| View previous topic :: View next topic |
| Author |
Message |
MaineCoon Senior Newbie
Joined: 24 Apr 2008 Posts: 6
|
Posted: Thu Apr 24, 2008 2:40 pm Post subject: |
|
|
| edavis wrote: | I am anxiously waiting this feature!
Currently I use Amazon S3 for backup storage and transfer data to/from using s3sync. It's quirky and on thing I really miss is the ability to store incremental linked backups ala rsync.
|
I initially looked at s3sync but didnt like the lack of versioning or incremental backups.
My choice has been duplicity (the latest version). It has worked great for backups. I've configured mine to maintain 6 months of backups, making a complete backup on the first of the month, and using incremental backups in the meantime. It bundles, compresses and encrypts using GPG.
Start with this guide, then use the script below:
http://www.randys.org/2007/11/16/how-to-automated-backups-to-amazon-s-s3-with-duplicity
And make sure to store a copy of your GPG key in a safe place off the server
You'll need to set change YOUR_ACCESS_KEY, YOUR_SECRET_KEY, YOUR_GPG_PASSPHRASE, YOUR_GPG_KEY, and YOUR_BUCKET_NAME.
A note about include/excludes - if you want to exclude something in a directory, you need to exclude the file/subdirectory before including the directory, as includes/excludes work on a 'first match' basis.
| Code: |
#!/bin/bash
# Export some ENV variables so you don't have to type anything
trace () {
stamp=`date +%Y-%m-%d_%H:%M:%S`
echo "$stamp: $*" >> /var/log/backup.log
}
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
export PASSPHRASE=YOUR_GPG_PASSPHRASE
GPG_KEY=YOUR_GPG_KEY
OLDER_THAN="6M"
# The source of your backup
SOURCE=/
# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://YOUR_BUCKET_NAME"
FULL=
if [ $(date +%d) -eq 1 ]; then
FULL=full
fi;
trace "Backup for local filesystem started"
trace "... removing old backups"
duplicity remove-older-than ${OLDER_THAN} ${DEST} >> /var/log/backup.log 2>&1
trace "... backing up filesystem"
duplicity \
${FULL} \
--encrypt-key=${GPG_KEY} \
--sign-key=${GPG_KEY} \
--include=/boot \
--include=/etc \
--include=/home \
--include=/lib \
--exclude=/root/.jungledisk/cache \
--exclude=/root/.cpan \
--include=/root \
--include=/usr \
--exclude=/var/tmp \
--include=/var \
--exclude=/** \
${SOURCE} ${DEST} >> /var/log/backup.log 2>&1
trace "Backup for local filesystem complete"
trace "------------------------------------"
# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
|
You can restore a file with this script:
| Code: |
#!/bin/bash
# Export some ENV variables so you don't have to type anything
export AWS_ACCESS_KEY_ID="YOUR_ACCESS_KEY"
export AWS_SECRET_ACCESS_KEY="YOUR_SECRET_KEY"
export PASSPHRASE=YOUR_GPG_PASSPHRASE
GPG_KEY=YOUR_GPG_KEY
# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://YOUR-BUCKET-NAME"
if [ $# -lt 3 ]; then echo "Usage $0 <time> <file> <restore-to>"; exit; fi
duplicity \
--encrypt-key=${GPG_KEY} \
--sign-key=${GPG_KEY} \
--file-to-restore $2 \
--restore-time $1 \
${DEST} $3
# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
|
Note that paths are relative, not absolute. /etc/apache2 would be backed up as etc/apache2. You can restore whole directories but the destination needs to exist... for example, to restore /etc/apache2 from April 23rd to a local directory 'restore', doing the following would fail because ./etc does not exist:
# cd ~
# mkdir restore
# cd restore
# duplicity-restore.sh "2008-04-23" etc/apache2 etc/apache2
However, doing:
# duplicity-restore.sh "2008-04-23" etc/apache2 apache2
would restore the directory to ./apache2 |
|
| Back to top |
|
 |
chrisnolan Senior Newbie
Joined: 02 Jun 2006 Posts: 17
|
Posted: Mon May 05, 2008 8:15 am Post subject: |
|
|
MaineCoon, many thanks for this. This is the S3 backup solution that I've been searching for.
Works very very nicely here on CentOS 5 - the only slightly tricky bit is getting all the dependencies installed for duplicity as the CentOS duplicity RPM is way out of date. |
|
| Back to top |
|
 |
memenode Senior Newbie
Joined: 20 Nov 2008 Posts: 17
|
Posted: Thu Nov 27, 2008 5:15 pm Post subject: |
|
|
I followed the above guide, but running the backup script results in this:
| Code: | 2008-11-27_17:11:24: Backup for local filesystem started
2008-11-27_17:11:24: ... removing old backups
No old backup sets found, nothing deleted.
2008-11-27_17:11:30: ... backing up filesystem
No signatures found, switching to full backup.
Traceback (most recent call last):
File "/usr/bin/duplicity", line 463, in <module>
with_tempdir(main)
File "/usr/bin/duplicity", line 458, in with_tempdir
fn()
File "/usr/bin/duplicity", line 449, in main
full_backup(col_stats)
File "/usr/bin/duplicity", line 155, in full_backup
bytes_written = write_multivol("full", tarblock_iter, globals.backend)
File "/usr/bin/duplicity", line 87, in write_multivol
globals.gpg_profile,globals.volsize)
File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 225, in GPGWriteFile
file.close()
File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 132, in close
self.gpg_process.wait()
File "/var/lib/python-support/python2.5/GnuPGInterface.py", line 639, in wait
raise IOError, "GnuPG exited non-zero, with code %d" % (e << 8)
IOError: GnuPG exited non-zero, with code 131072
close failed: [Errno 32] Broken pipe
2008-11-27_17:11:31: Backup for local filesystem complete
2008-11-27_17:11:31: ------------------------------------
|
My backitup script is:
| Code: | #!/bin/bash
# Export some ENV variables so you don't have to type anything
trace () {
stamp=`date +%Y-%m-%d_%H:%M:%S`
echo "$stamp: $*" >> /var/log/backup.log
}
export AWS_ACCESS_KEY_ID="xxxxxxxxxxxxxxxxx..."
export AWS_SECRET_ACCESS_KEY="xxxxxxxxxxxxxxxxxxxx..."
export PASSPHRASE=$(cat pwtextfile)
GPG_KEY=XXXXXXXX
OLDER_THAN="6M"
# The source of your backup
SOURCE=/
# The destination
# Note that the bucket need not exist
# but does need to be unique amongst all
# Amazon S3 users. So, choose wisely.
DEST="s3+http://mybucketname.s3.amazonaws.com"
FULL=
if [ $(date +%d) -eq 1 ]; then
FULL=full
fi;
trace "Backup for local filesystem started"
trace "... removing old backups"
duplicity remove-older-than ${OLDER_THAN} ${DEST} >> /var/log/backup.log 2>&1
trace "... backing up filesystem"
duplicity \
${FULL} \
--encrypt-key=${GPG_KEY} \
--sign-key=${GPG_KEY} \
--include=/boot \
--exclude=/** \
${SOURCE} ${DEST} >> /var/log/backup.log 2>&1
trace "Backup for local filesystem complete"
trace "------------------------------------"
# Reset the ENV variables. Don't need them sitting around
export AWS_ACCESS_KEY_ID=
export AWS_SECRET_ACCESS_KEY=
export PASSPHRASE=
|
I wanted to include only /boot for starts, not to waste bandwidth on testing. The /boot directory seems to be empty, but I creates a test text file there and it still didn't work so that doesn't seem to be an issue.
I used the defaults when generating my key.
EDIT: Hm, the output actually changed when I ran it with the test file created in /boot. It's this now:
| Code: | 2008-11-27_17:14:55: Backup for local filesystem started
2008-11-27_17:14:55: ... removing old backups
No old backup sets found, nothing deleted.
2008-11-27_17:14:55: ... backing up filesystem
No signatures found, switching to full backup.
Traceback (most recent call last):
File "/usr/bin/duplicity", line 463, in <module>
with_tempdir(main)
File "/usr/bin/duplicity", line 458, in with_tempdir
fn()
File "/usr/bin/duplicity", line 449, in main
full_backup(col_stats)
File "/usr/bin/duplicity", line 155, in full_backup
bytes_written = write_multivol("full", tarblock_iter, globals.backend)
File "/usr/bin/duplicity", line 87, in write_multivol
globals.gpg_profile,globals.volsize)
File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 213, in GPGWriteFile
data = block_iter.next(bytes_to_go).data
File "/usr/lib/python2.5/site-packages/duplicity/diffdir.py", line 407, in next
result = self.process(self.input_iter.next(), size)
File "/usr/lib/python2.5/site-packages/duplicity/diffdir.py", line 284, in get_delta_iter_w_sig
sigTarFile.close()
File "/usr/lib/python2.5/site-packages/duplicity/tarfile.py", line 508, in close
self.fileobj.write("\0" * (RECORDSIZE - remainder))
File "/usr/lib/python2.5/site-packages/duplicity/dup_temp.py", line 101, in write
return self.fileobj.write(buf)
File "/usr/lib/python2.5/site-packages/duplicity/gpg.py", line 125, in write
return self.gpg_input.write(buf)
IOError: [Errno 32] Broken pipe
close failed: [Errno 32] Broken pipe
2008-11-27_17:14:55: Backup for local filesystem complete
2008-11-27_17:14:55: ------------------------------------
|
I'm quite far off from understanding these sorts of errors.
Any ideas?
Thanks |
|
| Back to top |
|
 |
freedom_is_chaos Senior Member
Joined: 12 Sep 2008 Posts: 166
|
|
| Back to top |
|
 |
poetics5 Senior Newbie
Joined: 23 Oct 2008 Posts: 13
|
|
| Back to top |
|
 |
memenode Senior Newbie
Joined: 20 Nov 2008 Posts: 17
|
Posted: Mon Dec 01, 2008 7:15 pm Post subject: |
|
|
Thanks guys. Sorry for not responding earlier. I kinda just dropped this case for the time being and went on with other things, but I appreciate your help.
If duplicity is indeed buggy I'm tending towards not using it at this point. Jungledisk option seems good, but I dislike the fact that it's proprietary.. I guess though if no better option is available I might go with it.
Thanks again. |
|
| Back to top |
|
 |
mnordhoff Senior Member
Joined: 03 May 2008 Posts: 179
|
Posted: Mon Dec 01, 2008 9:39 pm Post subject: |
|
|
| memenode wrote: | Thanks guys. Sorry for not responding earlier. I kinda just dropped this case for the time being and went on with other things, but I appreciate your help.
If duplicity is indeed buggy I'm tending towards not using it at this point. Jungledisk option seems good, but I dislike the fact that it's proprietary.. I guess though if no better option is available I might go with it.
Thanks again. |
FWIW, JungleDisk has (or had, last I heard) some GPL software to extract the data from your S3 account. So if JungleDisk goes under or you lose your license or whatever, you won't lose your data, though you'll obviously have to find a new solution for new data. |
|
| Back to top |
|
 |
Kev@hearSAY Newbie
Joined: 25 Feb 2008 Posts: 2
|
Posted: Tue Dec 02, 2008 12:27 am Post subject: Duplicity making full backups... every time now?! |
|
|
| So I noticed that my alerts have been going haywire for the last week, and then I looked at my AWS account. Far too many GB used. Wow. I took a look, and apparently last week sometime duplicity started making full backups every time it runs on BOTH my linodes. I have no idea what I changed that caused this, but it must have been something I've changed on both linodes, and I'm racking my brain trying to figure it out. Does anyone have an idea what might be causing this? |
|
| Back to top |
|
 |
Kev@hearSAY Newbie
Joined: 25 Feb 2008 Posts: 2
|
Posted: Tue Dec 02, 2008 12:33 am Post subject: |
|
|
| Update: It's not random. More than half the storage space used in my AWS account for duplicity backups was filled today. Today is the first of a new month, the first time a new month has occurred since I installed duplicity. Hmm... I know I've got it set to make a full backup on the first of every month, but I didn't realize it was going to do it during ANY hour of that first day that backups are scheduled. Oh well.... I suppose I'll delete all of today's backups except for the most recent. |
|
| Back to top |
|
 |
memenode Senior Newbie
Joined: 20 Nov 2008 Posts: 17
|
Posted: Tue Dec 02, 2008 10:55 am Post subject: |
|
|
| mnordhoff wrote: |
FWIW, JungleDisk has (or had, last I heard) some GPL software to extract the data from your S3 account. So if JungleDisk goes under or you lose your license or whatever, you won't lose your data, though you'll obviously have to find a new solution for new data. |
Yeah I noticed something like that. Unfortunately though this option didn't quite work. Running jungledisk /mnt/s3 didn't mount anything. It still remained just a local directory. Maybe there's a solution to that, but frankly I'd really like something better..
Well I guess if all else fails I can just try s3sync or s3fs.. |
|
| Back to top |
|
 |
cboshuizen Senior Newbie
Joined: 07 Jan 2009 Posts: 10
|
Posted: Sun Jan 25, 2009 1:13 pm Post subject: Solving problems with Duplicity |
|
|
I had the same problems as mentioned by memenode, and I fixed them by checking three things: install location, permissions, and packages installed.
For install location, I picked a location for the root or backup user to run the script from. I chose /etc/backup, and performed all of the steps there.
For permissions, I made sure that I created all the necessary files, including the gpg key, with the correct user. This seemed to be an important step.
I also chown'd everything to the correct user.
Then, I made sure I installed the correct packages. It seemed to me that missing dependencies were to blame, so I followed these directions:
| Code: | | sudo aptitude build-dep duplicity |
or in the case of my ubuntu version (8.04),
| Code: | | sudo apt-get build-dep duplicity |
and then followed the rest of the steps:
| Code: | $ sudo aptitude install python-boto ncftp
$ wget http://savannah.nongnu.org/download/duplicity/duplicity-0.5.03.tar.gz
$ tar xvzf duplicity-0.5.03.tar.gz
$ cd duplicity-0.5.03/
$ sudo python setup.py install |
checking for the latest version of duplicity, which at this time is 0.5.06.
As a final step, make sure that the log can be written by the desired user.
I then made sure that I ran the script with the correct user, and it worked with out error.
However, while the files were being correctly transfered on the first test run, it wasn't noticing changes or new files, so something is still not quite right. I will review that and post another topic. |
|
| Back to top |
|
 |
cboshuizen Senior Newbie
Joined: 07 Jan 2009 Posts: 10
|
Posted: Sun Jan 25, 2009 4:09 pm Post subject: My set up seems to be working, despite misleading log files |
|
|
I just mentioned some problem with the report in the log, stating that 0 files were changed/added when in fact there had been.
To check what is going on, I created some more scripts to do listings and status reports, and the listings do match my changes to the file system.
I did some test backups, adding/changing/removing files in between, and could successfully restore any file from any point in time. Thus, it is all working despite the incorrect messages in the log.
*shrug* |
|
| Back to top |
|
 |
poetics5 Senior Newbie
Joined: 23 Oct 2008 Posts: 13
|
Posted: Thu Jan 29, 2009 11:47 pm Post subject: Re: 'Storage' Linodes? |
|
|
| PaulC wrote: | I know that we can add additional storage to our Linodes, but at $5/Gib/Mo it's not so attractive for an off-site backup of your photo or music collection.
So... any thought to some sort of 'storage'-oriented linode package(s)? I'm thinking for 'personal backup' usage, not some sort of file download site.
Just wondering if it's something others would find useful, or it's just me. Dunno if it makes business sense, but thought I'd ask. |
Why not just mount s3 to your server? then you'll have endless diskspace |
|
| Back to top |
|
 |
hybinet Senior Member
Joined: 02 May 2008 Posts: 445
|
Posted: Fri Jan 30, 2009 2:34 am Post subject: Re: 'Storage' Linodes? |
|
|
| poetics5 wrote: | | Why not just mount s3 to your server? then you'll have endless diskspace |
Yeah, there's S3FS which lets you mount your S3 bucket as if it were a regular filesystem.
Problem?
1) There seem to be at least 3 different programs named S3FS (in addition to JungleDisk which is not free), and each of them uses its own (proprietary) way of storing data in S3. These protocols are not compatible with each other. Accordingly, you can't access the data you stored on S3 unless you go through the same program you used to store it.
2) None of those 3 programs have reliable error handling capabilities. It'll tell you that a file has been copied to S3, but it won't tell you if a few bytes in the middle went corrupt in the process. Coupled with the apparently high failure rate of connections to S3, this is a total deal breaker. What good is a gazillion gigabytes of space if a data transmission error could go unnoticed?
Seriously, S3 is a fantastic service but it has so many limitations if one were to use it like a regular filesystem. Of course there are ways to work around those limitations (usually by storing extra information in the metadata), but I'm not jumping on the bandwagon until I have a standardized and reliable protocol for doing so.
In the meantime, I have a couple of low-cost, high-storage VPS's (50GB) and a backup package from BQBackup and the likes (100GB) mounted on my Linode using SSHFS. That protocol is damn slow, but at least it's standardized and reliable. |
|
| Back to top |
|
 |
phvt Senior Member
Joined: 28 Feb 2005 Posts: 74
|
Posted: Fri Jan 30, 2009 9:08 am Post subject: |
|
|
I believe the 'FuseOverAmazon' s3fs is the most advanced of the s3fs' out there, but I haven't tested any of them. s3fs is quite proprietary in file format because it creates a virtual block device in the bucket.
I've been testing JungleDisk (JD) on my Linode and the two main issues are:
(1) lots of little files cause a performance bottleneck because each must be posted in its own transaction with S3. JD does this file-based uploading because that's the only way S3 will guarantee "consistency".
(2) The JD cache isn't yet available offline, so files in the JD mount are only visible if there is connectivity to S3. It was designed this way to ensure cache validity when the bucket is shared between multiple machines..
Another downside of JD is that for Linux CLI use, you'll need to download the USB version and create the XML configuration file using it (on a Mac or PC), then paste the config to your Linode. There's no published spec for the XML file but easy to edit by hand after creating the initial file.
JungleDisk can create and use compatibility buckets which are compatible with other software (but S3 doesn't have any "standard" for this, anyway). The biggest advantage of JD is that it can encrypt everything before it leaves your machine, which obviously makes those buckets incompatible with other software. There is an open source tool to access your JD encrypted bucket data.
We need both fast and cheaper storage, and backups that are easy to verify and restore. Handling both with the same system might not be optimal (see how Amazon separates EBS from snapshot backups to S3). I really look forward to seeing what Linode comes up with because everything else here rocks! |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|