Trying to figure out a backup solution...

I'm using the beta backup service but as I've been reading it seems that you don't want to rely on that.

I've never had to resort to restoring a backup so I'd just like to ask a few questions if you don't mind.

Some points:

My main server is a PHP/Apache/Mysql vBulletin server. The MySql database is 800M.


Backups are categorized daily, weekly, and monthly when done correctly right? So if I backed up to S3 then I'd have to pay at least 800mb a day for a total of 800mb x 30? That seems like a lot of space that would add up in $$ really fast.

Is Amazon S3 the ideal way to back it up?

How do I backup MySQL properly and not crash the server during a mysql dump (server gets slow when I do this?) Will the SQL dump be fine? What about if it tries to do this in the middle of someone trying to post something?

What are the alternatives to Amazon S3? Something cheaper? Easier to restore?

20 Replies

I use GigaPros FTP hosting, and I encrypt the packages before FTPing them. All done automatically, daily. No issues so far (using them a few months), and they have FTP plans ranging from $2.5/mo for 1GB+2GB, to $25/mo for 20GB+400GB (space+bandwidth).

As for MySQL dumping, I don't know, don't use MySQL, but PostgreSQL that I use does it transactionally (with pg_dump), ie. the dump is db snapshot at the time of taking.

Ah, hmm I think that's more expensive than Amazon S3 isn't it?

I'm really confused at how much space I'll be needing.

There's always the DIY solution of home backups. If you have a few gigs at home, you can use rsync to only send the differences. So of your 800MB database, if only 10MB changes per day, you can do daily backups and only use 10MB of bandwidth on your home connection.

The problem with S3 is that because both the source and destination (S3) are local filesystems to the linode, it won't be able to tell what parts of a file have changed without actually reading the file (unchanged files can be detected by filesize/modification time, but telling WHAT changed requires reading it).

Wheras when you're doing a local/remote scenario, you can run rsync on the remote end (in this case, the home server), where rsync can scan its copies of the files without transferring data over the net.


Ah, hmm I think that's more expensive than Amazon S3 isn't it?

I'm really confused at how much space I'll be needing.

Well, don't you know your current usage? Besides, I'm not affiliated with Gigapros, but using the S3 online calculator shows S3 is twice as expensive. Granted, it also says as of June the inbound traffic will be free, which is what you mostly have, I guess.

As for the backup plan, depends on what kind of backups you need/want. I currently keep daily only backups, meaning not 30 monthly archives but last 24 hours only. That's all I need, since the backup is just to prevent data loss if my node begets to push daisies for whatever reason.

So unless you need historical snapshots throughout the month, last 24 hours plus maybe twice a week as extras should be more than enough. Or just keep last 5 days worth of backup or something.

Ah, that's where my confusion came… on how many total backups one would have.

So if I kept a week worth of backups that would be 800mb x 7 then, correct?

Anyone have answers on the MySQL concern I had?

What scripts do you guys use?

So Amazon isn't good because you can't rsync? What if you had a daily backup with rsync wouldn't it be 800mb+ no matter what each transfer? It doesn't just version the previous file…?

Database dumps compress well. Unless you have large blobs in there (which you shouldn't), the compressed dump file is likely to be only 100M or so.

Have you considered backing up the binary logs (binlogs) instead of shipping an entire database dump every time? If only a small part of the database changes during the day, this might be more space-efficient. Binary logs are also more rsync-friendly than raw database dumps, because they're append-only. The downside to binary logs is that they're tricky to verify.

Just a couple of reminders:

1) If you use the MyISAM storage engine, mysqldump will lock all your tables while the dump is in progress. If your application tries to insert or update some rows during this time, the page is likely to hang until the dump is complete.

2) If you use InnoDB tables, inserts and updates are allowed to happen even while the dump is in progress. So you should specify the –single-transaction --quick options in mysqldump to prevent getting an inconsistent snapshot.

I use Webbycart's backup service. 120Gb storage with unmetered bandwidth for $15 p/m.

nightly backups with rsync.

Before the backup I dump the mysql tables and .tgz them.


I use bqbackup (rsync) and s3 (duplicity).

Nightly I rsync bqbackup to my home mac.

I use duplicity with amazon s3.

Dump of mysql databases then duplicity backup.

I use a script that makes incremental backup every day and full backup every month.

With 4Gb of (uncompressed) data to backup, one year of backup (12 full and more than 300 incremental) my amazon bill is 6-7 dollars a month.

Every month I also make a full backup from my home box with rdiff-backup.

And every time I shut down I also take some minute to make a clone of the disk image, shrink it to the minimum size and keep it as an additional backup: now that I have two linodes (one for production and one for testing) I want to keep the last image on the testing linode.

A bit paranoid, I know ;-)


I use duplicity with amazon s3.

Dump of mysql databases then duplicity backup.

I use a script that makes incremental backup every day and full backup every month.

With 4Gb of (uncompressed) data to backup, one year of backup (12 full and more than 300 incremental) my amazon bill is 6-7 dollars a month.

Every month I also make a full backup from my home box with rdiff-backup.

And every time I shut down I also take some minute to make a clone of the disk image, shrink it to the minimum size and keep it as an additional backup: now that I have two linodes (one for production and one for testing) I want to keep the last image on the testing linode.

A bit paranoid, I know ;-)

I like your style. Could you share your script perhaps?


I like your style. Could you share your script perhaps?


This is the script for db backup (one table for file, I like this way and not all the DB in one file)

# backup mysql - evry DB in its file

MYSQLDUMP="$(which mysqldump)"
MYSQL="$(which mysql)"

# clean old backups
rm $MDBAK/*bak > /dev/null 2>&1

# save db list
DBS="$($MYSQL -u $MUSER -p$MPASS -Bse 'show databases')"

# dump every database
for db in $DBS; do
 #echo "$db -> $MFILE"

exit 0

And this is the script for duplicity backup:


## NEX: full bkup on Amazon S3
# NOTE: In a shared environment is not safe to export env vars

# Export variables
export AWS_ACCESS_KEY_ID='your AWS Access Key ID'
export AWS_SECRET_ACCESS_KEY='your AWS Secret Key'
export PASSPHRASE='your passphrase'

GPG_KEY='your GPG key'

# day of the month
DDATE=`date +%d`

# full backup only on 1st of the month, otherwise incremental
if [ $DDATE = 01 ]

# Backup source

# Bucket backup destination

#NEX - enable for removing old backups
#duplicity remove-older-than 1Y 

duplicity ${DO_FULL} \
    --encrypt-key=${GPG_KEY} \
    --sign-key=${GPG_KEY} \
    --exclude=/root/download/** \
    --exclude=/var/www/** \
    --exclude=/var/www/web23/user/** \
    --include=/etc \
    --include=/home \
    --include=/root \
    --include=/usr/local \
    --include=/var/www \
    --include=/var/dumps \
    --include=/var/mail \
    --exclude=/** \
    ${SOURCE} ${DEST}

# Reset env variables

exit 0

Useful links:

~~[" target="_blank">]( … duplicity/">](

~~[" target="blank">]( … webserver#">](

And this is the very simple script for the backup with rdiff-backup.

The difference from duplicity is that

duplicity can encrypt so it is safe to use in untrusted backup destinations (like amazon s3)

rdiff-backup stores files without encryption, so it is easier to use in a trusted environment (your home pc, I hope :-) ) and very easy to restore.


## NEX - full rdiff bkup
##       must be executed manually with root privileges (sudo)
##       better to create ssh account only for backup, so it can be launched unattended

# backup source (your linode)

# Local destination (on your home pc)

# Replace 12345 with your ssh port number, or remove "-p 12345" if it is on standard 22
rdiff-backup -v5 --print-statistics \
        --remote-schema 'ssh -p 12345 -C %s rdiff-backup --server' \
        --exclude=/root/download/** \
        --exclude=/var/www/** \
        --exclude=/var/www/web23/** \
        --exclude /lost+found \
        --exclude /media \
        --exclude /mnt \
        --exclude /proc \
        --exclude /sys \
        --exclude /tmp \
        ${SOURCE} ${DEST}

exit 0

As a reminder, rdiff won't be useful with S3 or an S3-driven service like Duplicity unless you keep local snapshots that you backed up against. rdiff and rsync have the same problem; they need to fully read each copy of the file at least once in order to determine what changed (in rdiff's case) or generate the set of checksums (in rsync's case).

So your backup procedure for file "foo" could be:

1) Backup time! Copy /mystuff/foo to /backups/2010-01-28/foo

2) rdiff /backups/2010-01-28/foo against /backups/2010-01-27/foo

3) Compress the diff

4) Send the diff to S3 for storage

The next day, repeat with process, diffing against the previous day. You would want to periodically do a full backup. Perhaps every week, you would want to compress the whole shebang and send it, and then send diffs for each day of the week.

In this scenario, you also only ever need to keep the most recent backup snapshot locally; each day, you just need to diff against the previous day, unless it's full-backup day.

Restoring a backup just involves taking the latest full backup and then applying the diffs sequentially until you reach the desired date. That can also be automated with scripts.

Of course, this is all far more complex than the old "rsync home each night and then let your home backup solution take care of things like incremental stuff for history".

As far as incremental diff strategy goes, what I do is something like:

Day 1: Full backup.

Day 2: Incremental against Day 1

Day 3: Incremental against Day 2

Day 4: Incremental against Day 1

Day 5: Incremental against Day 4

Day 6: Incremental against Day 1

(and so forth – I actually do it every 0.7 days, but you get the idea)

Basically, ensure that the distance from the full backup to your most recent incremental is reasonably short. This will make your incremental backups larger, more often than not, but it will reduce the amount of work needed to restore. Also, if one component of the backup gets corrupted or deleted, you've got a better shot of not losing everything.

Bandwidth is cheap, storage is cheap, but neither your data nor your time are. Have a backup strategy that works, is automatic, assures you that everything is up to date, and has a restore method you know how to use. And practice a restore… grab a 360 for the day and restore to it. It'll cost you a buck or two, but you'll sleep better.

And if you're me, you'll find out why backing up to home really sucks for full restores :-)

EDIT: And I might as well plug my personal backup methods:

0. Linode's backup service (ideal for full restores, not to be relied upon yet)

1. BackupPC on my home server (ideal for full LAN restores and single-file restores; stores ~3 months of data with pooling across machines)

2. Keyfobs with tarballs generated by BackupPC and moved off-site monthly (ideal for full restores and sphincter-clenching disasters)

3. Experimental backups from BackupPC to S3 (ideal for full restores, somewhat more automated than #2 but slow due to upstream bandwidth constraints)

Also, most of my works-in-progress are stored on Dropbox, which is synced across all of my computers and backed up by BackupPC. I use git for revision control and a script I wrote to back up my remote IMAP accounts (gmail, live@edu, etc).

I… think of too many worst-case scenarios.

We've had the backup discussion several times. I'll elaborate on how I (personally) do my backups on my Linode 360. I'll also note that I run a small webhosting business of about 10 websites.

I have three backups solutions currently:

1. Linode Backup Beta - Never had to restore, I really don't even look at the tab anymore to make sure its backing up. A quick check (just now) shows that the backups are being made successfully, but are not to be relied upon.

2. rdiff-backup daily to a virtual machine running Ubuntu 9.04. I'll elaborate below.

3. Custom made GMail backup script. I'll elaborate below.

Daily rdiff-backup

So this is my main fallback. If I lose data I just restore from my virtual machine. I run a Ubuntu 9.04 server install inside VMWare Workstation 7. Every morning before class I get it, boot the VM, login, type "./backup" at the prompt and go to class. I come back, it tells me how much has changed. If it was successful (occasionally it will fail), I shutdown the OS, power off the VM, and go about my daily work.

I have the VM using a 100GB disk image. The disks were not allocated at creation, so it grows as the disks get larger. This means its slightly slower, but isnt always taking 100GB of disk space when theres only 5gb of data in the VM. Every month I copy the VM to one of my external hard drives where it sits until the next month. I have a two month rotation for VM backups. This means at any point I have:

1. 2 month old Backup VM image

2. 1 month old Backup VM image

3. Currently used Backup VM image

I would normally have an old computer set up for this, but as a college student, I don't have the space to keep an extra desktop around. A VM works just fine for me.

Custom GMail Backups

Once I got my first web hosting customer I started looking for more redundancy as far as backups. The VM was for my own use. I would experiment and mess something up, I could simply restore it. After looking around for different backup solutions I decided to not spend any money and work on a customized GMail backup solution.

I have three different backup categories:

1. MySQL Database dump backups - Daily

2. Web (htdocs) folder backups - Weekly

3. Nothing here yet - Monthly

Let me lay out how the GMail backup script works. It is sloppy and needs to be rewritten, but heres how it works:

1. Create a temporary folder in /tmp/

2. Compress the command line parameter (a filename or directory) using tar.gz and save it in the folder created in 1. Redirect the output of the tar command (the file list) to a temporary text file.

3. Encrypt the folder using GPG with a special passcode.

4. Split the encrypted file into 24MB chunks.

5. Delete the old encrypted file.

6. For each of the chunks created in 4:

6a. Generate a subject line (m/d/Y - Filename - (X of Y)

6b. Generate the body. The body has instructions for recombining the files, decrypting, and extracting them.

6c. Send an email using mutt to a predetermined email address attaching the file list from 2 and the current chunk from 4/6.

7. Cleanup (remove temp folder from 1, and a sent folder that mutt makes)

8. Append a line to backup log.

So thats the script itself. I have a that contains a bunch of calls to with various filenames as the parameters. For instance the file dumps all my MySQL tables to a temp file, then calls with the temp file as a parameter. When its done it deletes the temp file, and is done for the day. I also have a script that calls on all of my htdocs directories (with some exceptions), along with system logs, repositories, etc.

All system-related backups go to a predetermined email. Once a gmail account fills up (I've been using ths system account since August and its only 43% full) I just register a new one, change the email in the script, and let it continue running. The web-related backup email was created at the same time and is 75% full.

Its not even close to an industry standard backup plan, but its worked for me in the past. If you're interested in the script I can go ahead and post it.

Wow this is a lot to take in, I have to go over it.

Couple questions…

So if I backup my database today for example:

backup-jan28-2010.sql 800mb

and then the backup script runs again tomorrow it'll backup to backup-jan29-2010.sql but only transfer what has risen above 800mb?

Another question… if I backed up over S3 how would I restore it? What about the other solutions?

The problem is that bandwidth and storage space on S3 isn't cheap.

Say I want to back up my Linode, which has 20GB of data.

Assume my incremental backups weekly are only 5% the size of a full backup on average. Assume that I only keep the previous/current week of data. That's two full backups and 12 incrementals, or 52GB of data.

A full month schedule would involve transferring roughly four full and 24 incrementals, let's say. 104GB to transfer.

Total costs:

S3 storage, 52GB @ $0.15/GB: $7.80/mth

S3 transfer, 104GB @ $0.10/GB: $10.40/mth

Linode transfer, 104GB @ $0.10/GB: $10.40/mth

Total: $28.60/mth

That's not cheap to back up a $30 linode!

The same approach when backing up to home would only cost the last bit, $10.40 total, assuming you already have the storage to spare. If you want to account for the cost of home storage, my costs for building my file server are actually somewhat similar to Amazon's… $0.10/GB (including drives/hardware) without redundancy, $0.12 per gig with…

It depends on what you want to do. If you run a rdiff-backup (or any rsyc-like backup solution) you could name you file the current date, then delete it the next day. Assuming you ran rdiff-backup before it was deleted you will always be able to restore that file. You could also simply just keep deleting and recreating backup.sql. If you run an rdiff-backup daily you can revert it back to any date you want.

How rdiff-backup works is that you can pull any file from any date, assuming you make backups between changes.

With rdiff-backup, if you name each file uniquely (ie backup-jan28-2010.sql) you will have to redownload all 800mb (compression?) once a day. If you simply overwrite backup.sql every time you dump your database, rdiff-backup will only transfer the changed date.

Basically rdiff-backup works like this: The first time it sees a file it HAS to download the whole thing, after that it will only download the changes. If you move a file, it will have to redownload the entire thing at its new location. If the file hasnt changed, it'll simply make a note in its internal tracking system that the file has not been changed and download nothing.

You can run all kinds of commands on your rdiff-backup such as the following. It checks my home directory for all the files that have been changed in the past 5 days. Note that these are all run on the virtual machine where the backups are kept and require no communication with my actual linode.

root@linode-backup:/backups# rdiff-backup --list-changed-since 1D linode_current/home/smark/
changed home/smark/.bash_history
changed home/smark/.lastip
changed home/smark/
changed home/smark/
changed home/smark/psybnc/psybnc-oftc/log/psybnc.log
changed home/smark/
changed home/smark/
changed home/smark/

Or the following which lists all my backups since I recreated the VM:

root@linode-backup:/backups# rdiff-backup --list-increments --list-increment-sizes linode_current/
        Time                       Size        Cumulative size
Thu Jan 28 01:04:22 2010         11.8 GB           11.8 GB   (current mirror)
Wed Jan 27 12:54:44 2010         2.26 MB           11.8 GB
Mon Jan 25 14:07:15 2010         35.3 MB           11.9 GB
Wed Jan 20 02:22:05 2010         39.1 MB           11.9 GB
Thu Jan 14 13:14:55 2010         42.6 MB           11.9 GB
Tue Jan 12 21:30:25 2010         37.1 MB           12.0 GB
Tue Jan 12 00:57:17 2010         36.7 MB           12.0 GB
Sat Jan  9 10:26:20 2010         35.3 MB           12.0 GB
Mon Jan  4 01:25:00 2010         41.1 MB           12.1 GB
Mon Jan  4 01:12:35 2010         1.18 MB           12.1 GB
Wed Dec 30 12:26:25 2009         49.8 MB           12.1 GB
Tue Dec 29 17:46:06 2009         34.3 MB           12.2 GB
Mon Dec 28 21:21:21 2009         32.7 MB           12.2 GB
Thu Dec 10 11:07:38 2009          305 MB           12.5 GB
Wed Dec  9 14:11:12 2009         16.9 MB           12.5 GB
Tue Dec  8 00:01:27 2009         41.2 MB           12.6 GB
Sun Dec  6 14:59:15 2009         17.0 MB           12.6 GB
Fri Dec  4 11:59:59 2009         16.3 MB           12.6 GB
Thu Dec  3 12:04:18 2009         15.5 MB           12.6 GB
Tue Dec  1 11:54:55 2009         20.0 MB           12.6 GB
Mon Nov 30 11:30:45 2009         22.0 MB           12.6 GB
Sun Nov 29 14:04:16 2009         13.5 MB           12.7 GB
Wed Nov 25 11:32:36 2009         25.4 MB           12.7 GB
Tue Nov 24 12:20:49 2009         20.3 MB           12.7 GB
Mon Nov 23 12:52:05 2009         21.1 MB           12.7 GB
Sun Nov 22 18:59:23 2009         21.2 MB           12.7 GB
Sun Nov 22 15:28:17 2009         1.47 MB           12.7 GB
Sat Nov 21 11:44:32 2009         21.9 MB           12.8 GB
Thu Nov 19 15:57:10 2009         18.0 MB           12.8 GB
Mon Nov 16 11:25:33 2009         16.5 MB           12.8 GB
Fri Nov 13 15:18:25 2009         17.8 MB           12.8 GB
Thu Nov 12 01:33:37 2009         29.1 MB           12.8 GB
Mon Nov  9 13:33:38 2009         29.7 MB           12.9 GB
Fri Nov  6 11:49:26 2009         42.5 MB           12.9 GB
Wed Nov  4 15:02:38 2009         37.6 MB           13.0 GB
Mon Nov  2 12:57:26 2009         35.1 MB           13.0 GB
Wed Oct 28 12:54:22 2009         34.6 MB           13.0 GB
Tue Oct 27 16:13:07 2009         17.9 MB           13.0 GB
Mon Oct 26 12:11:58 2009         15.9 MB           13.1 GB
Mon Oct 26 00:41:27 2009         15.7 MB           13.1 GB
Fri Oct 23 12:04:53 2009         7.58 MB           13.1 GB
Thu Oct 22 12:19:17 2009         6.16 MB           13.1 GB
Wed Oct 21 11:46:55 2009         7.27 MB           13.1 GB
Mon Oct 19 11:33:05 2009         7.30 MB           13.1 GB
Sat Oct 17 16:29:55 2009         8.01 MB           13.1 GB
Fri Oct 16 17:48:02 2009         7.23 MB           13.1 GB
Wed Oct 14 11:14:16 2009         7.47 MB           13.1 GB
Tue Oct 13 11:57:40 2009         7.50 MB           13.1 GB
Mon Oct 12 12:06:24 2009         7.73 MB           13.1 GB
Sun Oct 11 14:10:16 2009         6.34 MB           13.1 GB
Sat Oct 10 17:53:44 2009         6.86 MB           13.1 GB
Fri Oct  9 11:49:11 2009         6.84 MB           13.2 GB
Thu Oct  8 11:47:08 2009         7.78 MB           13.2 GB
Wed Oct  7 13:41:54 2009         8.81 MB           13.2 GB
Tue Oct  6 15:49:47 2009         38.3 MB           13.2 GB
Mon Oct  5 11:22:37 2009         8.22 MB           13.2 GB
Sun Oct  4 17:08:00 2009         8.07 MB           13.2 GB
Sat Oct  3 13:44:13 2009         9.80 MB           13.2 GB
Fri Oct  2 11:39:14 2009         8.07 MB           13.2 GB
Thu Oct  1 17:13:01 2009         7.10 MB           13.2 GB
Wed Sep 30 13:09:17 2009         23.9 MB           13.3 GB
Tue Sep 29 11:59:40 2009         5.95 MB           13.3 GB
Mon Sep 28 15:03:24 2009         5.25 MB           13.3 GB
Sat Sep 26 15:14:45 2009         4.83 MB           13.3 GB
Fri Sep 25 14:05:57 2009         2.23 MB           13.3 GB
Thu Sep 24 11:56:54 2009         3.12 MB           13.3 GB
Wed Sep 23 11:40:27 2009         6.71 MB           13.3 GB
Tue Sep 22 15:55:41 2009         1.24 MB           13.3 GB
Mon Sep 21 21:10:56 2009         1.50 MB           13.3 GB
Sun Sep 20 11:29:00 2009          103 MB           13.4 GB
Sat Sep 19 16:34:42 2009         1.39 MB           13.4 GB
Fri Sep 18 10:52:50 2009         1.17 MB           13.4 GB
Thu Sep 17 10:19:15 2009         1.28 MB           13.4 GB
Tue Sep 15 23:01:54 2009         1.65 MB           13.4 GB
Tue Sep 15 18:32:13 2009          930 KB           13.4 GB

Size = Changed size of the mirror files (?)

Cumulative Size = Total size of the backup

Okay sorry I'm asking so many questions guys, I'm really just trying to absorb all of this and learn. It's been great so far and I really appreciate it.

So I saw someone mentioned that to do proper MySQL dumps I have to lock the db so nothing odd will happen if restored, correct?

Alright, he said if it was running under a certain version, I think MyISAM then I wouldn't need to run a specific command to back it up, otherwise I would?

Firstly, how do I figure out what type of DB MySQL is running?

Second if it's not running that specific type of db then I have to mysqldump with a command to lock the tables, correct?


Please enter an answer

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct