Offsite Backups HOW-TO

I have written a little howto for setting up offsite backups using rdiff-backup. This is the method I use for backing up my linode to my home fileserver via cable modems. I will probably edit this post and insert the how-to here later today (it's 2:30 am here now).

Please let me know what you think and post any suggestions/complains/questions here :-)

http://thegrebs.com/docs/rdiff-backup.html

Overview

rdiff-backup provides a simple easy means for providing offsite

backups. This is intended to be a simple, easy to follow set of

instructions for implementing off site backups. There are some

great advantage to using rdiff-backup for this, here are some of

my favorite:
* Files are stored on the remote site as normal files, not in

some proprietary format making looking for something easy

using standard *nix tools such as find, grep and locate.
  • A full history of previous revisions is kept so you can restore

    a copy of that file you accidently changed from several days

    ago, before the change was made.

  • Rsync's algorithms are used so only the changes from the

    previous day are transfered, not a full image.

  • Previous revisions of files are stored as differences from

    current version saving space where backups are stored.

  • Can use ssh for transferring files making the transmissions

    secure from packet sniffing.

  • Can use RSA or DSA public key cryptography for authenticating

    to the remote host so automated backups are still secure

    without a clear text password in a config file.
    If you have any corrections or suggestions for this document, either

technical or regarding readability, please do drop me a line at

michael@michaelandheidi.net so your suggestions can help other people.

Conventions Used

Offsite will be used to refer to the location you are backing up to while linode will refer to the location you are backing up from.

This document assumes you already have rdiff-backup installed, if you need help with this, feel free to email me add the address above with any questions you may have. If there is enough demand, I will add instructions for getting it installed on the most popular distributions.

Setting up SSH

This step gets ssh setup for secure unattended, password-less

use. Public and private RSA keys will be generated. The private key is stored on the machine you are backing up too while the public key gets sent to your server you are backing up. These keys are used to authenticate a connection without passwords in a secure manner. Then we setup the remote end to authenticate against this key and allow only rdiff-backup to run.

4. Use the ssh-keygen program to generate a password-less RSA key pair by executing the following command. In this example /root is root's home directory. Adjust accordingly for your system.
````
offsite# ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/michael/.ssh/idrsa): /root/.ssh/idrsabackup Enter same passphrase again: Your identification has been saved in /root/.ssh/idrsabackup. Your public key has been saved in /root/.ssh/idrsa_backup.pub.
The key fingerprint is:
aa:63:1a:f8:0f:44:78:e9:d2:e1:25:40:51:2f:59:29 root@offsite

````
Set this new private key as readable only by root for security. SSH will complain if it doesn't have secure file permissions.
````
offsite# chmod go-r /root/.ssh/id_rsa_backup
````
  1. Now we must transfer the public key to the remote system.

    offsite# scp /root/.ssh/id_rsa_backup.pub linode:/root/.ssh/id_rsa_backup.pub
    root@linode's password: <enter root="" password="">id_rsa_backup.pub              100%  224     0.0KB/s   00:00</enter> 
    
  2. Now setup the entry on the remote system, on linode open idrsabackup.pub in your favorite text editor. This should be one really long line and you want to add command="…" to the beginning of it. Note that this line has been significantly shortened here.

    command="rdiff-backup --server" ssh-rsa AAAAB3Nz[....]iNM= root@offsite
    
  3. Add this new line to authorizedkeys2. If there is no root/.ssh/authorizedkeys2

    already then execute the following command:

    linode# mv /root/.ssh/id_rsa_backup.pub /root/.ssh/authorized_keys2
    

    If this file already exists, use these commands instead to add your new line to it:

    linode# cat /root/.ssh/id_rsa_backup.pub >> /root/.ssh/authorized_keys2
    linode# rm /root/.ssh/id_rsa_backup.pub
    

    Finally, set the permissions on this file to only readable by root:

    linode# chmod go-r /root/.ssh/authorized_keys2
    
  4. Now we need to tell ssh to use our new private key and a few other settings, this will lessen the CPU load on both ends. Open in your favorite editor, creating if it doesn't already exist, the file /root/.ssh/config.

    You want to add the following section to this file.

    host linode-backup
            hostname linode
            identityfile /root/.ssh/id_rsa_backup
            compression yes
            cipher blowfish
            protocol 2
    

    This causes ssh to use these settings when you ask it to connect to linode-backup. The hostname specifies the real host to connect to. identityfile specifies the name of the private key to use when connecting.

  5. Now you get to test out our settings for ssh.

    offsite# rdiff-backup --test-server linode-backup::/ignored
    Testing server started by:  ssh -C linode-backup rdiff-backup --server
    Server OK
    

    If your output doesn't match hopefully you will receive an error that make sense. The most important thing here is that ssh shouldn't ask you for your password. If it doesn't work out, check you filenames and permissions. If your still at a loss, feel free to send some output my way and we'll see if we can get it knocked out.

Setting Up rdiff-backup

In this section we will setup a location for storing the backups, create an rdiff-backup config file, and set things up for automatic backup via a system cronjob.

4. You need to decide on a location to store the backups at on offsite. I store my backups in /backups/. Once you decide where you want to put your backups. Create the path. For example, to create the location I use you would type:
offsite# mkdir -p /backups/linode

  1. We need to specify the files to include and exclude when backing up with rdiff-backup. This can be done on the command line when it is run but, for an automated system, it is much easier to create a file with this information. I use /etc/rdiff-backup.conf but you may use something else if you prefer. It is important to note that this file is located on offsite, not on linode as you might expect.

    - /dev
    - /proc
    - /tmp
    - /var/tmp
    - /usr/portage
    - /home/michael/dl
    

    In this file a dash at the start of a line excludes paths matching the contents of the line. To include a path, you just place the path on the line with no preceding symbols.

    My example is for a Gentoo system, though the only Gentoo specific item is /usr/portage. /home/michael/dl is the location I download, extract and compile software and is easily replaceable, hence it's exclusion. The remainder of the entries you will most likely want to keep. The way we are setting stuff up, the root dir will be included, causing everything to be backed up except for the items listed in this file.

  2. Finally we setup rdiff-backup to run daily from cron. You will need to locate the directory holding scripts for system cronjobs that run daily. In most cases this is /etc/cron.daily[/u[. In this directory you need to create a new shell script that looks like this:

    #!/bin/sh
    export HOME=/root
    rdiff-backup --print-statistics --include-globbing-filelist /etc/rdiff-backup.conf linode-backup::/ /backup/linode
    

    Note that this should all be one line.

    You may choose what ever name you desire for this script. I use rdiff-backup.sh. This script will run rdiff-backup using the config file you created earlier. The –print-statistics argument is optional and produces output that looks like this:

    --------------[ Session statistics ]--------------
    StartTime 1066622417.00 (Mon Oct 20 00:00:17 2003)
    EndTime 1066623159.32 (Mon Oct 20 00:12:39 2003)
    ElapsedTime 742.32 (12 minutes 22.32 seconds)
    SourceFiles 45652
    SourceFileSize 429389316 (409 MB)
    MirrorFiles 44428
    MirrorFileSize 414270694 (395 MB)
    NewFiles 1226
    NewFileSize 15059120 (14.4 MB)
    DeletedFiles 2
    DeletedFileSize 3501 (3.42 KB)
    ChangedFiles 119
    ChangedSourceSize 1755983 (1.67 MB)
    ChangedMirrorSize 1692980 (1.61 MB)
    IncrementFiles 1349
    IncrementFileSize 117377 (115 KB)
    TotalDestinationSizeChange 15235999 (14.5 MB)
    --------------------------------------------------
    
  3. Lastly you may want to run it for the first time right now. The first run will be the longest as it will need to transfer everything that needs to be backed up.

    A great way to do this and test out your script at the same time is by running it:

    offsite# /etc/cron.daily/rdiff-backup.sh
    

    Note that this could take a while depending on the amount of stuff to transfer and the speed of your connection.

Conclusion

michael@thegrebs.com

22 Replies

rdiff-backup is pretty neat and looking even better after 1.0 came out. Sadly, Debian 3.1 has version 0.13.4-5 in the stable branch so they should grab 1.01 from testing. Version 1.02 is the latest.

Some other backup utils based on rsync I'm evaluating are:

dirvish - http://www.dirvish.org/

selected by OSL at Oregon State University in Oct 2005 to backup their servers (includes projects such as Mozilla, Gentoo, kernel.org, and others). New maintainer took over in 2004 after the original author, jw, passed away at age 42.

rlbackup - http://www.math.ualberta.ca/imaging/rlbackup/

actively maintained but not well-known or publicized. only popular among math and science gurus? used by phy.bnl.gov on Debian Sarge but there's no .deb package yet

rsnapshot - http://www.rsnapshot.org/

looks nice but maintainer started looking to pass the torch last week. rsnapshot is mentioned in the book BSD Hacks as tip #39.

Others (non-rsync):

dar - http://dar.linux.free.fr/

actively maintained, great for backup to cd/dvd, once restored a 1.4 terabyte backup, can use openssl for encryption. It doesn't use rsync but seems pretty popular. kdar is availabe for kde fans.

hdup 1.6 - super-easy daily/weekly/monthly backups, uses non-proprietary format tar.gz/tar.bz2. This was replaced by hdup2 which I don't recommend due to loss of directory attributes (unless used with patched tar).

http://www.miek.nl/projects/hdup/hdup.shtml

arnie - http://furius.ca/arnie/

requires python 2.4 and doesn't yet support all attributes (uid, guid, ctime, mtime). pretty new project and very simple.

bacula - http://www.bacula.org/

network backup, full-featured and complex, well-documented and appears to be good for those with many machines to backup

amanda - http://www.amanda.org/

network backup to a single large tape, etc.

BackupPC is a high-performance, enterprise-grade system for backing up Linux and WinXX PCs and laptops to a server's disk. BackupPC is highly configurable and easy to install and maintain.

It can backup client PCs (Windows and Linux) using Samba, tar over ssh/rsh/nfs, or rsync.

Being a little fearful of rdiff-backup I have just been using the following script:

#!/bin/sh

rsync --verbose  --progress --stats --compress --rsh="/usr/bin/ssh -p 22" \
      -a --delete \
      --exclude "#*#" \
      --exclude "*~" \
      --exclude "/proc/*" \
      --exclude "/dev/*" \
      --exclude "/var/cache/*" \
      root@www.mylinode.com:/ /mnt/mylinode/backup/

Simple enough, and I then burn a copy of the local backup to DVD+RW (I have 5 DVDs that I rotate).

Mike, firstly great howto and thanks for the good work in general. I haven't implemented it on my linode yet but will be later this evening after I attend to a previous engagement. One thing though, I have never used rdiff-backup before and although I hope I never need to, how would I go about restoring said backup to my linode in the event of some kind of failure?

Once again thanks for all your efforts to make our linode experience as painless as possible.

Cheers.

@monarch:

Being a little fearful of rdiff-backup I have just been using the following script:

#!/bin/sh

rsync --verbose  --progress --stats --compress --rsh="/usr/bin/ssh -p 22" \
      -a --delete \
      --exclude "#*#" \
      --exclude "*~" \
      --exclude "/proc/*" \
      --exclude "/dev/*" \
      --exclude "/var/cache/*" \
      root@www.mylinode.com:/ /mnt/mylinode/backup/

Simple enough, and I then burn a copy of the local backup to DVD+RW (I have 5 DVDs that I rotate).

I do the same but also use -H to handle hard links and –numeric-ids

to stop rsync changing the UID of files where the username exists on both machines.

Burning these to DVD's for point in time backups is a good idea.

I am very keen to implement this kind of unattended back-up procedure, but am having trouble with the SSH side of things. I am unfamiliar with the method of key based ssh so can anyone help me out with a link to a howto to set it up.

When I use the scp command as per the above howto it tells me that the local file cant be found, ie /root/.ssh/id_rsa.pub and obviuosly doesnt copy the file, I'm not sure where I'm going wrong as this file does exist.

Any Pointers?

Cheers.

I have figured this out and feel quite foolish, I wasn't typing an absolute path when creating my key, but using one after that.

Oops :oops:

Cheers Guys.

Nice tutorial, much appreciated.

A few notes for people trying to pull this off using Cygwin and Windows:

There's a great Cygwin + rdiff-backup tutorial at:

http://katastrophos.net/andre/blog/?p=19

For the file ".ssh/config" on "Offsite" (aka your Windows machine), you will probably need to add a line that goes:

> user root

… under "hostname linode" if your Windows login is "Administrator" or anything other than "root."

Note: Unix files with illegal Windows chars – the asterisk in my case ("*") -- will choke rdiff-backup and derail your whole backup.

@ryantate:

Note: Unix files with illegal Windows chars – the asterisk in my case ("*") -- will choke rdiff-backup and derail your whole backup.

I had the same problem backing up Maildir files which use the ':' character. You can work around this problem in Cygwin by using a managed mount. This will provide name mangling that supports case-sensitive file names and special characters not allowed in Windows. Within Cygwin you can have a file named foo* under a managed mount; if you look at the actual file with Windows it will be named foo%2A.

Roy

Thanks Roy. I ran Google searches for site:cygwin.com "managed mount" and cygwin "managed mount" but got nothing definitive in the first page of hits, so I'll probably look into that later when I have more energy. In the meantime I'll just hope I don't have any weird filenames!

I have to say that rdiff-backup is incredibly brittle. If there is a single fatal error during the course of your 3GB download – asterisk in the filename, wireless connection hiccups (yes I have to use wireless because of where my PC is located) -- rdiff-backup refuses to recognize ANY of the metadata it compiled on files already downloaded. You have to nuke the whole metadata directory and start over. It then has to hash all the downloaded files all over again and run laborious comparisons.

My hope is once I finally get the Linode fully downloaded the incremental backups will be small. In the meantime I'm hoping not to nuke my bandwidth allowance -- I'm on about the 10th download try, averagin at least 1 GB each.

@ryantate:

Thanks Roy. I ran Google searches for site:cygwin.com "managed mount" and cygwin "managed mount" but got nothing definitive in the first page of hits, so I'll probably look into that later when I have more energy. In the meantime I'll just hope I don't have any weird filenames!

http://cygwin.com/faq/faq.using.html#fa … -sensitive">http://cygwin.com/faq/faq.using.html#faq.using.case-sensitive

Thanks! Looks pretty durn cool. I'll try it for my future backups – one running now following a wireless issue.

Hi all. I have been using a script that I wrote myself for over 2 years and have had zero problems with it. It has never failed once. It has the following nice features:

  • Automatically rotates backups each night and keeps a fixed number that you specify (for example, 14 gives you 14 days worth of backups at any time, so you can go back 2 weeks if necessary)

  • Uses rsync's options for using hard links so that files that are unchanged are not duplicated on the hard drive but instead are hard linked. This means that each backup looks like a complete backup, but only really has changed files. You can look at any backup directory and see a complete filesystem. This makes restoring the backup easier and less error-prone. This is similar in spirit to the "incremental" backups that rdiff-backup is making, but it makes every backup "look" complete except that there is only 1 copy of each file, not one copy per backup (unless the file has changed, in which case each change is a new file in that backup).

I am attaching the posting that I made about this over 2 years ago in this thread: http://www.linode.com/forums/viewtopic.php?t=666

@mcowger:

Im all setup now….would love to see that rsync script…

Happy to oblige. A few caveats:

  • This isn't "productized" in any way. You have to know what you are doing and you may need to modify this to work the way you want.

  • This script is run from the system which is to be backed up to and it contacts the system which is to be backed up.

  • In order to run this script unattended, you will need to set up keys so that the system which is doing the backup can ssh into the system which is being backed up, and as root no less. This could be a security concern because it means that having root on the backed up to machine is as good as having root on the backed up machine.

  • I run this script on both systems, so that they back each other up. It is important to "exclude" the backups directory of one system from being backed up on the other, otherwise you will have "recursive" backups that will grow exponentially.

  • Usage is as follows:

rsyncbackup.sh [remoteroot] [localroot] [excludesfile]

[remote_root] is the directory from the remote system to be backed up. I use '/' to back up everything.

[local_root] is the local directory under which the backups will be written. They are rotated so that if you are keeping 14 backups, after 14 days you will have backup.01, backup.02, …, backup.14, with backup.01 being the oldest.

[excludes_file] is a file on the local system which lists, with one file or directory per line, the files and directories from the remote system which should not be backed up.

You can set the following environment variable also:

NUM_BACKUPS is the number of backups to keep (I back up once per day and use 14 to have two weeks' worth of backups at all times).

This script uses hard linking to eliminate redundancy of files, so each incremental backup only actually uses disk space for those files which have changed since the last backup, although each backup.XX directory will look like a complete backup.

Here is what my crontab entry for doing the backups looks like:

# At 5:13 every morning, back up mitya.ischo.com
13 5 * * *      /data/backup/rsync_backup.sh mitya.ischo.com / /data/backup /data/backup/excludes.txt

And here is what the /data/backup/excludes.txt file looks like:

/proc
/data/backup/eva.ischo.com
/data/rsync/modules
/data/share
/data/tmp

Have fun!

NOTE: I'm having some problems with the way that this forums system formats the code lines; it's wrapping some even though it doesn't show them as wrapped in the "preview". It looks like it's only some of the comment lines though that have alot of dashes in them. Be careful when/if you copy the script text into a file to fix that before trying to run it.

`#!/bin/bash

echo "rsync_backup.sh started at " `date` "."

# ------------------------------- Constants ----------------------------------

NUM_BACKUPS=3

# -------------------------------- Commands ----------------------------------

MKDIR=/bin/mkdir
MV=/bin/mv
RM=/bin/rm
RSYNC=/usr/bin/rsync
SEQ=/usr/bin/seq

# --------------------------------- Paths ------------------------------------

BACKUP_ROOT="$3"
BACKUP_SYSTEM=$1
EXCLUDES_FILE="$4"
EXCLUDES_ARG=""
REMOTE_ROOT="$2"
TMP_OUT=$BACKUP_ROOT/rsync.out
TMP_TOUCH=$BACKUP_ROOT/rsync.touch

# --------------------------- Check Requirements -----------------------------

if [ -z "$BACKUP_ROOT" ]; then
    echo "Usage: rsync_backup.sh [remote_system] [remote_root] [local_root] [excludes file]"
    exit -1;
fi

if [ -z "$BACKUP_SYSTEM" ]; then
    echo "Usage: rsync_backup.sh [remote_system] [remote_root] [local_root] [excludes file]"
    exit -2;
fi

if [ -z "$REMOTE_ROOT" ]; then
    echo "Usage: rsync_backup.sh [remote_system] [remote_root] [local_root] [excludes file]"
    exit -3;
fi

BACKUP_DIR="$BACKUP_ROOT/$BACKUP_SYSTEM"
echo "Back up local root: $BACKUP_DIR"

if [ \! -d "$BACKUP_DIR" ]; then
    echo "Making directory: $BACKUP_DIR"
    $MKDIR -p "$BACKUP_DIR"
fi

# Figure out which is the newest backup
NEWEST_EXISTING=0
for i in `$SEQ 1 $NUM_BACKUPS`; do
    if [ $i -lt 10 ]; then
        if [ -d "$BACKUP_DIR/backup.0$i" ]; then
            NEWEST_EXISTING=$i
        fi
    else
        if [ -d "$BACKUP_DIR/backup.$i" ]; then
            NEWEST_EXISTING=$i
        fi
    fi
done

if [ $NEWEST_EXISTING -gt 0 ]; then
    if [ $NEWEST_EXISTING -lt 10 ]; then
        BACKUP_PREV="$BACKUP_DIR/backup.0$NEWEST_EXISTING"
    else
        BACKUP_PREV="$BACKUP_DIR/backup.$NEWEST_EXISTING"
    fi
    echo "Newest backup: $BACKUP_PREV"
else
    echo "No previous backups"
fi

NEW_NUM=$[NEWEST_EXISTING + 1]

if [ $NEW_NUM -lt 10 ]; then
    BACKUP_DEST="$BACKUP_DIR/backup.0$NEW_NUM"
else
    BACKUP_DEST="$BACKUP_DIR/backup.$NEW_NUM"
fi

echo "New backup dir: $BACKUP_DEST"

# ------------ Compose rsync args ------------

# Copy all files recursively including all file attributes, verbosely,
# and use compression over the wire, also delete any deleted files
RSYNC_ARGS="-avz --delete"

# Use ssh to the remote system
RSYNC_ARGS="$RSYNC_ARGS -e ssh"

# Exclude file, if present
if [ -n "$EXCLUDES_FILE" ]; then
    RSYNC_ARGS="$RSYNC_ARGS --exclude-from=$EXCLUDES_FILE"
fi

# Link dest, if we already have a previous rsync
if [ $NEWEST_EXISTING -gt 0 ]; then
    RSYNC_ARGS="$RSYNC_ARGS --link-dest=$BACKUP_PREV/"
fi

# Source location
RSYNC_ARGS="$RSYNC_ARGS root@$BACKUP_SYSTEM:$REMOTE_ROOT/"

# Destination
RSYNC_ARGS="$RSYNC_ARGS $BACKUP_DEST/"

# Do the rsync
$RM -f $TMP_OUT
echo "rsync command: $RSYNC $RSYNC_ARGS"
$RSYNC $RSYNC_ARGS > $TMP_OUT

# Touch the rsync directory to set the backup time
touch "$BACKUP_DEST"

tail -2 $TMP_OUT

# --------------------------------- Rotate -----------------------------------

# This function is unfortunately necessary because Linux is messing with the
# modification times of the directories on mv
function correct_mv() {
    $RM -f $TMP_TOUCH;
    touch -r $1 $TMP_TOUCH;
    $MV $1 $2;
    touch -r $TMP_TOUCH $2;
    $RM -f $TMP_TOUCH;
}

if [ $NEW_NUM -gt $NUM_BACKUPS ]; then
    # Remove number 1
    $RM -rf "$BACKUP_DIR/backup.01"
    # Renumber the others
    for i in `seq 2 $NEW_NUM`; do
        if [ $i -lt 10 ]; then
            FROM=0$i
        else
            FROM=$i
        fi
        j=$[i]
        if [ $j -lt 10 ]; then
            TO=0$j
        else
            TO=$j
        fi
        correct_mv "$BACKUP_DIR/backup.$FROM" "$BACKUP_DIR/backup.$TO"
    done
fi

echo "rsync_backup.sh finished at " `date` "."` [/i]

Is the above post still current?

Thanks in advance..

More recent instructions on the Linode wiki.

Funny that this topic got bumped, since I've got a question about it now!

I've used this method for a long time now to backup my Linode to a machine here at home, and it's worked pretty well for me. Unfortunately, that machine's hard drive decided to fail horribly the other day, so I was left to set that machine back up from scratch with a new drive. It originally ran a stripped down Ubuntu, but now it's straight Debian. Don't need a GUI since it runs headless these days.

I followed the instructions here again for setting up rdiff-backup like before, all of which was familiar, and it's working, but there's a problem: it's not obeying my excludes file. So, instead of the "include-globbing-filelist" option, I modified my exclude file (removing the minus signs) and tried using "exclude-globbing-filelist". But still it downloads directories like /proc and such anyway.

I've double-checked the paths and made sure it's looking for the include/exclude file in the right place, and that's all fine. I haven't tried putting them directly on the rdiff-backup command line, because that'd end up being kind of long. Are there any sort of known bugs or anything that could be causing this? If anyone has any suggestions, I'd be happy to hear'em!

My client has given me their old server which is now sitting her back at my offices, the client wants me to use this machine as an offsite backup solution.

I've never done this before and i'm pretty clueless on stuff like FTP so could someone point me towards a website with a description on how to setup something like this.

Is it just me, or are spammers like "Mckechnie" just getting dumber by the day?

@vonskippy:

Is it just me, or are spammers like "Mckechnie" just getting dumber by the day?
It's not just you - they really are getting dumber. Tragedy is - they were stupefyingly dumb to begin with.

Just a note to anyone attempting this on a ubuntu 8.04 and a source that uses a later version of rdiff-backup. rdiff-backup was throwing errors because of different version numbers, after a lot of help from chesty on the linode irc I was able to use a ubuntu ppa to upgrade to the latest version of rdiff. Hope this helps anyone who runs into this issue and thanks to chesty for all the help. :D

[EDIT: minor bugfix for rsync return value]

I used to use an rsync script called RIBS to back up my Linode, but I recently built an OpenSolaris storage server for my home network, and decided I wanted something that would take advantage of ZFS's features such as on-the-fly compression and deduplication, and (especially) snapshotting.

I've written a Perl script that works in a manner similar to RIBS, which I have christened ZIBS. Instead of using hard links, it just uses ZFS snapshots to accomplish the same basic thing. The snapshots are rotated according to the configuration in the script. Only the "hourly" backups actually run rsync; others merely create and rotate snapshots. It can back up mounted NFS filesystems as well as using SSH to a remote location, and it can also just create and rotate snapshots on a local ZFS data set.

Since ZFS snapshots are copy-on-write, storage is very efficient; furthermore, you can turn on compression. If you have multiple systems using the same distro, you can save further space by using deduplication. Dedup and compression should be turned on before making backups, since they're in-line operations rather than after-the-fact.

This script could be used on FreeBSD (which has native ZFS support in 7.2 and 8.0) or on Linux with ZFS-FUSE, but you'll need Solaris or OpenSolaris if you want deduplication.

Usage example:

# Create ZFS datasets for your backups (assuming a ZFS pool called "tank", and ZIBS itself set up as below):
zfs create -o compression=gzip -o dedup=on tank/my_centos_site1
zfs create -o compression=gzip -o dedup=on tank/my_centos_site2
zfs create -o compression=gzip -o dedup=on tank/my_debian_site
# Run this four times daily (changeable in script)
zibs ALL hourly
# Run this once a week
zibs ALL weekly
# Run once a month
zibs ALL monthly
# Run once a year
zibs ALL annual
# Rename a snapshot to take it out of the rotation
zfs rename tank/my_centos_site1@hourly.0 specialsnap
# List snapshots in your ZIBS datasets
zibs ALL list
# Back up only one system
zibs my_debian_site hourly

The script:

#!/usr/bin/perl

# ZIBS: ZFS Incremental Backup Script
# A backup script loosely inspired by RIBS, but making use of
# ZFS snapshots (and written in Perl instead of PHP).

# by Russ Price

use strict;

# Location of SSH key for offsite backups.
my $ssh_key = '/path/to/ssh-backup-key';

# Configuration data. Master keys are backup sets. Each backup set gets
# its own configuration. The subkeys are as follows:

# source:   Where the backup comes from; either a filesystem path for local
#           backups (e.g. via NFS), or host:path for offsite backups via SSH.
#           Be sure to include trailing slash.
#           If source is not present, assume a local ZFS data set, and
#           make snapshots only.
# offsite:  Flag to indicate offsite backup via SSH.
# dataset:  The destination ZFS dataset for the backups. This is not
#           an absolute path.
# sched:    The number of snapshots to keep for each schedule type. The
#           "hourly" entry is mandatory.
# excludes: An array of paths to exclude from the backup.

my %conf = (
        # Local file storage
        u => {
                dataset => 'tank/u',
                sched => {
                        hourly => 4,
                        daily => 7,
                        weekly => 4,
                        monthly => 12,
                        annual => 10
                }
        },
        # Mounted NFS filesystem from another server
        server2 => {
                source => '/server2/www',
                offsite => 0,
                dataset => 'tank/server2-www',
                sched => {
                        hourly => 4,
                        daily => 7,
                        weekly => 4,
                        monthly => 12,
                        annual => 10
                }
        },
        my_centos_site1 => {
                source => 'site1.example.com:/',
                offsite => 1,
                dataset => 'tank/my_centos_site1',
                sched => {
                        hourly => 4,
                        daily => 7,
                        weekly => 4,
                        monthly => 12,
                        annual => 10
                },
                excludes => [
                        '/proc/**',
                        '/dev/pts/**',
                        '/dev/shm/**',
                        '/aquota.group',
                        '/aquota.user',
                        '/etc/mtab',
                        '/var/spool/mqueue/**',
                        '/var/mail/*',
                        '/var/named/chroot/proc/**',
                        '/var/named/chroot/dev/**',
                        '/sys/**'
                ]
        },
        my_centos_site2 => {
                source  => 'site2.example.com:/',
                offsite => 1,
                dataset => 'tank/my_centos_site2',
                sched   => {
                        hourly => 4,
                        daily => 7,
                        weekly => 4,
                        monthly => 12,
                        annual => 10
                },
                excludes => [
                        '/proc/**',
                        '/dev/pts/**',
                        '/dev/shm/**',
                        '/aquota.group',
                        '/aquota.user',
                        '/etc/mtab',
                        '/var/spool/mqueue/**',
                        '/var/mail/*',
                        '/var/named/chroot/proc/**',
                        '/var/named/chroot/dev/**',
                        '/old/**',
                        '/sys/**'
                ]
        },
        my_debian_site => {
                source  => 'site3.example.com:/',
                offsite => 1,
                dataset => 'tank/my_debian_site',
                sched   => {
                        hourly => 4,
                        daily => 7,
                        weekly => 4,
                        monthly => 12,
                        annual => 10
                },
                excludes => [
                        '/proc/**',
                        '/dev/pts/**',
                        '/dev/shm/**',
                        '/aquota.group',
                        '/aquota.user',
                        '/etc/mtab',
                        '/var/spool/mqueue/**',
                        '/sys/**'
                ]
        }
);

sub get_snaps($) {
        my ($dataset, $type) = @_;

        # Get list of relevant snapshots
        open SNAPS, "zfs list -H -t snapshot | grep '$dataset' |";
        my @snaps;
        while(<snaps>) {
                my $s;
                if($type) {
                        ($s) = (m/^(\S+\@\Q$type\E\.\d+)\t.*$/o);
                } else {
                        ($s) = (m/^(\S+\@\S+)\t.*$/o);
                }
                @snaps = (@snaps, $s) if($s);
        }
        close SNAPS;

        return @snaps;
}

sub do_backup($) {
        my ($set, $type) = @_;
        my %info = %{$conf{$set}};
        my $dataset = $info{dataset};
        my $source = $info{source};
        my $offsite = $info{offsite};
        my %h = %{$info{sched}};
        my $schedmax = $h{$type};
        print "\nStarting $type backup of $set (max $schedmax)\n";

        my @snaps = get_snaps($dataset, $type);

        # Only use rsync if we have a source.
        if($source) {
                # Set up excludes
                my @excludes = @{$info{excludes}};
                my $exclude_args = '';
                foreach my $e (@excludes) {
                        $exclude_args .= "--exclude=\"$e\" ";
                }

                # Use SSH key and compression if offsite.
                my $offsite_args = '';
                if($offsite) {
                        $offsite_args = "-z -e \"ssh -i $ssh_key -p 22\"";
                }

                # If it's hourly, do an rsync
                if($type eq 'hourly') {
                        my $result = system "rsync -artplv --numeric-ids --delete --delete-excluded --stats $offsite_args $exclude_args $source /$dataset";
                        # Get actual return value
                        $result >>= 8;
                        if ($result) {
                                if($result == 24) {
                                        print "WARNING: File(s) vanished before they could be transferred.\n";
                                } else {
                                        print "WARNING: rsync returned error code $result - NOT rotating snapshots\n";
                                        return;
                                }
                        }
                }
        }

        # Rotate the snaps, destroying those beyond the limit
        foreach my $r (reverse sort @snaps) {
                my ($snapname, $snapnum) = ($r =~ m/^(\S+\@\Q$type\E)\.(\d+)$/o);
                if ($snapnum >= $schedmax - 1) {
                        system "zfs destroy $r";
                } else {
                        system "zfs rename $r $snapname." . ($snapnum + 1);
                }
        }

        # Create latest snapshot for given type
        print "Creating snapshot $dataset\@$type.0\n";
        system "zfs snapshot $dataset\@$type.0";
}

sub do_list($) {
        my ($system) = @_;
        my %info = %{$conf{$system}};
        my $dataset = $info{dataset};
        my @snaps = get_snaps($dataset, 0);

        foreach my $snap(@snaps) {
                open PROPS, "zfs get -H creation $snap |";
                while(<props>) {
                        my ($creation) = (m/^\S+\tcreation\t(.*)\t-$/o);
                        printf("%-55s %s\n", $snap, $creation) if($creation);
                }
                close PROPS;
        }
}

if(scalar @ARGV != 2) {
        print "Usage: $0 system|ALL hourly|daily|weekly|monthly|annual|list\n\n";
        print "Systems defined:\n";
        foreach my $system(keys %conf) {
                print "$system\n";
        }
        exit(1);
}

print "ZIBS: ZFS Incremental Backup Script\n";

if($ARGV[0] ne 'ALL') {
        if($ARGV[1] eq 'list') {
                do_list($ARGV[0]);
        } else {
                print "Performing " . $ARGV[1] . " backup for " . $ARGV[0] . "\n";
                do_backup($ARGV[0], $ARGV[1]);
        }
} else {
        print "Performing " . $ARGV[1] . " backup for " . $ARGV[0] . "\n" if($ARGV[1] ne 'list');
        foreach my $system (keys %conf) {
                if($ARGV[1] eq 'list') {
                        do_list($system);
                } else {
                        do_backup($system, $ARGV[1]);
                }
        }
}</props></snaps> 

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct