Creating an offsite binary replica

Hi guys

Recently some of you guys on ThePlanet had some power issues which may caused loss of data, or corruption of the file system.

While the above maybe unavoidable, the following is a great suggestion and a measure anyone can take to prevent from total data loss.

Instead of just creating normal backups (with tar cvpfsSW), this time around, one will backup the actual the partitions bit for bit.

So now you'll ask: Why would one ever want to do this?

The reason for this is simple. First if anything is to happen to your file system and fsck'ing does not work, you can always dump the most recent binary image of your partition right back on to the partition. Additionally, if you are in the situation where intricate fsck'ing is needed, you can always learn from your binary dump.

You may also ask How does this work?! Partitions aren't files one can copy!

Oh they are. Partitions very much are super large files that you can copy. There is a single UNIX rule of: "treat everything as a file". Which sounds absurd, but it really isn't. You just just have to realize that files are not always created to hold data.

In such a case, partitions can be accessed via block files such as /dev/ubda2 (for this example).

NOTE #1: You might be aware of dump/restore/snap, which should also work just fine (but is not covered here)

NOTE #2: While doing and offsite dump, it is bandwidth that becomes the bottle neck. However if the bandwidth is fast enough, it could easily be an IO bottleneck. IO bottlenecks effect an entire server, so please be considerate of your fellow linoders.

1 - Stop any non-critical services

bash# /etc/init.d/foobar stop

2a - Unmount the partition

bash# umount /dev/udba2

OR

2b - Remount as readonly

bash# mount /whatever -o remount,ro

3 - Begin the actual backup

bash# dd if=/dev/ubda2 | ssh user@host "dd of=ubda2.img"

4 - Depending on the size, grab some hot coco

5 - Compare the MD5 hashes to ensure integrity

bash# md5sum /dev/ubda2

bash# ssh user@host "md5sum ubda2.img"

HTH

Bill Clinton

13 Replies

Hi Bill,

That's a nice post. This will help many to have backup of their newly installed linodes.

Some more cons/side effects that could be there when you do this backup and when you restore will make it nicer…

Strike

I use rsync http://samba.anu.edu.au/rsync/features.html to backup critical directories.

The advantage of rsync is that it only transmits incremental changes across the network from that last backup, so is very fast.

AND it was developed in the computer science department where I studied.

@gmt:

The advantage of rsync is that it only transmits incremental changes across the network from that last backup, so is very fast.

You fail to realize that the backup I have described works at a much lower level and has different aims.

The average backup job only aims to create backups of the files and whatnot. Realistically these types of backups are the most helpful, the easiest to make, and more efficient when needing to stay up-to-date.

However when there is file system corruption and other random voodoo, your backup becomes useless until the file system itself is fixed.

My backup creates an actual binary image of the partition itself. This way when fsck is really gving your problems one day, you can always restore the file system to some given point, which might be a whole lot better than digging in /lost+found.

The ultimate solution would be for one to COMBINE both backups. Meaning that monthly binary dumps of the file systems would happen, and that regular incremental backups of the userland files would also be backed up. This way for minor issues, the user could always use his tar/rsync/etc backup. However if major filesystem corruption was to occure, the user could always restore the file system via the binary replica, and then use the tar/rsync/etc backup to update him/herself.

I hope this clears things up.

Bill Clinton

Interestingly enough, wouldn't rsync work on the drive level? It can be viewed as a single large file after all.

A nice idea to consider is to use two linodes in differents locations as hot backups (or even have both of them running at the same time).

What I'm thinking about is to have two hosts, synced together at all times, using a cron job to run rsync or whenever some config/data changes. And have the two servers run e-mail servers, dns and web server, the web server can be round robin-dns'ed, the dns servers both pointed from the registry and the e-mail servers acting as backup mail server.

Obviously though, it ramps up the cost from $20/mo to $40/mo for a Linode64. If one is on an high level linode he can use a low level one as a backup for those unfortunate times.

rsync itself won't work, at least that what it says in the short Google search.

There is always the option to code up something with librsync but that is more than just applying an existing tool.

> You fail to realize that the backup I have described works at a much lower level and has different aims.

The average backup job only aims to create backups of the files and whatnot. Realistically these types of backups are the most helpful, the easiest to make, and more efficient when needing to stay up-to-date.

Hardly, I've been using Unix and its variants for 20 years.

I have very busy sites and large databases (over 2GB) and can't afford to umount file systems. I use my own database hot backups & rsync to ensure 24x7 operations.

@gmt:

Hardly, I've been using Unix and its variants for 20 years.
Then I cannot see why you chose to compare your backup solution with mine. When clearly they both work on different levels of the file system, and the reasoning for either solution greatly differs as well.

It is like comparing tar, to dump/restore. Only this time, rsync and dd.

Bill Clinton

If you do make a drive image while your system is live (shutting down non-critical services aside), make sure you have a second type of backup as well. The method below will result in a non-consistent filesystem in your backup. It is likely that fsck will be able to repair it if you have to restore it, but you'd be a fool to rely on this method only.

There are also other disadvantages, such as only being able to restore to the same sized block devices, and not being able to see what's in there or restore an individual file (short of mounting it up on a loopback device).

rdiff-rbackup, rsync, or even tar or dump are far better solutions. In case of a disaster, all you need to do is load up a small Linux dist on a separate block device (the 80 meg Debian dist would do you nicely), and restore across the network from your backups. You'd have to do this anyway to restore your drive images.

Step 5 will also likely fail due to your drive image being inconsistent with what is on the drive when you take your md5sum. If it does succeed it's only because you are very lucky. Your choice of wording in step 5 is interesting to say the least, as your backup method is almost the opposite definition of "integrity".

Please read my posts before replying.

@dmuench:

The method below will result in a non-consistent filesystem in your backup. It is likely that fsck will be able to repair it if you have to restore it, but you'd be a fool to rely on this method only.

I have already stated in my second post that higher level backup solutions like rsync, tar, etc are the perferred solution.

I will quote myself:

@Bill Clinton:

The average backup job only aims to create backups of the files and whatnot. Realistically these types of backups are the most helpful, the easiest to make, and more efficient when needing to stay up-to-date.

Additionally I have stated that this is not intended for normal backing up purposes as it is a big hog of bandwidth and IO. It is soley a safe guard against bizarre file system problems where fsck'ing doesn't seem to help. (And it provides a great source of information about your file system, when expert fscking is needed.)

@dmuench:

Step 5 will also likely fail due to your drive image being inconsistent with what is on the drive when you take your md5sum. If it does succeed it's only because you are very lucky. Your choice of wording in step 5 is interesting to say the least, as your backup method is almost the opposite definition of "integrity".

You are wrong. Step 5 is done when the partition is either unmounted or in a read only state right after the time of the backup. Therefore none of the information has changed (duh)

The following is an example with my main machine. I am doing this on /boot (23 megs, JFS formatted.)

[root@atticus root]# mount /boot -o remount,ro
[root@atticus root]# dd if=/dev/hda2 | ssh sunny@linode "dd of=hda2.img"
sunny@linode's password:
60480+0 records in
60480+0 records out
60480+0 records in
60480+0 records out
[root@atticus root]# md5sum /dev/hda2
2eafd49b0109f6bded6358d8cf695065  /dev/hda2
[root@atticus root]# ssh sunny@linode "md5sum hda2.img"
sunny@linode's password:
2eafd49b0109f6bded6358d8cf695065  hda2.img

Bill Clinton

You are right, I did miss the unmount step, and that would make the MD5's match and the backup consistent.

However, it is still an almost useless, inefficient solution.

1. You have to have the filesystems unmounted or read only, which excludes root unless you temporarily boot off of something else. (I'm not going to switch my root to ro on the fly when the machine is in use)

2. You have to have the machine in a near idle state, or completely down if you want to back up root (again, by booting off of something else).

3. It's a waste of space and bandwidth.

4. It's no easier to restore than a proper file based backup.

5. It's much more work and impacts the machine vs. a proper file based backup.

Again, look at rdiff-backup. This is a forum for sharing tips with newer Linux users, and they don't need dumb "tips" leading them astray from proper solutions. Especially when it concerns data integrity.

Why must you repeat the same arguments over and over again, especially when I have done my best to answer them line-for-line.

This is getting highly pointless and repetitive. Especially on your part.

@dmuench:

1. You have to have the filesystems unmounted or read only, which excludes root unless you temporarily boot off of something else. (I'm not going to switch my root to ro on the fly when the machine is in use)

2. You have to have the machine in a near idle state, or completely down if you want to back up root (again, by booting off of something else).

You should learn about single mode. Uptime is overrated.

@dmuench:

3. It's a waste of space and bandwidth.

I have already stated this with my first and fourth posts.

@dmuench:

4. It's no easier to restore than a proper file based backup.

It is redundant to make an argument that effects either type of backup solution. Such an argument is not getting anyone anywhere.

@dmuench:

Again, look at rdiff-backup. This is a forum for sharing tips with newer Linux users, and they don't need dumb "tips" leading them astray from proper solutions. Especially when it concerns data integrity.

You are wrong.

First, a higher level backups do not give me juicy information about my file system.

Secondly, there are countless numbers of system administrator that use dump/restore to do the exact same thing. Such a binary backup via dump/restore/dd/cat/etc is required knowledge for anyone and everyone. (Additionally, if we lived in a world where certifications mattered, you should notice that Redhat's RHCE, SAIR, and the LPI all demand knowledge of binary dumping)

Thirdly I have already stated that this type of backup is to be made once. And a higher level type of backup (tar/rsync/whatever) is the preferred back up solution.

Lastly, I refuse to answer to any post of yours until you have something new and useful to say.

Bill Clinton

You're right sunny, I don't know why I even replied in the first place. You've proven in the past that you really don't know what you're talking about and aren't willing to learn from more experienced peoples' advice.

Just the fact that you think dd and dump/restore are similar shows your ignorance. I'm done trying to help people who are a lost cause.

I get the last word.

Thread locked.

-Chris

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct