Shared disk / synced disk between Linodes

My webapp runs on a cluster of two Apache servers, on different Linodes, behind a load balancer.

When a file is uploaded to one server, I'd like it to magically also be on the other server (for example when a user updates their profile picture).

How do I do that? NFS? Hadoop? GlusterFS? At the moment the web server shells to an rsync, which is nasty nasty nasty.

Thanks in advance.

14 Replies

The same question was asked on ServerFault, and the consensus answer was to use NFS:

~~[http://serverfault.com/questions/14577/what-network-file-sharing-protocol-has-the-best-performance-and-reliability" target="_blank">](http://serverfault.com/questions/14577/ … eliability">http://serverfault.com/questions/14577/what-network-file-sharing-protocol-has-the-best-performance-and-reliability](

@graham:

When a file is uploaded to one server, I'd like it to magically also be on the other server (for example when a user updates their profile picture).
If you're willing to accept some small latency, I like Unison for this - keeps two filesystem trees in sync, even bi-directionally. I typically set up a periodic (say 1-5 minutes depending on requirements) cron job that automatically syncs the two locations.

If uploads will always go to one server, you can use Unison in a purely mirrored approach, but if they might get updated on either server depending on which one handled the web request, Unison will figure out which needs to be updated.

You can certainly also trigger Unison from the web server so it only happens following a known update or to cut down on latency during an update.

It'll work fine directly over two filesystems, if you have the remote system mounted via NFS, for example, but can also run just fine over SSH.

– David

Unison is rsync-like (and in fact uses the rsync algorithm). The OP mentioned that he's not pleased with rsync as a means to synchronizing the two environments, presumably due to the delay between syncs and the overhead of having to scan everything each time.

I'd recommend DropBox (LAN syncing is supported in the experimental builds), but a lot of people don't like using that for business purposes since the data is stored by a third party (DropBox on Amazon S3).

My biggest concern with Dropbox (which I love for my personal use) on a server is that they Push updates… not notify you of them. They have been good not to push duds so far, as they do wait quite a while to push them out… but on a production server, them pushing something out that you that you haven't tested yourself before committing to: Seems all kinds of concerning.

That and their product seems to be getting bloaty-er and slower… IMHO.

Still a great service if indeed you need to sync 25+ Gigs of data to multiple locations.

@Guspaz:

Unison is rsync-like (and in fact uses the rsync algorithm). The OP mentioned that he's not pleased with rsync as a means to synchronizing the two environments, presumably due to the delay between syncs and the overhead of having to scan everything each time.
Actually, it's not clear what was "nasty nasty nasty" - I assumed it was the need to shell out the rsync command from within the web server environment. I'd be surprised if it was the rsync algorithm itself, which is well suited to synchronization duties, and to be honest, something along its lines should be desired in any synchronization solution if only to cut down on bandwidth.

Yes, while Unison uses an implementation of the rsync protocol as part of its transfer methodology, it's not really fair to equate the overall applications. For one big difference, unison is bi-directional, while rsync (the utility) is uni-directional. Also, Unison can recognize file moves, and not send any file contents or deltas at all in that case. It also keeps a cached state, so runs across even very large filesystems with minimal changes can be very efficient.

– David

I place my vote for unison. I've used it in production to successfully keep (as of today) 5 servers in sync for a high availability setup on linode.

i have a pretty little graphic you can take a look at of our general setup.

http://bit.ly/buqO8

Guspaz you're right, by 'nasty nasty nasty' I meant the delay of shelling to rsync, and scanning hundreds of images when only a single one has changed. I would like an uploaded file to be visible from the other server almost immediately.

rsync is great for eventual consistency, especially backups, whereas I would like near-realtime updates, under a second. The uploaded image needs to get to the media server before the next request comes in from the same user.

I will investigate unison, thanks for the recommendation.

On a linode or VPS I would suggest nfs.

On dedicated hardware I'd set up DRBD if it's a simple pair.

Otherwise it'd be NFS.

Only problem is with two servers, if one dies the other loses the data so it's not really replicated data, but merely copied.

For replicating the data realtime the only thing I can think of is DRBD, which still should not be too intensive as it'll only be as intensive as the actual writes.

If you need that kind of speed, you can write off rsync, Unison, anything that scans all the files each time.

Dropbox is far faster, since it notices files as soon as they change and syncs them, but there's added latency while one machine syncs to Dropbox's service, and then back down to the other machine. Sync times between machines when a file changes are probably 1-5 seconds (on a reasonably sized file), but that seems too slow for your desires.

Looks like you're going to have to go with some sort of filesystem-based solution.

DRBD isn't what you want, and is unusable for you. It's network RAID-1, which is great for reliability, but only one machine can mount the virtual disk at any given time. So you can't use it to share data.

One of the concerns about various networked or replicated file systems seems to be that if one machine is down, then the other machine may not be able to access the file system. However, if your setup has the client machine relying on the server machine anyhow, that may not be relevant; the client couldn't do anything anyhow.

Maybe a mixed approach would work well - use a fairly rapid, periodic sync like Unison to ensure that the two systems stay in sync, but then have critical files (like avatar images) attempt to be written to both servers by whichever server receives it in real time. The latter could be handled by an NFS mount, inter-server communication, or whatever, and could be willing to fail if it doesn't happen quickly, knowing the periodic sync will catch the change subsequently.

It's certainly not realtime, but just to add a data point, I can run Unison to sync two trees on machines at different physical locations (connected by an OpenVPN tunnel, which Unison runs ssh over) with sub-second run times when there are no changes. This is for a filesystem tree with about 6000 files and 8-9GB of data. I run Unison via a cron job that checks for an existing copy running, so you can run it at a small time interval without worrying about the occasional longer runtime needed when some big files change.

Mixing the two approaches would avoid any single point of failure dependency (such as with a common central filesystem), and provide a safety net if the real-time update got lost for some reason (temporary disconnection between servers), so that you'd know the two systems would eventually come back in sync once re-connected.

– David

@bezerker:

On dedicated hardware I'd set up DRBD if it's a simple pair.

DRBD replicates disks at the block level. Just be aware that this means one system MUST be read-only unless you use a cluster-aware file system. So DRBD isn't a complete solution for what you want to do (write from both nodes). Commonly used cluster file systems are glusterfs, GFS, and OCFS. NFS is NOT a cluster aware file system.

Instead of using rsync to scan and sync an entire directory tree, you could use inotifywait in a shell script and just sync the file that's changed.

@Vance:

Instead of using rsync to scan and sync an entire directory tree, you could use inotifywait in a shell script and just sync the file that's changed.

Or incron

I ended up setting up NFS, using these instructions

http://ubuntuforums.org/showthread.php?t=249889

It's working smoothly so far.

Additionally a nightly cron on each NFS client copies the contents of the NFS share to local disk, as a backup.

Thanks again for all the help and advice.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct