One biggy or multiple smallies
Just got my linode setup recently with Debian 4, Apache, MySQL, Ruby on Rails with Mongrel etc… and very happy with everything.
I am thinking about the future, already, and wondering whether I should consider getting another linode and clone my existing one before I get too far down the line with hosting websites for people, or whether I should just beef up the linode I have if and when that is required.
One thing I am considering is; I want to run a subversion server on my linode, already am, and store all the code there for my projects. Now I could set up a separate linode for this using it for a test server, and then setup capistrano to deploy to my 'production' server when all is ready, or just do it all on the one linode and expand as required.
i quite like the notion of separation but feel the test server with subversion would get very little use and may not be worth it financially. On the other hand, it may be a safer way to work with less potential problems.
Does anyone have an opinion on this? I know it all sounds a bit airy fairy but just wondering if anyone has been through this scenario of late.
Thanks,
Paul
17 Replies
Today I never do anything on my production server no matter how trivial it seems. I try all my changes on my development server first. I develop a lot faster since nothing I do has any risks of breaking anything. They don't charge you bandwidth quota for transferring an image from one linode account to your other. But it takes a little longer to transfer it from one host to another than it takes to make a local copy. So I do these host-host transfers only when I mess something up and my latest backup isn't very recent.
Five years ago I worked in a company that did all their development live. We where four programmers and downtimes to our web service could be like 30-60 minutes per day because we always managed to break /something/.
And if you have lets say a 600 Linode for $40 / month you still only need a 300 Linode for $20 / month for your development environment because you would be the only user on the development version of your site.
I made a perl script that looks at the output of "ifconfig" to see what ip dchp gave it. When the script knows what ip is has, it can "ifup eth0:3" for example since it knows that it has been booted from the development account (I have two ips for each Linode). It can also replace my cron file to a different one that has "rss2email" disabled. It was quite annoying to get two emails every time one of my rss feeds got updates. One email from the production environment and one from the development.
The documentation says that one /must/ edit the cron file using the cli command "crontab -e" and /not/ edit the actual crontab file by hand. I tried to ignore that instruction to see if it would work anyway. And it seems it does. So ignore this crontab instruction at your own risk.
So I would run all services on one linode account and use the other for only development/testing stuff.
Oh, and don't forget to shut down your production server before copying it to your other linode account. I tried to ignore that warning too just to see what would happen, and got two rows of corrupted mysql data in one of my Mediawiki tables. So from that day on I always shut my production server off before making the copy. I always make a local copy first before I transfer it to my development account, so I can boot up the production server as fast as possible. Oh, and don't allocate all your available hd space to one image if you don't need that much space. If your image is smaller it gets copied and transferred faster. You could always shut down your production server and resize your image in case you need more space later. That is if you use ext2 or ext3. In those cases you can use Dashboard to do it. If you have a different type of partition I don't think Dashboard can do it for you; you'd have to do it yourself. But I may be wrong since I only use ext3 partitions and haven't tried to ignore that particular warning yet.
On the prod box however, would you go with one box and keep upgrading with regards disk space, RAM, bandwidth etc till you can go no further and then get a new linode; or go with multiple smaller linodes?
i.e. 3/4 300 or 450 linodes or just keep going till you get everything running on a 1200.
Do you feel there is any real difference or advantages to be gained from one or the other?
Thanks again for your replies,
Paul
The money may end up as beer tokens in the end but I hope to make more than just a beer or two
Cheers
OTOH, if your users are distributed over multiple linodes, tbe amount of damage that one can do more limited.
Should one of my linodes go down or should there be any problems with my physical host; all my clients suffer. Splitting machines across two hosts and having fewer clients on one machine seems a bit safer to me and means I could have the client config waiting to be enabled (hot stand by effectively) on another machine should there be any problems.
I agree that two installs for two users is worse than two users on one install, but does this scale?
Luckily I am not even close to having to worry about that, but wanted to have a think about it now to prepare myself and see if anyone had any concrete scenarios where they had to move to another box.
Should I decide to go with the multiple box scenario (hoping I get enough business to care) I will write back and let people know why I made that decision.
All the best.
@harmone:
The documentation says that one /must/ edit the cron file using the cli command "crontab -e" and /not/ edit the actual crontab file by hand. I tried to ignore that instruction to see if it would work anyway. And it seems it does. So ignore this crontab instruction at your own risk.
"crontab -e" isn't the only switch that crontab(1) supports. In fact, if you run "crontab -u username /path/to/file", it'll replace the crontab of that user with that file. If you want to use environment variables, try "crontab -u username - <
@Ciaran:
@harmone:The documentation says that one /must/ edit the cron file using the cli command "crontab -e" and /not/ edit the actual crontab file by hand. I tried to ignore that instruction to see if it would work anyway. And it seems it does. So ignore this crontab instruction at your own risk.
"crontab -e" isn't the only switch that crontab(1) supports. In fact, if you run "crontab -u username /path/to/file", it'll replace the crontab of that user with that file. If you want to use environment variables, try "crontab -u username - <
:)
Oh cool. I didn't know that. I'll change my script so it behaves canonically just in case. I also didn't know about heredocs. I have used them in Perl but never knew it was a general concept and not just a feature of Perl. Here is a good article for everyone else who also has never heard of heredocs:
@macforum:
I agree that two installs for two users is worse than two users on one install, but does this scale? :) I would have thought apache/mysql would get to a point where regardless of resource the performace would be less on one machine than on two separate.
The answer to this is highly dependent on the actual load. For the extreme cases of bandwith limited or disk i/o limited loads, then two machines will, obviously, be superior. If you load is memory limited, I'd guess that a machine with twice as much usuable memory would be superior to two machines. Of course, you can't increase memory arbitrarily, so eventually you'll need multiple machines anyway.
But the split may not be the obvious "half the people on one, half on the other". It's conceivable splitting between web-server and RDBMS-server would be superior. But the whole thing is really dependent on what your users are doing.
Yes; splitting the db away from the webserver was always the case for me; and being from a Java/J2EE background I have generally ran clustered app servers on separate machines, but this just isn't as easy when you have to buy and administer your own servers
I will continue with one big server and see how things go although another concern I have is that I want to run Ruby on Rails apps with Mongrel, and the common approach seems to be running a Mongrel cluster per application; it will be interesting to see what kind of resource that takes up!
I just bought another linode and placed it in a separate data center for backup and testing things out (thanks for the replies about cron tab too; I will look to set that up when I start up my cloned server.), if anyone has any advice on backup strategies please shout.
It seems pretty related to this thread, I am interested in backups relating to one fat prod server and one test/dev/backup box.
I am looking at RDiff just now for nightly backups, I am certainly not going to shutdown my prod server daily to clone across the image.
Any advice would be appreciated.
All the best, Paul
A more interesting question is what to backup. I set up a /srv partition that has all my real data - web apps and data, images, and such. I backup that, and "/etc", and /var/backup, which the Debian system updates with current package selections and such. Also, a script runs before the backup job to do a mysqldump, rather than trying to backup the live DB files. I don't backup the actual system files (/usr and such), and I don't keep any vital data in /home. So I can't do a "bare-metal" restore; I'd need to start with a basic Debian image, restore my package selections and install, restore /etc, and then /srv.
Oh, I also track /etc with Mercurial. I don't put everything in, but any time I change a config file, I add it. This isn't really a backup, but it does let me track the config changes I make, and make it less painful to recover when I do something stupid.
Well; I plan to have multiple domains on my server, with different clients, so I would need to have access to certain backed up data to restore without restoring the whole server. RDiff seemed quite nice for this, but happy to consider any options as I have no experience with any of these types of tools.
I was thinking of keeping a reasonably up to date clone of my prod server on my backup/dev machine, so I could restore that on prod then run RDiff to get the previous nights application data back to a reasonable state. That way I would only need to RDiff app data.
However, If I could just do a base debian install and then RDiff/whatever tool I choose my previous evenings data back across such that that data was sufficient to get my user accounts and everything else back up and running, that would be preferable as I would not have to take clones of prod, which means having to shut down the server.
Main thing is; can you back up sufficient data to restore directly from an RDiff/whatever after doing a straight Debian install?
Cheers,
Paul
@macforum:
Well; I plan to have multiple domains on my server, with different clients, so I would need to have access to certain backed up data to restore without restoring the whole server.
Oh, certainly. And bacula can do that. One of the many ways to restore files in bacula is a) select a client, b) select "most recent" or "as of date", and c) browse the "filesystem" that bacula then presents, select directories and/or files to restore. I don't mean to be too pushy with bacula, it's definitely a beast. But it is a serious backup solution, with lots of users and testing. I trust it with my data.
@macforum:
Main thing is; can you back up sufficient data to restore directly from an RDiff/whatever after doing a straight Debian install?
Sure. The price you pay is that you end up backing up all your applications as well. For ME, I'd rather use the backup space for longer backups of data, and let Debian backup the applications. The price I pay is a more complicated restore procedure. That's okay for me, because a longer downtime for a restoration is acceptable. One important point: the base install must have the tools needed
to restore, or at least be able to install them. It's really important to work through your restore procedure step-by-step, and think about the problems that might occur at each step.
Another thing to consider: backups to a non-Linode. Linode.com seem to be a successful business, and I've no reason to expect any problems, but…things happen. Consider what happens to your business if linode.com disappears, and you scramble to find alternative hosting. If you don't have your client's data, you're still screwed.
Good point to note too that obviously the debian install would require the tools to actually connect and restore from backup
That was one reason I considered storing a clone of my image on my backup server to copy across to prod on catastrophic failure, and then just run the restore for the previous evenings data. My cloned image would have all the required tools installed and so remove this step, which may turn out to take time or be error prone, especially under the pressure of clients sites being down. I may be able to script the install of the backup software though and remove any human pressure stupidities. We'll see.
I'll try running both scenarios and see what makes most sense.
If you have any links to good tutorials on bacula I would be interested.
Thanks again Steve,
Paul
The best source for bacula tutorials is the bacula site. Follow the documentation link. (
It runs with a push rather than pull paradigm so is better suited to smaller deployments, but the upside of this is it runs unprivilaged on the remote host, and as root on the local host.
I've been running this both locally safeguarding my workstation, and on my linode sice I got one about 18 months ago, I've lost track of how many partial restores i've had to do (trigger happy with rm -r and busy breaking dev databases) but it's never failed me.
Firstly; apologies on late feedback, I've been away a few days.
I gave Backup Ninja a shot and it worked very well, and quickly which is good
I have not given Bacula a go yet; but to be honest due to the speed I got Backup Ninja going to the point where I could restore what I needed, I think I will stick with that for the time being. Steve's comment that Bacula is a bit of a monster is keeping me away just now as I am swamped with work
My original hope having looked at Rdiff (as you pointed out backupninja is just a wrapper for rdiff and some extra goodies) was that I would be able to Rdiff my whole file system, heavy hit the first time but should be small amounts going across to remote backup each night, and essentially just pull back the whole file system on top of a blanket debian install, in the case where my linode went boom.
However, due to the fact that with databases it seems important to do a dump that you can backup, the notion of just copying the underlying file system does not seem to be a goer, I wouldn't feel confident that the restore would contain my MySQL databases in a working condition, even if I was to restore the individual databases as other parts of the filesystem may not have been stable when the backup ran.
With this in mind; I can only see the following backup strategy as one where I can get up and going with confidence in a reasonable time:
Take a local copy of my prod disk image
Clone that across to my remote host
Setup my backup software for MySQL dumps, Subversion copies and Rdiff the following:
/var (all web sites, email, cron job and the MySQL and Subversion dumps that run before RDiff)
/home/me (all I care about as I run virtual users for clients)
/root
/etc
There are two RDiff's; one for the remote copy and the other locally to prod; if my remote server should be the one to go boom then I do not have access to incremental backups (unless I do a nightly backup of my remote host too, which I am not going to do)
With this setup; I have a local disk image to start up or a remote one should my whole Linode die and I need to start again and I have enough in my backups to restore everything to the previous evening on top of the image.
I need to make copies of my production image to local and remote should I install new software, but this should be rare once I get going and prod only needs to be down long enough to do the local disk image copy, so even with a large file system I am hoping 30 mins at the very worst.
Outside of the complete disaster scenario it is very easy to cp the previous evenings database, subversion repository or missing file and restore should I need to as the data is both local and remote.
That's my thoughts anyway and what I am trying to put in place, and test out obviously. The downside is the amount of space required to do all of this, but the upside is confidence in getting up and going quickly in a complete disaster scenario.
If anyone has a simpler model I'd love to hear it
Cheers,
Paul