Need architectural advice!

I have the following setup today

Linode768: WebServer Nginx + php-fpm (handles all the traffic on port 80)

Linode512: Database MySQL

Linode 512: In-Memory DB Redis

This works very well. The communication between the linodes is done on the private network and it works very well.

My traffic is increasing and I would like to add another webserver.

So, I can then add a nodebalancer and behind that I can add another webserver-linode. Fine.

My questions are as follows:

1) I have a major image library on the webserver linode today since my site uses images a lot and lets user upload images. This needs to be centralized. What would be the best solution for this? Can I in fact use 2 Linode512's behind the loadbalancer as "webservers" i.e. replacing the 768 of today and then have a Linde1024 for the images and "mount" that on the two webserver linodes (the two 512s) or should I go offsite completely for the images (it feels like costly to have a Linode1024 just to store images)

2) Can this be done without disrupting traffic? (the adding of the nodebalancer, I mean)

I appreciate the help.

10 Replies

There are a couple possibilities, each of which has its pros and cons:

  • Shunt everything off to something like Amazon S3; uploads go directly there, downloads come from there

PRO: Easy (if your application can be modified to handle it), scales very nicely and is quite reliable

CON: Will probably cost money… but not a lot. Much cheaper than Linode per GB. Also, if your application has a lot of assumptions about how files are handled, you're going to have a bad time.

  • Have a Linode dedicated to image storage

PRO: Keeps everything on Linode, and you can optimize the beezers out of it for static file serving if you point URLs directly at it. You can mount it via NFS from your application servers, so they can write files as they'd expect.

CON: Much more expensive than S3, more difficult to scale. Also, a considerable SPOF. (NFS acts… poorly when the server goes away.)

  • Replicate the images across all of your web servers.

PRO: No code or URL changes.

CON: Monumental waste of resources. :-)

I'd have to have a very compelling reason to NOT go with the S3 option, really.

Thank you VERY much for the swift and thorough answer.

I'll checkout the S3 option.

Just to get back to this thread for completeness.

Having looked into Amazon S3, it's hardly the cheaper choice, unless I'm reading it wrong,

Their pricing calculator shows that it will cost some 90 dollars / month (300 GB outbound traffic, 50,000,000 GET requests) so that just became the worse choice from a money perspective.

If I'm miscalculating or misthinking this, please feel free to say so.

@adergaard:

I have the following setup today

Linode768: WebServer Nginx + php-fpm (handles all the traffic on port 80)

Linode512: Database MySQL

Linode 512: In-Memory DB Redis

It sounds like you are running a big site here. How much disk do you need for the image store you mentioned?

Do you have one application using both Redis and MySQL? Is there more going on here than serving a single website?

Why do you want another web server? Are you close to CPU bound? IO bound? Bandwidth bound? Disk space bound?

I think you might be better off going with one big Linode rather than a bunch of small ones but I really don't know enough to be sure.

@adergaard:

Just to get back to this thread for completeness.

Having looked into Amazon S3, it's hardly the cheaper choice, unless I'm reading it wrong,
I don't think so. S3 costs are very dependent on your usage patterns, and while the raw disk storage can be reasonable in and of itself, if you have a lot of data transfer going on that can add up really quickly. Of course, if your monthly average transfer volume is relatively stable (or growing slowly) it can still be a win since while you'll take an initial hit to transfer to S3 pricing, it'll grow more slowly (and to any scale) than extra Linodes just for space, since the pure storage portion of the equation is cheaper.

BTW, you include outbound data and GETs but don't forget the PUTs to put that data into the system. I don't know how much new inbound volume you'd have monthly, but while the inbound data transfer is free, the PUT transactions aren't, and if you have a lot of new media each month at $0.01/1000 (10x the GET pricing) that can add up quickly too.

Depending on how much control you have over your application stack, and your typical usage patterns, you may be able to do better by combining the two systems. For example, use as much space as you can on your Linode as an LRU cache for files being accessed, so that you internally pull it back from S3 to your Linode as you serve the first request but then use the local copy for subsequent requests as long as it's available. Of course this only helps if you tend to have repeat requests or your average monthly working set is small, and depending on implementation it may have a worst case behavior more expensive than S3 alone. But if you're able to keep most of the storage/transfer local to your Linode you only end up paying S3 costs for a fraction of the monthly transfer volume while still getting the benefit of the long term archival costs (compared to extra Linodes just for space).

– David

@db3l:

@adergaard:

Just to get back to this thread for completeness.

Having looked into Amazon S3, it's hardly the cheaper choice, unless I'm reading it wrong,
I don't think so. S3 costs are very dependent on your usage patterns, and while the raw disk storage can be reasonable in and of itself, if you have a lot of data transfer going on that can add up really quickly. Of course, if your monthly average transfer volume is relatively stable (or growing slowly) it can still be a win since while you'll take an initial hit to transfer to S3 pricing, it'll grow more slowly (and to any scale) than extra Linodes just for space, since the pure storage portion of the equation is cheaper.

BTW, you include outbound data and GETs but don't forget the PUTs to put that data into the system. I don't know how much new inbound volume you'd have monthly, but while the inbound data transfer is free, the PUT transactions aren't, and if you have a lot of new media each month at $0.01/1000 (10x the GET pricing) that can add up quickly too.

Depending on how much control you have over your application stack, and your typical usage patterns, you may be able to do better by combining the two systems. For example, use as much space as you can on your Linode as an LRU cache for files being accessed, so that you internally pull it back from S3 to your Linode as you serve the first request but then use the local copy for subsequent requests as long as it's available. Of course this only helps if you tend to have repeat requests or your average monthly working set is small, and depending on implementation it may have a worst case behavior more expensive than S3 alone. But if you're able to keep most of the storage/transfer local to your Linode you only end up paying S3 costs for a fraction of the monthly transfer volume while still getting the benefit of the long term archival costs (compared to extra Linodes just for space).

– David

Thanks for the input, David. This was good food for thought. The PUT's are not in the same vicinity as the GET's so that won't be too bad, cost wise, but still the GET's and the data transfer kills it or at least isn't the Nirvana I first thought it could be.

There are other cheaper options than S3 for storing large amounts of data and delivering it cheaply, like OVH in Montreal. Of course, you don't get it served through a CDN at that point, just one server.

s3cmd, glacierfs, or rsync is your best bet

You can use 3scmd to synchronize your files across multiple web servers and only push/pull new/changed files. You wont need to send every request to s3. You can even setup an nginx reverse proxy cache very easily on your linode if you didnt go the s3cmd route. This will save requests to s3 and bandwidth.

This may vary a bit depending on if you reuse file names (modify files) vs just add new files.

How many images, average size, how many GBs?

I'd grab another linode for a second identical web server and nodebalancer.

Set up lsyncd both ways between each webserver (the rsync checks the timestamp the second time so you don't get in a bit loop. TIAS).

If you've got heaps of directories you may need something like this to increase the number of kernel inotify watches:

/etc/sysctl.d/notify-sync.conf

fs.inotify.max_user_watches = 1024000

Then set up the nginx fallback for a file not found to proxy through to the other web server to eliminate the timing hole where an image is uploaded but not yet copied to the other server.

  location ~ ^/img/(.............)\.jpg {
     expires 2592000;
     add_header Cache-Control public;
     alias /var/www/images/$1.jpg ;
     error_page 404 @fallback;
  }

location @fallback {
    proxy_set_header  X-Real-IP  $remote_addr;
    proxy_pass http://myotherserver;
}

Now you have a redundant web server. I've got two configurations like this that work well.

Thanks for the suggestions.

I've got a bit tied up in further developing the site (and also another project altogether) so I haven't gotten around to this yet but it's drawing closer to the time when it's no longer an option to wait.

So again, thanks for all suggestions.

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct