Help setting up Amazon EC2 for high traffic Facebook App

Hi,

I posted the following job in a marketplace, but I wonder whether Linode would be of any fit for this kind of requirement. Anybody can suggest?

We are in the process of developing a Facebook app that we expect very high traffic and the server should be able to handle the large traffic. The server should also sustain with millions of databases rows and thousands of photos uploaded to the app. We are currently evaluating Amazon EC2 platform to host the app, but we are open to your suggestion. In a nutshell,

1) You suggest a server that is small enough to start but large enough to handle high traffic

2) Installing & configuring LAMP stack

3) Backup solutions and scaling strategies and how you will handle

4) Suggestion on moving uploaded images to S3 or other CDN services to deliver it fast and Amazon RDS over conventional MySQL

5) Suggestion on alternative solutions like Rackspace/Linode.

4 Replies

Not to disrespect any VPS provider, but won't you be more in the territory of dedicated hardware with GigE NIC here?

No, he'd be in the territory of scaling horizontally with multiple linodes…

Hosting a VPS at Linode isn't all that different from hosting it at Amazon, except that Linode is far more cost effective (you get much more performance per dollar, for example). If you can build a scalable cloud platform for your application at Amazon, you can do it at Linode cheaper. You can still host uploaded images or other files in S3, even.

Scaling horizontally can be done equally well with dedicated machines, but that's beside the point.

I'd say start with a 512 node. With increased traffic you'll know if you need to scale horizontally or vertically. For example, if you max out memory with number of application processes but CPU and IOPS are relatively unused, you'll benefit from vertical scaling. Using too much CPU while not utilizing all the memory means you should scale out horizontally.

Now before someone jumps in telling me that CPU will seldom be the bottleneck here… I happen to manage a node for a client which uses extremely inefficient PHP/MySQL application. Each request scans through tens of thousands of rows with complex sorted joins without proper indices, which translates to average of 120% CPU used for puny 4 requests per second (application requests, and some 20-30 r/s overall with static content), with frequent spikes up to above 300%.

Another angle here is that you'll likely max out bandwidth before you max out CPU, if your app is well written and optimized for your rdbms. In that case you need to scale horizontally.

Horizontal scaling is always nice because you then bring redundancy in the equation, but it is more difficult to manage, eg. you need either to offload uploadable content to a CDN, or do some NFS magic with a common static server. At any rate, you'll need to design your app to be server-agnostic, ie. never rely on assets being available on the local machine: files, images, sessions, …

However, scaling the database horizontally is not as easy and will require planning. Which also depends on your read-to-write ratio, ie. whether you can manage with single master for writes and many replicated nodes for read, which is not as difficult to achieve.

Linode offers backups but I don't know how that works for busy nodes (being file-based backup). Personally I'm not using it, I do offsite backups, which can be another Linode. tar -mtime … | curl (S)FTP(S) for images, and database dumps for database, however that locks the database. I don't know about MySQL, but with PostgreSQL I can do continuous WAL archiving and PITR, without locking the db.

Also design the app to use a RDBMS in the core of your data management, but think ahead in that some day you might want to introduce layers of memcached or other nosql trickery to help out performance.

Linode backups are done with LVM snapshots, so should work fine with busy nodes. The key is that they should only be one component of your backup strategy. They're fine for your on-site backups, but you should also have off-site backups going.

My disaster recovery plan is basically, in order of execution:

1) Try to fix server/data in-place

2) If that fails, spin up a new linode instance from on-site backups

3) If that fails, build a new linode from off-site backups

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct