Alternatives to MySQL on Linode?

Hello!

I am writing an API that'll be running on my Apache Linode instance. This API will not be calculation heavy, but it could be very read-heavy (many users requesting stored data). Since I have not written the database component of the project yet, I am looking at not using MySQL. There are several ports or improvements upon MySQL out there, and I was wondering what others have thought of the newer systems available.

For example: Percona Server offers itself as a "backwards-compatible replacement for MySQL that is much faster and more scalable, easier to monitor and tune, and has features to make operational tasks easier." Meanwhile, Drizzle is a MySQL port that has "removed non-essential code, re-factored the remaining code, and converted the code to modern C++ and modern libraries." Both claim to have better performance, and you can find plenty of testimonials talking about the new systems. Frankly, given the stagnation of MySQL I'd be happier using an alternative. The main concerns I have are:

1) Ease of use for multiple applications: Whatever I end up using, I'd like to use for future projects as well. The current project will probably talk to the database with Perl, which I suspect can connect to any database, but large compatibility is a plus.

2) Ease of installation: I'm running Debian 6, and (for example) it appears Drizzle is not yet available through apt-get. None of my work is production-critical so I am okay playing with early versions of software, as long as it's not too painful to install and maintain.

3) Efficient use of memory: I am currently using a Linode 512, which also hosts a few other small programs. I will upgrade to a 1024 if my API ends up being very widely used, but I'd like to remain on a small host if possible.

4) Choosing a sane system: I'll be honest, I don't know enough about SQL to make an informed decision. For example. the benchmarks for Percona are more impressive, but it feels very "astroturfed" and it's hard to get solid information about it. And I've only looked at these two systems, I'm sure there are many more.

So, my open-ended question: which do I actually use?!

I only mentioned the two most obvious choices, but of late the SQL world seems to have exploded.

I would be interested to hear any thoughts or comments.

Thank you for your time :D

21 Replies

If you're willing to try newer software for better performance, try nginx instead of apache.

I'd also like to know a more scalable/faster mysql product to use.

You left out MariaDB, which is seen by many as the "official" continuation of MySQL, since it's developed by Monty Widenius, the creator of MySQL (My and Maria are the names of his daughters).

Drizzle seems to be more for cloud use (and while linode is a cloud provider, one VPS does not a cloud make), and isn't updating from MySQL as it goes; they want to use MySQL only as a starting point. It's also not stable; they're still in "beta".

That leaves Percona and MariaDB. My understanding is that MariaDB ships with Percona's XtraDB storage engine (replaces InnoDB) anyhow, so that's not really an advantage either way. I'd personally lean towards MariaDB since it seems to want to hew closer to MySQL in terms of compatibility, but they're both drop-in replacements, so I can't really say which is better.

Are you limited to databases that are compatible with MySQL?

If not, you might want to take a look at PostgreSQL.

+1 for PostgreSQL. I know a heck of a lot about tuning MySQL and working around its problems, but very little about tuning PostgreSQL. It just does its job, and supports (most) all of the stuff you expect a RDBMS to support.

Also, one word: caching. You don't want to do the same (read) query twice if you can avoid it. Any decent framework/ORM supports memcached(*), which is a great way to get key-value performance with relational consistency. -rt

(*) I'm using the word "decent" here like I'd use it in "Any decent RDBMS supports transactions", which is to imply that any RDBMS that doesn't support transactions is crap. (Hi there, defaults for MySQL pre-5.5)

@hoopycat:

Also, one word: caching. You don't want to do the same (read) query twice if you can avoid it. Any decent framework/ORM supports memcached(*), which is a great way to get key-value performance with relational consistency. -rt

I haven't played with memcached, but most RDBMS will do query caching if you do pass it the same read query twice; I'd tend to lean towards letting the RDBMS worry about caching rather than doing it by hand, but perhaps there are big performance improvements from using memcached rather than the database server's query cache?

@hoopycat:

(*) I'm using the word "decent" here like I'd use it in "Any decent RDBMS supports transactions", which is to imply that any RDBMS that doesn't support transactions is crap. (Hi there, defaults for MySQL pre-5.5)

Defaults or not, let's be fair, MySQL has supported transactions with InnoDB for over a decade. Then again, I frequently mock Apache for being completely out of touch with reality and choosing ludicrous httpd default settings, so default settings do have some weight.

On caching: There's ordinarily some processing that happens after the query results are returned, which can be fairly intensive depending on your application. It's not too much of a stretch to think of a HTML page being built of 10-20 queries and perhaps a half-dozen components (header, footer, content, content tools, ads, …). The closer to the end you can cache the output, the happier things will be. (tl;dr: cache the steak, not the grass)

On the MyISAM jab: Yes, I was hesitant to include it, but it's an important reflection on MySQL-as-a-project's decision-making process, and I did qualify it to reflect the change in 5.5. That, and Apache, have assured me that even blatantly dangerous and poorly-justified default settings in software won't get you sued out of existence. :-)

Postgresql does not have query caching because such thing is theoretically impossible in the context of a transaction.

HOWEVER, Pgsql relies a lot on filesystem cache and mmaps, so at least something is cached.

In my Python apps I can easily integrate Beaker caching framework with SQLAlchemy and basically have query-level caching.

@Azathoth:

Postgresql does not have query caching because such thing is theoretically impossible in the context of a transaction.

Umm, no, don't be silly. Transactions don't make query caching terribly more difficult, and a lot of transaction-supporting databases (SQL Server, MySQL, etc) do support it.

@Guspaz:

Umm, no, don't be silly. Transactions don't make query caching terribly more difficult, and a lot of transaction-supporting databases (SQL Server, MySQL, etc) do support it.

Well, I could be wrong of course, but as far as I know, PostgreSQL does not have query cache like MySQL for the same reason it can't cache the row count but has to count row by row – because each transaction can see different set of available rows. PostgreSQL's caching is done differently and relies on OS system cache.

MySQL invalidates the query cache every time something changes in the same table. So if your app does anything as simple as incrementing a hit-count field at the beginning of a request, every query that involves that table will have to start from scratch instead of using the query cache. This happens on top of the caching that the OS already does, so there is a bit of overhead. According to the Manual, the overhead can be as large as 13% if none of your queries are cache-able.

PostgreSQL works differently. It just tells the OS to cache parts of the database file(s), and then reads from those files as usual. The OS decides whether to read from RAM or the hard drive. Unlike MySQL, PostgreSQL does not maintain a separate cache for itself, except for some things that make sense to keep in RAM at all times. This approach makes the caching system simpler and sometimes even faster, especially in write-heavy use cases. But you often need to fiddle with kernel parameters in order for PostgreSQL to cache things properly.

Or at least that's what I remember. Databases change all the time, so the caching behavior may be different in current versions.

[repeat post deleted]

@hybinet:

PostgreSQL works differently. It just tells the OS to cache parts of the database file(s), and then reads from those files as usual. The OS decides whether to read from RAM or the hard drive. Unlike MySQL, PostgreSQL does not maintain a separate cache for itself, except for some things that make sense to keep in RAM at all times. This approach makes the caching system simpler and sometimes even faster, especially in write-heavy use cases. But you often need to fiddle with kernel parameters in order for PostgreSQL to cache things properly.

That's just caching the disk access, not any of the data manipulation. A query cache doesn't cache the source data that is being accessed, it caches the results of the query. If there is math involved, or sorting, or even if you're just selecting a small amount of data from a large table, the query cache is going to save you all that work. A disk cache doesn't help with any of that.

@Guspaz:

That's just caching the disk access, not any of the data manipulation. A query cache doesn't cache the source data that is being accessed, it caches the results of the query. If there is math involved, or sorting, or even if you're just selecting a small amount of data from a large table, the query cache is going to save you all that work. A disk cache doesn't help with any of that.
You're absolutely right. But unless you want your database to return stale data, the database needs to keep track of a lot of things if it is going to maintain a cache for query results. MySQL does this in a rather rudimentary fashion: whenever anything changes in a table, invalidate all cache entries that are related to that table. Even this approach results in a non-negligible amount of overhead.

A disk cache, on the other hand, works fine even after a table changes, because the changed blocks are also cached by the OS. There is no need to invalidate any cache entries. So MySQL's query caching would be faster in read-heavy use cases, whereas PostgreSQL's approach would be faster in write-heavy use cases. (Of course, there are many other factors to DB performance, too.)

I don't think this characterization is too far off from the way the two databases are usually thought of. It seems, however, that there is now a program called PostgreSQL Query Cache. I wonder if this would give us the best of both worlds.

@Guspaz:

Transactions don't make query caching terribly more difficult

Can't speak to transactions, but MVCC, which both Innodb and Postgres use, do. See http://bit.ly/rdASo5.

The net of all of this is that caches some times help and sometimes hurt, they're hard to get right, and MySQL has query cache but Postgres does not. (There's a recent add-on query cache for Postgres but it's not well vetted at this point.)

@hybinet:

A disk cache, on the other hand, works fine even after a table changes, because the changed blocks are also cached by the OS. There is no need to invalidate any cache entries. So MySQL's query caching would be faster in read-heavy use cases, whereas PostgreSQL's approach would be faster in write-heavy use cases. (Of course, there are many other factors to DB performance, too.)

But MySQL is going to be using disk caching too. It may not tell the OS what parts to cache explicitly, but the OS is generally smart enough to say, this data is being accessed a lot, I should keep it in RAM.

I wrote a search engine a long time ago, and query caching was really handy. The data in the search engine wasn't refreshed all that often (well, we're still talking minutes, but there were a rather large number of queries between each DB refresh pass), and a large number of queries were very common searches. Certainly there was a bit of a long tail type thing going on, but in my case, query caching helped performance a ton (and I did compare).

Disk caching was also irrelevant in my case, since I was keeping the entire table in RAM (then called heap tables, now I think MySQL calls them memory tables). Basically, I did all my updates and inserts on a regular table, and after each set of inserts/updates (which happened in bursts ever so often), I'd nuke the heap table and replace it with a copy from disk. It was the first web programming I'd ever done (and the first time I'd touched a database in my life), so it was all pretty badly written, but it worked well enough at the time to handle a few hundred thousand hits a day on a crappy old celeron.

@Guspaz:

The data in the search engine wasn't refreshed all that often
@Guspaz:

I was keeping the entire table in RAM.
Yep, that sounds like a case where the MySQL style of query caching would help a lot.

Thank you all for your informative replies!

Guspaz was quite right that I had left out MariaDB; I believe I bumped into it after making this thread. The distinction between Percona and aren't very clear, and I get the feeling it's more a corporate offering than anything anyway. MariaDB is definitely a contender, then.

To answer hybinet's question about MySQL-compatibility: for this project, no, but in general, it'd be a plus. Main concern is that I intend to host some form of blog/CMS/forum software, and it seems everything is MySQL. I've heard many good things about PostgreSQL, but I have not researched it thoroughly. I haven't finalized the products I want to use that, hence my nervousness about breaking MySQL-compatibility. The memory of speed advantages would have to be significant (and maybe they are!).

To answer Ericson578's comment about nginx, one thing at a time eh? :D No, you're quite right about trying smaller web servers as well. However, I already have an existing deploy of Apache doing all sorts of cool custom Apache-things, and re-configuring that would be a considerable amount of work. I probably will consider porting at some point, but for now I'm sticking with Apache.

The comments about caching are interesting, and I will have to research this topic more. Incidentally, further down the road I'll probably be making a thread about caching at the output/Apache level :)

This sounds promising:

http://www.memsql.com/

(I'm on the waiting list.)

"Your application can instantly communicate with MemSQL using the MySQL protocol. The entire MySQL ecosystem remains at your disposal."

@ferodynamics:

memsql
Interesting. How is it different from using MySQL with the MEMORY storage engine? Would it keep my data safe if my server suddenly got Fremonted?

It seems in the recent SQL explosion, there's a lot of start-ups out there that are heavy on outlandish claims & light on facts.

MemSQL claims it is "30x faster" but that claim isn't very interesting until someone can verify it; NuoDB is another I ran into that claims to be "a revolutionary database solution."

Unfortunately, for the time being those are websites are for capital investment, not practical use :/

I'm finding it difficult to take MemSQL seriously… their web site has more technical information about the capabilities of their office than it does their product.

NuoDB at least has something, but it sounds a lot like MySQL Cluster, but without the problems inherent in being a mature, widely-used system with ample documentation and a familiar interface.

(I… think this is the nicest thing I've ever said about MySQL.)

Reply

Please enter an answer
Tips:

You can mention users to notify them: @username

You can use Markdown to format your question. For more examples see the Markdown Cheatsheet.

> I’m a blockquote.

I’m a blockquote.

[I'm a link] (https://www.google.com)

I'm a link

**I am bold** I am bold

*I am italicized* I am italicized

Community Code of Conduct