Relational Database Management Systems (RBDMS) using SQL have been used to store application information for decades. Serving as the backbone of major industries like healthcare and finance, the relational model of organizing data into tables with an identifying key for each row proved to be reliable and efficient. Modern SQL databases including MySQL and PostgreSQL remain some of the most popular databases in use today. But when is SQL not enough?
The rise of NoSQL (NotOnlySQL) databases starting in the late 2000s coincided with many other advancements. While multicore processors and virtualization were becoming commonplace, the cloud was taking off, and millions of users around the world were going online for the first time with smartphones. Everything needed to grow, and the most practical way to accomplish this much needed scale is horizontal scaling. We often see SQL versus NoSQL oversimplified to “SQL can scale vertically and NoSQL can scale horizontally,” but this is incomplete and incorrect.
When we talk about scaling horizontally, we mean growing our environment by adding more nodes or machines. While SQL databases can scale vertically with relative ease by adding more RAM and compute to a single node, spreading your dataset across multiple nodes is more challenging. This can be done via a technique called sharding. When working with large data sets and high throughput, sharding helps decrease the load on a single server and enables scaling through the addition or removal of servers, depending on the need.
MySQL Sharding and Limitations
SQL databases can scale horizontally by sharding. The method and supported features will vary significantly between databases but caveats need to be considered. Let’s focus on one of the more common examples – MySQL using the NDB storage engine. MySQL supports NDB clusters that can split a single, large table into multiple smaller tables. The process of splitting a table is referred to as partitioning. When stored across multiple servers, these smaller tables make up the shards. The databases in your cluster each store one of the shards. Together, the databases in the cluster constitute your full data set.
Using sharding in SQL databases can offer very high scaling of dataset size, but it can also make your application logic more complex. You need to carefully configure how your data is partitioned into multiple shards, because this decision impacts overall database performance. In addition to the complexity and high time requirement, there are technical hurdles to consider. To counter a commonly stated limitation, MySQL can be configured to perform join operations across multiple shards but at a cost to performance at larger scales. This can make analytical functions impractical in these environments.
Many different types of NoSQL databases have exploded in use since their inception in the late 2000s. For this example, we’re going to focus on the most popular NoSQL database, MongoDB. MongoDB (derived from the word ‘humongous’) is document-oriented. Data is stored in documents similar to JSON objects and each document contains pairs of fields and values. This is opposed to SQL databases that use tables and rows to format data. You may have read that NoSQL databases like MongoDB are typically better suited for horizontal scaling, but let’s dive into why this is the case.
Note that MongoDB specifically uses a format called BSON, which is derived from JSON, but this will vary with each database.
Schemas and Shards
MongoDB is schemaless (or schema-free), meaning it does not require a defined organizational structure at the database level. The schema is instead built into your code at the application level and this gives us a lot of flexibility to change the structure later while preserving our data. While they lack the rigidly enforced consistency of ACID compliant SQL databases, MongoDB and other NoSQL databases excel at availability and partition tolerance.
When we looked at horizontally scaling SQL databases, we went over the process of splitting a table into shards. While possible, it brought a great deal of limitations due to the rigid structure built into the database. MongoDB and other NoSQL databases, on the other hand, are designed to accommodate sharding at a structural level. A shard is a subset of data and MongoDB lets us horizontally scale by deploying shards as replica sets. Replica sets are clusters of at least three nodes with redundant copies of the same data. They provide availability and redundancy when spread across large environments and are not confined by a predetermined scheme.
From this, we can immediately see the concessions that NoSQL databases make to be scalable. NoSQL databases often use significantly more storage than SQL databases because of the amount of redundant data needed to achieve availability across large horizontal deployments. NoSQL write speeds tend to outperform SQL databases but queries are slower. Lacking a defined structure, NoSQL databases are inherently not ACID compliant, which makes them less practical for applications handling high volumes of financial transactions. Alternatively, we can configure massive distributed NoSQL clusters that maintain performance, making them ideal candidates for Big Data and analytics.
So when is SQL not enough? As we would expect, the answer is not straightforward but there are some general guidelines we can take into account when designing our applications. What does our app need to do and how big does it need to be? From there, we can decide our number one priority. Saying “SQL scales vertically and NoSQL scales horizontally” is not true, but we can say “most SQL databases were designed with consistency in mind while most NoSQL databases were designed to accommodate scaling.”
There will always be counterpoints to that general guideline. You can scale MySQL horizontally, and MongoDB started supporting Multi-Document ACID Transactions. The more we understand how these databases are designed, we gain the insight to pick the best tool for the job.
Deploying Databases on Linode
Learn more about Linode Managed Databases or sign up to receive updates on your preferred database engine. You can also deploy database managed systems like MongoDB from the Linode Marketplace or follow our guides to install a database on a variety of Linux distros, like Installing MongoDB on CentOS 7.