BlogDatabasesDjango and SQL: Your Dynamic Duo for Scaling Databases

Django and SQL: Your Dynamic Duo for Scaling Databases

August 3, 2022

ScaleDatabaseswithDjangoandSQL-BlogHeader

Scaling and optimizing databases to meet the needs of your applications can be a significant challenge. If you haven’t read my recent blog about how Django can do the heavy lifting for Python and SQL database applications, I highly recommend you check it out. But the TL;DR version is that SQL is optimized for SQL databases, Python is not, and Django is a great intermediary to help you build more effective applications, with less friction, complexity, and code when using these two languages together.

So while Django does the heavy lifting of creating the database app, you are still responsible for the day-to-day management and monitoring of your databases. Some of these management tasks can be deferred to your cloud provider, using services like Linode Managed Databases, but you might discover new roadblocks as you scale, such as:

Database Migrations. Converting an existing database to a new, desired state with controlled changes to the database scheme.

Multi-Database Deployments. To optimize performance, developers can design their applications to use separate databases for segmented functions. For example, a primary read/write database and a read replica database for common queries.

If one of your databases uses SQL, you can use Django to reduce friction and make your life a lot easier while handling a significant amount of data.

This introduction to two key database management concepts pairs with the step-by-step instructions to building a production-ready Django application found in the Understanding Databases ebook and my new educational video series. Either learning path will help you get Django to do the SQL heavy lifting for you.

Database Migrations
When you’re starting out, getting the data types right for any given column can be a bit tricky, especially since your data needs will inevitably change over time. What if you wanted your title field to be just 80 characters long? What if you need to add a timestamp field so you can track exactly when items were added to the database?

Changing a table after it has been created can get pretty messy for a few reasons:

What do you do with pre-existing values?
What if pre-existing rows are missing data for new columns/fields?
What if you remove a column/field? What happens to the data?
What if you add a relation that didn’t exist before (ie foreign keys)?

Luckily for Django developers, we have something called makemigrations and migrate.

Let’s take a look at how it works in action.

Here’s our example Django data model:

class BlogArticle(models.Model):
    user = models.ForeignKey(User, default=1, on_delete=models.SET_DEFAULT)
    title = models.CharField(max_length=120)
    slug = models.SlugField(blank=True, null=True)
    content = models.TextField(blank=True, null=True)
    publish_timestamp = models.DateTimeField(
        auto_now_add=False,
        auto_now=False,
        blank=True,
        null=True,
    )

Let’s add the field:

updated_by = models.ForeignKey(
        User, related_name="editor", null=True, blank=True, on_delete=models.SET_NULL
)

This field will allow us to track the last user to make a change to our model.

Let’s update our model:

class BlogArticle(models.Model):
    user = models.ForeignKey(User, default=1, on_delete=models.SET_DEFAULT)
    title = models.CharField(max_length=120)
    slug = models.SlugField(blank=True, null=True)
    content = models.TextField(blank=True, null=True)
    publish_timestamp = models.DateTimeField(
        auto_now_add=False,
        auto_now=False,
        blank=True,
        null=True,
    )
    # our new field
    updated_by = models.ForeignKey(
        User, related_name="editor", null=True, blank=True, on_delete=models.SET_NULL
    )

Now, after we save this file this BlogArticle class is declared in (models.py), how do we let our database know this change occurred?

There’s two ways:

python manage.py makemigrations
python manage.py migrate

Let’s discuss what these two commands do:

python manage.py makemigrations

python manage.py makemigrations looks for changes in all models.py files across your Django project and looks for changes. If changes are found, a new python file will be created with the proposed changes that our SQL database needs to make. The proposed changes look something like:

from django.conf import settings
from django.db import migrations, models
import django.db.models.deletion


class Migration(migrations.Migration):

    dependencies = [
        migrations.swappable_dependency(settings.AUTH_USER_MODEL),
        ('articles', '0001_initial'),
    ]

    operations = [
        migrations.AddField(
            model_name='article',
            name='updated_by',
            field=models.ForeignKey(blank=True, null=True, on_delete=django.db.models.deletion.SET_NULL, related_name='editor', to=settings.AUTH_USER_MODEL),
        ),
    ]

This, of course, is just another Python file. This file is letting us (the developers) know what should happen in our database. It’s written in Python and not SQL to maintain cohesion and to leverage the Django ORM’s built-in features.

But why is this a file for what should happen? Well, there’s a few reasons for this:

If we need to review what should happen before it does, we can catch it here.
This makemigrations command does not check with the database to see if this change can even happen.
The database may have already been changed to fit these requirements (depending on a number of factors related to who/what is managing the database).
If we need to run tests prior to changing a production database, right now would be an amazing time to do so.

Assuming that this change is valid (as far as we can tell), we can commit the changes:

python manage.py migrate

python manage.py migrate will attempt to change our database for us — all fields, columns, tables, foreign keys, you name it — Django will do the work for us to help ensure the database is updated in the way we intended.

It’s important to note that Django might fail to make these changes for a number of reasons. For new Django developers, this is almost always due to adding and removing fields and columns and failing to run migrations correctly.

When done correctly, python manage.py migrate ensures a stable system that matches our Python code with our SQL tables, thus allowing us all the awesomeness that both Django and SQL databases provide.

How does this give us more flexibility?

Python has broad applications where SQL does not. Structured Query Language has its limitations written in the name. Who’s creating Pixar animations with just SQL?

OK, all of this is to say that the simplicity of Python actually helps developers adopt the power of SQL and possibly without even knowing it.

Why Managed Databases and Django Make Sense

When it comes to creating a web applications, including Django, you’ll need to decide a few things:

Which data storage solution(s) do we want? MySQL, Postgres, MongoDB, Redis, Object Storage, etc
How will we run/integrate with the data storage solution?
How will we recover from interruption or downtime?
How will we maintain the storage solution?
How will we secure our storage solution?
How will we backup our storage solution?

The answers to these questions may change as your project grows in complexity but they all start in the same place: deciding between self-managed versus third-party managed.

Self-managed:

Pros: Control and cost.
(Significant) Con: You’re responsible for everything.

Managed services often cost more money from the start, whereas self-managing means you can use your preferred Linux distro that’s somehow (or somewhat) optimized for what you need. This can include running a forked version of MySQL that your team has modified. You might save dollars on running your service, but this will always take more time to maintain.

Third-party managed databases:

Yes, it might be slightly more expensive in dollars and cents but it will take significantly less time to maintain. This option and managed data storage solutions are my de facto choice for my web applications. In this example, we’re already utilizing Django to manage database transactions. SQLAlchemy also shares this strength as it is used with frameworks such as FastAPI, Flask, and many others. If you’re already outsourcing your SQL writing to a Python package, why not outsource running your SQL servers?

Now, given the effectiveness of Python ORMs (like Django ORM and SQLAlchemy), I recommend that you use managed database and/or managed data storage services whenever possible, here’s what you stand to gain if you do:

Reduced development time
Reduced management time
Reduced recovery time
Reduced service interruptions
Reduced deployment and development complexity
Reduced complexity in Database migrations (from other services)
Reduced repetitive/ineffective/inefficient activities for SQL developers
Reduced DevOps/Ops complexity
Increased effectiveness of non-SQL developers
Increased deployment and development speed
Increased reliability (often backed by a Service Level Agreement)
Increased security
Increased maintainability
Increased in backups and redundancy
Marginal increase in cost

I made the list above with the mindset of using a Managed MySQL Database Cluster on Linode as well Linode Object Storage (for storing files like CSS, JavaScript, images, videos, etc). Practically speaking, using these services helps us maintain focus on building an excellent web application with Django, FastAPI, Flask, Node.js, or whatever. To put it another way, we shift the focus on building the tools and software your users actually want. You know, where the real value is to them.

MySQL, PostgreSQL, Redis, and Django

For a long time, Django’s leading database was PostgreSQL. I would argue this is, in large part, due to the fact that you could use a JSONField within Postgres only. With Django 3.2+ and MySQL 5.7.8+, the JSONField is now available for MySQL as well.

Why is this important?

Storing unstructured data, like JSON, is often required when handling user-generated content or storing data from other API services. Let’s see how:

from django.db import models

class Pet(models.Model):
    name = models.CharField(max_length=200)
    data = models.JSONField(null=True)

    def __str__(self):
        return self.name

Here’s the data I want to store in relation to this Pet:

pet1 = {
    "name": "Bruno",
    "type": "Rat",
    "nickname": "We don't talk about it",
    "age": 2,
    "age_interval": "months"
}

pet2 = {
    "name": "Tom",
    "type": "Cat",
    "breed": "Mixed"
    "age": 4,
    "age_interval: "years",
    "favorite_food": [{"brand": "Acme", "flavor": "Tuna" }]
}

pet3 = {
    "name": "Stewey",
    "type": "Dog",
    "breed": "unknown"
    "age": 34,
    "age_interval: "dog years",
    "nickname": "Football"
}

This data shows us when we might need a JSONField . We can store all the pet names (using the name key) and keep the rest to be stored in the JSONField. The cool thing about JSONFields is they can be queried much like any other standard Django field even with these varying schemas.

There’s an ongoing debate among Django developers as to which database to use: MySQL or PostgreSQL. For the longest time, I always opted for PostgreSQL due to the fact the JSONField was only available on PostgreSQL, and that’s no longer the case. I say pick one and stick with it until it no longer serves your needs.

But what do we use Redis for?

Redis is an in-memory datastore that’s incredibly fast and often used as a temporary database (more on this in a second), a caching service, and/or a messaging queue. The reason I call it a temporary database is due to the fact that it’s in-memory. Memory is often more expensive than disk storage and thus making storing data long term in-memory is often not feasible.

My primary use case for Redis and Django are caching and queuing

Caching: Let’s say you have several web pages that users visit a lot. You want the data in those pages to be shown to users as quickly as possible. Redis, as a caching system for Django, makes doing this incredibly easy. The data within these pages might be rendered from a SQL database but Redis can store that rendered data from the cache. In other words, using Redis with SQL can often speed up your responses while reducing the amount of queries to your SQL databases.

Queuing: Another popular use case of Redis is to offload long-running tasks to another process (often through a Python package called Celery). When you need to do this, you can use Redis as a queue of the tasks that should be completed at another time.

For example, if you have a user that needs a report of all of their transactions for the past five years, the software might take hours to actually generate that report. Obviously, no one is going to stare at a machine for hours. So we would offload this request from our user to a Redis queue. Once in Redis, we can have a worker process running (like using Celery with Django) to actually generate the report. Once the report is done, no matter how long it took, the user would be notified. This notification, as with other notifications, could also be done through a Redis Queue coupled with a Celery/Django worker process.

This is all to say that Redis and MySQL actually complement each other very well. You can deploy a self-managed Redis database server via the Linode Marketplace.

Object Storage

The last data-related managed service I recommend using is Linode Object Storage. Object Storage is responsible for all the other kinds of data you may need to store. For example, we would not store all the bytes in a video in MySQL. Instead, we’d store metadata related to that video and store the video in Object Storage.

Here are a few things you would use object storage for:

Cascading Style Sheets (CSS)
JavaScript (like React.js, Vue.js, Vanilla.js, etc)
Videos
Images (raw and compressed)
CSVs, XLSX
Database Backups
Docker Container Image Layers (if self-managed)
Iterations of Trained Machine Learning Algorithms
Terraform State Files
PDFs (both large and small)
Any persistent file that needs to be downloaded often (or uploaded)

Summary

After reading this, I hope you feel motivated to leverage the power of managed services with your web application projects. Django is an excellent solution for building web apps on top of SQL databases, but it’s certainly not the only one. If you want to dive into the internals of SQL and SQL servers, I think it’s a worthwhile exercise to see how many successful applications leverage Django to handle the bulk of what Django can do.

Here’s a few (or many) highlights that make Django with Managed MySQL on Linode awesome:

Django does the heavy SQL lifting for you (so do tools like SQLAlchemy for Flask/FastAPI)
Django enables raw SQL commands too (again, so do tools like SQLAlchemy)
Django helps beginners learn SQL commands
Django has built-in support for MySQL and PostgreSQL (in addition to a db-specific python client)
Increases speed to production deployments
Increased reliability and recoverability
Enables development and production environments to match database technology almost exactly
Makes container-based Django easier and more reliable
Unlocks scaling from a single-node deployment to multi-node or even full fledge transition to Kubernetes
Easier for new Django/Python developers to use production-grade systems
Sharing databases across multiple python-based apps is easier and more secure (such as a FastAPI application reading/writing from/to a Django-based MySQL database).
Django’s JSONField now supported using MySQL (previously only PostgreSQL)
Easy to test (during CI/CD or in local development environments)
Scales to meet Django demands
Support for multiple databases in a single Django project such as: using MySQL as primary read/write database and a MySQL read replica database for common queries.
Strict Access Controls (Linode Private IPs, local development)
Requires SSL Certificate for connection (add complexity to deployments but also increases security)
Enables private connection (in same region; lowers connection costs)

If you’re interested in a modern approach to deploying Django applications on Linode along with a managed MySQL Database, GitHub Actions for CI/CD, Terraform, and Ansible, jump in to tons of free step-by-step educational content:

Watch our Understanding Databases Video Tutorial Series on Linode LIVE!
Download the Understanding Databases: Extended Edition ebook

To help get you started, the Coding for Entrepreneurs GitHub has a repository of code that goes with each step in the series. Good luck, and be sure to let me know how things are going via Twitter @JustinMitchel.

Compute

Storage

Databases

Networking

Developer Tools

Delivery

Security

Services

Industries

Pricing

Community

Engage With Us

Django and SQL: Your Dynamic Duo for Scaling Databases

python manage.py makemigrations

python manage.py migrate

You might also like...

Comments

Leave a Reply Cancel reply

Django and SQL: Your Dynamic Duo for Scaling Databases

python manage.py makemigrations

python manage.py migrate

You might also like...

Getting Started with LLMs: Managing Data Collection

How to Start Delivering Global-Scale Services | HarperDB Install and Set Up

Distributed Database Computing | Apache Cassandra Running in Multiple Data Centers

Comments

Leave a Reply Cancel reply

Sign up for the “In the Node” Newsletter