BlogDatabasesHarnessing NoSQL Capabilities in PostgreSQL

Harnessing NoSQL Capabilities in PostgreSQL

July 5, 2023

Image with symbol representing a database connected to other databases and the text Harnessing NoSQL Capabilities in PostgreSQL at the bottom.

NoSQL document stores can be ideal for managing large amounts of unstructured data. However, some organizations work with unstructured data but still want the capabilities that come with traditional SQL databases. For example, media or news content agencies may run high-traffic websites centered around vast amounts of text and image content. Although they need to store this unstructured data, they perhaps don’t really need the flexible schemas or horizontal scalability that come with NoSQL databases. Instead, they need the database-management ease and consistency that comes with a relational database like PostgreSQL.

Is it possible to get the best of both worlds? Yes.

With its data types meant to support unstructured data, PostgreSQL offers a happy medium, enabling you to harness NoSQL capabilities within a relational database that is cost-effective and simple to manage. In this article, we’ll look at how you can use the HStore and JSONB data types in PostgreSQL to work with unstructured data.

Before we dive in, let’s look briefly at the main differences between SQL and NoSQL databases.

Understanding SQL versus NoSQL

SQL and NoSQL databases each have their unique strengths and weaknesses. Making an informed decision about which will best meet your data needs depends on a strong understanding of their differences.

SQL (relational) databases, like PostgreSQL and MySQL, represent data with a clear and predictable structure in tables, rows, and columns. They adhere to ACID properties (atomicity, consistency, isolation, and durability), which yield a strong foundation for data integrity by ensuring that database transactions are reliably processed.

SQL databases shine where data consistency and integrity are crucial, such as when dealing with complex queries and transactional systems (like with financial applications).

In contrast, NoSQL databases (document stores) cater to large and varied data sets not necessarily suited for tabular representation. Examples of NoSQL databases include MongoDB, Cassandra, and Couchbase. NoSQL databases work with flexible schemas, allowing data structures to evolve over time. They also support horizontal scalability, distributing data across multiple servers for improved handling of large data loads and high traffic.

NoSQL databases are often used in applications where scalability is crucial, such as for handling large quantities of data in real-time applications or large language models (LLMs). NoSQL databases are also beneficial when dealing with varied and evolving data structures, as they allow organizations to adapt as their data needs change.

Why Might You Use PostgreSQL as a Document Store?

PostgreSQL is a relational database, so it may seem unconventional to consider it an option to meet NoSQL needs. However, your situation may have a strong case for using PostgreSQL as a document store.

If your data storage needs are diverse—requiring both structured, ACID-compliant data storage and flexible, schema-less document storage—then you can leverage PostgreSQL to combine relational and non-relational models. Or, perhaps you want certain NoSQL capabilities but also want the data consistency guarantees that come with ACID properties. Finally, as a mature technology with an active community, PostgreSQL brings comprehensive SQL support, advanced indexing, and full-text search. These features—combined with its NoSQL capabilities—make PostgreSQL a versatile data storage solution.

Limitations of Using PostgreSQL for NoSQL-Style Data

Despite its versatility, PostgreSQL has certain limitations compared to traditional NoSQL databases. While PostgreSQL can scale up vertically, it doesn’t inherently support horizontal scaling or distributed data with automatic sharding, features that NoSQL databases typically offer. PostgreSQL also does not offer optimizations for certain NoSQL data structures like wide-column stores or graph databases. Finally, PostgreSQL does not offer tunable consistency for optimizing performance, which you might get from some NoSQL databases.

As you consider using PostgreSQL for large, unstructured data sets, know that these limitations may impact performance and your ability to scale. In addition, mixing SQL and NoSQL data operations introduces complexity. Careful planning and understanding of both paradigms will help you avoid potential pitfalls.

However, with the right understanding and use case, PostgreSQL can serve as a powerful tool, providing the best of both SQL and NoSQL worlds.

HStore and JSONB in PostgreSQL

As we consider the possibilities of using PostgreSQL as a NoSQL solution, we encounter three data types that offer NoSQL-like functionality, but they each have unique characteristics and use cases.

HStore: This data type allows you to store key-value pairs in a single PostgreSQL value. It is useful for storing semi-structured data that does not have a fixed schema.
JSONB: This is a binary representation of JSON-like data. It can store more complex structures compared to HStore and supports full JSON capabilities. JSONB is indexable, making it a good choice for large amounts of data.
JSON: This is similar to JSONB, though it lacks many of JSONB’s capabilities and efficiencies. The JSON data type stores an exact copy of the input text, which includes white space and duplicate keys.

We mention the JSON data type as a valid choice for storing JSON-formatted data when you don’t need the full capabilities provided by JSONB. However, our primary focus for the remainder of this article will be HStore and JSONB.

HStore

The PostgreSQL documentation describes HStore as useful when you have “rows with many attributes that are rarely examined, or semi-structured data.” Before you can work with the HStore data type, make sure to enable the HStore extension:

> CREATE EXTENSION hstore;

HStore is represented as zero or more key => value separated by commas. The order of the pairs is not significant or reliably retained on output.

> SELECT 'foo => bar, prompt => "hello world", pi => 3.14'::hstore;
                      hstore                       
-----------------------------------------------------
"pi"=>"3.14", "foo"=>"bar", "prompt"=>"hello world"
(1 row)

Each HStore key is unique. If an HStore declaration is made with duplicate keys, only one of the duplicates will be stored, and there is no guarantee about which one that will be.

> SELECT 'key => value1, key => value2'::hstore;
    hstore     
-----------------
"key"=>"value1"
(1 row)

With its flat key-value structure, HStore offers simplicity and fast querying, making it ideal for straightforward scenarios. However, HStore only supports text data and does not support nested data, making it limited for complex data structures.

On the other hand, JSONB can handle a wider variety of data types.

JSONB

The JSONB data type accepts JSON-formatted input text and then stores it in a decomposed binary format. Although this conversion makes input slightly slow, the result is fast processing and efficient indexing. JSONB does not preserve white space or the order of object keys.

> SELECT '{"foo": "bar", "pi": 3.14, "nested": { "prompt": "hello", "count": 5 } }'::jsonb;
                                jsonb                                
-----------------------------------------------------------------------
{"pi": 3.14, "foo": "bar", "nested": {"count": 5, "prompt": "hello"}}
(1 row)

If duplicate object keys are given, the last value is kept.

> SELECT '{"key": "value1", "key": "value2"}'::jsonb;
      jsonb      
-------------------
{"key": "value2"}
(1 row)

Because JSONB supports complex structures and full JSON capabilities, it is the ideal choice for complex or nested data, preferable over HStore or JSON. However, using JSONB introduces some performance overhead and increased storage usage compared to HStore.

Practical Examples: Working with HStore and JSONB

Let’s consider some practical examples to demonstrate how to work with these data types. We’ll look at creating tables, basic querying and operations, and indexing.

Basic HStore Operations

As you would with any other data type, you can define fields in your PostgreSQL data table as an HStore data type.

> CREATE TABLE articles (    id serial primary key,    title varchar(64),    meta hstore  );

Inserting a record with an HStore attribute looks like this:

> INSERT INTO articles (title, meta)
  VALUES (
    'Data Types in PostgreSQL',
    'format => blog, length => 1350, language => English, license => "Creative Commons"');

> SELECT * FROM articles;
id |          title           | meta                                     ----+--------------------------+------------------------------------------  1 | Data Types in PostgreSQL | "format"=>"blog", "length"=>"1350", "license"=>"Creative Commons", "language"=>"English"(1 row)

With HStore fields, you can fetch specific key-value pairs from the field as specified by keys you supply:

> SELECT title,          meta -> 'license' AS license,         meta -> 'format' AS format  FROM articles;
              title              |     license      |   format  
---------------------------------+------------------+------------
Data Types in PostgreSQL        | Creative Commons | blog
Advanced Querying in PostgreSQL | None             | blog
Scaling PostgreSQL              | MIT              | blog
PostgreSQL Fundamentals         | Creative Commons | whitepaper
(4 rows)

You can also query with criteria based on specific values within an HStore field.

> SELECT id, title FROM articles WHERE meta -> 'license' = 'Creative Commons';

id |          title          
----+--------------------------
  1 | Data Types in PostgreSQL
  4 | PostgreSQL Fundamentals
(2 rows)

You may at times only want to query for rows that contain a specific key in the HStore field. For example, the following query only returns rows where the meta HStore contains the note key. To do this, you would use the ? operator.

> SELECT title, meta->'note' AS note FROM articles WHERE meta ? 'note';
              title              |      note      
---------------------------------+-----------------
PostgreSQL Fundamentals         | hold for review
Advanced Querying in PostgreSQL | needs edit
(2 rows)

A list of useful HStore operators and functions can be found here. For example, you can extract the keys for an HStore to an array, or you can convert an HStore to a JSON representation.

> SELECT title, akeys(meta) FROM articles where id=1;
          title           |              akeys              
--------------------------+----------------------------------
Data Types in PostgreSQL | {format,length,license,language}
(1 row)

> SELECT title, hstore_to_json(meta) FROM articles where id=1;
          title           |            hstore_to_json
--------------------------+------------------------------------------------
Data Types in PostgreSQL | {"format": "blog", "length": "1350", "license": "Creative Commons", "language": "English"}
(1 row)

Basic JSONB Operations

Working with the JSONB data type in PostgreSQL is straightforward. Table creation and record insertion look like this:

> CREATE TABLE authors (id serial primary key, name varchar(64), meta jsonb);

> INSERT INTO authors (name, meta)  VALUES    ('Adam Anderson',     '{ "active":true, "expertise": ["databases", "data science"], "country": "UK" }');

Notice that the jsonb meta field is supplied as a text string in JSON format. PostgreSQL will complain if the value you provide is not a valid JSON.

> INSERT INTO authors (name, meta)  VALUES ('Barbara Brandini', '{ "this is not valid JSON" }');
ERROR:  invalid input syntax for type json

Unlike with the HStore type, JSONB supports nested data.

> INSERT INTO authors (name, meta)  VALUES ('Barbara Brandini',          '{ "active":true,             "expertise": ["AI/ML"],             "country": "CAN",             "contact": {               "email": "barbara@example.com",               "phone": "111-222-3333"             }           }');

Similar to HStore, JSONB fields can be retrieved partially, with only certain keys. For example:

> SELECT name, meta -> 'country' AS country FROM authors;
      name       | country ------------------+--------- Adam Anderson    | "UK" Barbara Brandini | "CAN" Charles Cooper   | "UK"(3 rows)

The JSONB data type has many operators that are similar in usage to HStore. For example, the following use of the ? operator retrieves only those rows where the meta field contains the contact key.

> SELECT name,         meta -> 'active' AS active,         meta -> 'contact' AS contact  FROM authors  WHERE meta ? 'contact';
      name       | active |                 contact                         
------------------+--------+-----------------------------------------------
Barbara Brandini | true   | {"email": "barbara@example.com", "phone": "111-222-3333"}
Charles Cooper   | false  | {"email": "charles@example.com"}
(2 rows)

Working with Indexes

As per the documentation, the HStore data type “has GiST and GIN index support for the @>, ?, ?& and ?| operators.” For a detailed explanation of the differences between the two types of indexes, please see here. Indexing for JSONB uses GIN indexes to facilitate the efficient search for keys or key-value pairs.

The statement to create an index is as one would expect:

> CREATE INDEX idx_hstore ON articles USING GIN(meta);
> CREATE INDEX idx_jsonb ON authors USING GIN(meta);

SQL Structure with NoSQL Flexibility

Let’s revisit the original use case that we mentioned in the introduction. Imagine a news content agency that stores its articles in much the same way as one would with a NoSQL document store. Perhaps the article can be represented in JSON as an ordered array of objects representing sections, each with text content, notations, and formatting. In addition, a host of metadata is associated with each article, and those metadata attributes are inconsistent from one article to the next.

The above description encapsulates the lion’s share of the organization’s NoSQL needs, but everything else about how it manages and organizes its data aligns closely with a relational data model.

By combining the NoSQL capabilities of a data type like JSONB with PostgreSQL’s traditional SQL strengths, the organization can enjoy flexible schemas and fast querying in nested data while still being able to perform joint operations and enforce data relationships. PostgreSQL’s HStore and JSONB data types offer powerful options to developers that need the structure of a relational database but also require NoSQL-style data storage.

PostgreSQL at Scale

Are you looking to support NoSQL-style data storage and querying while staying within the framework of a traditional relational database? Perhaps your organization deals with documents similarly to how we’ve described in this post. Or perhaps you’re looking for options to handle the storage of unstructured data for a large language model (LLM) or some other AI/ML undertaking.

The PostgreSQL Cluster in the Linode Marketplace gives you the relational model and structure of a SQL database along with the horizontal scalability of a NoSQL database. Combine this with using HStore or JSONB data types, and you have an ideal hybrid solution for harnessing NoSQL capabilities as you work within PostgreSQL.

Compute

Storage

Databases

Networking

Developer Tools

Delivery

Security

Services

Industries

Pricing

Community

Engage With Us

Harnessing NoSQL Capabilities in PostgreSQL

You might also like...

Comments

Leave a Reply Cancel reply

Harnessing NoSQL Capabilities in PostgreSQL

You might also like...

Getting Started with LLMs: Managing Data Collection

How to Start Delivering Global-Scale Services | HarperDB Install and Set Up

Distributed Database Computing | Apache Cassandra Running in Multiple Data Centers

Comments

Leave a Reply Cancel reply

Sign up for the “In the Node” Newsletter