Product docs and API reference are now on Akamai TechDocs.
Search product docs.
Search for “” in product docs.
Search API reference.
Search for “” in API reference.
Search Results
 results matching 
 results
No Results
Filters
How to Create a HarperDB Cluster
Traducciones al EspañolEstamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.
HarperDB is a versatile database solution that combines SQL and NoSQL functionality. It includes a comprehensive built-in API for easy integration with other applications. This guide provides a brief introduction to HarperDB and explains how to install it. It also explains how to configure multiple database instances into a cluster and replicate data.
What is HarperDB?
HarperDB combines a flexible database, built-in API, and distribution logic into a single backend. This solution, known as an embedded database, allows developers to more quickly and easily create integrated web applications. HarperDB allows both NoSQL and SQL tables to be mixed together in the same database and schema. SQL tables are highly structured and normalized, while NoSQL permits more freeform data. This combination enables access to legacy data and operational systems in the same place as new business intelligence analytics.
HarperDB is available through the HarperDB Cloud or as a self-hosted solution. The optional HarperDB Studio provides a visual GUI for storing or retrieving data but requires registration. Users can configure HarperDB through either the comprehensive API or the HarperDB CLI. Unfortunately, the CLI only supports a subset of the current functionality. API calls can be embedded into an application or sent as stand-alone requests using curl
or a similar utility.
HarperDB is optimized for fast performance and scalability, with sub-millisecond latency between the API and data layer. NoSQL data can be accessed as quickly as SQL tables in traditional relational database management systems (RDBMS). HarperDB is particularly useful for gaming, media, manufacturing, status reporting, and real-time updates.
HarperDB also supports clustering and per-table data replication. Data replication can be configured in one or both directions. Administrators have full control over how the data is replicated within the cluster. A HarperDB instance can both send table updates to a second node and receive updates from it in return. However, it can simultaneously transmit changes to a second table to another node in a unidirectional manner. HarperDB minimizes data latency between nodes, allowing clusters encompassing different regions and different continents. Clusters can grow very large, permitting virtually unlimited horizontal scaling.
Some advantages of HarperDB are:
- The HarperDB API provides applications with direct database access. This allows the application and its data to be bundled together in a single distribution.
- Each HarperDB node is atomic and guarantees “exactly-once” delivery. It avoids unnecessary data duplication.
- Every node in the cluster can read, write, and replicate data.
- HarperDB features a fast and resilient caching mechanism.
- Connections are self-healing, allowing for fast replication even in an unstable network.
- HarperDB supports data streaming and edge processing. This technique pre-processes data, only storing or transmitting the most important information.
- NoSQL tables support dynamic schemas, which can seamlessly change as new data arrives. HarperDB provides an auto-indexing function for more efficient hashing.
- HarperDB allows SQL queries on both structured and unstructured data.
- HarperDB’s Custom Functions allow developers to add their own API endpoints and manage authentication and authorization.
Before You Begin
If you have not already done so, create a Linode account and Compute Instance. See our Getting Started with Linode and Creating a Compute Instance guides.
Follow our Setting Up and Securing a Compute Instance guide to update your system. You may also wish to set the timezone, configure your hostname, create a limited user account, and harden SSH access.
On a multi-user system, it is best to create a dedicated HarperDB user account with
sudo
access. Use this account for the instructions in this guide.
sudo
. If you are not familiar with the sudo
command, see the Linux Users and Groups guide.How To Install HarperDB
Run these instructions on every node in the cluster. Each cluster must contain at least two nodes. These guidelines are designed for Ubuntu 22.04 LTS users but are similar to other Linux distributions. HarperDB is also available as a Docker container or as a .tgz
file for offline installation. For more details on these options and the standard installation procedure, see the HarperDB installation instructions.
Ensure the system is up to date by executing the following command:
sudo apt-get update -y && sudo apt-get upgrade -y
HarperDB requires Node.js to run properly. To install Node.js, first install the Node Version Manager (NVM). To download and install NVM, use the following command.
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.3/install.sh | bash
Log out and log back into the terminal to activate NVM.
exit ssh username@system_IP
Use NVM to install Node.js. This command installs Node.js release 20, the current LTS release as of this writing. It also installs the NPM package manager for Node.js.
Note HarperDB requires Node.js release 14 or higher.nvm install 20
NVM installs the latest Node.js 20 LTS patch version and sets that as the default.
Create a swap file for the system.
sudo dd if=/dev/zero of=/swapfile bs=128M count=16 sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile echo "/swapfile swap swap defaults 0 0" | sudo tee -a /etc/fstab
Increase the open file limits for the account. Replace
accountname
with the name of the actual account.echo "accountname soft nofile 500000" | sudo tee -a /etc/security/limits.conf echo "accountname hard nofile 1000000" | sudo tee -a /etc/security/limits.conf
Use NPM to install HarperDB.
npm install -g harperdb
How To Configure and Initialize the HarperDB Cluster
This section explains the steps required to initialize and run HarperDB. It also describes the additional configuration required to create and enable a HarperDB cluster. Each cluster must contain at least two nodes.
Some cluster attributes can be passed as parameters to the initial harperdb start
command. If the system is initially configured as a stand-alone instance, it can be added to a cluster later on. However, further changes cannot be made through the command line. They must be implemented in either the harperdb-config.yaml
file or through API calls.
For simplicity and consistency, this guide appends most of the required cluster configuration to the initial harperdb start
command. It then completes the configuration process using API calls. For more information on the HarperDB API, see the HarperDB API documentation.
Replication occurs on a per-table basis and is configured after the schema and table are defined. See the following section for a more complete explanation. Follow these steps to enable clustering on your HarperDB nodes.
On the first node, use
harperdb start
to launch the application. Provide the following configuration attributes in the command.- For
TC_AGREEMENT
, indicateyes
to accept the terms of the agreement. - Define the
ROOTPATH
directory for persistent data. This example sets the directory to/home/user/hdb
. Replace theuser
with the actual name of the user account. - Set the
HDB_ADMIN_USERNAME
to the name of the administrative user. - Provide a password for the administrative account in
HDB_ADMIN_PASSWORD
. Replace thepassword
with a more secure password. - Set
OPERATIONSAPI_NETWORK_PORT
to9925
. - Choose a name for the
CLUSTERING_USER
and provide a password for the user inCLUSTERING_PASSWORD
. These values must be the same for all nodes in the cluster. - Set
CLUSTERING_ENABLED
totrue
. - Identify the node using
CLUSTERING_NODENAME
. This name must be unique within the cluster.
Note HTTPS is recommended for better security on production systems or with sensitive data. To use HTTPS, add the parameters--OPERATIONSAPI_NETWORK_HTTPS "true"
and--CUSTOMFUNCTIONS_NETWORK_HTTPS "true"
.harperdb start \ --TC_AGREEMENT "yes" \ --ROOTPATH "/home/user/hdb" \ --OPERATIONSAPI_NETWORK_PORT "9925" \ --HDB_ADMIN_USERNAME "HDB_ADMIN" \ --HDB_ADMIN_PASSWORD "password" \ --CLUSTERING_ENABLED "true" \ --CLUSTERING_USER "cluster_user" \ --CLUSTERING_PASSWORD "password" \ --CLUSTERING_NODENAME "hdb1"
|------------- HarperDB 4.1.2 successfully started ------------|
- For
(Optional) To launch HarperDB at bootup, create a crontab entry for the application. Substitute the name of the administrative account for
user
and ensure the path reflects the release of the NVM being used. In this example, the path entry reflects release18.17
of NVM.Note To integrate HarperDB withsystemd
and start/stop it usingsystemctl
, see the HarperDB Linux documentation.(crontab -l 2>/dev/null; echo "@reboot PATH=\"/home/user/.nvm/versions/node/v18.17.0/bin:$PATH\" && harperdb start") | crontab -
Start HarperDB on the remaining nodes. Change the value of
CLUSTERING_NODENAME
to a different value. In this example, it is set tohdb2
. The remaining attributes are the same as on the first node.harperdb start \ --TC_AGREEMENT "yes" \ --ROOTPATH "/home/user/hdb" \ --OPERATIONSAPI_NETWORK_PORT "9925" \ --HDB_ADMIN_USERNAME "HDB_ADMIN" \ --HDB_ADMIN_PASSWORD "password" \ --CLUSTERING_ENABLED "true" \ --CLUSTERING_USER "cluster_user" \ --CLUSTERING_PASSWORD "password" \ --CLUSTERING_NODENAME "hdb2"
Run
harperdb status
on each node to confirm HarperDB is active. Thestatus
field should indicaterunning
.harperdb status
harperdb: status: running pid: 1726 clustering: hub server: status: running pid: 1698 leaf server: status: running pid: 1715 network: - name: hdb1 response time: 6 connected nodes: [] routes: [] replication: node name: hdb1 is enabled: true connections: []
Determine the network topology for the cluster. A full mesh of connections is not required. Data can be replicated to any cluster node provided it is connected to the rest of the cluster. Design some measure of resiliency into the network. If a hub-and-spoke architecture is configured, the remaining nodes would be isolated if the central node suffers an outage. As a general guideline, connect each node to two other nodes. It is not necessary to add the route in both directions. For instance, a connection between
node1
andnode2
can be added to eithernode1
ornode2
. Successful negotiation establishes a bidirectional route.Authentication is required to send messages to HarperDB using the API. To derive the
AuthorizationKey
from the name and password of the administrator account, use the JavaScriptbtoa()
function. Run the commandbtoa("HDB_ADMIN:password")
to convert the account credentials into a Base64 string. Replace thepassword
with the actual password.Note JavaScript commands can be executed in a web browser console. On Firefox, select Tools->Browser Tools->Web Developer Tools to access the console. Choose the Console option within the developer window, then enter the command. Alternatively, online JavaScript emulators are widely available for the same purpose. Use the result for theAuthorizationKey
values in the following API calls. See the Mozilla documentation for more information.btoa("HDB_ADMIN:password")
Add routes until the network architecture is fully implemented. If a cluster consists of
node
,node2
, andnode3
, add a route onnode1
to reachnode2
and another onnode2
tonode3
. On nodehdb1
, run thecurl
command shown below to install a route tohdb2
. Include the following information:- In the
POST
header, send the command to the local HarperDB process athttp://localhost:9925
. - Include an
Authorization
header. Use theAuthorizationKey
derived from the administrator account and password in the previous step. - Inside the
data
header, set theoperation
tocluster_set_routes
and set theserver
to `hub``.
- Use
routes
to specify a list of one or more routes to install. Each route consists of ahost
and aport
, which is typically9932
. Thehost
is the IP address of the peer system. In the following example, replace192.0.2.10
with the actual IP address of the peer.
curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "cluster_set_routes", "server": "hub", "routes":[ {"host": "192.0.2.10", "port": 9932} ] }'
{"message":"cluster routes successfully set","set":[{"host":"192.0.2.10","port":9932}],"skipped":[]}
- In the
Stop and start the HarperDB instance to quickly negotiate the route.
harperdb stop harperdb start
Run the
harperdb status
command again. Ensure the route is displayed underroutes
.harperdb status
harperdb: status: running pid: 20926 clustering: hub server: status: running pid: 20899 leaf server: status: running pid: 20914 network: - name: hdb1 response time: 18 connected nodes: - hdb2 routes: - host: 192.0.2.10 port: 9932 - name: hdb2 response time: 92 connected nodes: - hdb1 routes: [] replication: node name: hdb1 is enabled: true connections: []
How to Add and Replicate Data on HarperDB
The cluster is now ready for replication. Replication occurs on a per-table basis in HarperDB, so data is not automatically replicated. Instead, one or more subscriptions define how to manage the table data. The schema and table must be created first before adding any subscriptions. Each subscription references a single peer node. To replicate data to multiple nodes, multiple subscriptions must be added.
A subscription contains the name of the schema
and table
to replicate, along with Boolean values for publish
and subscribe
. When publish
is set to true
, transactions on the local node are replicated to the remote node. Setting subscribe
to true
means any changes to the remote table are sent to the local node. Both values can be set to true
, resulting in bidirectional replication. In all cases, the local node is the one receiving the subscription request.
The following example demonstrates how to create a schema, table, and subscription on node hdb1
. The subscription both publishes and subscribes to the dog
table, resulting in two-way replication between nodes hdb1
and hdb2
.
Create the
dev
schema on nodehdb1
through the HarperDB API using thecreate_schema
operation. Provide the correct value for theAuthorizationKey
as described earlier.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_schema", "schema": "dev" }'
{"message":"schema 'dev' successfully created"}
Create the
dog
table within thedev
schema. This API call invokes thecreate_table
operation and sets thehash_attribute
toid
. This is a NoSQL table, so columns and types are not defined.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "create_table", "schema": "dev", "table": "dog", "hash_attribute": "id" }'
{"message":"table 'dev.dog' successfully created."}
Add a subscription to the
dog
table using the APIadd_node
operation. Add the following information to the request.- Set
node_name
tohdb2
to designate it as the peer for replication. - Specify the schema and table to replicate. In this example, the
schema
isdev
and thetable
is adog
. - To transmit updates to
hdb2
setpublish
totrue
. This configures replication in one direction only.
Note Theadd_node
operation can create multiple subscriptions for several schemas/tables at the same time. However, all subscriptions in the request must relate to the same peer. Separate each subscription using a comma and enclose it with the[]
brackets. To replicate more tables to a different node, call theadd_node
API again and provide the newnode_name
.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "add_node", "node_name": "hdb2", "subscriptions": [ { "schema": "dev", "table": "dog", "subscribe": false, "publish": true } ] }'
{"message":"Successfully added 'hdb2' to manifest","added":[{"schema":"dev","table":"dog","publish":true,"subscribe":false}],"skipped":[]}
- Set
Use
harperdb status
to confirm HarperDB is aware of the subscription.harperdb status
... replication: node name: hdb1 is enabled: true connections: - node name: hdb2 status: open ports: clustering: 9932 operations api: 9925 latency ms: 132 uptime: 6h 49m 43s subscriptions: - schema: dev table: dog publish: true subscribe: false
To subscribe to updates to the
dog
table fromhdb2
, use theupdate_node
operation. Set bothsubscribe
andpublish
totrue
in the API call.Note subscribe
andpublish
could have been both set totrue
in the originaladd_node
operation. This method demonstrates how to update an existing subscription. To completely remove the subscription, use theremove_node
operation and include the name of the node undernode_name
.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "update_node", "node_name": "hdb2", "subscriptions": [ { "schema": "dev", "table": "dog", "subscribe": true, "publish": true } ] }'
{"message":"Successfully updated 'hdb2'","updated":[{"schema":"dev","table":"dog","publish":true,"subscribe":true}],"skipped":[]}
Add a record to the table to ensure replication is working. Either SQL or NoSQL can be used to add data to the
dog
table. This example adds a record using the NoSQLinsert
operation. Specifydev
as theschema
anddog
as the table. Use therecords
attribute to add one or more entries to the table. Because NoSQL is very free-form, a variable number of key-value fields can be appended to the record. Thehash_attribute
is set toid
in the table, so each new record must provide a unique value for theid
field.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "insert", "schema": "dev", "table": "dog", "records": [ { "id": 1, "dog_name": "Penny", "age": 7, "weight": 38 } ] }'
{"message":"inserted 1 of 1 records","inserted_hashes":[1],"skipped_hashes":[]}
To confirm the record has been added, retrieve the data using an SQL query. To send an SQL query to HarperDB, specify
sql
for theoperation
and setsql
to the desired SQL statement. The query"SELECT * FROM dev.dog
retrieves all records from the table. The output confirmsPenny
has been added to the table.Note NoSQL data is not normalized or columnar, so the key-value pairs do not necessarily appear in any particular order.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "sql", "sql": "SELECT * FROM dev.dog" }'
[{"weight":38,"id":1,"dog_name":"Penny","__updatedtime__":1690742615459.453,"__createdtime__":1690742615459.453,"age":7}]
Change to the console of the
hdb2
node and run the same command. The output should be the same, indicating the record has been replicated to this node.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "sql", "sql": "SELECT * FROM dev.dog" }'
[{"id":1,"age":7,"__createdtime__":1690742615459.453,"weight":38,"dog_name":"Penny","__updatedtime__":1690742615459.453}]
Confirm replication works in the opposite direction. Using the console for the
hdb2
node, add a second entry to thedev.dog
table. Increment theid
to2
to ensure it is unique within the table.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "insert", "schema": "dev", "table": "dog", "records": [ { "id": 2, "dog_name": "Rex", "age": 2, "weight": 68 } ] }'
{"message":"inserted 1 of 1 records","inserted_hashes":[2],"skipped_hashes":[]}
Return to the first node and retrieve all records from the
dev.dog
table. The reply should now list two dogs, including the entry added onhdb2
. This confirms data is replicating in both directions.curl --location --request POST 'http://localhost:9925' \ --header 'Authorization: Basic AuthorizationKey' \ --header 'Content-Type: application/json' \ --data '{ "operation": "sql", "sql": "SELECT * FROM dev.dog" }'
[{"weight":38,"id":1,"dog_name":"Penny","__updatedtime__":1690742615459.453,"__createdtime__":1690742615459.453,"age":7},{"weight":68,"id":2,"dog_name":"Rex","__updatedtime__":1690744053074.6084,"__createdtime__":1690744053074.6084,"age":2}]
More Information
You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.
This page was originally published on