Deploy A Production-Ready Cassandra Node Cluster on Ubuntu 16.04 and CentOS 7

Updated by Andrew Lescher Contributed by Andrew Lescher

Contribute on GitHub

View Project | View File | Edit File

Banner Image

What is Apache Cassandra

Apache Cassandra is an open-source application that is managed in a simple command line interface using the CQL language. CQL, or Cassandra Query Language, is syntactically similar to Structured Query Language, making it easy to pick up for those familiar with SQL.

Cassandra NoSQL databases are ideal for situations requiring maximum data redundancy and uptime, ease of horizontal scaling across multiple unique servers, and evolving project needs during the software development lifecycle, which would otherwise be heavily restricted by traditional relational database implementations.

Introduction To This Tutorial

This guide is Part 2 in a series detailing the implementation of Apache Cassandra on Ubuntu 16.04 and CentOS 7 distributions. To complete this guide, you must have at least two Cassandra nodes setup on two separate Linodes. By following these instructions, you will learn how to link your Cassandra nodes together to form a true cluster.

You will also learn how to secure communication between your nodes, as well as reinforce your cluster against typical failure points. The resulting cluster will be production-ready and configured for maximum uptime.

Before You Begin

  1. You must have at least two Cassandra nodes set up and configured according to the instructions in the Deploy A Scalable And Development-Driven NoSQL DB With Apache Cassandra guide. The Cassandra nodes should have equal or similar hardware specs; otherwise, bottlenecks can occur.

  2. A working firewall is a necessary security measure. Firewall-specific instructions will be presented for UFW, FirewallD, and IPtables. Steps for setting up UFW can be found at How to Configure a Firewall with UFW. FirewallD instructions are located at Introduction to FirewallD on CentOS.

  3. Most of the commands in this guide require root privileges in order to execute. You may work through the guide as-is if you can run the commands under the root account in your system. Alternatively, an elevated user account with sudo privileges can be used as long as each command is prefixed with sudo.

Initial Node Preparation

Prepare The Nodes For Clustering

The instructions here must be executed on each Cassandra node to be clustered. Apply the exact same configuration to each node, unless otherwise indicated.

  1. Clear the default data from the Cassandra system table in order to import a new values set in the cassandra.yaml config file.

    1
    2
    3
     systemctl stop cassandra
    
     rm -rf /var/lib/cassandra/data/system/*
    
  2. Edit the cassandra.yaml file. Set the appropriate values for each variable indicated below.

    1
     vim /etc/cassandra/conf/cassandra.yaml
    
Property Explanation
cluster_name Choose your cluster name here.
seed_provider This contains a comma-delimited list of each public IP address of each node to be clustered. Input the list in the line that reads - seeds: "127.0.0.1".
listen_address Other nodes in the cluster will use the IP address listed here to find each other. Change from “localhost” to the specific node’s public IP address.
rpc_address The listen address for client communication. Change from “localhost” to the public IP address or loopback address of the node.
endpoint_snitch Snitches determine how Cassandra replicates data. Change this to “GossipingPropertyFileSnitch,” as this is more suitable to a multi-datacenter configuration.
auto_bootstrap Add this property anywhere in the file. If you have yet to add data to your nodes - that is, you would start with a fresh cluster - set this to “false.” If your node(s) already contains data, do not add this property.
num_tokens This property defines the proportion of data stored on each node. For nodes with equal hardware capabilities, this number should be set equally between them so the data is more likely to be evenly distributed. The default value of 256 is likely to ensure equal data distribution. For more information on this topic, see the “How data is distributed across a cluster” link in the “External Resources” section.

Example

/etc/cassandra/conf/cassandra.yaml
1
2
3
4
5
6
7
8
9
10
cluster_name: '[Your Cluster Name]'
listen_address: [public_ip_address]
rpc_address: [public_ip_address]
num_tokens: 256
seed_provider:
  - class_name: org.apache.cassandra.locator.SimpleSeedProvider
    parameters:
          - seeds: "[node1_ip_address],[node2_ip_address]"
endpoint_snitch: GossipingPropertyFileSnitch
auto_bootstrap: false
  1. Edit the cassandra-rackdc.properties file. Assign each node the same datacenter and rack name.

Example

/etc/cassandra/conf/cassandra-rackdc.properties
1
2
3
4
# These properties are used with GossipingPropertyFileSnitch and will
# indicate the rack and dc for this node
dc=DC1
rack=RACK1

Edit Firewall Settings

Open Cassandra Communication Ports

  1. Ports 7000 and 9042 must be available for external nodes to connect to. As a security measure, limit connections to these ports to only the IP addresses of any other nodes in the cluster. Depending on your preference, you may use UFW, FirewallD, or iptables to proceed.

    UFW

    1
     ufw allow proto tcp from [external_node_ip_address] to any port 7000,9042 comment "Cassandra TCP"
    

    FirewallD

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
     firewall-cmd --permanent --zone=public --add-rich-rule='
         rule family="ipv4"
         source address="[external_node_ip_address]"
         port protocol="tcp" port="7000" accept'
    			
     firewall-cmd --permanent --zone=public --add-rich-rule='
         rule family="ipv4"
         source address="[external_node_ip_address]"
         port protocol="tcp" port="9042" accept'
    		
     firewall-cmd --reload
    

    iptables

    1
     -A INPUT -p tcp -s [external_node_ip_address] -m multiport --dports 7000,9042 -m state --state NEW,ESTABLISHED -j ACCEPT
    

Test the Cluster Setup

Boot Cassandra

Start Cassandra on each node, one after another, with systemctl start cassandra. Run nodetool status, and you should see each node in your cluster listed in the output.

Enable Node-to-Node Encryption

Setting up encryption between nodes offers additional security and protects the data that is transferred between Cassandra nodes. The commands in this section need only to be run on one node in your cluster, with the appropriate files then distributed across the rest of the cluster.

Generate SSL Files

  1. Create a new directory called “.keystore” in the Cassandra config directory. Navigate to the newly created directory.

    1
     mkdir /etc/cassandra/conf/.keystore
    
  2. Create a configuration file for openssl to help automate the certificate creation process. Copy the contents below into a new file and title it rootCAcert.conf. The contents in bold should be updated with your specific information.

~/.keystore/rootCAcert.conf
1
2
3
4
5
6
7
8
9
10
11
12
  [ req ]
  distinguished_name     = req_distinguished_name
  prompt                 = no
  output_password        = set_strong_password_here
  default_bits           = 4096

  [ req_distinguished_name ]
  C                      = US
  ST                     = WA
  L                      = Seattle
  OU                     = Cluster_Name
  CN                     = Cluster_Name_MasterCA
  1. Create the public and private key files.

    1
     openssl req -config rootCAcert.conf -new -x509 -nodes -keyout ca-cert.key -out ca-cert.cert -days 365
    
  2. Generate a keystore for each node in your cluster. Below, the command sequence is demonstrated as if two nodes comprised this cluster.

    1
    2
     keytool -genkeypair -keyalg RSA -alias node1 -keystore node1-keystore.jks -storepass cassandra -keypass cassandra -validity 365 -keysize 4096 -dname "CN=node1, OU=[cluster_name]"	
     keytool -genkeypair -keyalg RSA -alias node2 -keystore node2-keystore.jks -storepass cassandra -keypass cassandra -validity 365 -keysize 4096 -dname "CN=node2, OU=[cluster_name]"
    
  3. Verify the key. A successful verification will print out the certificate fingerprint. Repeat this command for each certificate file.

    1
     keytool -list -keystore node1-keystore.jks -storepass [password]
    
  4. Generate the signing-request file. Repeat this command for each node in your cluster, using each .jks file for the -keystore option. Below, the command sequence is demonstrated as if two nodes comprised this cluster.

    1
    2
    3
     keytool -certreq -keystore node1-keystore.jks -alias node1 -file node1-cert.csr -keypass cassandra -storepass cassandra -dname "CN=node1, OU=[cluster_name]"
    		
     keytool -certreq -keystore node2-keystore.jks -alias node2 -file node2-cert.csr -keypass cassandra -storepass cassandra -dname "CN=node2, OU=[cluster_name]"
    
  5. Sign each node’s certificate. Run the following command for each node in your cluster, using each .csr file you created earlier. Set the certificate to expire in 365 days for best practice. Below, the command sequence is demonstrated as if two nodes comprised this cluster.

    1
    2
     openssl x509 -req -CA ca-cert.cert -CAkey ca-cert.key -in node1-cert.csr -out node1-signed.cert -days 365 -CAcreateserial -passin pass:cassandra	
     openssl x509 -req -CA ca-cert.cert -CAkey ca-cert.key -in node2-cert.csr -out node2-signed.cert -days 365 -CAcreateserial -passin pass:cassandra
    
  6. Verify the certificates generated for each node.

    1
     openssl verify -CAfile ca-cert.cert node1-signed.cert
    
  7. Import the original certificate into the keystore for each node. Below, the command sequence is demonstrated as if two nodes comprised this cluster.

    1
    2
     keytool -importcert -keystore node1-keystore.jks -alias ca-cert -file ca-cert.cert -noprompt -keypass cassandra -storepass cassandra	
     keytool -importcert -keystore node2-keystore.jks -alias ca-cert -file ca-cert.cert -noprompt -keypass cassandra -storepass cassandra
    
  8. Now, import the signed certificate into the keystore for each node. Below, the command sequence is demonstrated as if two nodes comprised this cluster.

    1
    2
     keytool -importcert -keystore node1-keystore.jks -alias node1 -file node1-signed.cert -noprompt -keypass cassandra -storepass cassandra	
     keytool -importcert -keystore node2-keystore.jks -alias node2 -file node2-signed.cert -noprompt -keypass cassandra -storepass cassandra 
    
  9. Create a Cassandra server truststore file. This essentially acts as a certificate authority, allowing all nodes whose client certificates were signed here to communicate.

    1
    keytool -importcert -keystore cassandra-truststore.jks -alias truststore -file ca-cert.cert -noprompt -keypass [password] -storepass [password]
    

Copy Files to Each Node in The Cluster

  1. Copy the truststore file and keystore files into Cassandra’s conf directory for each node. Depending on your installation, the conf directory could be located at /etc/cassandra/conf, or /etc/cassandra.

    If you receive a “Permission denied” error upon executing the following command, your destination server user does not have permissions to access Cassandra’s config directory.

    1
    2
    3
     scp ~/.keystore/cassandra-truststore.jks username@<dest_server_public_ip>:/cassandra/config/directory/cassandra-truststore.jks
    		
     scp ~/.keystore/[Cluster_Name].jks username@<dest_server_public_ip>:/cassandra/config/directory/[Cluster_Name]-keystore.jks
    

    Use the “-i” option if your destination server requires a certificate to login.

    1
     scp -i /local_path/to/private_key_file ~/.keystore/cassandra-truststore.jks username@<dest_server_public_ip>:/cassandra/config/directory/cassandra-truststore.jks
    

Configure Encryption Settings

  1. Edit the cassandra.yaml file on each node to match the following. Replace text in [brackets] with the indicated information.
/etc/cassandra/conf/cassandra.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
server_encryption_options:
  internode_encryption: all
  keystore: /etc/cassandra/conf/[keystore_file.jks]
  keystore_password: cassandra
  truststore: /etc/cassandra/conf/[truststore_file.jks]
  truststore_password: cassandra
  # More advanced defaults below:
  protocol: TLS
  algorithm: SunX509
  store_type: JKS
  cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA]
  require_client_auth: true

You may want to configure the internode_encryption setting to better meet the needs of your specific environment. A breakdown of available values are shown below.

Property Property description
all All traffic between nodes is encrypted.
none No traffic is encrypted.
dc Only traffic between datacenters is encrypted.
rack Only traffic between server racks is encrypted.

Verify SSL Setup

Run the following commands on each server node.

  1. Reboot Cassandra

    1
    systemctl restart cassandra
    
  2. Verify the nodes are online and communicating.

    1
     nodetool status
    
  3. Check log file to verify ssl encryption status.

    1
     grep SSL /var/log/cassandra/system.log 2>&1 | tail -1
    

If successful, your console output should read similar to the following:

1
INFO  [main] 2017-07-19 14:35:14,212 MessagingService.java:521 - Starting Encrypted Messaging Service on SSL port 7001

Automate SSL Certificate Generation

If you have many Cassandra nodes for which to create and distribute certificates, the process outlined above can quickly become tedious. Now that you understand how SSL certificates are generated for Cassandra, the process can be automated with a bash script. You can automate the process with a script. The rest of this guide shows you how to use the SSL certificate generator to automate the generation of SSL certificates.

  1. Navigate to any folder and pull the script from Github:

    1
     git pull https://github.com/Darkstar90/cassandra-keygen.git
    
  2. Run the script with the -h or --help option

    1
     bash cassandra-keygen.sh --help
    
  3. The output of the command will demonstrate the capabilities of the script and define its usage. The script has 3 modes:

    1
    2
     a. Generate Node Certificates  b. Import Node Certificates
     c. Generate Truststore
    

An example usage of each mode is demonstrated below, for a cluster of 6 nodes:

Generate Node Certificates

1
bash cassandra-keygen.sh  --nodes 6 --directory /etc/cassandra/conf/.keystore --cluster Cassandra_Cluster --password cassandra --sslconfig /etc/cassandra/conf/rootCAcert.conf --keysize 4096

Import Node Certificates

1
bash cassandra-keygen.sh -n 6 -d /etc/cassandra/conf/.keystore -p cassandra

Generate Truststore

1
bash cassandra-keygen.sh --directory /etc/cassandra/conf/.keystore --truststore cassandra

Do All Actions

1
bash cassandra-keygen.sh  --nodes 6 --directory /etc/cassandra/conf/.keystore --cluster Cassandra_Cluster --password cassandra --truststore cassandra --sslconfig /etc/cassandra/conf/rootCAcert.conf --keysize 4096

Where to Go From Here

Now that your Cassandra cluster is up and running with node-to-node SSL encryption, you are prepared to deploy production-ready databases. Logging into each node in the cluster with cqlsh can also be accomplished with encryption. See the “[Client-to-node encryption]” link the external resources section for information on setting that up.

More Information

You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

This guide is published under a CC BY-ND 4.0 license.