Install and Configure Apache Kafka on Ubuntu

Traducciones al Español
Estamos traduciendo nuestros guías y tutoriales al Español. Es posible que usted esté viendo una traducción generada automáticamente. Estamos trabajando con traductores profesionales para verificar las traducciones de nuestro sitio web. Este proyecto es un trabajo en curso.

Apache Kafka, often known simply as Kafka, is a popular open-source platform for stream management and processing. Kafka is structured around the concept of an event. External agents, independently and asynchronously, send and receive event notifications to and from Kafka. Kafka accepts a continuous stream of events from multiple clients, stores them, and potentially forwards them to a second set of clients for further processing. It is flexible, robust, reliable, self-contained, and offers low latency along with high throughput. LinkedIn originally developed Kafka, but the Apache Software Foundation offers the current open-source iteration.

Before You Begin

  1. Familiarize yourself with our Getting Started with Linode guide and complete the steps for setting your Linode’s hostname and timezone.

  2. This guide will use sudo wherever possible. Complete the sections of our How to Secure Your Server to create a standard user account, harden SSH access and remove unnecessary network services. Do not follow the Configure a Firewall section yet as this guide includes firewall rules specifically for an OpenVPN server.

  3. Update your system:

     sudo apt-get update && sudo apt-get upgrade
    
Note
This guide is written for non-root users. Commands that require elevated privileges are prefixed with sudo. If you’re not familiar with the sudo command, see the Linux Users and Groups guide.

A Summary of the Apache Kafka Installation Process

A complete Kafka installation consists of the high-level steps listed below. Each step is described in a separate section. These instructions are designed for Ubuntu 20.04 but are generally valid for any Debian-based Linux distribution.

  1. Install Java
  2. Download and Install Apache Kafka
  3. Run Kafka
  4. Create a Kafka Topic
  5. Write and Read Kafka Events
  6. Process Data with Kafka Streams
  7. Create System Files for Zookeeper and Kafka

Install Java

You must install Java before you can use Apache Kafka. This guide explains how to install OpenJDK, an open-source version of Java.

  1. Update your Ubuntu packages.

     sudo apt update
    
  2. Install OpenJDK with apt.

     sudo apt install openjdk-11-jdk
    
  3. Confirm you installed the expected version of Java.

     java -version
    

    Java returns some basic information about the installation. The information can vary based on the version you have installed.

    openjdk version "11.0.9.1" 2020-11-04
    OpenJDK Runtime Environment (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04)
    OpenJDK 64-Bit Server VM (build 11.0.9.1+1-Ubuntu-0ubuntu1.20.04, mixed mode, sharing)

Download and Install Apache Kafka

Tar archives for Apache Kafka can be downloaded directly from the Apache Site and installed with the process outlined in this section. The name of the Kafka download varies based on the release version. Substitute the name of your own file wherever you see kafka_2.13-2.7.0.tgz.

  1. Navigate to the Apache Kafka Downloads page and choose the Kafka release you want. We recommend choosing the latest version, which is currently Apache Kafka 2.7. This link takes you to a landing page where you can use either HTTP or FTP to download the tar file.

  2. If you downloaded the software onto a different computer than the host, transfer the Apache Kafka files to the host via scp, ftp, or another file transfer method. Replace the user and yourhost values with your user name and host IP address:

     scp /localpath/kafka_2.13-2.7.0.tgz [email protected]:~/
    
    Note
    If the transfer is blocked, verify your firewall is not blocking the connection. Execute sudo ufw allow 22/tcp to allow ufw to allow scp transfers.
  3. (Optional) You can confirm you downloaded the file correctly with a SHA512 checksum. You can find the checksum file on the Apache Kafka Downloads page. Each release includes a link to a corresponding sha512 file. Download this file and transfer it to your Kafka host using scp. Place the checksum file in the same directory as your tar file. Execute the following command to generate a checksum for the tar file:

     gpg --print-md SHA512 kafka_2.13-2.7.0.tgz
    

    Compare the output from this command against the contents of the SHA512 file. The two checksums should match. This step does not confirm the authenticity of the file, only its validity. The checksum output has the following format:

    kafka_2.13-2.7.0.tgz: F3DD1FD8 8766D915 0D3D395B 285BFA75 F5B89A83 58223814
                          90C8428E 6E568889 054DDB5F ADA1EB63 613A6441 989151BC
                          7C7D6CDE 16A871C6 674B909C 4EDD4E28

  4. For extra security, confirm the file is signed. Download the .asc file and the signing keys associated with the release. You can find these files on the Apache Kafka Downloads page. The link to the KEYS file is located at the top of the page. Each release includes a link to its asc file. Download these files and transfer them to your Kafka host using scp. Place these files in the same directory as your tar file.

    • Import the keys from the KEYS file. This installs the entire key set.

      gpg --import KEYS
      
    • Use gpg to verify the signature.

      gpg --verify kafka_2.13-2.7.0.tgz.asc  kafka_2.13-2.7.0.tgz
      
    • The output should list the actual RSA key and the person who signed it.

      gpg: Signature made Wed Dec 16 14:03:36 2020 UTC
      gpg:                using RSA key DFB5ABA9CD50A02B5C2A511662A9813636302260
      gpg:                issuer "[email protected]"
      gpg: Good signature from "Bill Bejeck (CODE SIGNING KEY) <[email protected]>" [unknown]
      Note
      Gpg might warn you the “key is not certified with a trusted signature”. Unfortunately, there is no easy way to confirm the authenticity of the signer, and for most deployments, this is not necessary. For unqualified authentication for high-security deployments, follow the steps for Validating Authenticity of a Key on the Apache Kafka Authentication page.

  5. Extract the files with the tar utility. After the extraction process is complete, either delete the archive or store it in a secure place elsewhere on your system.

     tar -zxvf kafka_2.13-2.7.0.tgz
    
  6. (Optional) Create a new centralized directory for Kafka and move the extracted files to this new Kafka home directory.

     sudo mkdir /home/kafka
     sudo mv kafka_2.13-2.7.0 /home/kafka
    

Run Kafka

Kafka can be launched directly from the command line. You must launch the Zookeeper module before running Kafka.

  1. Review the settings contained in the kafka_2.13-2.7.0/config/server.properties file within your Kafka directory. For now, the default settings are fine. But we recommend you set the delete.topic.enable attribute to true at the end of the file. This allows you to delete any topics you might create during testing.

    File: /home/kafka/kafka_2.13-2.7.0/config/server.properties
    1
    2
    3
    
    ...
    delete.topic.enable = true
        

  2. Change to the Kafka home directory and start Zookeeper.

     cd /home/kafka/kafka_2.13-2.7.0/
     bin/zookeeper-server-start.sh config/zookeeper.properties
    
    Note
    Leave all settings in Zookeeper.properties at the defaults for most deployments.
  3. Open a new console session and launch Kafka.

     cd /home/kafka/kafka_2.13-2.7.0/
     bin/kafka-server-start.sh config/server.properties
    

Create a Kafka Topic

Before you can send any events to Kafka, you must create a topic to contain the events. An explanation of topics can be found in Linode’s Introduction to Kafka.

  1. Open a new console session.

  2. Change the directory to your Kafka directory and create a new topic named test-events.

     cd /home/kafka/kafka_2.13-2.7.0/
     bin/kafka-topics.sh --create --topic test-events --bootstrap-server localhost:9092
    

    Kafka confirms the topic has been created.

    Created topic test-events.
  3. Generate a list of all the topics on the cluster with the --list option. You should see test-events listed in the output.

     bin/kafka-topics.sh --list --zookeeper localhost:2181
    
  4. Use the describe flag to display all information about the new topic.

     bin/kafka-topics.sh --describe --topic test-events --zookeeper localhost:2181
    

    Kafka returns a summary of the topic, including the number of partitions and the replication factor.

    CTopic: test-events PartitionCount: 1 ReplicationFactor: 1 Configs:
    Topic: test-events Partition: 0 Leader: 0 Replicas: 0 Isr: 0

Writing and Reading Kafka Events

Kafka’s command-line interface allows you to quickly test out the new topic. Use the API to create a Producer and write some events into the topic. Then, create a consumer and read the events you wrote.

  1. Open a new console session for the producer and change the directory to the Kafka root directory.

     cd /home/kafka/kafka_2.13-2.7.0/
    
  2. Configure a producer and specify a topic for its events. You are not creating any events yet, only a client with the ability to send events. Kafka returns a prompt > indicating the producer is ready.

     bin/kafka-console-producer.sh --topic test-events --bootstrap-server localhost:9092
    
  3. Send a few key-value pairs to Kafka. Separate the keys and values with a :. You can choose to write messages with different keys or with the same key. If you do not specify a key, and only specify a value, the event is assigned a NULL key.

     >key1: This is event 1
     >key2: This is event 2
     >key1: This is event 3
    
  4. Open a new console session to run the consumer and change the directory to the root Kafka directory.

     cd /home/kafka/kafka_2.13-2.7.0/
    
  5. Create the consumer, specifying the test-events topic it should read from. The --from-beginning flag indicates it should read all events starting from the beginning of the topic.

     bin/kafka-console-consumer.sh --topic test-events --from-beginning --bootstrap-server localhost:9092
    
    Note

    Kafka’s Consumer API provides options to format the incoming events. Run the following command to view the full list.

    bin/kafka-console-consumer.sh
    
  6. The consumer immediately polls Kafka for any outstanding events in the topic and displays them onscreen. You should be able to see all the events you sent earlier.

    key1: This is event 1
    key2: This is event 2
    key1: This is event 3
  7. Return to the producer console (the producer should still be running) and generate another new event.

     >key2: This is event 4
    
  8. The event immediately appears in the consumer console.

    key2: This is event 4
  9. Stop the producer or consumer anytime you like with a Ctrl-C command.

    Note
    Events are durable and can be read as many times as you want. You can create a second consumer for the same topic and have it read all the same events.

Process Data with Kafka Streams

Kafka Streams is a library for performing real-time transformations and analysis on a stream. A Kafka Streams application typically acts as both a consumer and a producer. It polls a topic for new events, processes the data, and transmits its output as events to a second topic. Other applications are consumers of this second topic. Kafka Streams is explained in Linode’s Introduction to Apache Kafka.

You can use the WordCountDemo Java application included with Kafka Streams to run a quick demo. WordCountDemo consumes streams-plaintext-input events. It parses and processes the lines, and stores the words and counts in a table. The updated word counts are converted to a stream of events and sent to the streams-plaintext-input topic. The entire file is included below.

File: WordCountDemo.java
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
// Serializers/deserializers (serde) for String and Long types
final Serde<String> stringSerde = Serdes.String();
final Serde<Long> longSerde = Serdes.Long();

// Construct a `KStream` from the input topic "streams-plaintext-input", where message values
// represent lines of text (for the sake of this example, we ignore whatever may be stored
// in the message keys).
KStream<String, String> textLines = builder.stream(
      "streams-plaintext-input",
      Consumed.with(stringSerde, stringSerde)
    );

KTable<String, Long> wordCounts = textLines
    // Split each text line, by whitespace, into words.
    .flatMapValues(value -> Arrays.asList(value.toLowerCase().split("\\W+")))

    // Group the text words as message keys
    .groupBy((key, value) -> value)

    // Count the occurrences of each word (message key).
    .count();

// Store the running counts as a changelog stream to the output topic.
wordCounts.toStream().to("streams-wordcount-output", Produced.with(Serdes.String(), Serdes.Long()));
  1. Create a topic on the Kafka cluster to store the sample word count data.

     cd /home/kafka/kafka_2.13-2.7.0/
     bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --topic streams-plaintext-input
    

    Kafka confirms it has created the topic.

  2. Create a second topic to store the output of the Kafka Streams application. Set the cleanup policy to compact entries, so only the updated word counts are stored.

     bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --topic streams-wordcount-output --config cleanup.policy=compact
    

    Kafka again confirms it has created the topic.

  3. Run the WordCountDemo application.

     bin/kafka-run-class.sh org.apache.kafka.streams.examples.wordcount.WordCountDemo
    
  4. Launch a producer to send test data to the WordCountDemo stream as streams-plaintext-input events.

     bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic streams-plaintext-input
    
  5. Create a consumer to listen to the streams-wordcount-output stream. This stream contains the updated results of the WordCountDemo application. Set the formatting properties as follows to create more legible output.

     bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic streams-wordcount-output --from-beginning --formatter kafka.tools.DefaultMessageFormatter --property print.key=true --property print.value=true --property key.deserializer=org.apache.kafka.common.serialization.StringDeserializer --property value.deserializer=org.apache.kafka.common.serialization.LongDeserializer
    
  6. Enter some test data at the producer prompt.

     This is not the end
    
  7. Verify the word counts are displayed in the consumer window.

    this   1
    is 1
    not 1
    the 1
    end 1

  8. Use the producer to write more test input.

     The end of the line
    
  9. Review the new output from the consumer. Notice how the word counts have been updated.

    the 2
    end 2
    of 1
    the 3
    line   1

  10. When you are finished with the demo, use Ctrl-C to stop the producer, the consumer, and the WordCountDemo application.

Create System Files for Zookeeper and Kafka

Until now, you have been starting Zookeeper and Kafka from the command line inside the Kafka directory. This is perfectly acceptable, but it is much easier to create entries for them inside /etc/systemd/system/ and start them with systemctl enable.

  1. Create a system file for Zookeeper called /etc/systemd/system/zookeeper.service.

     sudo vi /etc/systemd/system/zookeeper.service
    
  2. Edit the file and add the following information. Use the location of your Kafka directory in the path names.

    File: /etc/systemd/system/zookeeper.service
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    
    [Unit]
    Description=Apache Zookeeper Server
    Requires=network.target remote-fs.target
    After=network.target remote-fs.target
    
    [Service]
    Type=simple
    ExecStart=/home/kafka/kafka_2.13-2.7.0/bin/zookeeper-server-start.sh /home/kafka/kafka_2.13-2.7.0/config/zookeeper.properties
    ExecStop=/home/kafka/kafka_2.13-2.7.0/bin/zookeeper-server-stop.sh
    
    Restart=on-abnormal
    
    [Install]
    WantedBy=multi-user.target
  3. Create a second file for the Kafka server called /etc/systemd/system/kafka.service.

     sudo vi /etc/systemd/system/kafka.service
    
  4. Edit the file and add the following information. Verify the full path to your Java application and enter it as the JAVA_HOME path.

    File: /etc/systemd/system/kafka.service
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    
    [Unit]
    Description=Apache Kafka Server
    Requires=zookeeper.service
    After=zookeeper.service
    
    [Service]
    Type=simple
    Environment="JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64"
    ExecStart=/home/kafka/kafka_2.13-2.7.0/bin/kafka-server-start.sh /home/kafka/kafka_2.13-2.7.0/config/server.properties
    ExecStop=/home/kafka/kafka_2.13-2.7.0/bin/kafka-server-stop.sh
    
    Restart=on-abnormal
    
    [Install]
    WantedBy=multi-user.target
  5. Reload the systemd daemon and start both applications.

     sudo systemctl daemon-reload
     sudo systemctl enable --now zookeeper
     sudo systemctl enable --now kafka
    
  6. Confirm both Kafka and the Zookeeper are running as expected. Verify the status of both processes with systemctl status.

     sudo systemctl status kafka zookeeper
    

    The entries should both show as active.

    kafka.service - Apache Kafka Server
        Loaded: loaded (/etc/systemd/system/kafka.service; enabled; vendor preset: enabled)
        Active: active (running) since Thu 2021-01-21 15:13:45 UTC; 4s ago
    ...

Shut Down the Kafka Environment

When you are finished with Kafka, we recommend you gracefully shut down all components and delete all unnecessary logs.

  1. Shut down any Kafka consumers and producers and any Kafka Streams applications with a ctrl-C command.

  2. Shut down Kafka and then Zookeeper with systemctl stop commands. If you did not register your Kafka application with the systemd daemon, shut them down with a Ctrl-C command.

     sudo systemctl stop kafka
     sudo systemctl stop zookeeper
    
  3. Clean up any test data with the following command:

     rm -rf /tmp/kafka-logs /tmp/zookeeper
    

More Information

You may wish to consult the following resources for additional information on this topic. While these are provided in the hope that they will be useful, please note that we cannot vouch for the accuracy or timeliness of externally hosted materials.

This page was originally published on

Create a Linode account to try this guide with a $100 credit.
This credit will be applied to any valid services used during your first 60 days.

Your Feedback Is Important

Let us know if this guide made it easy to get the answer you needed.


Join the conversation.
Read other comments or post your own below. Comments must be respectful, constructive, and relevant to the topic of the guide. Do not post external links or advertisements. Before posting, consider if your comment would be better addressed by contacting our Support team or asking on our Community Site.