One place for hosting & domains

      How To Install Apache Kafka on Ubuntu 20.04


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Apache Kafka is a popular distributed message broker designed to handle large volumes of real-time data. A Kafka cluster is highly scalable and fault-tolerant. It also has a much higher throughput compared to other message brokers like ActiveMQ and RabbitMQ. Though it is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.

      A publish/subscribe messaging system allows one or more producers to publish messages without considering the number of consumers or how they will process the messages. Subscribed clients are notified automatically about updates and the creation of new messages. This system is more efficient and scalable than systems where clients poll periodically to determine if new messages are available.

      In this tutorial, you will install and use Apache Kafka 2.6.0 on Ubuntu 20.04.

      Prerequisites

      To follow along, you will need:

      Step 1 &mdash Creating a User for Kafka

      Because Kafka can handle requests over a network, your first step is to create a dedicated user for the service. This minimizes damage to your Ubuntu machine in the event that someone compromises the Kafka server. We will create a dedicated kafka user in this step.

      Logged in as your non-root sudo user, create a user called kafka:

      Follow the prompts to set a password and create the kafka user.

      Next, add the kafka user to the sudo group with the adduser command. You need these privileges to install Kafka’s dependencies:

      Your kafka user is now ready. Log into the account using su:

      Now that you’ve created a Kafka-specific user, you are ready to download and extract the Kafka binaries.

      Step 2 &mdash Downloading and Extracting the Kafka Binaries

      Let’s download and extract the Kafka binaries into dedicated folders in our kafka user’s home directory.

      To start, create a directory in /home/kafka called Downloads to store your downloads:

      Use curl to download the Kafka binaries:

      • curl "https://downloads.apache.org/kafka/2.6.1/kafka_2.13-2.6.1.tgz" -o ~/Downloads/kafka.tgz

      Create a directory called kafka and change to this directory. This will be the base directory of the Kafka installation:

      • mkdir ~/kafka && cd ~/kafka

      Extract the archive you downloaded using the tar command:

      • tar -xvzf ~/Downloads/kafka.tgz --strip 1

      We specify the --strip 1 flag to ensure that the archive’s contents are extracted in ~/kafka/ itself and not in another directory (such as ~/kafka/kafka_2.13-2.6.0/) inside of it.

      Now that we’ve downloaded and extracted the binaries successfully, we can start configuring our Kafka server.

      Step 3 &mdash Configuring the Kafka Server

      Kafka’s default behavior will not allow you to delete a topic. A Kafka topic is the category, group, or feed name to which messages can be published. To modify this, you must edit the configuration file.

      Kafka’s configuration options are specified in server.properties. Open this file with nano or your favorite editor:

      • nano ~/kafka/config/server.properties

      First, add a setting that will allow us to delete Kafka topics. Add the following to the bottom of the file:

      ~/kafka/config/server.properties

      delete.topic.enable = true
      

      Second, change the directory where the Kafka logs are stored by modifying the logs.dir property:

      ~/kafka/config/server.properties

      log.dirs=/home/kafka/logs
      

      Save and close the file. Now that you’ve configured Kafka, your next step is to create systemd unit files for running and enabling the Kafka server on startup.

      Step 4 &mdash Creating Systemd Unit Files and Starting the Kafka Server

      In this section, you will create systemd unit files for the Kafka service. This will help you perform common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.

      Zookeeper is a service that Kafka uses to manage its cluster state and configurations. It is used in many distributed systems. If you would like to know more about it, visit the official Zookeeper docs.

      Create the unit file for zookeeper:

      • sudo nano /etc/systemd/system/zookeeper.service

      Enter the following unit definition into the file:

      /etc/systemd/system/zookeeper.service

      [Unit]
      Requires=network.target remote-fs.target
      After=network.target remote-fs.target
      
      [Service]
      Type=simple
      User=kafka
      ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
      ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
      Restart=on-abnormal
      
      [Install]
      WantedBy=multi-user.target
      

      The [Unit] section specifies that Zookeeper requires networking and the filesystem to be ready before it can start.

      The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files for starting and stopping the service. It also specifies that Zookeeper should be restarted if it exits abnormally.

      After adding this content, save and close the file.

      Next, create the systemd service file for kafka:

      • sudo nano /etc/systemd/system/kafka.service

      Enter the following unit definition into the file:

      /etc/systemd/system/kafka.service

      [Unit]
      Requires=zookeeper.service
      After=zookeeper.service
      
      [Service]
      Type=simple
      User=kafka
      ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
      ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
      Restart=on-abnormal
      
      [Install]
      WantedBy=multi-user.target
      

      The [Unit] section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper gets started automatically when the kafka service starts.

      The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. It also specifies that Kafka should be restarted if it exits abnormally.

      Now that you have defined the units, start Kafka with the following command:

      • sudo systemctl start kafka

      To ensure that the server has started successfully, check the journal logs for the kafka unit:

      • sudo systemctl status kafka

      You will receive output like this:

      Output

      ● kafka.service Loaded: loaded (/etc/systemd/system/kafka.service; disabled; vendor preset: enabled) Active: active (running) since Wed 2021-02-10 00:09:38 UTC; 1min 58s ago Main PID: 55828 (sh) Tasks: 67 (limit: 4683) Memory: 315.8M CGroup: /system.slice/kafka.service ├─55828 /bin/sh -c /home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1 └─55829 java -Xmx1G -Xms1G -server -XX:+UseG1GC -XX:MaxGCPauseMillis=20 -XX:InitiatingHeapOccupancyPercent=35 -XX:+ExplicitGCInvokesConcurrent -XX:MaxInlineLevel=15 -Djava.awt.headless=true -Xlog:gc*:file=> Feb 10 00:09:38 cart-67461-1 systemd[1]: Started kafka.service.

      You now have a Kafka server listening on port 9092.

      You have started the kafka service. But if you rebooted your server, Kafka would not restart automatically. To enable the kafka service on server boot, run the following commands:

      • sudo systemctl enable zookeeper
      • sudo systemctl enable kafka

      In this step, you started and enabled the kafka and zookeeper services. In the next step, you will check the Kafka installation.

      Step 5 &mdash Testing the Kafka Installation

      In this step, you will test your Kafka installation. Specifically, you will publish and consume a “Hello World” message to make sure the Kafka server is behaving correctly.

      Publishing messages in Kafka requires:

      • A producer, who enables the publication of records and data to topics.
      • A consumer, who reads messages and data from topics.

      To begin, create a topic named TutorialTopic:

      • ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

      You can create a producer from the command line using the kafka-console-producer.sh script. It expects the Kafka server’s hostname, a port, and a topic as arguments.

      Now publish the string "Hello, World" to the TutorialTopic topic:

      • echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

      Next, create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server’s hostname and port, along with a topic name as arguments.

      The following command consumes messages from TutorialTopic. Note the use of the --from-beginning flag, which allows the consumption of messages that were published before the consumer was started:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

      If there are no configuration issues, you will see Hello, World appear in your terminal:

      Output

      Hello, World

      The script will continue to run, waiting for more messages to publish. To test this, open a new terminal window and log into your server.

      In this new terminal, start a producer to publish another message:

      • echo "Hello World from Sammy at DigitalOcean!" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

      You will see this message in the consumer’s output:

      Output

      Hello, World Hello World from Sammy at DigitalOcean!

      When you are done testing, press CTRL+C to stop the consumer script.

      You have now installed and configured a Kafka server on Ubuntu 20.04. In the next step, you will perform a few quick tasks to harden the security of your Kafka server.

      Step 6 &mdash Hardening the Kafka Server

      With your installation complete, you can remove the kafka user’s admin privileges. Before you do so, log out and log back in as any other non-root sudo user. If you are still running the same shell session that you started this tutorial with, type exit.

      Remove the kafka user from the sudo group:

      To further improve your Kafka server’s security, lock the kafka user’s password using the passwd command. This makes sure that nobody can directly log into the server using this account:

      At this point, only root or a sudo user can log in as kafka by typing in the following command:

      In the future, if you want to unlock it, use passwd with the -u option:

      You have now successfully restricted the kafka user’s admin privileges. You are ready to begin using Kafka, or you can follow the next optional step, which will add KafkaT to your system.

      Step 7 &mdash Installing KafkaT (Optional)

      KafkaT is a tool that Airbnb developed. It makes it easier to view details about your Kafka cluster and perform certain administrative tasks from the command line. But because it is a Ruby gem, you will need Ruby to use it. You will also need the build-essential package to build the other gems that KafkaT depends on. Install them using apt:

      • sudo apt install ruby ruby-dev build-essential

      You can now install KafkaT using the gem command:

      • sudo CFLAGS=-Wno-error=format-overflow gem install kafkat

      The “Wno-error=format-overflow” compilation flag is required to suppress Zookeeper’s warnings and errors during kafkat’s installation process.

      KafkaT uses .kafkatcfg as the configuration file to determine the installation and log directories of your Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.

      Create a new file called .kafkatcfg:

      Add the following lines to specify the required information about your Kafka server and Zookeeper instance:

      ~/.kafkatcfg

      {
        "kafka_path": "~/kafka",
        "log_path": "/home/kafka/logs",
        "zk_path": "localhost:2181"
      }
      

      You are now ready to use KafkaT. For a start, here’s how you would use it to view details about all Kafka partitions:

      You will see the following output:

      Output

      [DEPRECATION] The trollop gem has been renamed to optimist and will no longer be supported. Please switch to optimist as soon as possible. /var/lib/gems/2.7.0/gems/json-1.8.6/lib/json/common.rb:155: warning: Using the last argument as keyword parameters is deprecated ... Topic Partition Leader Replicas ISRs TutorialTopic 0 0 [0] [0] __consumer_offsets 0 0 [0] [0] ... ...

      You will see TutorialTopic, as well as __consumer_offsets, an internal topic used by Kafka for storing client-related information. You can safely ignore lines starting with __consumer_offsets.

      To learn more about KafkaT, refer to its GitHub repository.

      Conclusion

      You now have Apache Kafka running securely on your Ubuntu server. You can now integrate Kafka into your favorite programming language using Kafka clients.

      To learn more about Kafka, you can also consult its documentation.



      Source link