One place for hosting & domains

      Debian

      How To Create a Kubernetes Cluster Using Kubeadm on Debian 9


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Kubernetes is a container orchestration system that manages containers at scale. Initially developed by Google based on its experience running containers in production, Kubernetes is open source and actively developed by a community around the world.

      Note: This tutorial uses version 1.14 of Kubernetes, the official supported version at the time of this article’s publication. For up-to-date information on the latest version, please see the current release notes in the official Kubernetes documentation.

      Kubeadm automates the installation and configuration of Kubernetes components such as the API server, Controller Manager, and Kube DNS. It does not, however, create users or handle the installation of operating-system-level dependencies and their configuration. For these preliminary tasks, it is possible to use a configuration management tool like Ansible or SaltStack. Using these tools makes creating additional clusters or recreating existing clusters much simpler and less error prone.

      In this guide, you will set up a Kubernetes cluster from scratch using Ansible and Kubeadm, and then deploy a containerized Nginx application to it.

      Goals

      Your cluster will include the following physical resources:

      The master node (a node in Kubernetes refers to a server) is responsible for managing the state of the cluster. It runs Etcd, which stores cluster data among components that schedule workloads to worker nodes.

      Worker nodes are the servers where your workloads (i.e. containerized applications and services) will run. A worker will continue to run your workload once they’re assigned to it, even if the master goes down once scheduling is complete. A cluster’s capacity can be increased by adding workers.

      After completing this guide, you will have a cluster ready to run containerized applications, provided that the servers in the cluster have sufficient CPU and RAM resources for your applications to consume. Almost any traditional Unix application including web applications, databases, daemons, and command line tools can be containerized and made to run on the cluster. The cluster itself will consume around 300-500MB of memory and 10% of CPU on each node.

      Once the cluster is set up, you will deploy the web server Nginx to it to ensure that it is running workloads correctly.

      Prerequisites

      Step 1 — Setting Up the Workspace Directory and Ansible Inventory File

      In this section, you will create a directory on your local machine that will serve as your workspace. You will configure Ansible locally so that it can communicate with and execute commands on your remote servers. Once that’s done, you will create a hosts file containing inventory information such as the IP addresses of your servers and the groups that each server belongs to.

      Out of your three servers, one will be the master with an IP displayed as master_ip. The other two servers will be workers and will have the IPs worker_1_ip and worker_2_ip.

      Create a directory named ~/kube-cluster in the home directory of your local machine and cd into it:

      • mkdir ~/kube-cluster
      • cd ~/kube-cluster

      This directory will be your workspace for the rest of the tutorial and will contain all of your Ansible playbooks. It will also be the directory inside which you will run all local commands.

      Create a file named ~/kube-cluster/hosts using nano or your favorite text editor:

      • nano ~/kube-cluster/hosts

      Add the following text to the file, which will specify information about the logical structure of your cluster:

      ~/kube-cluster/hosts

      [masters]
      master ansible_host=master_ip ansible_user=root
      
      [workers]
      worker1 ansible_host=worker_1_ip ansible_user=root
      worker2 ansible_host=worker_2_ip ansible_user=root
      
      [all:vars]
      ansible_python_interpreter=/usr/bin/python3
      

      You may recall that inventory files in Ansible are used to specify server information such as IP addresses, remote users, and groupings of servers to target as a single unit for executing commands. ~/kube-cluster/hosts will be your inventory file and you’ve added two Ansible groups (masters and workers) to it specifying the logical structure of your cluster.

      In the masters group, there is a server entry named “master” that lists the master node’s IP (master_ip) and specifies that Ansible should run remote commands as the root user.

      Similarly, in the workers group, there are two entries for the worker servers (worker_1_ip and worker_2_ip) that also specify the ansible_user as root.

      The last line of the file tells Ansible to use the remote servers’ Python 3 interpreters for its management operations.

      Save and close the file after you’ve added the text.

      Having set up the server inventory with groups, let’s move on to installing operating system level dependencies and creating configuration settings.

      Step 2 — Creating a Non-Root User on All Remote Servers

      In this section you will create a non-root user with sudo privileges on all servers so that you can SSH into them manually as an unprivileged user. This can be useful if, for example, you would like to see system information with commands such as top/htop, view a list of running containers, or change configuration files owned by root. These operations are routinely performed during the maintenance of a cluster, and using a non-root user for such tasks minimizes the risk of modifying or deleting important files or unintentionally performing other dangerous operations.

      Create a file named ~/kube-cluster/initial.yml in the workspace:

      • nano ~/kube-cluster/initial.yml

      Next, add the following play to the file to create a non-root user with sudo privileges on all of the servers. A play in Ansible is a collection of steps to be performed that target specific servers and groups. The following play will create a non-root sudo user:

      ~/kube-cluster/initial.yml

      - hosts: all
        become: yes
        tasks:
          - name: create the 'sammy' user
            user: name=sammy append=yes state=present createhome=yes shell=/bin/bash
      
          - name: allow 'sammy' to have passwordless sudo
            lineinfile:
              dest: /etc/sudoers
              line: 'sammy ALL=(ALL) NOPASSWD: ALL'
              validate: 'visudo -cf %s'
      
          - name: set up authorized keys for the sammy user
            authorized_key: user=sammy key="{{item}}"
            with_file:
              - ~/.ssh/id_rsa.pub
      

      Here’s a breakdown of what this playbook does:

      • Creates the non-root user sammy.

      • Configures the sudoers file to allow the sammy user to run sudo commands without a password prompt.

      • Adds the public key in your local machine (usually ~/.ssh/id_rsa.pub) to the remote sammy user’s authorized key list. This will allow you to SSH into each server as the sammy user.

      Save and close the file after you’ve added the text.

      Next, execute the playbook by locally running:

      • ansible-playbook -i hosts ~/kube-cluster/initial.yml

      The command will complete within two to five minutes. On completion, you will see output similar to the following:

      Output

      PLAY [all] **** TASK [Gathering Facts] **** ok: [master] ok: [worker1] ok: [worker2] TASK [create the 'sammy' user] **** changed: [master] changed: [worker1] changed: [worker2] TASK [allow 'sammy' user to have passwordless sudo] **** changed: [master] changed: [worker1] changed: [worker2] TASK [set up authorized keys for the sammy user] **** changed: [worker1] => (item=ssh-rsa AAAAB3...) changed: [worker2] => (item=ssh-rsa AAAAB3...) changed: [master] => (item=ssh-rsa AAAAB3...) PLAY RECAP **** master : ok=5 changed=4 unreachable=0 failed=0 worker1 : ok=5 changed=4 unreachable=0 failed=0 worker2 : ok=5 changed=4 unreachable=0 failed=0

      Now that the preliminary setup is complete, you can move on to installing Kubernetes-specific dependencies.

      Step 3 — Installing Kubernetes Dependencies

      In this section, you will install the operating-system-level packages required by Kubernetes with Debian’s package manager. These packages are:

      • Docker – a container runtime. It is the component that runs your containers. Support for other runtimes such as rkt is under active development in Kubernetes.

      • kubeadm – a CLI tool that will install and configure the various components of a cluster in a standard way.

      • kubelet – a system service/program that runs on all nodes and handles node-level operations.

      • kubectl – a CLI tool used for issuing commands to the cluster through its API Server.

      Create a file named ~/kube-cluster/kube-dependencies.yml in the workspace:

      • nano ~/kube-cluster/kube-dependencies.yml

      Add the following plays to the file to install these packages to your servers:

      ~/kube-cluster/kube-dependencies.yml

      - hosts: all
        become: yes
        tasks:
         - name: install remote apt deps
           apt:
             name: "{{ item }}"
             state: present
           with_items:
             - apt-transport-https
             - ca-certificates
             - gnupg2
             - software-properties-common
      
         - name: add Docker apt-key
           apt_key:
             url: https://download.docker.com/linux/debian/gpg
             state: present
      
         - name: add Docker's APT repository
           apt_repository:
            repo: deb https://download.docker.com/linux/debian stretch stable
            state: present
            filename: 'docker'
      
         - name: install Docker
           apt:
             name: docker-ce
             state: present
             update_cache: true
      
         - name: add Kubernetes apt-key
           apt_key:
             url: https://packages.cloud.google.com/apt/doc/apt-key.gpg
             state: present
      
         - name: add Kubernetes' APT repository
           apt_repository:
            repo: deb http://apt.kubernetes.io/ kubernetes-xenial main
            state: present
            filename: 'kubernetes'
      
         - name: install kubelet
           apt:
             name: kubelet=1.14.0-00
             state: present
             update_cache: true
      
         - name: install kubeadm
           apt:
             name: kubeadm=1.14.0-00
             state: present
      
      - hosts: master
        become: yes
        tasks:
         - name: install kubectl
           apt:
             name: kubectl=1.14.0-00
             state: present
             force: yes
      

      The first play in the playbook does the following:

      • Add dependencies for adding, verifying and installing packages from remote repositories.

      • Adds the Docker APT repository’s apt-key for key verification.

      • Installs Docker, the container runtime.

      • Adds the Kubernetes APT repository’s apt-key for key verification.

      • Adds the Kubernetes APT repository to your remote servers’ APT sources list.

      • Installs kubelet and kubeadm.

      The second play consists of a single task that installs kubectl on your master node.

      Note: While the Kubernetes documentation recommends you use the latest stable release of Kubernetes for your environment, this tutorial uses a specific version. This will ensure that you can follow the steps successfully, as Kubernetes changes rapidly and the latest version may not work with this tutorial.

      Save and close the file when you are finished.

      Next, execute the playbook by locally running:

      • ansible-playbook -i hosts ~/kube-cluster/kube-dependencies.yml

      On completion, you will see output similar to the following:

      Output

      PLAY [all] **** TASK [Gathering Facts] **** ok: [worker1] ok: [worker2] ok: [master] TASK [install Docker] **** changed: [master] changed: [worker1] changed: [worker2] TASK [install APT Transport HTTPS] ***** ok: [master] ok: [worker1] changed: [worker2] TASK [add Kubernetes apt-key] ***** changed: [master] changed: [worker1] changed: [worker2] TASK [add Kubernetes' APT repository] ***** changed: [master] changed: [worker1] changed: [worker2] TASK [install kubelet] ***** changed: [master] changed: [worker1] changed: [worker2] TASK [install kubeadm] ***** changed: [master] changed: [worker1] changed: [worker2] PLAY [master] ***** TASK [Gathering Facts] ***** ok: [master] TASK [install kubectl] ****** ok: [master] PLAY RECAP **** master : ok=9 changed=5 unreachable=0 failed=0 worker1 : ok=7 changed=5 unreachable=0 failed=0 worker2 : ok=7 changed=5 unreachable=0 failed=0

      After execution, Docker, kubeadm, and kubelet will be installed on all of the remote servers. kubectl is not a required component and is only needed for executing cluster commands. Installing it only on the master node makes sense in this context, since you will run kubectl commands only from the master. Note, however, that kubectl commands can be run from any of the worker nodes or from any machine where it can be installed and configured to point to a cluster.

      All system dependencies are now installed. Let’s set up the master node and initialize the cluster.

      Step 4 — Setting Up the Master Node

      In this section, you will set up the master node. Before creating any playbooks, however, it’s worth covering a few concepts such as Pods and Pod Network Plugins, since your cluster will include both.

      A pod is an atomic unit that runs one or more containers. These containers share resources such as file volumes and network interfaces in common. Pods are the basic unit of scheduling in Kubernetes: all containers in a pod are guaranteed to run on the same node that the pod is scheduled on.

      Each pod has its own IP address, and a pod on one node should be able to access a pod on another node using the pod’s IP. Containers on a single node can communicate easily through a local interface. Communication between pods is more complicated, however, and requires a separate networking component that can transparently route traffic from a pod on one node to a pod on another.

      This functionality is provided by pod network plugins. For this cluster, you will use Flannel, a stable and performant option.

      Create an Ansible playbook named master.yml on your local machine:

      • nano ~/kube-cluster/master.yml

      Add the following play to the file to initialize the cluster and install Flannel:

      ~/kube-cluster/master.yml

      - hosts: master
        become: yes
        tasks:
          - name: initialize the cluster
            shell: kubeadm init --pod-network-cidr=10.244.0.0/16 >> cluster_initialized.txt
            args:
              chdir: $HOME
              creates: cluster_initialized.txt
      
          - name: create .kube directory
            become: yes
            become_user: sammy
            file:
              path: $HOME/.kube
              state: directory
              mode: 0755
      
          - name: copy admin.conf to user's kube config
            copy:
              src: /etc/kubernetes/admin.conf
              dest: /home/sammy/.kube/config
              remote_src: yes
              owner: sammy
      
          - name: install Pod network
            become: yes
            become_user: sammy
            shell: kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml >> pod_network_setup.txt
            args:
              chdir: $HOME
              creates: pod_network_setup.txt
      

      Here’s a breakdown of this play:

      • The first task initializes the cluster by running kubeadm init. Passing the argument --pod-network-cidr=10.244.0.0/16 specifies the private subnet that the pod IPs will be assigned from. Flannel uses the above subnet by default; we’re telling kubeadm to use the same subnet.

      • The second task creates a .kube directory at /home/sammy. This directory will hold configuration information such as the admin key files, which are required to connect to the cluster, and the cluster’s API address.

      • The third task copies the /etc/kubernetes/admin.conf file that was generated from kubeadm init to your non-root user’s home directory. This will allow you to use kubectl to access the newly-created cluster.

      • The last task runs kubectl apply to install Flannel. kubectl apply -f descriptor.[yml|json] is the syntax for telling kubectl to create the objects described in the descriptor.[yml|json] file. The kube-flannel.yml file contains the descriptions of objects required for setting up Flannel in the cluster.

      Save and close the file when you are finished.

      Execute the playbook locally by running:

      • ansible-playbook -i hosts ~/kube-cluster/master.yml

      On completion, you will see output similar to the following:

      Output

      PLAY [master] **** TASK [Gathering Facts] **** ok: [master] TASK [initialize the cluster] **** changed: [master] TASK [create .kube directory] **** changed: [master] TASK [copy admin.conf to user's kube config] ***** changed: [master] TASK [install Pod network] ***** changed: [master] PLAY RECAP **** master : ok=5 changed=4 unreachable=0 failed=0

      To check the status of the master node, SSH into it with the following command:

      Once inside the master node, execute:

      You will now see the following output:

      Output

      NAME STATUS ROLES AGE VERSION master Ready master 1d v1.14.0

      The output states that the master node has completed all initialization tasks and is in a Ready state from which it can start accepting worker nodes and executing tasks sent to the API Server. You can now add the workers from your local machine.

      Step 5 — Setting Up the Worker Nodes

      Adding workers to the cluster involves executing a single command on each. This command includes the necessary cluster information, such as the IP address and port of the master's API Server, and a secure token. Only nodes that pass in the secure token will be able join the cluster.

      Navigate back to your workspace and create a playbook named workers.yml:

      • nano ~/kube-cluster/workers.yml

      Add the following text to the file to add the workers to the cluster:

      ~/kube-cluster/workers.yml

      - hosts: master
        become: yes
        gather_facts: false
        tasks:
          - name: get join command
            shell: kubeadm token create --print-join-command
            register: join_command_raw
      
          - name: set join command
            set_fact:
              join_command: "{{ join_command_raw.stdout_lines[0] }}"
      
      
      - hosts: workers
        become: yes
        tasks:
          - name: join cluster
            shell: "{{ hostvars['master'].join_command }} >> node_joined.txt"
            args:
              chdir: $HOME
              creates: node_joined.txt
      

      Here's what the playbook does:

      • The first play gets the join command that needs to be run on the worker nodes. This command will be in the following format:kubeadm join --token <token> <master-ip>:<master-port> --discovery-token-ca-cert-hash sha256:<hash>. Once it gets the actual command with the proper token and hash values, the task sets it as a fact so that the next play will be able to access that info.

      • The second play has a single task that runs the join command on all worker nodes. On completion of this task, the two worker nodes will be part of the cluster.

      Save and close the file when you are finished.

      Execute the playbook by locally running:

      • ansible-playbook -i hosts ~/kube-cluster/workers.yml

      On completion, you will see output similar to the following:

      Output

      PLAY [master] **** TASK [get join command] **** changed: [master] TASK [set join command] ***** ok: [master] PLAY [workers] ***** TASK [Gathering Facts] ***** ok: [worker1] ok: [worker2] TASK [join cluster] ***** changed: [worker1] changed: [worker2] PLAY RECAP ***** master : ok=2 changed=1 unreachable=0 failed=0 worker1 : ok=2 changed=1 unreachable=0 failed=0 worker2 : ok=2 changed=1 unreachable=0 failed=0

      With the addition of the worker nodes, your cluster is now fully set up and functional, with workers ready to run workloads. Before scheduling applications, let's verify that the cluster is working as intended.

      Step 6 — Verifying the Cluster

      A cluster can sometimes fail during setup because a node is down or network connectivity between the master and worker is not working correctly. Let's verify the cluster and ensure that the nodes are operating correctly.

      You will need to check the current state of the cluster from the master node to ensure that the nodes are ready. If you disconnected from the master node, you can SSH back into it with the following command:

      Then execute the following command to get the status of the cluster:

      You will see output similar to the following:

      Output

      NAME STATUS ROLES AGE VERSION master Ready master 1d v1.14.0 worker1 Ready <none> 1d v1.14.0 worker2 Ready <none> 1d v1.14.0

      If all of your nodes have the value Ready for STATUS, it means that they're part of the cluster and ready to run workloads.

      If, however, a few of the nodes have NotReady as the STATUS, it could mean that the worker nodes haven't finished their setup yet. Wait for around five to ten minutes before re-running kubectl get nodes and inspecting the new output. If a few nodes still have NotReady as the status, you might have to verify and re-run the commands in the previous steps.

      Now that your cluster is verified successfully, let's schedule an example Nginx application on the cluster.

      Step 7 — Running An Application on the Cluster

      You can now deploy any containerized application to your cluster. To keep things familiar, let's deploy Nginx using Deployments and Services to see how this application can be deployed to the cluster. You can use the commands below for other containerized applications as well, provided you change the Docker image name and any relevant flags (such as ports and volumes).

      Still within the master node, execute the following command to create a deployment named nginx:

      • kubectl create deployment nginx --image=nginx

      A deployment is a type of Kubernetes object that ensures there's always a specified number of pods running based on a defined template, even if the pod crashes during the cluster's lifetime. The above deployment will create a pod with one container from the Docker registry's Nginx Docker Image.

      Next, run the following command to create a service named nginx that will expose the app publicly. It will do so through a NodePort, a scheme that will make the pod accessible through an arbitrary port opened on each node of the cluster:

      • kubectl expose deploy nginx --port 80 --target-port 80 --type NodePort

      Services are another type of Kubernetes object that expose cluster internal services to clients, both internal and external. They are also capable of load balancing requests to multiple pods, and are an integral component in Kubernetes, frequently interacting with other components.

      Run the following command:

      This will output text similar to the following:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1d nginx NodePort 10.109.228.209 <none> 80:nginx_port/TCP 40m

      From the third line of the above output, you can retrieve the port that Nginx is running on. Kubernetes will assign a random port that is greater than 30000 automatically, while ensuring that the port is not already bound by another service.

      To test that everything is working, visit http://worker_1_ip:nginx_port or http://worker_2_ip:nginx_port through a browser on your local machine. You will see Nginx's familiar welcome page.

      If you would like to remove the Nginx application, first delete the nginx service from the master node:

      • kubectl delete service nginx

      Run the following to ensure that the service has been deleted:

      You will see the following output:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1d

      Then delete the deployment:

      • kubectl delete deployment nginx

      Run the following to confirm that this worked:

      Output

      No resources found.

      Conclusion

      In this guide, you've successfully set up a Kubernetes cluster on Debian 9 using Kubeadm and Ansible for automation.

      If you're wondering what to do with the cluster now that it's set up, a good next step would be to get comfortable deploying your own applications and services onto the cluster. Here's a list of links with further information that can guide you in the process:

      • Dockerizing applications - lists examples that detail how to containerize applications using Docker.

      • Pod Overview - describes in detail how Pods work and their relationship with other Kubernetes objects. Pods are ubiquitous in Kubernetes, so understanding them will facilitate your work.

      • Deployments Overview - provides an overview of deployments. It is useful to understand how controllers such as deployments work since they are used frequently in stateless applications for scaling and the automated healing of unhealthy applications.

      • Services Overview - covers services, another frequently used object in Kubernetes clusters. Understanding the types of services and the options they have is essential for running both stateless and stateful applications.

      Other important concepts that you can look into are Volumes, Ingresses and Secrets, all of which come in handy when deploying production applications.

      Kubernetes has a lot of functionality and features to offer. The Kubernetes Official Documentation is the best place to learn about concepts, find task-specific guides, and look up API references for various objects.



      Source link

      How To Back Up, Import, and Migrate Your Apache Kafka Data on Debian 9


      The author selected Tech Education Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Backing up your Apache Kafka data is an important practice that will help you recover from unintended data loss or bad data added to the cluster due to user error. Data dumps of cluster and topic data are an efficient way to perform backups and restorations.

      Importing and migrating your backed up data to a separate server is helpful in situations where your Kafka instance becomes unusable due to server hardware or networking failures and you need to create a new Kafka instance with your old data. Importing and migrating backed up data is also useful when you are moving the Kafka instance to an upgraded or downgraded server due to a change in resource usage.

      In this tutorial, you will back up, import, and migrate your Kafka data on a single Debian 9 installation as well as on multiple Debian 9 installations on separate servers. ZooKeeper is a critical component of Kafka’s operation. It stores information about cluster state such as consumer data, partition data, and the state of other brokers in the cluster. As such, you will also back up ZooKeeper’s data in this tutorial.

      Prerequisites

      To follow along, you will need:

      • A Debian 9 server with at least 4GB of RAM and a non-root sudo user set up by following the tutorial.
      • A Debian 9 server with Apache Kafka installed, to act as the source of the backup. Follow the How To Install Apache Kafka on Debian 9 guide to set up your Kafka installation, if Kafka isn’t already installed on the source server.
      • OpenJDK 8 installed on the server. To install this version, follow these instructions on installing specific versions of OpenJDK.
      • Optional for Step 7 — Another Debian 9 server with Apache Kafka installed, to act as the destination of the backup. Follow the article link in the previous prerequisite to install Kafka on the destination server. This prerequisite is required only if you are moving your Kafka data from one server to another. If you want to back up and import your Kafka data to a single server, you can skip this prerequisite.

      Step 1 — Creating a Test Topic and Adding Messages

      A Kafka message is the most basic unit of data storage in Kafka and is the entity that you will publish to and subscribe from Kafka. A Kafka topic is like a container for a group of related messages. When you subscribe to a particular topic, you will receive only messages that were published to that particular topic. In this section you will log in to the server that you would like to back up (the source server) and add a Kafka topic and a message so that you have some data populated for the backup.

      This tutorial assumes you have installed Kafka in the home directory of the kafka user (/home/kafka/kafka). If your installation is in a different directory, modify the ~/kafka part in the following commands with your Kafka installation’s path, and for the commands throughout the rest of this tutorial.

      SSH into the source server by executing:

      • ssh sammy@source_server_ip

      Run the following command to log in as the kafka user:

      Create a topic named BackupTopic using the kafka-topics.sh shell utility file in your Kafka installation's bin directory, by typing:

      • ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic BackupTopic

      Publish the string "Test Message 1" to the BackupTopic topic by using the ~/kafka/bin/kafka-console-producer.sh shell utility script.

      If you would like to add additional messages here, you can do so now.

      • echo "Test Message 1" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic BackupTopic > /dev/null

      The ~/kafka/bin/kafka-console-producer.sh file allows you to publish messages directly from the command line. Typically, you would publish messages using a Kafka client library from within your program, but since that involves different setups for different programming languages, you can use the shell script as a language-independent way of publishing messages during testing or while performing administrative tasks. The --topic flag specifies the topic that you will publish the message to.

      Next, verify that the kafka-console-producer.sh script has published the message(s) by running the following command:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      The ~/kafka/bin/kafka-console-consumer.sh shell script starts the consumer. Once started, it will subscribe to messages from the topic that you published in the "Test Message 1" message in the previous command. The --from-beginning flag in the command allows consuming messages that were published before the consumer was started. Without the flag enabled, only messages published after the consumer was started will appear. On running the command, you will see the following output in the terminal:

      Output

      Test Message 1

      Press CTRL+C to stop the consumer.

      You've created some test data and verified that it's persisted. Now you can back up the state data in the next section.

      Step 2 — Backing Up the ZooKeeper State Data

      Before backing up the actual Kafka data, you need to back up the cluster state stored in ZooKeeper.

      ZooKeeper stores its data in the directory specified by the dataDir field in the ~/kafka/config/zookeeper.properties configuration file. You need to read the value of this field to determine the directory to back up. By default, dataDir points to the /tmp/zookeeper directory. If the value is different in your installation, replace /tmp/zookeeper with that value in the following commands.

      Here is an example output of the ~/kafka/config/zookeeper.properties file:

      ~/kafka/config/zookeeper.properties

      ...
      ...
      ...
      # the directory where the snapshot is stored.
      dataDir=/tmp/zookeeper
      # the port at which the clients will connect
      clientPort=2181
      # disable the per-ip limit on the number of connections since this is a non-production config
      maxClientCnxns=0
      ...
      ...
      ...
      

      Now that you have the path to the directory, you can create a compressed archive file of its contents. Compressed archive files are a better option over regular archive files to save disk space. Run the following command:

      • tar -czf /home/kafka/zookeeper-backup.tar.gz /tmp/zookeeper/*

      The command's output tar: Removing leading / from member names you can safely ignore.

      The -c and -z flags tell tar to create an archive and apply gzip compression to the archive. The -f flag specifies the name of the output compressed archive file, which is zookeeper-backup.tar.gz in this case.

      You can run ls in your current directory to see zookeeper-backup.tar.gz as part of your output.

      You have now successfully backed up the ZooKeeper data. In the next section, you will back up the actual Kafka data.

      Step 3 — Backing Up the Kafka Topics and Messages

      In this section, you will back up Kafka's data directory into a compressed tar file like you did for ZooKeeper in the previous step.

      Kafka stores topics, messages, and internal files in the directory that the log.dirs field specifies in the ~/kafka/config/server.properties configuration file. You need to read the value of this field to determine the directory to back up. By default and in your current installation, log.dirs points to the /tmp/kafka-logs directory. If the value is different in your installation, replace /tmp/kafka-logs in the following commands with the correct value.

      Here is an example output of the ~/kafka/config/server.properties file:

      ~/kafka/config/server.properties

      ...
      ...
      ...
      ############################# Log Basics #############################
      
      # A comma separated list of directories under which to store log files
      log.dirs=/tmp/kafka-logs
      
      # The default number of log partitions per topic. More partitions allow greater
      # parallelism for consumption, but this will also result in more files across
      # the brokers.
      num.partitions=1
      
      # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
      # This value is recommended to be increased for installations with data dirs located in RAID array.
      num.recovery.threads.per.data.dir=1
      ...
      ...
      ...
      

      First, stop the Kafka service so that the data in the log.dirs directory is in a consistent state when creating the archive with tar. To do this, return to your server's non-root user by typing exit and then run the following command:

      • sudo systemctl stop kafka

      After stopping the Kafka service, log back in as your kafka user with:

      It is necessary to stop/start the Kafka and ZooKeeper services as your non-root sudo user because in the Apache Kafka installation prerequisite you restricted the kafka user as a security precaution. This step in the prerequisite disables sudo access for the kafka user, which leads to commands failing to execute.

      Now, create a compressed archive file of the directory's contents by running the following command:

      • tar -czf /home/kafka/kafka-backup.tar.gz /tmp/kafka-logs/*

      Once again, you can safely ignore the command's output (tar: Removing leading / from member names).

      You can run ls in the current directory to see kafka-backup.tar.gz as part of the output.

      You can start the Kafka service again — if you do not want to restore the data immediately — by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl start kafka

      Log back in as your kafka user:

      You have successfully backed up the Kafka data. You can now proceed to the next section, where you will be restoring the cluster state data stored in ZooKeeper.

      Step 4 — Restoring the ZooKeeper Data

      In this section you will restore the cluster state data that Kafka creates and manages internally when the user performs operations such as creating a topic, adding/removing additional nodes, and adding and consuming messages. You will restore the data to your existing source installation by deleting the ZooKeeper data directory and restoring the contents of the zookeeper-backup.tar.gz file. If you want to restore data to a different server, see Step 7.

      You need to stop the Kafka and ZooKeeper services as a precaution against the data directories receiving invalid data during the restoration process.

      First, stop the Kafka service by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl stop kafka

      Next, stop the ZooKeeper service:

      • sudo systemctl stop zookeeper

      Log back in as your kafka user:

      You can then safely delete the existing cluster data directory with the following command:

      Now restore the data you backed up in Step 2:

      • tar -C /tmp/zookeeper -xzf /home/kafka/zookeeper-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/zookeeper before extracting the data. You specify the --strip 2 flag to make tar extract the archive's contents in /tmp/zookeeper/ itself and not in another directory (such as /tmp/zookeeper/tmp/zookeeper/) inside of it.

      You have restored the cluster state data successfully. Now, you can proceed to the Kafka data restoration process in the next section.

      Step 5 — Restoring the Kafka Data

      In this section you will restore the backed up Kafka data to your existing source installation (or the destination server if you have followed the optional Step 7) by deleting the Kafka data directory and restoring the compressed archive file. This will allow you to verify that restoration works successfully.

      You can safely delete the existing Kafka data directory with the following command:

      Now that you have deleted the data, your Kafka installation resembles a fresh installation with no topics or messages present in it. To restore your backed up data, extract the files by running:

      • tar -C /tmp/kafka-logs -xzf /home/kafka/kafka-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/kafka-logs before extracting the data. You specify the --strip 2 flag to ensure that the archive's contents are extracted in /tmp/kafka-logs/ itself and not in another directory (such as /tmp/kafka-logs/kafka-logs/) inside of it.

      Now that you have extracted the data successfully, you can start the Kafka and ZooKeeper services again by typing exit, to switch to your non-root sudo user, and then executing:

      • sudo systemctl start kafka

      Start the ZooKeeper service with:

      • sudo systemctl start zookeeper

      Log back in as your kafka user:

      You have restored the kafka data, you can move on to verifying that the restoration is successful in the next section.

      Step 6 — Verifying the Restoration

      To test the restoration of the Kafka data, you will consume messages from the topic you created in Step 1.

      Wait a few minutes for Kafka to start up and then execute the following command to read messages from the BackupTopic:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      If you get a warning like the following, you need to wait for Kafka to start fully:

      Output

      [2018-09-13 15:52:45,234] WARN [Consumer clientId=consumer-1, groupId=console-consumer-87747] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

      Retry the previous command in another few minutes or run sudo systemctl restart kafka as your non-root sudo user. If there are no issues in the restoration, you will see the following output:

      Output

      Test Message 1

      If you do not see this message, you can check if you missed out any commands in the previous section and execute them.

      Now that you have verified the restored Kafka data, this means you have successfully backed up and restored your data in a single Kafka installation. You can continue to Step 7 to see how to migrate the cluster and topics data to an installation in another server.

      Step 7 — Migrating and Restoring the Backup to Another Kafka Server (Optional)

      In this section, you will migrate the backed up data from the source Kafka server to the destination Kafka server. To do so, you will first use the scp command to download the compressed tar.gz files to your local system. You will then use scp again to push the files to the destination server. Once the files are present in the destination server, you can follow the steps used previously to restore the backup and verify that the migration is successful.

      You are downloading the backup files locally and then uploading them to the destination server, instead of copying it directly from your source to destination server, because the destination server will not have your source server's SSH key in its /home/sammy/.ssh/authorized_keys file and cannot connect to and from the source server. Your local machine can connect to both servers however, saving you an additional step of setting up SSH access from the source to destination server.

      Download the zookeeper-backup.tar.gz and kafka-backup.tar.gz files to your local machine by executing:

      • scp sammy@source_server_ip:/home/kafka/zookeeper-backup.tar.gz .

      You will see output similar to:

      Output

      zookeeper-backup.tar.gz 100% 68KB 128.0KB/s 00:00

      Now run the following command to download the kafka-backup.tar.gz file to your local machine:

      • scp sammy@source_server_ip:/home/kafka/kafka-backup.tar.gz .

      You will see the following output:

      Output

      kafka-backup.tar.gz 100% 1031KB 488.3KB/s 00:02

      Run ls in the current directory of your local machine, you will see both of the files:

      Output

      kafka-backup.tar.gz zookeeper.tar.gz

      Run the following command to transfer the zookeeper-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp zookeeper-backup.tar.gz sammy@destination_server_ip:/home/sammy/zookeeper-backup.tar.gz

      Now run the following command to transfer the kafka-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp kafka-backup.tar.gz sammy@destination_server_ip:/home/sammy/kafka-backup.tar.gz

      You have uploaded the backup files to the destination server successfully. Since the files are in the /home/sammy/ directory and do not have the correct permissions for access by the kafka user, you can move the files to the /home/kafka/ directory and change their permissions.

      SSH into the destination server by executing:

      • ssh sammy@destination_server_ip

      Now move zookeeper-backup.tar.gz to /home/kafka/ by executing:

      • sudo mv zookeeper-backup.tar.gz /home/sammy/zookeeper-backup.tar.gz

      Similarly, run the following command to copy kafka-backup.tar.gz to /home/kafka/:

      • sudo mv kafka-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      Change the owner of the backup files by running the following command:

      • sudo chown kafka /home/kafka/zookeeper-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      The previous mv and chown commands will not display any output.

      Now that the backup files are present in the destination server at the correct directory, follow the commands listed in Steps 4 to 6 of this tutorial to restore and verify the data for your destination server.

      Conclusion

      In this tutorial, you backed up, imported, and migrated your Kafka topics and messages from both the same installation and installations on separate servers. If you would like to learn more about other useful administrative tasks in Kafka, you can consult the operations section of Kafka's official documentation.

      To store backed up files such as zookeeper-backup.tar.gz and kafka-backup.tar.gz remotely, you can explore DigitalOcean Spaces. If Kafka is the only service running on your server, you can also explore other backup methods such as full instance backups.



      Source link

      How To Install Apache Kafka on Debian 9


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Apache Kafka is a popular distributed message broker designed to efficiently handle large volumes of real-time data. A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ. Though it is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.

      A publish/subscribe messaging system allows one or more producers to publish messages without considering the number of consumers or how they will process the messages. Subscribed clients are notified automatically about updates and the creation of new messages. This system is more efficient and scalable than systems where clients poll periodically to determine if new messages are available.

      In this tutorial, you will install and use Apache Kafka 1.1.1 on Debian 9.

      Prerequisites

      To follow along, you will need:

      • One Debian 9 server and a non-root user with sudo privileges. Follow the steps specified in this guide if you do not have a non-root user set up.
      • At least 4GB of RAM on the server. Installations without this amount of RAM may cause the Kafka service to fail, with the Java virtual machine (JVM) throwing an “Out Of Memory” exception during startup.
      • OpenJDK 8 installed on your server. To install this version, follow these instructions on installing specific versions of OpenJDK. Kafka is written in Java, so it requires a JVM; however, its startup shell script has a version detection bug that causes it to fail to start with JVM versions above 8.

      Step 1 — Creating a User for Kafka

      Since Kafka can handle requests over a network, you should create a dedicated user for it. This minimizes damage to your Debian machine should the Kafka server be compromised. We will create a dedicated kafka user in this step, but you should create a different non-root user to perform other tasks on this server once you have finished setting up Kafka.

      Logged in as your non-root sudo user, create a user called kafka with the useradd command:

      The -m flag ensures that a home directory will be created for the user. This home directory, /home/kafka, will act as our workspace directory for executing commands in the sections below.

      Set the password using passwd:

      Add the kafka user to the sudo group with the adduser command, so that it has the privileges required to install Kafka's dependencies:

      Your kafka user is now ready. Log into this account using su:

      Now that we've created the Kafka-specific user, we can move on to downloading and extracting the Kafka binaries.

      Step 2 — Downloading and Extracting the Kafka Binaries

      Let's download and extract the Kafka binaries into dedicated folders in our kafka user's home directory.

      To start, create a directory in /home/kafka called Downloads to store your downloads:

      Install curl using apt-get so that you'll be able to download remote files:

      • sudo apt-get update && sudo apt-get install -y curl

      Once curl is installed, use it to download the Kafka binaries:

      • curl "http://www-eu.apache.org/dist/kafka/1.1.1/kafka_2.12-1.1.1.tgz" -o ~/Downloads/kafka.tgz

      Create a directory called kafka and change to this directory. This will be the base directory of the Kafka installation:

      • mkdir ~/kafka && cd ~/kafka

      Extract the archive you downloaded using the tar command:

      • tar -xvzf ~/Downloads/kafka.tgz --strip 1

      We specify the --strip 1 flag to ensure that the archive's contents are extracted in ~/kafka/ itself and not in another directory (such as ~/kafka/kafka_2.12-1.1.1/) inside of it.

      Now that we've downloaded and extracted the binaries successfully, we can move on to configuring Kafka to allow for topic deletion.

      Step 3 — Configuring the Kafka Server

      Kafka's default behavior will not allow us to delete a topic, the category, group, or feed name to which messages can be published. To modify this, let's edit the configuration file.

      Kafka's configuration options are specified in server.properties. Open this file with nano or your favorite editor:

      • nano ~/kafka/config/server.properties

      Let's add a setting that will allow us to delete Kafka topics. Add the following to the bottom of the file:

      ~/kafka/config/server.properties

      delete.topic.enable = true
      

      Save the file, and exit nano. Now that we've configured Kafka, we can move on to creating systemd unit files for running and enabling it on startup.

      Step 4 — Creating Systemd Unit Files and Starting the Kafka Server

      In this section, we will create systemd unit files for the Kafka service. This will help us perform common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.

      ZooKeeper is a service that Kafka uses to manage its cluster state and configurations. It is commonly used in many distributed systems as an integral component. If you would like to know more about it, visit the official ZooKeeper docs.

      Create the unit file for zookeeper:

      • sudo nano /etc/systemd/system/zookeeper.service

      Enter the following unit definition into the file:

      /etc/systemd/system/zookeeper.service

      [Unit]
      Requires=network.target remote-fs.target
      After=network.target remote-fs.target
      
      [Service]
      Type=simple
      User=kafka
      ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
      ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
      Restart=on-abnormal
      
      [Install]
      WantedBy=multi-user.target
      

      The [Unit] section specifies that ZooKeeper requires networking and the filesystem to be ready before it can start.

      The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files for starting and stopping the service. It also specifies that ZooKeeper should be restarted automatically if it exits abnormally.

      Next, create the systemd service file for kafka:

      • sudo nano /etc/systemd/system/kafka.service

      Enter the following unit definition into the file:

      /etc/systemd/system/kafka.service

      [Unit]
      Requires=zookeeper.service
      After=zookeeper.service
      
      [Service]
      Type=simple
      User=kafka
      ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
      ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
      Restart=on-abnormal
      
      [Install]
      WantedBy=multi-user.target
      

      The [Unit] section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper gets started automatically when the kafka service starts.

      The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. It also specifies that Kafka should be restarted automatically if it exits abnormally.

      Now that the units have been defined, start Kafka with the following command:

      • sudo systemctl start kafka

      To ensure that the server has started successfully, check the journal logs for the kafka unit:

      You should see output similar to the following:

      Output

      Mar 23 13:31:48 kafka systemd[1]: Started kafka.service.

      You now have a Kafka server listening on port 9092.

      While we have started the kafka service, if we were to reboot our server, it would not be started automatically. To enable kafka on server boot, run:

      • sudo systemctl enable kafka

      Now that we've started and enabled the services, let's check the installation.

      Step 5 — Testing the Installation

      Let's publish and consume a "Hello World" message to make sure the Kafka server is behaving correctly. Publishing messages in Kafka requires:

      • A producer, which enables the publication of records and data to topics.
      • A consumer, which reads messages and data from topics.

      First, create a topic named TutorialTopic by typing:

      • ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

      You can create a producer from the command line using the kafka-console-producer.sh script. It expects the Kafka server's hostname, port, and a topic name as arguments.

      Publish the string "Hello, World" to the TutorialTopic topic by typing:

      • echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

      Next, you can create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server's hostname and port, along with a topic name as arguments.

      The following command consumes messages from TutorialTopic. Note the use of the --from-beginning flag, which allows the consumption of messages that were published before the consumer was started:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

      If there are no configuration issues, you should see Hello, World in your terminal:

      Output

      Hello, World

      The script will continue to run, waiting for more messages to be published to the topic. Feel free to open a new terminal and start a producer to publish a few more messages. You should be able to see them all in the consumer's output.

      When you are done testing, press CTRL+C to stop the consumer script. Now that we have tested the installation, let's move on to installing KafkaT.

      Step 6 — Install KafkaT (Optional)

      KafkaT is a tool from Airbnb that makes it easier for you to view details about your Kafka cluster and perform certain administrative tasks from the command line. Because it is a Ruby gem, you will need Ruby to use it. You will also need the build-essential package to be able to build the other gems it depends on. Install them using apt:

      • sudo apt install ruby ruby-dev build-essential

      You can now install KafkaT using the gem command:

      KafkaT uses .kafkatcfg as the configuration file to determine the installation and log directories of your Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.

      Create a new file called .kafkatcfg:

      Add the following lines to specify the required information about your Kafka server and Zookeeper instance:

      ~/.kafkatcfg

      {
        "kafka_path": "~/kafka",
        "log_path": "/tmp/kafka-logs",
        "zk_path": "localhost:2181"
      }
      

      You are now ready to use KafkaT. For a start, here's how you would use it to view details about all Kafka partitions:

      You will see the following output:

      Output

      Topic Partition Leader Replicas ISRs TutorialTopic 0 0 [0] [0] __consumer_offsets 0 0 [0] [0] ... ...

      You will see TutorialTopic, as well as __consumer_offsets, an internal topic used by Kafka for storing client-related information. You can safely ignore lines starting with __consumer_offsets.

      To learn more about KafkaT, refer to its GitHub repository.

      Step 7 — Setting Up a Multi-Node Cluster (Optional)

      If you want to create a multi-broker cluster using more Debian 9 machines, you should repeat Step 1, Step 4, and Step 5 on each of the new machines. Additionally, you should make the following changes in the server.properties file for each:

      • The value of the broker.id property should be changed such that it is unique throughout the cluster. This property uniquely identifies each server in the cluster and can have any string as its value. For example, "server1", "server2", etc.

      • The value of the zookeeper.connect property should be changed such that all nodes point to the same ZooKeeper instance. This property specifies the ZooKeeper instance's address and follows the <HOSTNAME/IP_ADDRESS>:<PORT> format. For example, "203.0.113.0:2181", "203.0.113.1:2181" etc.

      If you want to have multiple ZooKeeper instances for your cluster, the value of the zookeeper.connect property on each node should be an identical, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances.

      Step 8 — Restricting the Kafka User

      Now that all of the installations are done, you can remove the kafka user's admin privileges. Before you do so, log out and log back in as any other non-root sudo user. If you are still running the same shell session you started this tutorial with, simply type exit.

      Remove the kafka user from the sudo group:

      To further improve your Kafka server's security, lock the kafka user's password using the passwd command. This makes sure that nobody can directly log into the server using this account:

      At this point, only root or a sudo user can log in as kafka by typing in the following command:

      In the future, if you want to unlock it, use passwd with the -u option:

      You have now successfully restricted the kafka user's admin privileges.

      Conclusion

      You now have Apache Kafka running securely on your Debian server. You can make use of it in your projects by creating Kafka producers and consumers using Kafka clients, which are available for most programming languages. To learn more about Kafka, you can also consult its documentation.



      Source link