One place for hosting & domains

      CentOS

      How To Use Traefik as a Reverse Proxy for Docker Containers on CentOS 7


      The author selected Girls Who Code to receive a donation as part of the Write for DOnations program.

      Introduction

      Docker can be an efficient way to run web applications in production, but you may want to run multiple applications on the same Docker host. In this situation, you’ll need to set up a reverse proxy since you only want to expose ports 80 and 443 to the rest of the world.

      Traefik is a Docker-aware reverse proxy that includes its own monitoring dashboard. In this tutorial, you’ll use Traefik to route requests to two different web application containers: a WordPress container and an Adminer container, each talking to a MySQL database. You’ll configure Traefik to serve everything over HTTPS using Let’s Encrypt.

      Prerequisites

      To follow along with this tutorial, you will need the following:

      Step 1 — Configuring and Running Traefik

      The Traefik project has an official Docker image, so we will use that to run Traefik in a Docker container.

      But before we get our Traefik container up and running, we need to create a configuration file and set up an encrypted password so we can access the monitoring dashboard.

      We’ll use the htpasswd utility to create this encrypted password. First, install the utility, which is included in the httpd-tools package:

      • sudo yum install -y httpd-tools

      Then generate the password with htpasswd. Substitute secure_password with the password you’d like to use for the Traefik admin user:

      • htpasswd -nb admin secure_password

      The output from the program will look like this:

      Output

      admin:$apr1$kEG/8JKj$yEXj8vKO7HDvkUMI/SbOO.

      You’ll use this output in the Traefik configuration file to set up HTTP Basic Authentication for the Traefik health check and monitoring dashboard. Copy the entire output line so you can paste it later.

      To configure the Traefik server, we’ll create a new configuration file called traefik.toml using the TOML format. TOML is a configuration language similar to INI files, but standardized. This file lets us configure the Traefik server and various integrations, or providers, we want to use. In this tutorial, we will use three of Traefik’s available providers: api, docker, and acme, which is used to support TLS using Let’s Encrypt.

      Open up your new file in Vi or your favorite text editor:

      Enter insert mode by pressing i, then add two named entry points, http and https, that all backends will have access to by default:

      traefik.toml

      defaultEntryPoints = ["http", "https"]
      

      We'll configure the http and https entry points later in this file.

      Next, configure the api provider, which gives you access to a dashboard interface. This is where you'll paste the output from the htpasswd command:

      traefik.toml

      ...
      [entryPoints]
        [entryPoints.dashboard]
          address = ":8080"
          [entryPoints.dashboard.auth]
            [entryPoints.dashboard.auth.basic]
              users = ["admin:your_encrypted_password"]
      
      [api]
      entrypoint="dashboard"
      

      The dashboard is a separate web application that will run within the Traefik container. We set the dashboard to run on port 8080.

      The entrypoints.dashboard section configures how we'll be connecting with the api provider, and the entrypoints.dashboard.auth.basic section configures HTTP Basic Authentication for the dashboard. Use the output from the htpasswd command you just ran for the value of the users entry. You could specify additional logins by separating them with commas.

      We've defined our first entryPoint, but we'll need to define others for standard HTTP and HTTPS communication that isn't directed towards the api provider. The entryPoints section configures the addresses that Traefik and the proxied containers can listen on. Add these lines to the file underneath the entryPoints heading:

      traefik.toml

      ...
        [entryPoints.http]
          address = ":80"
            [entryPoints.http.redirect]
              entryPoint = "https"
        [entryPoints.https]
          address = ":443"
            [entryPoints.https.tls]
      ...
      

      The http entry point handles port 80, while the https entry point uses port 443 for TLS/SSL. We automatically redirect all of the traffic on port 80 to the https entry point to force secure connections for all requests.

      Next, add this section to configure Let's Encrypt certificate support for Traefik:

      traefik.toml

      ...
      [acme]
      email = "your_email@your_domain"
      storage = "acme.json"
      entryPoint = "https"
      onHostRule = true
        [acme.httpChallenge]
        entryPoint = "http"
      

      This section is called acme because ACME is the name of the protocol used to communicate with Let's Encrypt to manage certificates. The Let's Encrypt service requires registration with a valid email address, so in order to have Traefik generate certificates for our hosts, set the email key to your email address. We then specify that we will store the information that we will receive from Let's Encrypt in a JSON file called acme.json. The entryPoint key needs to point to the entry point handling port 443, which in our case is the https entry point.

      The key onHostRule dictates how Traefik should go about generating certificates. We want to fetch our certificates as soon as our containers with specified hostnames are created, and that's what the onHostRule setting will do.

      The acme.httpChallenge section allows us to specify how Let's Encrypt can verify that the certificate should be generated. We're configuring it to serve a file as part of the challenge through the http entrypoint.

      Finally, configure the docker provider by adding these lines to the file:

      traefik.toml

      ...
      [docker]
      domain = "your_domain"
      watch = true
      network = "web"
      

      The docker provider enables Traefik to act as a proxy in front of Docker containers. We've configured the provider to watch for new containers on the web network (that we'll create soon) and expose them as subdomains of your_domain.

      At this point, traefik.toml should have the following contents:

      traefik.toml

      defaultEntryPoints = ["http", "https"]
      
      [entryPoints]
        [entryPoints.dashboard]
          address = ":8080"
          [entryPoints.dashboard.auth]
            [entryPoints.dashboard.auth.basic]
              users = ["admin:your_encrypted_password"]
        [entryPoints.http]
          address = ":80"
            [entryPoints.http.redirect]
              entryPoint = "https"
        [entryPoints.https]
          address = ":443"
            [entryPoints.https.tls]
      
      [api]
      entrypoint="dashboard"
      
      [acme]
      email = "your_email@your_domain"
      storage = "acme.json"
      entryPoint = "https"
      onHostRule = true
        [acme.httpChallenge]
        entryPoint = "http"
      
      [docker]
      domain = "your_domain"
      watch = true
      network = "web"
      

      Once you have added the contents, hit ESC to leave insert mode. Type :x then ENTER to save and exit the file. With all of this configuration in place, we can fire up Traefik.

      Step 2 – Running the Traefik Container

      Next, create a Docker network for the proxy to share with containers. The Docker network is necessary so that we can use it with applications that are run using Docker Compose. Let's call this network web.

      • docker network create web

      When the Traefik container starts, we will add it to this network. Then we can add additional containers to this network later for Traefik to proxy to.

      Next, create an empty file which will hold our Let's Encrypt information. We'll share this into the container so Traefik can use it:

      Traefik will only be able to use this file if the root user inside of the container has unique read and write access to it. To do this, lock down the permissions on acme.json so that only the owner of the file has read and write permission.

      Once the file gets passed to Docker, the owner will automatically change to the root user inside the container.

      Finally, create the Traefik container with this command:

      • docker run -d
      • -v /var/run/docker.sock:/var/run/docker.sock
      • -v $PWD/traefik.toml:/traefik.toml
      • -v $PWD/acme.json:/acme.json
      • -p 80:80
      • -p 443:443
      • -l traefik.frontend.rule=Host:monitor.your_domain
      • -l traefik.port=8080
      • --network web
      • --name traefik
      • traefik:1.7.6-alpine

      The command is a little long so let's break it down.

      We use the -d flag to run the container in the background as a daemon. We then share our docker.sock file into the container so that the Traefik process can listen for changes to containers. We also share the traefik.toml configuration file and the acme.json file we created into the container.

      Next, we map ports 80 and 443 of our Docker host to the same ports in the Traefik container so Traefik receives all HTTP and HTTPS traffic to the server.

      Then we set up two Docker labels that tell Traefik to direct traffic to the hostname monitor.your_domain to port 8080 within the Traefik container, exposing the monitoring dashboard.

      We set the network of the container to web, and we name the container traefik.

      Finally, we use the traefik:1.7.6-alpine image for this container, because it's small.

      A Docker image's ENTRYPOINT is a command that always runs when a container is created from the image. In this case, the command is the traefik binary within the container. You can pass additional arguments to that command when you launch the container, but we've configured all of our settings in the traefik.toml file.

      With the container started, you now have a dashboard you can access to see the health of your containers. You can also use this dashboard to visualize the frontends and backends that Traefik has registered. Access the monitoring dashboard by pointing your browser to https://monitor.your_domain. You will be prompted for your username and password, which are admin and the password you configured in Step 1.

      Once logged in, you'll see an interface similar to this:

      Empty Traefik dashboard

      There isn't much to see just yet, but leave this window open, and you will see the contents change as you add containers for Traefik to work with.

      We now have our Traefik proxy running, configured to work with Docker, and ready to monitor other Docker containers. Let's start some containers for Traefik to act as a proxy for.

      Step 3 — Registering Containers with Traefik

      With the Traefik container running, you're ready to run applications behind it. Let's launch the following containers behind Traefik:

      1. A blog using the official WordPress image.
      2. A database management server using the official Adminer image.

      We'll manage both of these applications with Docker Compose using a docker-compose.yml file. Open the docker-compose.yml file in your editor:

      Add the following lines to the file to specify the version and the networks we'll use:

      docker-compose.yml

      version: "3"
      
      networks:
        web:
          external: true
        internal:
          external: false
      

      We use Docker Compose version 3 because it's the newest major version of the Compose file format.

      For Traefik to recognize our applications, they must be part of the same network, and since we created the network manually, we pull it in by specifying the network name of web and setting external to true. Then we define another network so that we can connect our exposed containers to a database container that we won't expose through Traefik. We'll call this network internal.

      Next, we'll define each of our services, one at a time. Let's start with the blog container, which we'll base on the official WordPress image. Add this configuration to the file:

      docker-compose.yml

      version: "3"
      ...
      
      services:
        blog:
          image: wordpress:4.9.8-apache
          environment:
            WORDPRESS_DB_PASSWORD:
          labels:
            - traefik.backend=blog
            - traefik.frontend.rule=Host:blog.your_domain
            - traefik.docker.network=web
            - traefik.port=80
          networks:
            - internal
            - web
          depends_on:
            - mysql
      

      The environment key lets you specify environment variables that will be set inside of the container. By not setting a value for WORDPRESS_DB_PASSWORD, we're telling Docker Compose to get the value from our shell and pass it through when we create the container. We will define this environment variable in our shell before starting the containers. This way we don't hard-code passwords into the configuration file.

      The labels section is where you specify configuration values for Traefik. Docker labels don't do anything by themselves, but Traefik reads these so it knows how to treat containers. Here's what each of these labels does:

      • traefik.backend specifies the name of the backend service in Traefik (which points to the actual blog container).
      • traefik.frontend.rule=Host:blog.your_domain tells Traefik to examine the host requested and if it matches the pattern of blog.your_domain it should route the traffic to the blog container.
      • traefik.docker.network=web specifies which network to look under for Traefik to find the internal IP for this container. Since our Traefik container has access to all of the Docker info, it would potentially take the IP for the internal network if we didn't specify this.
      • traefik.port specifies the exposed port that Traefik should use to route traffic to this container.

      With this configuration, all traffic sent to our Docker host's port 80 will be routed to the blog container.

      We assign this container to two different networks so that Traefik can find it via the web network and it can communicate with the database container through the internal network.

      Lastly, the depends_on key tells Docker Compose that this container needs to start after its dependencies are running. Since WordPress needs a database to run, we must run our mysql container before starting our blog container.

      Next, configure the MySQL service by adding this configuration to your file:

      docker-compose.yml

      services:
      ...
        mysql:
          image: mysql:5.7
          environment:
            MYSQL_ROOT_PASSWORD:
          networks:
            - internal
          labels:
            - traefik.enable=false
      

      We're using the official MySQL 5.7 image for this container. You'll notice that we're once again using an environment item without a value. The MYSQL_ROOT_PASSWORD and WORDPRESS_DB_PASSWORD variables will need to be set to the same value to make sure that our WordPress container can communicate with MySQL. We don't want to expose the mysql container to Traefik or the outside world, so we're only assigning this container to the internal network. Since Traefik has access to the Docker socket, the process will still expose a frontend for the mysql container by default, so we'll add the label traefik.enable=false to specify that Traefik should not expose this container.

      Finally, add this configuration to define the Adminer container:

      docker-compose.yml

      services:
      ...
        adminer:
          image: adminer:4.6.3-standalone
          labels:
            - traefik.backend=adminer
            - traefik.frontend.rule=Host:db-admin.your_domain
            - traefik.docker.network=web
            - traefik.port=8080
          networks:
            - internal
            - web
          depends_on:
            - mysql
      

      This container is based on the official Adminer image. The network and depends_on configurations for this container exactly match what we're using for the blog container.

      However, since we're directing all of the traffic to port 80 on our Docker host directly to the blog container, we need to configure this container differently in order for traffic to make it to our adminer container. The line traefik.frontend.rule=Host:db-admin.your_domain tells Traefik to examine the host requested. If it matches the pattern of db-admin.your_domain, Traefik will route the traffic to the adminer container.

      At this point, docker-compose.yml should have the following contents:

      docker-compose.yml

      version: "3"
      
      networks:
        web:
          external: true
        internal:
          external: false
      
      services:
        blog:
          image: wordpress:4.9.8-apache
          environment:
            WORDPRESS_DB_PASSWORD:
          labels:
            - traefik.backend=blog
            - traefik.frontend.rule=Host:blog.your_domain
            - traefik.docker.network=web
            - traefik.port=80
          networks:
            - internal
            - web
          depends_on:
            - mysql
        mysql:
          image: mysql:5.7
          environment:
            MYSQL_ROOT_PASSWORD:
          networks:
            - internal
          labels:
            - traefik.enable=false
        adminer:
          image: adminer:4.6.3-standalone
          labels:
            - traefik.backend=adminer
            - traefik.frontend.rule=Host:db-admin.your_domain
            - traefik.docker.network=web
            - traefik.port=8080
          networks:
            - internal
            - web
          depends_on:
            - mysql
      

      Save the file and exit the text editor.

      Next, set values in your shell for the WORDPRESS_DB_PASSWORD and MYSQL_ROOT_PASSWORD variables before you start your containers:

      • export WORDPRESS_DB_PASSWORD=secure_database_password
      • export MYSQL_ROOT_PASSWORD=secure_database_password

      Substitute secure_database_password with your desired database password. Remember to use the same password for both WORDPRESS_DB_PASSWORD and MYSQL_ROOT_PASSWORD.

      With these variables set, run the containers using docker-compose:

      Now take another look at the Traefik admin dashboard. You'll see that there is now a backend and a frontend for the two exposed servers:

      Populated Traefik dashboard

      Navigate to blog.your_domain, substituting your_domain with your domain. You'll be redirected to a TLS connection and can now complete the WordPress setup:

      WordPress setup screen

      Now access Adminer by visiting db-admin.your_domain in your browser, again substituting your_domain with your domain. The mysql container isn't exposed to the outside world, but the adminer container has access to it through the internal Docker network that they share using the mysql container name as a host name.

      On the Adminer login screen, use the username root, use mysql for the server, and use the value you set for MYSQL_ROOT_PASSWORD for the password. Once logged in, you'll see the Adminer user interface:

      Adminer connected to the MySQL database

      Both sites are now working, and you can use the dashboard at monitor.your_domain to keep an eye on your applications.

      Conclusion

      In this tutorial, you configured Traefik to proxy requests to other applications in Docker containers.

      Traefik's declarative configuration at the application container level makes it easy to configure more services, and there's no need to restart the traefik container when you add new applications to proxy traffic to since Traefik notices the changes immediately through the Docker socket file it's monitoring.

      To learn more about what you can do with Traefik, head over to the official Traefik documentation. If you'd like to explore Docker containers further, check out How To Set Up a Private Docker Registry on Ubuntu 18.04 or How To Secure a Containerized Node.js Application with Nginx, Let's Encrypt, and Docker Compose. Although these tutorials are written for Ubuntu 18.04, many of the Docker-specific commands can be used for CentOS 7.



      Source link

      How To Install Elasticsearch, Logstash, and Kibana (Elastic Stack) on CentOS 7


      The author selected Software in the Public Interest to receive a donation as part of the Write for DOnations program.

      Introduction

      The Elastic Stack — formerly known as the ELK Stack — is a collection of open-source software produced by Elastic which allows you to search, analyze, and visualize logs generated from any source in any format, a practice known as centralized logging. Centralized logging can be very useful when attempting to identify problems with your servers or applications, as it allows you to search through all of your logs in a single place. It’s also useful because it allows you to identify issues that span multiple servers by correlating their logs during a specific time frame.

      The Elastic Stack has four main components:

      • Elasticsearch: a distributed RESTful search engine which stores all of the collected data.
      • Logstash: the data processing component of the Elastic Stack which sends incoming data to Elasticsearch.
      • Kibana: a web interface for searching and visualizing logs.
      • Beats: lightweight, single-purpose data shippers that can send data from hundreds or thousands of machines to either Logstash or Elasticsearch.

      In this tutorial, you will install the Elastic Stack on a CentOS 7 server. You will learn how to install all of the components of the Elastic Stack — including Filebeat, a Beat used for forwarding and centralizing logs and files — and configure them to gather and visualize system logs. Additionally, because Kibana is normally only available on the localhost, you will use Nginx to proxy it so it will be accessible over a web browser. At the end of this tutorial, you will have all of these components installed on a single server, referred to as the Elastic Stack server.

      Note: When installing the Elastic Stack, you should use the same version across the entire stack. This tutorial uses the latest versions of each component, which are, at the time of this writing, Elasticsearch 6.5.2, Kibana 6.5.2, Logstash 6.5.2, and Filebeat 6.5.2.

      Prerequisites

      To complete this tutorial, you will need the following:

      • One CentOS 7 server set up by following Initial Server Setup with CentOS 7, including a non-root user with sudo privileges and a firewall. The amount of CPU, RAM, and storage that your Elastic Stack server will require depends on the volume of logs that you intend to gather. For this tutorial, you will be using a VPS with the following specifications for our Elastic Stack server:

        • OS: CentOS 7.5
        • RAM: 4GB
        • CPU: 2
      • Java 8 — which is required by Elasticsearch and Logstash — installed on your server. Note that Java 9 is not supported. To install this, follow the “Install OpenJDK 8 JRE” section of our guide on how to install Java on CentOS.

      • Nginx installed on your server, which you will configure later in this guide as a reverse proxy for Kibana. Follow our guide on How To Install Nginx on CentOS 7 to set this up.

      Additionally, because the Elastic Stack is used to access valuable information about your server that you would not want unauthorized users to access, it’s important that you keep your server secure by installing a TLS/SSL certificate. This is optional but strongly encouraged. Because you will ultimately make changes to your Nginx server block over the course of this guide, we suggest putting this security in place by completing the Let’s Encrypt on CentOS 7 guide immediately after this tutorial’s second step.

      If you do plan to configure Let’s Encrypt on your server, you will need the following in place before doing so:

      • A fully qualified domain name (FQDN). This tutorial will use example.com throughout. You can purchase a domain name on Namecheap, get one for free on Freenom, or use the domain registrar of your choice.

      • Both of the following DNS records set up for your server. You can follow this introduction to DigitalOcean DNS for details on how to add them.

        • An A record with example.com pointing to your server’s public IP address.
        • An A record with www.example.com pointing to your server’s public IP address.

      Step 1 — Installing and Configuring Elasticsearch

      The Elastic Stack components are not available through the package manager by default, but you can install them with yum by adding Elastic’s package repository.

      All of the Elastic Stack’s packages are signed with the Elasticsearch signing key in order to protect your system from package spoofing. Packages which have been authenticated using the key will be considered trusted by your package manager. In this step, you will import the Elasticsearch public GPG key and add the Elastic repository in order to install Elasticsearch.

      Run the following command to download and install the Elasticsearch public signing key:

      • sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

      Next, add the Elastic repository. Use your preferred text editor to create the file elasticsearch.repo in the /etc/yum.repos.d/ directory. Here, we’ll use the vi text editor:

      • sudo vi /etc/yum.repos.d/elasticsearch.repo

      To provide yum with the information it needs to download and install the components of the Elastic Stack, enter insert mode by pressing i and add the following lines to the file.

      /etc/yum.repos.d/elasticsearch.repo

      [elasticsearch-6.x]
      name=Elasticsearch repository for 6.x packages
      baseurl=https://artifacts.elastic.co/packages/6.x/yum
      gpgcheck=1
      gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
      enabled=1
      autorefresh=1
      type=rpm-md
      

      Here you have included the human-readable name of the repo, the baseurl of the repo’s data directory, and the gpgkey required to verify Elastic packages.

      When you’re finished, press ESC to leave insert mode, then :wq and ENTER to save and exit the file. To learn more about the text editor vi and its successor vim, check out our Installing and Using the Vim Text Editor on a Cloud Server tutorial.

      With the repo added, you can now install the Elastic Stack. According to the official documentation, you should install Elasticsearch before the other components. Installing in this order ensures that the components each product depends on are correctly in place.

      Install Elasticsearch with the following command:

      • sudo yum install elasticsearch

      Once Elasticsearch is finished installing, open its main configuration file, elasticsearch.yml, in your editor:

      sudo vi /etc/elasticsearch/elasticsearch.yml
      

      Note: Elasticsearch’s configuration file is in YAML format, which means that indentation is very important! Be sure that you do not add any extra spaces as you edit this file.

      Elasticsearch listens for traffic from everywhere on port 9200. You will want to restrict outside access to your Elasticsearch instance to prevent outsiders from reading your data or shutting down your Elasticsearch cluster through the REST API. Find the line that specifies network.host, uncomment it, and replace its value with localhost so it looks like this:

      /etc/elasticsearch/elasticsearch.yml

      . . .
      network.host: localhost
      . . .
      

      Save and close elasticsearch.yml. Then, start the Elasticsearch service with systemctl:

      • sudo systemctl start elasticsearch

      Next, run the following command to enable Elasticsearch to start up every time your server boots:

      • sudo systemctl enable elasticsearch

      You can test whether your Elasticsearch service is running by sending an HTTP request:

      • curl -X GET "localhost:9200"

      You will see a response showing some basic information about your local node, similar to this:

      Output

      { "name" : "8oSCBFJ", "cluster_name" : "elasticsearch", "cluster_uuid" : "1Nf9ZymBQaOWKpMRBfisog", "version" : { "number" : "6.5.2", "build_flavor" : "default", "build_type" : "rpm", "build_hash" : "9434bed", "build_date" : "2018-11-29T23:58:20.891072Z", "build_snapshot" : false, "lucene_version" : "7.5.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }

      Now that Elasticsearch is up and running, let’s install Kibana, the next component of the Elastic Stack.

      Step 2 — Installing and Configuring the Kibana Dashboard

      According to the installation order in the official documentation, you should install Kibana as the next component after Elasticsearch. After setting Kibana up, we will be able to use its interface to search through and visualize the data that Elasticsearch stores.

      Because you already added the Elastic repository in the previous step, you can just install the remaining components of the Elastic Stack using yum:

      Then enable and start the Kibana service:

      • sudo systemctl enable kibana
      • sudo systemctl start kibana

      Because Kibana is configured to only listen on localhost, we must set up a reverse proxy to allow external access to it. We will use Nginx for this purpose, which should already be installed on your server.

      First, use the openssl command to create an administrative Kibana user which you'll use to access the Kibana web interface. As an example, we will name this account kibanaadmin, but to ensure greater security we recommend that you choose a non-standard name for your user that would be difficult to guess.

      The following command will create the administrative Kibana user and password, and store them in the htpasswd.users file. You will configure Nginx to require this username and password and read this file momentarily:

      • echo "kibanaadmin:`openssl passwd -apr1`" | sudo tee -a /etc/nginx/htpasswd.users

      Enter and confirm a password at the prompt. Remember or take note of this login, as you will need it to access the Kibana web interface.

      Next, we will create an Nginx server block file. As an example, we will refer to this file as example.com.conf, although you may find it helpful to give yours a more descriptive name. For instance, if you have a FQDN and DNS records set up for this server, you could name this file after your FQDN:

      • sudo vi /etc/nginx/conf.d/example.com.conf

      Add the following code block into the file, being sure to update example.com and www.example.com to match your server's FQDN or public IP address. This code configures Nginx to direct your server's HTTP traffic to the Kibana application, which is listening on localhost:5601. Additionally, it configures Nginx to read the htpasswd.users file and require basic authentication.

      Note that if you followed the prerequisite Nginx tutorial through to the end, you may have already created this file and populated it with some content. In that case, delete all the existing content in the file before adding the following:

      example.com.conf'>/etc/nginx/conf.d/example.com.conf

      server {
          listen 80;
      
          server_name example.com www.example.com;
      
          auth_basic "Restricted Access";
          auth_basic_user_file /etc/nginx/htpasswd.users;
      
          location / {
              proxy_pass http://localhost:5601;
              proxy_http_version 1.1;
              proxy_set_header Upgrade $http_upgrade;
              proxy_set_header Connection 'upgrade';
              proxy_set_header Host $host;
              proxy_cache_bypass $http_upgrade;
          }
      }
      

      When you're finished, save and close the file.

      Then check the configuration for syntax errors:

      If any errors are reported in your output, go back and double check that the content you placed in your configuration file was added correctly. Once you see syntax is ok in the output, go ahead and restart the Nginx service:

      • sudo systemctl restart nginx

      By default, SELinux security policy is set to be enforced. Run the following command to allow Nginx to access the proxied service:

      • sudo setsebool httpd_can_network_connect 1 -P

      You can learn more about SELinux in the tutorial An Introduction to SELinux on CentOS 7.

      Kibana is now accessible via your FQDN or the public IP address of your Elastic Stack server. You can check the Kibana server's status page by navigating to the following address and entering your login credentials when prompted:

      http://your_server_ip/status
      

      This status page displays information about the server’s resource usage and lists the installed plugins.

      |Kibana status page

      Note: As mentioned in the Prerequisites section, it is recommended that you enable SSL/TLS on your server. You can follow this tutorial now to obtain a free SSL certificate for Nginx on CentOS 7. After obtaining your SSL/TLS certificates, you can come back and complete this tutorial.

      Now that the Kibana dashboard is configured, let's install the next component: Logstash.

      Step 3 — Installing and Configuring Logstash

      Although it's possible for Beats to send data directly to the Elasticsearch database, we recommend using Logstash to process the data first. This will allow you to collect data from different sources, transform it into a common format, and export it to another database.

      Install Logstash with this command:

      • sudo yum install logstash

      After installing Logstash, you can move on to configuring it. Logstash's configuration files are written in the JSON format and reside in the /etc/logstash/conf.d directory. As you configure it, it's helpful to think of Logstash as a pipeline which takes in data at one end, processes it in one way or another, and sends it out to its destination (in this case, the destination being Elasticsearch). A Logstash pipeline has two required elements, input and output, and one optional element, filter. The input plugins consume data from a source, the filter plugins process the data, and the output plugins write the data to a destination.

      Logstash pipeline

      Create a configuration file called 02-beats-input.conf where you will set up your Filebeat input:

      • sudo vi /etc/logstash/conf.d/02-beats-input.conf

      Insert the following input configuration. This specifies a beats input that will listen on TCP port 5044.

      /etc/logstash/conf.d/02-beats-input.conf

      input {
        beats {
          port => 5044
        }
      }
      

      Save and close the file. Next, create a configuration file called 10-syslog-filter.conf, which will add a filter for system logs, also known as syslogs:

      • sudo vi /etc/logstash/conf.d/10-syslog-filter.conf

      Insert the following syslog filter configuration. This example system logs configuration was taken from official Elastic documentation. This filter is used to parse incoming system logs to make them structured and usable by the predefined Kibana dashboards:

      /etc/logstash/conf.d/10-syslog-filter.conf

      filter {
        if [fileset][module] == "system" {
          if [fileset][name] == "auth" {
            grok {
              match => { "message" => ["%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sshd(?:[%{POSINT:[system][auth][pid]}])?: %{DATA:[system][auth][ssh][event]} %{DATA:[system][auth][ssh][method]} for (invalid user )?%{DATA:[system][auth][user]} from %{IPORHOST:[system][auth][ssh][ip]} port %{NUMBER:[system][auth][ssh][port]} ssh2(: %{GREEDYDATA:[system][auth][ssh][signature]})?",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sshd(?:[%{POSINT:[system][auth][pid]}])?: %{DATA:[system][auth][ssh][event]} user %{DATA:[system][auth][user]} from %{IPORHOST:[system][auth][ssh][ip]}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sshd(?:[%{POSINT:[system][auth][pid]}])?: Did not receive identification string from %{IPORHOST:[system][auth][ssh][dropped_ip]}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sudo(?:[%{POSINT:[system][auth][pid]}])?: s*%{DATA:[system][auth][user]} 🙁 %{DATA:[system][auth][sudo][error]} ;)? TTY=%{DATA:[system][auth][sudo][tty]} ; PWD=%{DATA:[system][auth][sudo][pwd]} ; USER=%{DATA:[system][auth][sudo][user]} ; COMMAND=%{GREEDYDATA:[system][auth][sudo][command]}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} groupadd(?:[%{POSINT:[system][auth][pid]}])?: new group: name=%{DATA:system.auth.groupadd.name}, GID=%{NUMBER:system.auth.groupadd.gid}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} useradd(?:[%{POSINT:[system][auth][pid]}])?: new user: name=%{DATA:[system][auth][user][add][name]}, UID=%{NUMBER:[system][auth][user][add][uid]}, GID=%{NUMBER:[system][auth][user][add][gid]}, home=%{DATA:[system][auth][user][add][home]}, shell=%{DATA:[system][auth][user][add][shell]}$",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} %{DATA:[system][auth][program]}(?:[%{POSINT:[system][auth][pid]}])?: %{GREEDYMULTILINE:[system][auth][message]}"] }
              pattern_definitions => {
                "GREEDYMULTILINE"=> "(.|n)*"
              }
              remove_field => "message"
            }
            date {
              match => [ "[system][auth][timestamp]", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
            }
            geoip {
              source => "[system][auth][ssh][ip]"
              target => "[system][auth][ssh][geoip]"
            }
          }
          else if [fileset][name] == "syslog" {
            grok {
              match => { "message" => ["%{SYSLOGTIMESTAMP:[system][syslog][timestamp]} %{SYSLOGHOST:[system][syslog][hostname]} %{DATA:[system][syslog][program]}(?:[%{POSINT:[system][syslog][pid]}])?: %{GREEDYMULTILINE:[system][syslog][message]}"] }
              pattern_definitions => { "GREEDYMULTILINE" => "(.|n)*" }
              remove_field => "message"
            }
            date {
              match => [ "[system][syslog][timestamp]", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
            }
          }
        }
      }
      

      Save and close the file when finished.

      Lastly, create a configuration file called 30-elasticsearch-output.conf:

      • sudo vi /etc/logstash/conf.d/30-elasticsearch-output.conf

      Insert the following output configuration. This output configures Logstash to store the Beats data in Elasticsearch, which is running at localhost:9200, in an index named after the Beat used. The Beat used in this tutorial is Filebeat:

      /etc/logstash/conf.d/30-elasticsearch-output.conf

      output {
        elasticsearch {
          hosts => ["localhost:9200"]
          manage_template => false
          index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
        }
      }
      

      Save and close the file.

      If you want to add filters for other applications that use the Filebeat input, be sure to name the files so they're sorted between the input and the output configuration, meaning that the file names should begin with a two-digit number between 02 and 30.

      Test your Logstash configuration with this command:

      • sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t

      If there are no syntax errors, your output will display Configruation OK after a few seconds. If you don't see this in your output, check for any errors that appear in your output and update your configuration to correct them.

      If your configuration test is successful, start and enable Logstash to put the configuration changes into effect:

      • sudo systemctl start logstash
      • sudo systemctl enable logstash

      Now that Logstash is running correctly and is fully configured, let's install Filebeat.

      Step 4 — Installing and Configuring Filebeat

      The Elastic Stack uses several lightweight data shippers called Beats to collect data from various sources and transport them to Logstash or Elasticsearch. Here are the Beats that are currently available from Elastic:

      • Filebeat: collects and ships log files.
      • Metricbeat: collects metrics from your systems and services.
      • Packetbeat: collects and analyzes network data.
      • Winlogbeat: collects Windows event logs.
      • Auditbeat: collects Linux audit framework data and monitors file integrity.
      • Heartbeat: monitors services for their availability with active probing.

      In this tutorial, we will use Filebeat to forward local logs to our Elastic Stack.

      Install Filebeat using yum:

      • sudo yum install filebeat

      Next, configure Filebeat to connect to Logstash. Here, we will modify the example configuration file that comes with Filebeat.

      Open the Filebeat configuration file:

      • sudo vi /etc/filebeat/filebeat.yml

      Note: As with Elasticsearch, Filebeat's configuration file is in YAML format. This means that proper indentation is crucial, so be sure to use the same number of spaces that are indicated in these instructions.

      Filebeat supports numerous outputs, but you’ll usually only send events directly to Elasticsearch or to Logstash for additional processing. In this tutorial, we'll use Logstash to perform additional processing on the data collected by Filebeat. Filebeat will not need to send any data directly to Elasticsearch, so let's disable that output. To do so, find the output.elasticsearch section and comment out the following lines by preceding them with a #:

      /etc/filebeat/filebeat.yml

      ...
      #output.elasticsearch:
        # Array of hosts to connect to.
        #hosts: ["localhost:9200"]
      ...
      

      Then, configure the output.logstash section. Uncomment the lines output.logstash: and hosts: ["localhost:5044"] by removing the #. This will configure Filebeat to connect to Logstash on your Elastic Stack server at port 5044, the port for which we specified a Logstash input earlier:

      /etc/filebeat/filebeat.yml

      output.logstash:
        # The Logstash hosts
        hosts: ["localhost:5044"]
      

      Save and close the file.

      You can now extend the functionality of Filebeat with Filebeat modules. In this tutorial, you will use the system module, which collects and parses logs created by the system logging service of common Linux distributions.

      Let's enable it:

      • sudo filebeat modules enable system

      You can see a list of enabled and disabled modules by running:

      • sudo filebeat modules list

      You will see a list similar to the following:

      Output

      Enabled: system Disabled: apache2 auditd elasticsearch haproxy icinga iis kafka kibana logstash mongodb mysql nginx osquery postgresql redis suricata traefik

      By default, Filebeat is configured to use default paths for the syslog and authorization logs. In the case of this tutorial, you do not need to change anything in the configuration. You can see the parameters of the module in the /etc/filebeat/modules.d/system.yml configuration file.

      Next, load the index template into Elasticsearch. An Elasticsearch index is a collection of documents that have similar characteristics. Indexes are identified with a name, which is used to refer to the index when performing various operations within it. The index template will be automatically applied when a new index is created.

      To load the template, use the following command:

      • sudo filebeat setup --template -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["localhost:9200"]'

      This will give the following output:

      Output

      Loaded index template

      Filebeat comes packaged with sample Kibana dashboards that allow you to visualize Filebeat data in Kibana. Before you can use the dashboards, you need to create the index pattern and load the dashboards into Kibana.

      As the dashboards load, Filebeat connects to Elasticsearch to check version information. To load dashboards when Logstash is enabled, you need to manually disable the Logstash output and enable Elasticsearch output:

      • sudo filebeat setup -e -E output.logstash.enabled=false -E output.elasticsearch.hosts=['localhost:9200'] -E setup.kibana.host=localhost:5601

      You will see output that looks like this:

      Output

      . . . 2018-12-05T21:23:33.806Z INFO elasticsearch/client.go:163 Elasticsearch url: http://localhost:9200 2018-12-05T21:23:33.811Z INFO elasticsearch/client.go:712 Connected to Elasticsearch version 6.5.2 2018-12-05T21:23:33.815Z INFO template/load.go:129 Template already exists and will not be overwritten. Loaded index template Loading dashboards (Kibana must be running and reachable) 2018-12-05T21:23:33.816Z INFO elasticsearch/client.go:163 Elasticsearch url: http://localhost:9200 2018-12-05T21:23:33.819Z INFO elasticsearch/client.go:712 Connected to Elasticsearch version 6.5.2 2018-12-05T21:23:33.819Z INFO kibana/client.go:118 Kibana url: http://localhost:5601 2018-12-05T21:24:03.981Z INFO instance/beat.go:717 Kibana dashboards successfully loaded. Loaded dashboards 2018-12-05T21:24:03.982Z INFO elasticsearch/client.go:163 Elasticsearch url: http://localhost:9200 2018-12-05T21:24:03.984Z INFO elasticsearch/client.go:712 Connected to Elasticsearch version 6.5.2 2018-12-05T21:24:03.984Z INFO kibana/client.go:118 Kibana url: http://localhost:5601 2018-12-05T21:24:04.043Z WARN fileset/modules.go:388 X-Pack Machine Learning is not enabled 2018-12-05T21:24:04.080Z WARN fileset/modules.go:388 X-Pack Machine Learning is not enabled Loaded machine learning job configurations

      Now you can start and enable Filebeat:

      • sudo systemctl start filebeat
      • sudo systemctl enable filebeat

      If you've set up your Elastic Stack correctly, Filebeat will begin shipping your syslog and authorization logs to Logstash, which will then load that data into Elasticsearch.

      To verify that Elasticsearch is indeed receiving this data, query the Filebeat index with this command:

      • curl -X GET 'http://localhost:9200/filebeat-*/_search?pretty'

      You will see an output that looks similar to this:

      Output

      { "took" : 1, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 3225, "max_score" : 1.0, "hits" : [ { "_index" : "filebeat-6.5.2-2018.12.05", "_type" : "doc", "_id" : "vf5GgGcB_g3p-PRo_QOw", "_score" : 1.0, "_source" : { "@timestamp" : "2018-12-05T19:00:34.000Z", "source" : "/var/log/secure", "meta" : { "cloud" : { . . .

      If your output shows 0 total hits, Elasticsearch is not loading any logs under the index you searched for, and you will need to review your setup for errors. If you received the expected output, continue to the next step, in which you'll become familiar with some of Kibana's dashboards.

      Step 5 — Exploring Kibana Dashboards

      Let's look at Kibana, the web interface that we installed earlier.

      In a web browser, go to the FQDN or public IP address of your Elastic Stack server. After entering the login credentials you defined in Step 2, you will see the Kibana homepage:

      Kibana Homepage

      Click the Discover link in the left-hand navigation bar. On the Discover page, select the predefined filebeat-* index pattern to see Filebeat data. By default, this will show you all of the log data over the last 15 minutes. You will see a histogram with log events, and some log messages below:

      Discover page

      Here, you can search and browse through your logs and also customize your dashboard. At this point, though, there won't be much in there because you are only gathering syslogs from your Elastic Stack server.

      Use the left-hand panel to navigate to the Dashboard page and search for the Filebeat System dashboards. Once there, you can search for the sample dashboards that come with Filebeat's system module.

      For example, you can view detailed stats based on your syslog messages:

      Syslog Dashboard

      You can also view which users have used the sudo command and when:

      Sudo Dashboard

      Kibana has many other features, such as graphing and filtering, so feel free to explore.

      Conclusion

      In this tutorial, you installed and configured the Elastic Stack to collect and analyze system logs. Remember that you can send just about any type of log or indexed data to Logstash using Beats, but the data becomes even more useful if it is parsed and structured with a Logstash filter, as this transforms the data into a consistent format that can be read easily by Elasticsearch.



      Source link

      How To Install Apache Kafka on CentOS 7


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Apache Kafka is a popular distributed message broker designed to efficiently handle large volumes of real-time data. A Kafka cluster is not only highly scalable and fault-tolerant, but it also has a much higher throughput compared to other message brokers such as ActiveMQ and RabbitMQ. Though it is generally used as a publish/subscribe messaging system, a lot of organizations also use it for log aggregation because it offers persistent storage for published messages.

      A publish/subscribe messaging system allows one or more producers to publish messages without considering the number of consumers or how they will process the messages. Subscribed clients are notified automatically about updates and the creation of new messages. This system is more efficient and scalable than systems where clients poll periodically to determine if new messages are available.

      In this tutorial, you will install and use Apache Kafka 1.1.0 on CentOS 7.

      Prerequisites

      To follow along, you will need:

      • One CentOS 7 server and a non-root user with sudo privileges. Follow the steps specified in this guide if you do not have a non-root user set up.
      • At least 4GB of RAM on the server. Installations without this amount of RAM may cause the Kafka service to fail, with the Java virtual machine (JVM) throwing an “Out Of Memory” exception during startup.
      • OpenJDK 8 installed on your server. To install this version, follow these instructions on installing specific versions of OpenJDK. Kafka is written in Java, so it requires a JVM; however, its startup shell script has a version detection bug that causes it to fail to start with JVM versions above 8.

      Step 1 — Creating a User for Kafka

      Since Kafka can handle requests over a network, you should create a dedicated user for it. This minimizes damage to your CentOS machine should the Kafka server be compromised. We will create a dedicated kafka user in this step, but you should create a different non-root user to perform other tasks on this server once you have finished setting up Kafka.

      Logged in as your non-root sudo user, create a user called kafka with the useradd command:

      The -m flag ensures that a home directory will be created for the user. This home directory, /home/kafka, will act as our workspace directory for executing commands in the sections below.

      Set the password using passwd:

      Add the kafka user to the wheel group with the adduser command, so that it has the privileges required to install Kafka's dependencies:

      • sudo usermod -aG wheel kafka

      Your kafka user is now ready. Log into this account using su:

      Now that we've created the Kafka-specific user, we can move on to downloading and extracting the Kafka binaries.

      Step 2 — Downloading and Extracting the Kafka Binaries

      Let's download and extract the Kafka binaries into dedicated folders in our kafka user's home directory.

      To start, create a directory in /home/kafka called Downloads to store your downloads:

      Use curl to download the Kafka binaries:

      • curl "http://www-eu.apache.org/dist/kafka/1.1.0/kafka_2.12-1.1.0.tgz" -o ~/Downloads/kafka.tgz

      Create a directory called kafka and change to this directory. This will be the base directory of the Kafka installation:

      • mkdir ~/kafka && cd ~/kafka

      Extract the archive you downloaded using the tar command:

      • tar -xvzf ~/Downloads/kafka.tgz --strip 1

      We specify the --strip 1 flag to ensure that the archive's contents are extracted in ~/kafka/ itself and not in another directory (such as ~/kafka/kafka_2.12-1.1.0/) inside of it.

      Now that we've downloaded and extracted the binaries successfully, we can move on configuring to Kafka to allow for topic deletion.

      Step 3 — Configuring the Kafka Server

      Kafka's default behavior will not allow us to delete a topic, the category, group, or feed name to which messages can be published. To modify this, let's edit the configuration file.

      Kafka's configuration options are specified in server.properties. Open this file with vi or your favorite editor:

      • vi ~/kafka/config/server.properties

      Let's add a setting that will allow us to delete Kafka topics. Press i to insert text, and add the following to the bottom of the file:

      ~/kafka/config/server.properties

      delete.topic.enable = true
      

      When you are finished, press ESC to exit insert mode and :wq to write the changes to the file and quit. Now that we've configured Kafka, we can move on to creating systemd unit files for running and enabling it on startup.

      Step 4 — Creating Systemd Unit Files and Starting the Kafka Server

      In this section, we will create systemd unit files for the Kafka service. This will help us perform common service actions such as starting, stopping, and restarting Kafka in a manner consistent with other Linux services.

      Zookeeper is a service that Kafka uses to manage its cluster state and configurations. It is commonly used in many distributed systems as an integral component. If you would like to know more about it, visit the official Zookeeper docs.

      Create the unit file for zookeeper:

      • sudo vi /etc/systemd/system/zookeeper.service

      Enter the following unit definition into the file:

      /etc/systemd/system/zookeeper.service

      [Unit]
      Requires=network.target remote-fs.target
      After=network.target remote-fs.target
      
      [Service]
      Type=simple
      User=kafka
      ExecStart=/home/kafka/kafka/bin/zookeeper-server-start.sh /home/kafka/kafka/config/zookeeper.properties
      ExecStop=/home/kafka/kafka/bin/zookeeper-server-stop.sh
      Restart=on-abnormal
      
      [Install]
      WantedBy=multi-user.target
      

      The [Unit] section specifies that Zookeeper requires networking and the filesystem to be ready before it can start.

      The [Service] section specifies that systemd should use the zookeeper-server-start.sh and zookeeper-server-stop.sh shell files for starting and stopping the service. It also specifies that Zookeeper should be restarted automatically if it exits abnormally.

      Next, create the systemd service file for kafka:

      • sudo vi /etc/systemd/system/kafka.service

      Enter the following unit definition into the file:

      /etc/systemd/system/kafka.service

      [Unit]
      Requires=zookeeper.service
      After=zookeeper.service
      
      [Service]
      Type=simple
      User=kafka
      ExecStart=/bin/sh -c '/home/kafka/kafka/bin/kafka-server-start.sh /home/kafka/kafka/config/server.properties > /home/kafka/kafka/kafka.log 2>&1'
      ExecStop=/home/kafka/kafka/bin/kafka-server-stop.sh
      Restart=on-abnormal
      
      [Install]
      WantedBy=multi-user.target
      

      The [Unit] section specifies that this unit file depends on zookeeper.service. This will ensure that zookeeper gets started automatically when the kafa service starts.

      The [Service] section specifies that systemd should use the kafka-server-start.sh and kafka-server-stop.sh shell files for starting and stopping the service. It also specifies that Kafka should be restarted automatically if it exits abnormally.

      Now that the units have been defined, start Kafka with the following command:

      • sudo systemctl start kafka

      To ensure that the server has started successfully, check the journal logs for the kafka unit:

      You should see output similar to the following:

      Output

      Jul 17 18:38:59 kafka-centos systemd[1]: Started kafka.service.

      You now have a Kafka server listening on port 9092.

      While we have started the kafka service, if we were to reboot our server, it would not be started automatically. To enable kafka on server boot, run:

      • sudo systemctl enable kafka

      Now that we've started and enabled the services, let's check the installation.

      Step 5 — Testing the Installation

      Let's publish and consume a "Hello World" message to make sure the Kafka server is behaving correctly. Publishing messages in Kafka requires:

      • A producer, which enables the publication of records and data to topics.
      • A consumer, which reads messages and data from topics.

      First, create a topic named TutorialTopic by typing:

      • ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic TutorialTopic

      You can create a producer from the command line using the kafka-console-producer.sh script. It expects the Kafka server's hostname, port, and a topic name as arguments.

      Publish the string "Hello, World" to the TutorialTopic topic by typing:

      • echo "Hello, World" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic TutorialTopic > /dev/null

      Next, you can create a Kafka consumer using the kafka-console-consumer.sh script. It expects the ZooKeeper server's hostname and port, along with a topic name as arguments.

      The following command consumes messages from TutorialTopic. Note the use of the --from-beginning flag, which allows the consumption of messages that were published before the consumer was started:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic TutorialTopic --from-beginning

      If there are no configuration issues, you should see Hello, World in your terminal:

      Output

      Hello, World

      The script will continue to run, waiting for more messages to be published to the topic. Feel free to open a new terminal and start a producer to publish a few more messages. You should be able to see them all in the consumer's output.

      When you are done testing, press CTRL+C to stop the consumer script. Now that we have tested the installation, let's move on to installing KafkaT.

      Step 6 — Installing KafkaT (Optional)

      KafkaT is a tool from Airbnb that makes it easier for you to view details about your Kafka cluster and perform certain administrative tasks from the command line. Because it is a Ruby gem, you will need Ruby to use it. You will also need ruby-devel and build-related packages such as make and gcc to be able to build the other gems it depends on. Install them using yum:

      • sudo yum install ruby ruby-devel make gcc patch

      You can now install KafkaT using the gem command:

      KafkaT uses .kafkatcfg as the configuration file to determine the installation and log directories of your Kafka server. It should also have an entry pointing KafkaT to your ZooKeeper instance.

      Create a new file called .kafkatcfg:

      Add the following lines to specify the required information about your Kafka server and Zookeeper instance:

      ~/.kafkatcfg

      {
        "kafka_path": "~/kafka",
        "log_path": "/tmp/kafka-logs",
        "zk_path": "localhost:2181"
      }
      

      You are now ready to use KafkaT. For a start, here's how you would use it to view details about all Kafka partitions:

      You will see the following output:

      Output

      Topic Partition Leader Replicas ISRs TutorialTopic 0 0 [0] [0] __consumer_offsets 0 0 [0] [0] ... ...

      You will see TutorialTopic, as well as __consumer_offsets, an internal topic used by Kafka for storing client-related information. You can safely ignore lines starting with __consumer_offsets.

      To learn more about KafkaT, refer to its GitHub repository.

      Step 7 — Setting Up a Multi-Node Cluster (Optional)

      If you want to create a multi-broker cluster using more CentOS 7 machines, you should repeat Step 1, Step 4, and Step 5 on each of the new machines. Additionally, you should make the following changes in the server.properties file for each:

      • The value of the broker.id property should be changed such that it is unique throughout the cluster. This property uniquely identifies each server in the cluster and can have any string as its value. For example, "server1", "server2", etc.

      • The value of the zookeeper.connect property should be changed such that all nodes point to the same ZooKeeper instance. This property specifies the Zookeeper instance's address and follows the <HOSTNAME/IP_ADDRESS>:<PORT> format. For example, "203.0.113.0:2181", "203.0.113.1:2181" etc.

      If you want to have multiple ZooKeeper instances for your cluster, the value of the zookeeper.connect property on each node should be an identical, comma-separated string listing the IP addresses and port numbers of all the ZooKeeper instances.

      Step 8 — Restricting the Kafka User

      Now that all of the installations are done, you can remove the kafka user's admin privileges. Before you do so, log out and log back in as any other non-root sudo user. If you are still running the same shell session you started this tutorial with, simply type exit.

      Remove the kafka user from the sudo group:

      • sudo gpasswd -d kafka wheel

      To further improve your Kafka server's security, lock the kafka user's password using the passwd command. This makes sure that nobody can directly log into the server using this account:

      At this point, only root or a sudo user can log in as kafka by typing in the following command:

      In the future, if you want to unlock it, use passwd with the -u option:

      You have now successfully restricted the kafka user's admin privileges.

      Conclusion

      You now have Apache Kafka running securely on your CentOS server. You can make use of it in your projects by creating Kafka producers and consumers using Kafka clients, which are available for most programming languages. To learn more about Kafka, you can also consult its documentation.



      Source link