One place for hosting & domains

      CentOS

      Como Usar o Traefik como um Proxy Reverso para Containers do Docker no CentOS 7


      O autor selecionou o Girls Who Code para receber uma doação como parte do programa Write for DOnations.

      Introdução

      O Docker pode ser uma maneira eficiente de executar aplicativos web em produção, mas você pode querer executar vários aplicativos no mesmo host do Docker. Nesta situação, você precisará configurar um proxy reverso, já que você só deseja expor as portas 80 e 443 para o resto do mundo.

      O Traefik é um proxy reverso que reconhece o Docker e inclui seu próprio painel de monitoramento ou dashboard. Neste tutorial, você usará o Traefik para rotear solicitações para dois containers de aplicação web diferentes: um container WordPress e um container Adminer, cada um falando com um banco de dados MySQL. Você irá configurar o Traefik para servir tudo através de HTTPS utilizando o Let’s Encrypt.

      Pré-requisitos

      Para acompanhar este tutorial, você vai precisar do seguinte:

      Passo 1 — Configurando e Executando o Traefik

      O projeto do Traefik tem uma imagem Docker oficial, portanto vamos utilizá-la para executar o Traefik em um container Docker.

      Mas antes de colocarmos o nosso container Traefik em funcionamento, precisamos criar um arquivo de configuração e configurar uma senha criptografada para que possamos acessar o painel de monitoramento.

      Usaremos o utilitário htpasswd para criar essa senha criptografada. Primeiro, instale o utilitário, que está incluído no pacote httpd-tools:

      • sudo yum install -y httpd-tools

      Em seguida, gere a senha com o htpasswd. Substitua senha_segura pela senha que você gostaria de usar para o usuário admin do Traefik:

      • htpasswd -nb admin senha_segura

      A saída do programa ficará assim:

      Output

      admin:$apr1$kEG/8JKj$yEXj8vKO7HDvkUMI/SbOO.

      Você utilizará essa saída no arquivo de configuração do Traefik para configurar a Autenticação Básica de HTTP para a verificação de integridade do Traefik e para o painel de monitoramento. Copie toda a linha de saída para poder colá-la mais tarde.

      Para configurar o servidor Traefik, criaremos um novo arquivo de configuração chamado traefik.toml usando o formato TOML. O TOML é uma linguagem de configuração semelhante ao arquivos INI, mas padronizado. Esse arquivo nos permite configurar o servidor Traefik e várias integrações, ou providers, que queremos usar. Neste tutorial, usaremos três dos provedores disponíveis do Traefik: api,docker e acme, que é usado para suportar o TLS utilizando o Let’s Encrypt.

      Abra seu novo arquivo no vi ou no seu editor de textos favorito:

      Entre no modo de inserção pressionando i, adicione dois EntryPoints nomeados http ehttps, que todos os backends terão acesso por padrão:

      traefik.toml

      defaultEntryPoints = ["http", "https"]
      

      Vamos configurar os EntryPoints http e https posteriormente neste arquivo.

      Em seguida, configure o provider api, que lhe dá acesso a uma interface do painel. É aqui que você irá colar a saída do comando htpasswd:

      traefik.toml

      ...
      [entryPoints]
        [entryPoints.dashboard]
          address = ":8080"
          [entryPoints.dashboard.auth]
            [entryPoints.dashboard.auth.basic]
              users = ["admin:sua_senha_criptografada"]
      
      [api]
      entrypoint="dashboard"
      

      O painel é uma aplicação web separada que será executada no container do Traefik. Vamos definir o painel para executar na porta 8080.

      A seção entrypoints.dashboard configura como nos conectaremos com o provider da api, e a seção entrypoints.dashboard.auth.basic configura a Autenticação Básica HTTP para o painel. Use a saída do comando htpasswd que você acabou de executar para o valor da entrada users. Você poderia especificar logins adicionais, separando-os com vírgulas.

      Definimos nosso primeiro entryPoint, mas precisaremos definir outros para comunicação HTTP e HTTPS padrão que não seja direcionada para o provider da api. A seção entryPoints configura os endereços que o Traefik e os containers com proxy podem escutar. Adicione estas linhas ao arquivo logo abaixo do cabeçalho entryPoints:

      traefik.toml

      ...
        [entryPoints.http]
          address = ":80"
            [entryPoints.http.redirect]
              entryPoint = "https"
        [entryPoints.https]
          address = ":443"
            [entryPoints.https.tls]
      ...
      

      O entrypoint http manipula a porta 80, enquanto o entrypoint https usa a porta443 para o TLS/SSL. Redirecionamos automaticamente todo o tráfego na porta 80 para o entrypoint https para forçar conexões seguras para todas as solicitações.

      Em seguida, adicione esta seção para configurar o suporte ao certificado Let's Encrypt do Traefik:

      traefik.toml

      ...
      [acme]
      email = "seu_email@seu_domínio"
      storage = "acme.json"
      entryPoint = "https"
      onHostRule = true
        [acme.httpChallenge]
        entryPoint = "http"
      

      Esta seção é chamada acme porque ACME é o nome do protocolo usado para se comunicar com o Let's Encrypt para gerenciar certificados. O serviço Let's Encrypt requer o registro com um endereço de e-mail válido, portanto, para que o Traefik gere certificados para nossos hosts, defina a chave email como seu endereço de e-mail. Em seguida, vamos especificar que armazenaremos as informações que vamos receber do Let's Encrypt em um arquivo JSON chamado acme.json. A chave entryPoint precisa apontar para a porta de manipulação do entrypoint 443, que no nosso caso é o entrypoint https.

      A chave onHostRule determina como o Traefik deve gerar certificados. Queremos buscar nossos certificados assim que nossos containers com os nomes de host especificados forem criados, e é isso que a configuração onHostRule fará.

      A seção acme.httpChallenge nos permite especificar como o Let's Encrypt pode verificar se o certificado deve ser gerado. Estamos configurando-o para servir um arquivo como parte do desafio através do entrypoint http.

      Finalmente, vamos configurar o provider docker adicionando estas linhas ao arquivo:

      traefik.toml

      ...
      [docker]
      domain = "seu_domínio"
      watch = true
      network = "web"
      

      O provedor docker permite que o Traefik atue como um proxy na frente dos containers do Docker. Configuramos o provider para vigiar ou watch por novos containers na rede web (que criaremos em breve) e os expor como subdomínios de seu_domínio.

      Neste ponto, o traefik.toml deve ter o seguinte conteúdo:

      traefik.toml

      defaultEntryPoints = ["http", "https"]
      
      [entryPoints]
        [entryPoints.dashboard]
          address = ":8080"
          [entryPoints.dashboard.auth]
            [entryPoints.dashboard.auth.basic]
              users = ["admin:sua_senha_criptografada"]
        [entryPoints.http]
          address = ":80"
            [entryPoints.http.redirect]
              entryPoint = "https"
        [entryPoints.https]
          address = ":443"
            [entryPoints.https.tls]
      
      [api]
      entrypoint="dashboard"
      
      [acme]
      email = "seu_email@seu_domínio"
      storage = "acme.json"
      entryPoint = "https"
      onHostRule = true
        [acme.httpChallenge]
        entryPoint = "http"
      
      [docker]
      domain = "seu_domínio"
      watch = true
      network = "web"
      

      Depois de adicionar o conteúdo, pressione ESC para sair do modo de inserção. Digite :x e depois ENTER para salvar e sair do arquivo. Com toda essa configuração pronta, podemos ativar o Traefik.

      Passo 2 – Executando o Container Traefik

      Em seguida, crie uma rede do Docker para o proxy compartilhar com os containers. A rede do Docker é necessária para que possamos usá-la com aplicações que são executadas usando o Docker Compose. Vamos chamar essa rede de web.

      • docker network create web

      Quando o container Traefik iniciar, nós o adicionaremos a essa rede. Em seguida, podemos adicionar containers adicionais a essa rede posteriormente para o Traefik fazer proxy.

      Em seguida, crie um arquivo vazio que conterá as informações do Let's Encrypt. Compartilharemos isso no container para que o Traefik possa usá-lo:

      O Traefik só poderá usar esse arquivo se o usuário root dentro do container tiver acesso exclusivo de leitura e gravação a ele. Para fazer isso, bloqueie as permissões em acme.json para que somente o proprietário do arquivo tenha permissão de leitura e gravação.

      Depois que o arquivo for repassado para o Docker, o proprietário será automaticamente alterado para o usuário root dentro do container.

      Finalmente, crie o container Traefik com este comando:

      • docker run -d
      • -v /var/run/docker.sock:/var/run/docker.sock
      • -v $PWD/traefik.toml:/traefik.toml
      • -v $PWD/acme.json:/acme.json
      • -p 80:80
      • -p 443:443
      • -l traefik.frontend.rule=Host:monitor.seu_domínio
      • -l traefik.port=8080
      • --network web
      • --name traefik
      • traefik:1.7.6-alpine

      O comando é um pouco longo, então vamos dividi-lo.

      Usamos a flag -d para executar o container em segundo plano como um daemon. Em seguida, compartilhamos nosso arquivo docker.sock dentro do container para que o processo do Traefik possa escutar por alterações nos containers. Compartilhamos também o arquivo de configuração traefik.toml e o arquivoacme.json que criamos dentro do container.

      Em seguida, mapeamos as portas :80 e :443 do nosso host Docker para as mesmas portas no container Traefik, para que o Traefik receba todo o tráfego HTTP e HTTPS para o servidor.

      Em seguida, configuramos dois labels do Docker que informam ao Traefik para direcionar o tráfego para o monitor.seu_domínio para a porta :8080 dentro do container do Traefik, expondo o painel de monitoramento.

      Configuramos a rede do container para web, e nomeamos o container para traefik.

      Finalmente, usamos a imagem traefik:1.7.6-alpine para este container, porque é pequena.

      Um ENTRYPOINT da imagem do Docker é um comando que sempre é executado quando um container é criado a partir da imagem. Neste caso, o comando é o binário traefik dentro do container. Você pode passar argumentos adicionais para esse comando quando você inicia o container, mas definimos todas as nossas configurações no arquivo traefik.toml.

      Com o container iniciado, agora você tem um painel que você pode acessar para ver a integridade de seus containers. Você também pode usar este painel para visualizar os frontends e backends que o Traefik registrou. Acesse o painel de monitoramento apontando seu navegador para https://monitor.seu_domínio. Você será solicitado a fornecer seu nome de usuário e senha, que são admin e a senha que você configurou no Passo 1.

      Uma vez logado, você verá uma interface semelhante a esta:

      Empty Traefik dashboard

      Ainda não há muito o que ver, mas deixe essa janela aberta e você verá o conteúdo mudar à medida que você adiciona containers para o Traefik trabalhar.

      Agora temos nosso proxy Traefik em execução, configurado para funcionar com o Docker, e pronto para monitorar outros containers Docker. Vamos iniciar alguns containers para que o Traefik possa agir como proxy para eles.

      Com o container do Traefik em execução, você está pronto para executar aplicações por trás dele. Vamos lançar os seguintes containers por trás do Traefik:

      1. Um blog usando a imagem oficial do WordPress.

      2. Um servidor de gerenciamento de banco de dados usando a imagem oficial do Adminer.

      Vamos gerenciar essas duas aplicações com o Docker Compose usando um arquivo docker-compose.yml. Abra o arquivo docker-compose.yml em seu editor:

      Adicione as seguintes linhas ao arquivo para especificar a versão e as redes que usaremos:

      docker-compose.yml

      version: "3"
      
      networks:
        web:
          external: true
        internal:
          external: false
      

      Usamos a versão 3 do Docker Compose porque é a mais nova versão principal do formato de arquivo Compose.

      Para o Traefik reconhecer nossas aplicações, elas devem fazer parte da mesma rede e, uma vez que criamos a rede manualmente, nós a inserimos especificando o nome da rede web e configurandoexternal para true. Em seguida, definimos outra rede para que possamos conectar nossos containers expostos a um container de banco de dados que não vamos expor por meio do Traefik. Chamaremos essa rede de internal.

      Em seguida, definiremos cada um dos nossos serviços ou services, um de cada vez. Vamos começar com o container blog, que basearemos na imagem oficial do WordPress. Adicione esta configuração ao arquivo:

      docker-compose.yml

      version: "3"
      ...
      
      services:
        blog:
          image: wordpress:4.9.8-apache
          environment:
            WORDPRESS_DB_PASSWORD:
          labels:
            - traefik.backend=blog
            - traefik.frontend.rule=Host:blog.seu_domínio
            - traefik.docker.network=web
            - traefik.port=80
          networks:
            - internal
            - web
          depends_on:
            - mysql
      

      A chave environment permite que você especifique variáveis de ambiente que serão definidas dentro do container. Ao não definir um valor para WORDPRESS_DB_PASSWORD, estamos dizendo ao Docker Compose para obter o valor de nosso shell e repassá-lo quando criamos o container. Vamos definir essa variável de ambiente em nosso shell antes de iniciar os containers. Dessa forma, não codificamos senhas no arquivo de configuração.

      A seção labels é onde você especifica os valores de configuração do Traefik. As labels do Docker não fazem nada sozinhas, mas o Traefik as lê para saber como tratar os containers. Veja o que cada uma dessas labels faz:

      • traefik.backend especifica o nome do serviço de backend no Traefik (que aponta para o container real blog).
      • traefik.frontend.rule=Host:blog.seu_domínio diz ao Traefik para examinar o host solicitado e, se ele corresponde ao padrão de blog.seu_domínio, ele deve rotear o tráfego para o container blog.
      • traefik.docker.network=web especifica qual rede procurar sob o Traefik para encontrar o IP interno para esse container. Como o nosso container Traefik tem acesso a todas as informações do Docker, ele possivelmente levaria o IP para a rede internal se não especificássemos isso.
      • traefik.port especifica a porta exposta que o Traefik deve usar para rotear o tráfego para esse container.

      Com essa configuração, todo o tráfego enviado para a porta 80 do host do Docker será roteado para o container blog.

      Atribuímos este container a duas redes diferentes para que o Traefik possa encontrá-lo através da rede web e possa se comunicar com o container do banco de dados através da rede internal.

      Por fim, a chave depends_on informa ao Docker Compose que este container precisa ser iniciado após suas dependências estarem sendo executadas. Como o WordPress precisa de um banco de dados para ser executado, devemos executar nosso container mysql antes de iniciar nosso containerblog.

      Em seguida, configure o serviço MySQL adicionando esta configuração ao seu arquivo:

      docker-compose.yml

      services:
      ...
        mysql:
          image: mysql:5.7
          environment:
            MYSQL_ROOT_PASSWORD:
          networks:
            - internal
          labels:
            - traefik.enable=false
      

      Estamos usando a imagem oficial do MySQL 5.7 para este container. Você notará que estamos mais uma vez usando um item environment sem um valor. As variáveis MYSQL_ROOT_PASSWORD eWORDPRESS_DB_PASSWORD precisarão ser configuradas com o mesmo valor para garantir que nosso container WordPress possa se comunicar com o MySQL. Nós não queremos expor o container mysql para o Traefik ou para o mundo externo, então estamos atribuindo este container apenas à rede internal. Como o Traefik tem acesso ao soquete do Docker, o processo ainda irá expor um frontend para o container mysql por padrão, então adicionaremos a label traefik.enable=false para especificar que o Traefik não deve expor este container.

      Por fim, adicione essa configuração para definir o container do Adminer:

      docker-compose.yml

      services:
      ...
        adminer:
          image: adminer:4.6.3-standalone
          labels:
            - traefik.backend=adminer
            - traefik.frontend.rule=Host:db-admin.seu_domínio
            - traefik.docker.network=web
            - traefik.port=8080
          networks:
            - internal
            - web
          depends_on:
            - mysql
      

      Este container é baseado na imagem oficial do Adminer. A configuração network e depends_on para este container corresponde exatamente ao que estamos usando para o container blog.

      No entanto, como estamos redirecionando todo o tráfego para a porta 80 em nosso host Docker diretamente para o container blog, precisamos configurar esse container de forma diferente para que o tráfego chegue ao container adminer. A linha traefik.frontend.rule=Host:db-admin.seu_domínio diz ao Traefik para examinar o host solicitado. Se ele corresponder ao padrão do db-admin.seu_domínio, o Traefik irá rotear o tráfego para o container adminer.

      Neste ponto, docker-compose.yml deve ter o seguinte conteúdo:

      docker-compose.yml

      version: "3"
      
      networks:
        web:
          external: true
        internal:
          external: false
      
      services:
        blog:
          image: wordpress:4.9.8-apache
          environment:
            WORDPRESS_DB_PASSWORD:
          labels:
            - traefik.backend=blog
            - traefik.frontend.rule=Host:blog.seu_domínio
            - traefik.docker.network=web
            - traefik.port=80
          networks:
            - internal
            - web
          depends_on:
            - mysql
        mysql:
          image: mysql:5.7
          environment:
            MYSQL_ROOT_PASSWORD:
          networks:
            - internal
          labels:
            - traefik.enable=false
        adminer:
          image: adminer:4.6.3-standalone
          labels:
            - traefik.backend=adminer
            - traefik.frontend.rule=Host:db-admin.seu_domínio
            - traefik.docker.network=web
            - traefik.port=8080
          networks:
            - internal
            - web
          depends_on:
            - mysql
      

      Salve o arquivo e saia do editor de texto.

      Em seguida, defina valores em seu shell para as variáveis WORDPRESS_DB_PASSWORD e MYSQL_ROOT_PASSWORD antes de iniciar seus containers:

      • export WORDPRESS_DB_PASSWORD=senha_segura_do_banco_de_dados
      • export MYSQL_ROOT_PASSWORD=senha_segura_do_banco_de_dados

      Substitua senha_segura_do_banco_de_dados pela sua senha do banco de dados desejada. Lembre-se de usar a mesma senha tanto para WORDPRESS_DB_PASSWORD quanto para MYSQL_ROOT_PASSWORD.

      Com estas variáveis definidas, execute os containers usando o docker-compose:

      Agora, dê outra olhada no painel de administrador do Traefik. Você verá que agora existe um backend e um frontend para os dois servidores expostos:

      Populated Traefik dashboard

      Navegue até blog.seu_domínio, substituindo seu_domínio pelo seu domínio. Você será redirecionado para uma conexão TLS e poderá agora concluir a configuração do WordPress:

      WordPress setup screen

      Agora acesse o Adminer visitando db-admin.seu_domínio no seu navegador, novamente substituindo seu_domínio pelo seu domínio. O container mysql não está exposto ao mundo externo, mas o container adminer tem acesso a ele através da rede internal do Docker que eles compartilham usando o nome do container mysql como um nome de host.

      Na tela de login do Adminer, use o nome de usuário root, use mysql para o server, e use o valor que você definiu para MYSQL_ROOT_PASSWORD para a senha. Uma vez logado, você verá a interface de usuário do Adminer:

      Adminer connected to the MySQL database

      Ambos os sites agora estão funcionando, e você pode usar o painel em monitor.seu_domínio para ficar de olho em suas aplicações.

      Conclusão

      Neste tutorial, você configurou o Traefik para fazer proxy das solicitações para outras aplicações em containers Docker.

      A configuração declarativa do Traefik no nível do container da aplicação facilita a configuração de mais serviços, e não há necessidade de reiniciar o container traefik quando você adiciona novas aplicações para fazer proxy, uma vez que o Traefik percebe as alterações imediatamente através do arquivo de soquete do Docker que ele está monitorando.

      Para saber mais sobre o que você pode fazer com o Traefik, consulte a documentação oficial do Traefik. Se você quiser explorar mais os containers Docker, confira Como Configurar um Registro Privado do Docker no Ubuntu 18.04 ou How To Secure a Containerized Node.js Application with Nginx, Let's Encrypt, and Docker Compose. Embora esses tutoriais sejam escritos para o Ubuntu 18.04, muitos dos comandos específicos do Docker podem ser usados para o CentOS 7.



      Source link

      How To Install and Use ClickHouse on CentOS 7


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      ClickHouse is an open-source, column-oriented analytics database created by Yandex for OLAP and big data use cases. ClickHouse’s support for real-time query processing makes it suitable for applications that require sub-second analytical results. ClickHouse’s query language is a dialect of SQL that enables powerful declarative querying capabilities while offering familiarity and a smaller learning curve for the end user.

      Column-oriented databases store records in blocks grouped by columns instead of rows. By not loading data for columns absent in the query, column-oriented databases spend less time reading data while completing queries. As a result, these databases can compute and return results much faster than traditional row-based systems for certain workloads, such as OLAP.

      Online Analytics Processing (OLAP) systems allow for organizing large amounts of data and performing complex queries. They are capable of managing petabytes of data and returning query results quickly. In this way, OLAP is useful for work in areas like data science and business analytics.

      In this tutorial, you’ll install the ClickHouse database server and client on your machine. You’ll use the DBMS for typical tasks and optionally enable remote access from another server so that you’ll be able to connect to the database from another machine. Then you’ll test ClickHouse by modeling and querying example website-visit data.

      Prerequisites

      • One CentOS 7 server with a sudo enabled non-root user and firewall setup. You can follow the initial server setup tutorial to create the user and this tutorial to set up the firewall.
      • (Optional) A secondary CentOS 7 server with a sudo enabled non-root user and firewall setup. You can follow the initial server setup tutorial and the additional setup tutorial for the firewall.

      Step 1 — Installing ClickHouse

      In this section, you will install the ClickHouse server and client programs using yum.

      First, SSH into your server by running:

      Install the base dependencies by executing:

      • sudo yum install -y pygpgme yum-utils

      The pygpgme packages is used for adding and verifying GPG signatures while the yum-utils allows easy management of source RPMs.

      Altinity, a ClickHouse consulting firm, maintains a YUM repository that has the latest version of ClickHouse. You'll add the repository's details to securely download validated ClickHouse packages by creating the file. To check the package contents, you can inspect the sources from which they are built at this Github project.

      Create the repository details file by executing:

      • sudo vi /etc/yum.repos.d/altinity_clickhouse.repo

      Next, add the following contents to the file:

      /etc/yum.repos.d/altinity_clickhouse.repo

      [altinity_clickhouse]
      name=altinity_clickhouse
      baseurl=https://packagecloud.io/altinity/clickhouse/el/7/$basearch
      repo_gpgcheck=1
      gpgcheck=0
      enabled=1
      gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey
      sslverify=1
      sslcacert=/etc/pki/tls/certs/ca-bundle.crt
      metadata_expire=300
      
      [altinity_clickhouse-source]
      name=altinity_clickhouse-source
      baseurl=https://packagecloud.io/altinity/clickhouse/el/7/SRPMS
      repo_gpgcheck=1
      gpgcheck=0
      enabled=1
      gpgkey=https://packagecloud.io/altinity/clickhouse/gpgkey
      sslverify=1
      sslcacert=/etc/pki/tls/certs/ca-bundle.crt
      metadata_expire=300
      

      Now that you've added the repositories, enable them with the following command:

      • sudo yum -q makecache -y --disablerepo='*' --enablerepo='altinity_clickhouse'

      The -q flag tells the command to run in quiet mode. The makecache command makes available the packages specified in the --enablerepo flag.

      On execution, you'll see output similar to the following:

      Output

      Importing GPG key 0x0F6E36F6: Userid : "https://packagecloud.io/altinity/clickhouse (https://packagecloud.io/docs#gpg_signing) <support@packagecloud.io>" Fingerprint: 7001 38a9 6a20 6b22 bf28 3c06 ed26 58f3 0f6e 36f6 From : https://packagecloud.io/altinity/clickhouse/gpgkey

      The output confirms it has successfully verified and added the GPG key.

      The clickhouse-server and clickhouse-client packages will now be available for installation. Install them with:

      • sudo yum install -y clickhouse-server clickhouse-client

      You've installed the ClickHouse server and client successfully. You're now ready to start the database service and ensure that it's running correctly.

      Step 2 — Starting the Service

      The clickhouse-server package that you installed in the previous section creates a systemd service, which performs actions such as starting, stopping, and restarting the database server. systemd is an init system for Linux to initialize and manage services. In this section you'll start the service and verify that it is running successfully.

      Start the clickhouse-server service by running:

      • sudo service clickhouse-server start

      You will see output similar to the following:

      Output

      Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/ DONE

      To verify that the service is running successfully, execute:

      • sudo service clickhouse-server status

      It will print an output similar to the following which denotes that the server is running properly:

      Output

      clickhouse-server service is running

      You have successfully started the ClickHouse server and will now be able to use the clickhouse-client CLI program to connect to the server.

      Step 3 — Creating Databases and Tables

      In ClickHouse, you can create and delete databases by executing SQL statements directly in the interactive database prompt. Statements consist of commands following a particular syntax that tell the database server to perform a requested operation along with any data required. You create databases by using the CREATE DATABASE table_name syntax. To create a database, first start a client session by running the following command:

      • clickhouse-client --multiline

      This command will log you into the client prompt where you can run ClickHouse SQL statements to perform actions such as:

      • Creating, updating, and deleting databases, tables, indexes, partitions, and views.

      • Executing queries to retrieve data that is optionally filtered and grouped using various conditions.

      The --multiline flag tells the CLI to allow entering queries that span multiple lines.

      In this step, with the ClickHouse client ready for inserting data, you're going to create a database and table. For the purposes of this tutorial, you'll create a database named test, and inside that you'll create a table named visits that tracks website-visit durations.

      Now that you're inside the ClickHouse command prompt, create your test database by executing:

      You'll see the following output that shows that you have created the database:

      Output

      CREATE DATABASE test Ok. 0 rows in set. Elapsed: 0.003 sec.

      A ClickHouse table is similar to tables in other relational databases; it holds a collection of related data in a structured format. You can specify columns along with their types, add rows of data, and execute different kinds of queries on tables.

      The syntax for creating tables in ClickHouse follows this example structure:

      CREATE TABLE table_name
      (
          column_name1 column_type [options],
          column_name2 column_type [options],
          ...
      ) ENGINE = engine
      

      The table_name and column_name values can be any valid ASCII identifiers. ClickHouse supports a wide range of column types; some of the most popular are:

      • UInt64: used for storing integer values in the range 0 to 18446744073709551615.

      • Float64: used for storing floating point numbers such as 2039.23, 10.5, etc.

      • String: used for storing variable length characters. It does not require a max length attribute since it can store arbitrary lengths.

      • Date: used for storing dates that follow the YYYY-MM-DD format.

      • DateTime: used for storing dates coupled with time and follows the YYYY-MM-DD HH:MM:SS format.

      After the column definitions, you specify the engine used for the table. In ClickHouse, Engines determine the physical structure of the underlying data, the table's querying capabilities, its concurrent access modes, and support for indexes. Different engine types are suitable for different application requirements. The most commonly used and widely applicable engine type is MergeTree.

      Now that you have an overview of table creation, you'll create a table. Start by confirming the database you'll be modifying:

      You will see the following output showing that you have switched to the test database from the default database:

      Output

      USE test Ok. 0 rows in set. Elapsed: 0.001 sec.

      The remainder of this guide will assume that you are executing statements within this database's context.

      Create your visits table by running this command:

      • CREATE TABLE visits (
      • id UInt64,
      • duration Float64,
      • url String,
      • created DateTime
      • ) ENGINE = MergeTree()
      • PRIMARY KEY id
      • ORDER BY id;

      Here's a breakdown of what the command does. You create a table named visits that has four columns:

      • id: The primary key column. Similarly to other RDBMS systems, a primary key column in ClickHouse uniquely identifies a row; each row should have a unique value for this column.

      • duration: A float column used to store the duration of each visit in seconds. float columns can store decimal values such as 12.50.

      • url: A string column that stores the URL visited, such as http://example.com.

      • created: A date and time column that tracks when the visit occurred.

      After the column definitions, you specify MergeTree as the storage engine for the table. The MergeTree family of engines is recommended for production databases due to its optimized support for large real-time inserts, overall robustness, and query support. Additionally, MergeTree engines support sorting of rows by primary key, partitioning of rows, and replicating and sampling data.

      If you intend to use ClickHouse for archiving data that is not queried often or for storing temporary data, you can use the Log family of engines to optimize for that use-case.

      After the column definitions, you'll define other table-level options. The PRIMARY KEY clause sets id as the primary key column and the ORDER BY clause will store values sorted by the id column. A primary key uniquely identifies a row and is used for efficiently accessing a single row and efficient colocation of rows.

      On executing the create statement, you will see the following output:

      Output

      CREATE TABLE visits ( id UInt64, duration Float64, url String, created DateTime ) ENGINE = MergeTree() PRIMARY KEY id ORDER BY id Ok. 0 rows in set. Elapsed: 0.010 sec.

      In this section, you've created a database and a table to track website-visits data. In the next step, you'll insert data into the table, update existing data, and delete that data.

      Step 4 — Inserting, Updating, and Deleting Data and Columns

      In this step, you'll use your visits table to insert, update, and delete data. The following command is an example of the syntax for inserting rows into a ClickHouse table:

      INSERT INTO table_name VALUES (column_1_value, column_2_value, ....);
      

      Now, insert a few rows of example website-visit data into your visits table by running each of the following statements:

      • INSERT INTO visits VALUES (1, 10.5, 'http://example.com', '2019-01-01 00:01:01');
      • INSERT INTO visits VALUES (2, 40.2, 'http://example1.com', '2019-01-03 10:01:01');
      • INSERT INTO visits VALUES (3, 13, 'http://example2.com', '2019-01-03 12:01:01');
      • INSERT INTO visits VALUES (4, 2, 'http://example3.com', '2019-01-04 02:01:01');

      You'll see the following output repeated for each insert statement:

      Output

      INSERT INTO visits VALUES Ok. 1 rows in set. Elapsed: 0.004 sec.

      The output for each row shows that you've inserted it successfully into the visits table.

      Now you'll add an additional column to the visits table. When adding or deleting columns from existing tables, ClickHouse supports the ALTER syntax.

      For example, the basic syntax for adding a column to a table is as follows:

      ALTER TABLE table_name ADD COLUMN column_name column_type;
      

      Add a column named location that will store the location of the visits to a website by running the following statement:

      • ALTER TABLE visits ADD COLUMN location String;

      You'll see output similar to the following:

      Output

      ALTER TABLE visits ADD COLUMN location String Ok. 0 rows in set. Elapsed: 0.014 sec.

      The output shows that you have added the location column successfully.

      As of version 19.4.3, ClickHouse doesn't support updating and deleting individual rows of data due to implementation constraints. ClickHouse has support for bulk updates and deletes, however, and has a distinct SQL syntax for these operations to highlight their non-standard usage.

      The following syntax is an example for bulk updating rows:

      ALTER TABLE table_name UPDATE  column_1 = value_1, column_2 = value_2 ...  WHERE  filter_conditions;
      

      You'll run the following statement to update the url column of all rows that have a duration of less than 15. Enter it into the database prompt to execute:

      • ALTER TABLE visits UPDATE url = 'http://example2.com' WHERE duration < 15;

      The output of the bulk update statement will be as follows:

      Output

      ALTER TABLE visits UPDATE url = 'http://example2.com' WHERE duration < 15 Ok. 0 rows in set. Elapsed: 0.003 sec.

      The output shows that your update query completed successfully. The 0 rows in set in the output denotes that the query did not return any rows; this will be the case for any update and delete queries.

      The example syntax for bulk deleting rows is similar to updating rows and has the following structure:

      ALTER TABLE table_name DELETE WHERE filter_conditions;
      

      To test deleting data, run the following statement to remove all rows that have a duration of less than 5:

      • ALTER TABLE visits DELETE WHERE duration < 5;

      The output of the bulk delete statement will be similar to:

      Output

      ALTER TABLE visits DELETE WHERE duration < 5 Ok. 0 rows in set. Elapsed: 0.003 sec.

      The output confirms that you have deleted the rows with a duration of less than five seconds.

      To delete columns from your table, the syntax would follow this example structure:

      ALTER TABLE table_name DROP COLUMN column_name;
      

      Delete the location column you added previously by running the following:

      • ALTER TABLE visits DROP COLUMN location;

      The DROP COLUMN output confirming that you have deleted the column will be as follows:

      Output

      ALTER TABLE visits DROP COLUMN location String Ok. 0 rows in set. Elapsed: 0.010 sec.

      Now that you've successfully inserted, updated, and deleted rows and columns in your visits table, you'll move on to query data in the next step.

      Step 5 — Querying Data

      ClickHouse's query language is a custom dialect of SQL with extensions and functions suited for analytics workloads. In this step, you'll run selection and aggregation queries to retrieve data and results from your visits table.

      Selection queries allow you to retrieve rows and columns of data filtered by conditions that you specify, along with options such as the number of rows to return. You can select rows and columns of data using the SELECT syntax. The basic syntax for SELECT queries is:

      SELECT func_1(column_1), func_2(column_2) FROM table_name WHERE filter_conditions row_options;
      

      Execute the following statement to retrieve url and duration values for rows where the url is http://example.com:

      • SELECT url, duration FROM visits WHERE url = 'http://example2.com' LIMIT 2;

      You will see the following output:

      Output

      SELECT url, duration FROM visits WHERE url = 'http://example2.com' LIMIT 2 ┌─url─────────────────┬─duration─┐ │ http://example2.com │ 10.5 │ └─────────────────────┴──────────┘ ┌─url─────────────────┬─duration─┐ │ http://example2.com │ 13 │ └─────────────────────┴──────────┘ 2 rows in set. Elapsed: 0.013 sec.

      The output has returned two rows that match the conditions you specified. Now that you've selected values, you can move to executing aggregation queries.

      Aggregation queries are queries that operate on a set of values and return single output values. In analytics databases, these queries are run frequently and are well optimized by the database. Some aggregate functions supported by ClickHouse are:

      • count: returns the count of rows matching the conditions specified.

      • sum: returns the sum of selected column values.

      • avg: returns the average of selected column values.

      Some ClickHouse-specific aggregate functions include:

      • uniq: returns an approximate number of distinct rows matched.

      • topK: returns an array of the most frequent values of a specific column using an approximation algorithm.

      To demonstrate the execution of aggregation queries, you'll calculate the total duration of visits by running the sum query:

      • SELECT SUM(duration) FROM visits;

      You will see output similar to the following:

      Output

      SELECT SUM(duration) FROM visits ┌─SUM(duration)─┐ │ 63.7 │ └───────────────┘ 1 rows in set. Elapsed: 0.010 sec.

      Now, calculate the top two URLs by executing:

      • SELECT topK(2)(url) FROM visits;

      You will see output similar to the following:

      Output

      SELECT topK(2)(url) FROM visits ┌─topK(2)(url)──────────────────────────────────┐ │ ['http://example2.com','http://example1.com'] │ └───────────────────────────────────────────────┘ 1 rows in set. Elapsed: 0.010 sec.

      Now that you have successfully queried your visits table, you'll delete tables and databases in the next step.

      Step 6 — Deleting Tables and Databases

      In this section, you'll delete your visits table and test database.

      The syntax for deleting tables follows this example:

      DROP TABLE table_name;
      

      To delete the visits table, run the following statement:

      You will see the following output declaring that you've deleted the table successfully:

      Output

      DROP TABLE visits Ok. 0 rows in set. Elapsed: 0.005 sec.

      You can delete databases using the DROP database table_name syntax. To delete the test database, execute the following statement:

      The resulting output shows that you've deleted the database successfully:

      Output

      DROP DATABASE test Ok. 0 rows in set. Elapsed: 0.003 sec.

      You've deleted tables and databases in this step. Now that you've created, updated, and deleted databases, tables, and data in your ClickHouse instance, you'll enable remote access to your database server in the next section.

      Step 7 — Setting Up Firewall Rules (Optional)

      If you intend to only use ClickHouse locally with applications running on the same server, or do not have a firewall enabled on your server, you don't need to complete this section. If instead, you'll be connecting to the ClickHouse database server remotely, you should follow this step.

      Currently your server has a firewall enabled that disables your public IP address accessing all ports. You'll complete the following two steps to allow remote access:

      • Add a firewall rule allowing incoming connections to port 8123, which is the HTTP port that ClickHouse server runs.

      If you are inside the database prompt, exit it by typing CTRL+D.

      Edit the configuration file by executing:

      • sudo vi /etc/clickhouse-server/config.xml

      Then uncomment the line containing <!-- <listen_host>0.0.0.0</listen_host> -->, like the following file:

      /etc/clickhouse-server/config.xml

      
      ...
       <interserver_http_host>example.yandex.ru</interserver_http_host>
          -->
      
          <!-- Listen specified host. use :: (wildcard IPv6 address), if you want to accept connections both with IPv4 and IPv6 from everywhere. -->
          <!-- <listen_host>::</listen_host> -->
          <!-- Same for hosts with disabled ipv6: -->
          <listen_host>0.0.0.0</listen_host>
      
          <!-- Default values - try listen localhost on ipv4 and ipv6: -->
          <!--
          <listen_host>::1</listen_host>
          <listen_host>127.0.0.1</listen_host>
          -->
      ...
      
      

      Save the file and exit vi. For the new configuration to apply restart the service by running:

      • sudo service clickhouse-server restart

      You will see the following output from this command:

      Output

      Stop clickhouse-server service: DONE Start clickhouse-server service: Path to data directory in /etc/clickhouse-server/config.xml: /var/lib/clickhouse/ DONE

      Add the remote server's IP to zone called public:

      • sudo firewall-cmd --permanent --zone=public --add-source=second_server_ip/32

      ClickHouse's server listens on port 8123 for HTTP connections and port 9000 for connections from clickhouse-client. Allow access to both ports for your second server's IP address with the following command:

      • sudo firewall-cmd --permanent --zone=public --add-port=8123/tcp
      • sudo firewall-cmd --permanent --zone=public --add-port=9000/tcp

      You will see the following output for both commands that shows that you've enabled access to both ports:

      Output

      success

      Now that you have added the rules, reload the firewall for the changes to take effect:

      • sudo firewall-cmd --reload

      This command will output a success message as well. ClickHouse will now be accessible from the IP that you added. Feel free to add additional IPs such as your local machine's address if required.

      To verify that you can connect to the ClickHouse server from the remote machine, first follow the steps in Step 1 of this tutorial on the second server and ensure that you have the clickhouse-client installed on it.

      Now that you have logged into the second server, start a client session by executing:

      • clickhouse-client --host your_server_ip --multiline

      You will see the following output that shows that you have connected successfully to the server:

      Output

      ClickHouse client version 19.4.3. Connecting to your_server_ip:9000 as user default. Connected to ClickHouse server version 19.4.3 revision 54416. hostname 🙂

      In this step, you've enabled remote access to your ClickHouse database server by adjusting your firewall rules.

      Conclusion

      You have successfully set up a ClickHouse database instance on your server and created a database and table, added data, performed queries, and deleted the database. Within ClickHouse's documentation you can read about their benchmarks against other open-source and commercial analytics databases and general reference documents. Further features ClickHouse offers includes distributed query processing across multiple servers to improve performance and protect against data loss by storing data over different shards.



      Source link

      How To Gather Infrastructure Metrics with Metricbeat on CentOS 7


      The author selected the Computer History Museum to receive a donation as part of the Write for DOnations program.

      Introduction

      Metricbeat, which is one of several Beats that helps send various types of server data to an Elastic Stack server, is a lightweight data shipper that, once installed on your servers, periodically collects system-wide and per-process CPU and memory statistics and sends the data directly to your Elasticsearch deployment. This shipper replaced the earlier Topbeat in version 5.0 of the Elastic Stack.

      Other Beats currently available from Elastic are:

      • Filebeat: collects and ships log files.
      • Packetbeat: collects and analyzes network data.
      • Winlogbeat: collects Windows event logs.
      • Auditbeat: collects Linux audit framework data and monitors file integrity.
      • Heartbeat: monitors services for their availability with active probing.

      In this tutorial, you will use Metricbeat to forward local system metrics like CPU/memory/disk usage and network utilization from a CentOS 7 server to another server of the same kind with the Elastic Stack installed. With this shipper, you will gather the basic metrics that you need to get the current state of your server.

      Prerequisites

      To follow this tutorial, you will need:

      Note: When installing the Elastic Stack, you must use the same version across the entire stack. In this tutorial, you will use the latest versions of the entire stack, which are, at the time of this writing, Elasticsearch 6.7.0, Kibana 6.7.0, Logstash 6.7.0, and Metricbeat 6.7.0.

      Step 1 — Configuring Elasticsearch to Listen for Traffic on an External IP

      The tutorial How To Install Elasticsearch, Logstash, and Kibana (Elastic Stack) on CentOS 7 restricted Elasticsearch access to the localhost only. In practice, this is rare, since you will often need to monitor many hosts. In this step, you will configure the Elastic Stack components to interact with the external IP address.

      Log in to your Elastic Stack server as your non-root user:

      • ssh sammy@Elastic_Stack_server_ip

      Use your preferred text editor to edit Elasticsearch’s main configuration file, elasticsearch.yml. This tutorial will use vi:

      • sudo vi /etc/elasticsearch/elasticsearch.yml

      Find the following section and modify it so that Elasticsearch listens on all interfaces. Enter insert mode by pressing i, then add the following highlighted item:

      /etc/elasticsearch/elasticsearch.yml

      ...
      network.host: 0.0.0.0
      ...
      

      The address 0.0.0.0 is assigned specific meanings in a number of contexts. In this case, 0.0.0.0 means “any IPv4 address at all.”

      When you’re finished, press ESC to leave insert mode, then :wq and ENTER to save and exit the file. To learn more about the text editor vi and its successor Vim, check out our Installing and Using the Vim Text Editor on a Cloud Server tutorial. After you have saved and exited the file, restart the Elasticsearch service with systemctl to apply the new settings:

      • sudo systemctl restart elasticsearch

      Now, allow access to the Elasticsearch port from your second CentOS server. In order to configure access coming from specific IP addresses or subnets, use the rich rule functionality of firewalld:

      • sudo firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="second_centos_server_ip/32" port protocol="tcp" port="9200" accept'

      Rich rules allow you to create more complex and customizable firewalld rules to gain greater control over your firewall. In this command, you are adding a rule that accepts ipv4 traffic from the source, which you have set as the IP address of the second CentOS server, to port 9200 of your Elastic Stack server.

      Next, reload firewalld to activate the new rule:

      • sudo firewall-cmd --reload

      Repeat these commands for each of your servers if you have more than two. If your servers are on the same network, you can allow access using one rule for all hosts on the network. To do this, you need to replace the /32 after the IP address with a lower value, for example /24.

      Next, test the connection. Log in to your second CentOS server as your non-root user:

      • ssh sammy@second_centos_server_ip

      Use the curl command to test the connection to the Elastic Stack server:

      • curl Elastic_Stack_server_ip:9200

      You’ll receive output similar to the following:

      Output

      { "name" : "tl5Is5f", "cluster_name" : "elasticsearch", "cluster_uuid" : "W9AcSNWHQ3mYs2uE8odklA", "version" : { "number" : "6.7.0", "build_flavor" : "default", "build_type" : "rpm", "build_hash" : "3bd3e59", "build_date" : "2019-03-06T15:16:26.864148Z", "build_snapshot" : false, "lucene_version" : "7.6.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }

      Now that you know the connection works, you are ready to send metrics to your Elastic Stack server.

      Step 2 — Installing and Configuring Metricbeat on the Elastic Stack Server

      In the next two steps, you will first install Metricbeat on the Elastic Stack server and import all the needed data, then install and configure the client on the second CentOS server.

      Log into your Elastic Stack server as your non-root user:

      • ssh sammy@Elastic_Stack_server_ip

      Since you previously set up the Elasticsearch repositories in the prerequisite, you only need to install Metricbeat:

      • sudo yum install metricbeat

      Once the installation is complete, load the index template into Elasticsearch. An Elasticsearch index is a collection of documents that have similar characteristics. Specific names identify each index, which Elasticsearch will use to refer to the indexes when performing various operations. Your Elasticsearch server will automatically apply the index template when you create a new index.

      To load the template, use the following command:

      • sudo metricbeat setup --template -E 'output.elasticsearch.hosts=["localhost:9200"]'

      You will see the following output:

      Output

      Loaded index template

      Metricbeat comes packaged with example Kibana dashboards, visualizations, and searches for visualizing Metricbeat data in Kibana. Before you can use the dashboards, you need to create the index pattern and load the dashboards into Kibana.

      To load the templates, use the following command:

      • sudo metricbeat setup -e -E output.elasticsearch.hosts=['localhost:9200'] -E setup.kibana.host=localhost:5601

      You will see output that looks like this:

      Output

      ... 2019-03-20T09:51:32.096Z INFO instance/beat.go:281 Setup Beat: metricbeat; Version: 6.7.0 2019-03-20T09:51:32.136Z INFO add_cloud_metadata/add_cloud_metadata.go:323 add_cloud_metadata: hosting provider type detected as digitalocean, metadata={"instance_id":"133130541","provider":"digitalocean","region":"fra1"} 2019-03-20T09:51:32.137Z INFO elasticsearch/client.go:165 Elasticsearch url: http://localhost:9200 2019-03-20T09:51:32.137Z INFO [publisher] pipeline/module.go:110 Beat name: elastic 2019-03-20T09:51:32.138Z INFO elasticsearch/client.go:165 Elasticsearch url: http://localhost:9200 2019-03-20T09:51:32.140Z INFO elasticsearch/client.go:721 Connected to Elasticsearch version 6.7.0 2019-03-20T09:51:32.148Z INFO template/load.go:130 Template already exists and will not be overwritten. 2019-03-20T09:51:32.148Z INFO instance/beat.go:894 Template successfully loaded. Loaded index template Loading dashboards (Kibana must be running and reachable) 2019-03-20T09:51:32.149Z INFO elasticsearch/client.go:165 Elasticsearch url: http://localhost:9200 2019-03-20T09:51:32.150Z INFO elasticsearch/client.go:721 Connected to Elasticsearch version 6.7.0 2019-03-20T09:51:32.151Z INFO kibana/client.go:118 Kibana url: http://localhost:5601 2019-03-20T09:51:56.209Z INFO instance/beat.go:741 Kibana dashboards successfully loaded. Loaded dashboards

      Now you can start Metricbeat:

      • sudo systemctl start metricbeat

      To make Metricbeat start automatically at boot from now on, use the enable command:

      • sudo systemctl enable metricbeat

      Metricbeat will begin shipping your system stats into Elasticsearch.

      To verify that Elasticsearch is indeed receiving this data, query the Metricbeat index with this command:

      • curl -XGET 'http://localhost:9200/metricbeat-*/_search?pretty'

      You will see an output that looks similar to this:

      Output

      ... { "took" : 3, "timed_out" : false, "_shards" : { "total" : 1, "successful" : 1, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 108, "max_score" : 1.0, "hits" : [ { "_index" : "metricbeat-6.7.0-2019.03.20", "_type" : "doc", "_id" : "A4mU8GgBKrpxEYMLjJZt", "_score" : 1.0, "_source" : { "@timestamp" : "2019-03-20T09:54:52.481Z", "metricset" : { "name" : "network", "module" : "system", "rtt" : 125 }, "event" : { "dataset" : "system.network", "duration" : 125260 }, "system" : { "network" : { "in" : { "packets" : 59728, "errors" : 0, "dropped" : 0, "bytes" : 736491211 }, "out" : { "dropped" : 0, "packets" : 31630, "bytes" : 8283069, "errors" : 0 }, "name" : "eth0" } }, "beat" : { "version" : "6.7.0", "name" : "elastic", "hostname" : "elastic" }, ...

      The line "total" : 108, indicates that Metricbeat has found 108 search results for this specific metric. Any number of search results indicates that Metricbeat is working; if your output shows 0 total hits, you will need to review your setup for errors. If you received the expected output, continue to the next step, in which you will install Metricbeat on the second CentOS server.

      Step 3 — Installing and Configuring Metricbeat on the Second CentOS Server

      Perform this step on all CentOS servers from which you want to send metrics to your Elastic Stack server. If you also have Ubuntu servers, you can install Metricbeat by following Step 3 of How To Gather Infrastructure Metrics with Metricbeat on Ubuntu 18.04.

      Log into your second CentOS server as your non-root user:

      • ssh sammy@second_centos_server_ip

      The Elastic Stack components are not available through the yum package manager by default, but you can install them by adding Elastic’s package repository.

      All of the Elastic Stack’s packages are signed with the Elasticsearch signing key in order to protect your system from package spoofing. Your package manager will trust packages that have been authenticated using the key. In this step, you will import the Elasticsearch public GPG key and add the Elastic package source list in order to install Metricbeat.

      To begin, run the following command to download and install the Elasticsearch public signing key:

      • sudo rpm --import https://artifacts.elastic.co/GPG-KEY-elasticsearch

      Next, add the Elastic repository. Use your preferred text editor to create the file elasticsearch.repo in the /etc/yum.repos.d/ directory:

      • sudo vi /etc/yum.repos.d/elasticsearch.repo

      To provide yum with the information it needs to download and install the components of the Elastic Stack, enter insert mode by pressing i and add the following lines to the file:

      /etc/yum.repos.d/elasticsearch.repo

      [elasticsearch-6.x]
      name=Elasticsearch repository for 6.x packages
      baseurl=https://artifacts.elastic.co/packages/6.x/yum
      gpgcheck=1
      gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
      enabled=1
      autorefresh=1
      type=rpm-md
      

      When you’re finished, save and close the file.

      Next, install Metricbeat with this command:

      • sudo yum install metricbeat

      Once Metricbeat is finished installing, configure it to connect to Elasticsearch. Open its configuration file, metricbeat.yml:

      • sudo vi /etc/metricbeat/metricbeat.yml

      Note: Metricbeat’s configuration file is in YAML format, which means that indentation is very important! Be sure that you do not add any extra spaces as you edit this file.

      Metricbeat supports numerous outputs, but you’ll usually only send events directly to Elasticsearch or to Logstash for additional processing. Find the following section and update the IP address:

      /etc/metricbeat/metricbeat.yml

      #-------------------------- Elasticsearch output ------------------------------
      output.elasticsearch:
        # Array of hosts to connect to.
        hosts: ["Elastic_Stack_server_ip:9200"]
      
      ...
      

      Save and close the file.

      You can extend the functionality of Metricbeat with modules. In this tutorial, you will use the system module, which allows you to monitor your server’s stats like CPU/memory/disk usage and network utilization.

      In this case, the system module is enabled by default. You can see a list of enabled and disabled modules by running:

      • sudo metricbeat modules list

      You will see a list similar to the following:

      Output

      Enabled: system Disabled: aerospike apache ceph couchbase docker dropwizard elasticsearch envoyproxy etcd golang graphite haproxy http jolokia kafka kibana kubernetes kvm logstash memcached mongodb munin mysql nginx php_fpm postgresql prometheus rabbitmq redis traefik uwsgi vsphere windows zookeeper

      You can see the parameters of the module in the /etc/metricbeat/modules.d/system.yml configuration file. In the case of this tutorial, you do not need to change anything in the configuration. The default metricsets are cpu, load, memory, network, process, and process_summary. Each module has one or more metricset. A metricset is the part of the module that fetches and structures the data. Rather than collecting each metric as a separate event, metricsets retrieve a list of multiple related metrics in a single request to the remote system.

      Now you can start and enable Metricbeat:

      • sudo systemctl start metricbeat
      • sudo systemctl enable metricbeat

      Repeat this step on all servers where you want to collect metrics. After that, you can proceed to the next step in which you will see how to navigate through some of Kibana’s dashboards.

      Step 4 — Exploring Kibana Dashboards

      In this step, you will take a look at Kibana, the web interface that you installed in the Prerequisites section.

      In a web browser, go to the FQDN or public IP address of your Elastic Stack server. After entering the login credentials you defined in Step 2 of the Elastic Stack tutorial, you will see the Kibana homepage:

      Kibana Homepage

      Click the Discover link in the left-hand navigation bar. On the Discover page, select the predefined meticbeat-* index pattern to see Metricbeat data. By default, this will show you all of the log data over the last 15 minutes. You will find a histogram and some metric details:

      Discover page

      Here, you can search and browse through your metrics and also customize your dashboard. At this point, though, there won’t be much in there because you are only gathering system stats from your servers.

      Use the left-hand panel to navigate to the Dashboard page and search for the Metricbeat System dashboard. Once there, you can search for the sample dashboards that come with Metricbeat’s system module.

      For example, you can view brief information about all your hosts:

      Syslog Dashboard

      You can also click on the host name and view more detailed information:

      Sudo Dashboard

      Kibana has many other features, such as graphing and filtering, so feel free to explore.

      Conclusion

      In this tutorial, you’ve installed Metricbeat and configured the Elastic Stack to collect and analyze system metrics. Metricbeat comes with internal modules that collect metrics from services like Apache, Nginx, Docker, MySQL, PostgreSQL, and more. Now you can collect and analyze the metrics of your applications by simply turning on the modules you need.

      If you want to understand more about server monitoring, check out An Introduction to Metrics, Monitoring, and Alerting and Putting Monitoring and Alerting into Practice.



      Source link