One place for hosting & domains

      Elasticsearch

      How to Set Up an Elasticsearch, Fluentd and Kibana (EFK) Logging Stack on Kubernetes


      Introduction

      When running multiple services and applications on a Kubernetes cluster, a centralized, cluster-level logging stack can help you quickly sort through and analyze the heavy volume of log data produced by your Pods. One popular centralized logging solution is the Elasticsearch, Fluentd, and Kibana (EFK) stack.

      Elasticsearch is a real-time, distributed, and scalable search engine which allows for full-text and structured search, as well as analytics. It is commonly used to index and search through large volumes of log data, but can also be used to search many different kinds of documents.

      Elasticsearch is commonly deployed alongside Kibana, a powerful data visualization frontend and dashboard for Elasticsearch. Kibana allows you to explore your Elasticsearch log data through a web interface, and build dashboards and queries to quickly answer questions and gain insight into your Kubernetes applications.

      In this tutorial we’ll use Fluentd to collect, transform, and ship log data to the Elasticsearch backend. Fluentd is a popular open-source data collector that we’ll set up on our Kubernetes nodes to tail container log files, filter and transform the log data, and deliver it to the Elasticsearch cluster, where it will be indexed and stored.

      We’ll begin by configuring and launching a scalable Elasticsearch cluster, and then create the Kibana Kubernetes Service and Deployment. To conclude, we’ll set up Fluentd as a DaemonSet so it runs on every Kubernetes worker node.

      Prerequisites

      Before you begin with this guide, ensure you have the following available to you:

      • A Kubernetes 1.10+ cluster with role-based access control (RBAC) enabled

        • Ensure your cluster has enough resources available to roll out the EFK stack, and if not scale your cluster by adding worker nodes. We’ll be deploying a 3-Pod Elasticsearch cluster (you can scale this down to 1 if necessary), as well as a single Kibana Pod. Every worker node will also run a Fluentd Pod. The cluster in this guide consists of 3 worker nodes and a managed control plane.
      • The kubectl command-line tool installed on your local machine, configured to connect to your cluster. You can read more about installing kubectl in the official documentation.

      Once you have these components set up, you’re ready to begin with this guide.

      Step 1 — Creating a Namespace

      Before we roll out an Elasticsearch cluster, we’ll first create a Namespace into which we’ll install all of our logging instrumentation. Kubernetes lets you separate objects running in your cluster using a “virtual cluster” abstraction called Namespaces. In this guide, we’ll create a kube-logging namespace into which we’ll install the EFK stack components. This Namespace will also allow us to quickly clean up and remove the logging stack without any loss of function to the Kubernetes cluster.

      To begin, first investigate the existing Namespaces in your cluster using kubectl:

      kubectl get namespaces
      

      You should see the following three initial Namespaces, which come preinstalled with your Kubernetes cluster:

      Output

      • NAME STATUS AGE
      • default Active 5m
      • kube-system Active 5m
      • kube-public Active 5m

      The default Namespace houses objects that are created without specifying a Namespace. The kube-system Namespace contains objects created and used by the Kubernetes system, like kube-dns, kube-proxy, and kubernetes-dashboard. It’s good practice to keep this Namespace clean and not pollute it with your application and instrumentation workloads.

      The kube-public Namespace is another automatically created Namespace that can be used to store objects you’d like to be readable and accessible throughout the whole cluster, even to unauthenticated users.

      To create the kube-logging Namespace, first open and edit a file called kube-logging.yaml using your favorite editor, such as nano:

      Inside your editor, paste the following Namespace object YAML:

      kube-logging.yaml

      kind: Namespace
      apiVersion: v1
      metadata:
        name: kube-logging
      

      Then, save and close the file.

      Here, we specify the Kubernetes object's kind as a Namespace object. To learn more about Namespace objects, consult the Namespaces Walkthrough in the official Kubernetes documentation. We also specify the Kubernetes API version used to create the object (v1), and give it a name, kube-logging.

      Once you've created the kube-logging.yaml Namespace object file, create the Namespace using kubectl create with the -f filename flag:

      • kubectl create -f kube-logging.yaml

      You should see the following output:

      Output

      namespace/kube-logging created

      You can then confirm that the Namespace was successfully created:

      At this point, you should see the new kube-logging Namespace:

      Output

      NAME STATUS AGE default Active 23m kube-logging Active 1m kube-public Active 23m kube-system Active 23m

      We can now deploy an Elasticsearch cluster into this isolated logging Namespace.

      Step 2 — Creating the Elasticsearch StatefulSet

      Now that we've created a Namespace to house our logging stack, we can begin rolling out its various components. We'll first begin by deploying a 3-node Elasticsearch cluster.

      In this guide, we use 3 Elasticsearch Pods to avoid the "split-brain" issue that occurs in highly-available, multi-node clusters. At a high-level, "split-brain" is what arises when one or more nodes can't communicate with the others, and several "split" masters get elected. To learn more, consult “Avoiding split brain.”

      One key takeaway is that you should set the discover.zen.minimum_master_nodes Elasticsearch parameter to N/2 + 1 (rounding down in the case of fractional numbers), where N is the number of master-eligible nodes in your Elasticsearch cluster. For our 3-node cluster, this means that we'll set this value to 2. That way, if one node gets disconnected from the cluster temporarily, the other two nodes can elect a new master and the cluster can continue functioning while the last node attempts to rejoin. It's important to keep this parameter in mind when scaling your Elasticsearch cluster.

      Creating the Headless Service

      To start, we'll create a headless Kubernetes service called elasticsearch that will define a DNS domain for the 3 Pods. A headless service does not perform load balancing or have a static IP; to learn more about headless services, consult the official Kubernetes documentation.

      Open a file called elasticsearch_svc.yaml using your favorite editor:

      • nano elasticsearch_svc.yaml

      Paste in the following Kubernetes service YAML:

      elasticsearch_svc.yaml

      kind: Service
      apiVersion: v1
      metadata:
        name: elasticsearch
        namespace: kube-logging
        labels:
          app: elasticsearch
      spec:
        selector:
          app: elasticsearch
        clusterIP: None
        ports:
          - port: 9200
            name: rest
          - port: 9300
            name: inter-node
      

      Then, save and close the file.

      We define a Service called elasticsearch in the kube-logging Namespace, and give it the app: elasticsearch label. We then set the .spec.selector to app: elasticsearch so that the Service selects Pods with the app: elasticsearch label. When we associate our Elasticsearch StatefulSet with this Service, the Service will return DNS A records that point to Elasticsearch Pods with the app: elasticsearch label.

      We then set clusterIP: None, which renders the service headless. Finally, we define ports 9200 and 9300 which are used to interact with the REST API, and for inter-node communication, respectively.

      Create the service using kubectl:

      • kubectl create -f elasticsearch_svc.yaml

      You should see the following output:

      Output

      service/elasticsearch created

      Finally, double-check that the service was successfully created using kubectl get:

      kubectl get services --namespace=kube-logging
      

      You should see the following:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 26s

      Now that we've set up our headless service and a stable .elasticsearch.kube-logging.svc.cluster.local domain for our Pods, we can go ahead and create the StatefulSet.

      Creating the StatefulSet

      A Kubernetes StatefulSet allows you to assign a stable identity to Pods and grant them stable, persistent storage. Elasticsearch requires stable storage to persist data across Pod rescheduling and restarts. To learn more about the StatefulSet workload, consult the Statefulsets page from the Kubernetes docs.

      Open a file called elasticsearch_statefulset.yaml in your favorite editor:

      • nano elasticsearch_statefulset.yaml

      We will move through the StatefulSet object definition section by section, pasting blocks into this file.

      Begin by pasting in the following block:

      elasticsearch_statefulset.yaml

      apiVersion: apps/v1
      kind: StatefulSet
      metadata:
        name: es-cluster
        namespace: kube-logging
      spec:
        serviceName: elasticsearch
        replicas: 3
        selector:
          matchLabels:
            app: elasticsearch
        template:
          metadata:
            labels:
              app: elasticsearch
      

      In this block, we define a StatefulSet called es-cluster in the kube-logging namespace. We then associate it with our previously created elasticsearch Service using the serviceName field. This ensures that each Pod in the StatefulSet will be accessible using the following DNS address: es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local, where [0,1,2] corresponds to the Pod's assigned integer ordinal.

      We specify 3 replicas (Pods) and set the matchLabels selector to app: elasticseach, which we then mirror in the .spec.template.metadata section. The .spec.selector.matchLabels and .spec.template.metadata.labels fields must match.

      We can now move on to the object spec. Paste in the following block of YAML immediately below the preceding block:

      elasticsearch_statefulset.yaml

      . . .
          spec:
            containers:
            - name: elasticsearch
              image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3
              resources:
                  limits:
                    cpu: 1000m
                  requests:
                    cpu: 100m
              ports:
              - containerPort: 9200
                name: rest
                protocol: TCP
              - containerPort: 9300
                name: inter-node
                protocol: TCP
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
              env:
                - name: cluster.name
                  value: k8s-logs
                - name: node.name
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: discovery.zen.ping.unicast.hosts
                  value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
                - name: discovery.zen.minimum_master_nodes
                  value: "2"
                - name: ES_JAVA_OPTS
                  value: "-Xms512m -Xmx512m"
      

      Here we define the Pods in the StatefulSet. We name the containers elasticsearch and choose the docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3 Docker image. At this point, you may modify this image tag to correspond to your own internal Elasticsearch image, or a different version. Note that for the purposes of this guide, only Elasticsearch 6.4.3 has been tested.

      The -oss suffix ensures that we use the open-source version of Elasticsearch. If you'd like to use the default version containing X-Pack (which includes a free license), omit the -oss suffix. Note that you will have to modify the steps in this guide slightly to account for the added authentication provided by X-Pack.

      We then use the resources field to specify that the container needs at least 0.1 vCPU guaranteed to it, and can burst up to 1 vCPU (which limits the Pod's resource usage when performing an initial large ingest or dealing with a load spike). You should modify these values depending on your anticipated load and available resources. To learn more about resource requests and limits, consult the official Kubernetes Documentation.

      We then open and name ports 9200 and 9300 for REST API and inter-node communication, respectively. We specify a volumeMount called data that will mount the PersistentVolume named data to the container at the path /usr/share/elasticsearch/data. We will define the VolumeClaims for this StatefulSet in a later YAML block.

      Finally, we set some environment variables in the container:

      • cluster.name: The Elasticsearch cluster's name, which in this guide is k8s-logs.
      • node.name: The node's name, which we set to the .metadata.name field using valueFrom. This will resolve to es-cluster-[0,1,2], depending on the node's assigned ordinal.
      • discovery.zen.ping.unicast.hosts: This field sets the discovery method used to connect nodes to each other within an Elasticsearch cluster. We use unicast discovery, which specifies a static list of hosts for our cluster. In this guide, thanks to the headless service we configured earlier, our Pods have domains of the form es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local, so we set this variable accordingly. Using local namespace Kubernetes DNS resolution, we can shorten this to es-cluster-[0,1,2].elasticsearch. To learn more about Elasticsearch discovery, consult the official Elasticsearch documentation.
      • discovery.zen.minimum_master_nodes: We set this to (N/2) + 1, where N is the number of master-eligible nodes in our cluster. In this guide we have 3 Elasticsearch nodes, so we set this value to 2 (rounding down to the nearest integer). To learn more about this parameter, consult the official Elasticsearch documenation.
      • ES_JAVA_OPTS: Here we set this to -Xms512m -Xmx512m which tells the JVM to use a minimum and maximum heap size of 512 MB. You should tune these parameters depending on your cluster's resource availability and needs. To learn more, consult Setting the heap size.

      The next block we'll paste in looks as follows:

      elasticsearch_statefulset.yaml

      . . .
            initContainers:
            - name: fix-permissions
              image: busybox
              command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
              securityContext:
                privileged: true
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
            - name: increase-vm-max-map
              image: busybox
              command: ["sysctl", "-w", "vm.max_map_count=262144"]
              securityContext:
                privileged: true
            - name: increase-fd-ulimit
              image: busybox
              command: ["sh", "-c", "ulimit -n 65536"]
              securityContext:
                privileged: true
      

      In this block, we define several Init Containers that run before the main elasticsearch app container. These Init Containers each run to completion in the order they are defined. To learn more about Init Containers, consult the official Kubernetes Documentation.

      The first, named fix-permissions, runs a chown command to change the owner and group of the Elasticsearch data directory to 1000:1000, the Elasticsearch user's UID. By default Kubernetes mounts the data directory as root, which renders it inaccessible to Elasticsearch. To learn more about this step, consult Elasticsearch's “Notes for production use and defaults.”

      The second, named increase-vm-max-map, runs a command to increase the operating system's limits on mmap counts, which by default may be too low, resulting in out of memory errors. To learn more about this step, consult the official Elasticsearch documentation.

      The next Init Container to run is increase-fd-ulimit, which runs the ulimit command to increase the maximum number of open file descriptors. To learn more about this step, consult the “Notes for Production Use and Defaults” from the official Elasticsearch documentation.

      Note: The Elasticsearch Notes for Production Use also mentions disabling swapping for performance reasons. Depending on your Kubernetes installation or provider, swapping may already be disabled. To check this, exec into a running container and run cat /proc/swaps to list active swap devices. If you see nothing there, swap is disabled.

      Now that we've defined our main app container and the Init Containers that run before it to tune the container OS, we can add the final piece to our StatefulSet object definition file: the volumeClaimTemplates.

      Paste in the following volumeClaimTemplate block:

      elasticsearch_statefulset.yaml

      . . .
        volumeClaimTemplates:
        - metadata:
            name: data
            labels:
              app: elasticsearch
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: do-block-storage
            resources:
              requests:
                storage: 100Gi
      

      In this block, we define the StatefulSet's volumeClaimTemplates. Kubernetes will use this to create PersistentVolumes for the Pods. In the block above, we name it data (which is the name we refer to in the volumeMounts defined previously), and give it the same app: elasticsearch label as our StatefulSet.

      We then specify its access mode as ReadWriteOnce, which means that it can only be mounted as read-write by a single node. We define the storage class as do-block-storage in this guide since we use a DigitalOcean Kubernetes cluster for demonstration purposes. You should change this value depending on where you are running your Kubernetes cluster. To learn more, consult the Persistent Volume documentation.

      Finally, we specify that we'd like each PersistentVolume to be 100GiB in size. You should adjust this value depending on your production needs.

      The complete StatefulSet spec should look something like this:

      elasticsearch_statefulset.yaml

      apiVersion: apps/v1
      kind: StatefulSet
      metadata:
        name: es-cluster
        namespace: kube-logging
      spec:
        serviceName: elasticsearch
        replicas: 3
        selector:
          matchLabels:
            app: elasticsearch
        template:
          metadata:
            labels:
              app: elasticsearch
          spec:
            containers:
            - name: elasticsearch
              image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3
              resources:
                  limits:
                    cpu: 1000m
                  requests:
                    cpu: 100m
              ports:
              - containerPort: 9200
                name: rest
                protocol: TCP
              - containerPort: 9300
                name: inter-node
                protocol: TCP
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
              env:
                - name: cluster.name
                  value: k8s-logs
                - name: node.name
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: discovery.zen.ping.unicast.hosts
                  value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
                - name: discovery.zen.minimum_master_nodes
                  value: "2"
                - name: ES_JAVA_OPTS
                  value: "-Xms512m -Xmx512m"
            initContainers:
            - name: fix-permissions
              image: busybox
              command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
              securityContext:
                privileged: true
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
            - name: increase-vm-max-map
              image: busybox
              command: ["sysctl", "-w", "vm.max_map_count=262144"]
              securityContext:
                privileged: true
            - name: increase-fd-ulimit
              image: busybox
              command: ["sh", "-c", "ulimit -n 65536"]
              securityContext:
                privileged: true
        volumeClaimTemplates:
        - metadata:
            name: data
            labels:
              app: elasticsearch
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: do-block-storage
            resources:
              requests:
                storage: 100Gi
      

      Once you're satisfied with your Elasticsearch configuration, save and close the file.

      Now, deploy the StatefulSet using kubectl:

      • kubectl create -f elasticsearch_statefulset.yaml

      You should see the following output:

      Output

      statefulset.apps/es-cluster created

      You can monitor the StatefulSet as it is rolled out using kubectl rollout status:

      • kubectl rollout status sts/es-cluster --namespace=kube-logging

      You should see the following output as the cluster is rolled out:

      Output

      Waiting for 3 pods to be ready... Waiting for 2 pods to be ready... Waiting for 1 pods to be ready... partitioned roll out complete: 3 new pods have been updated...

      Once all the Pods have been deployed, you can check that your Elasticsearch cluster is functioning correctly by performing a request against the REST API.

      To do so, first forward the local port 9200 to the port 9200 on one of the Elasticsearch nodes (es-cluster-0) using kubectl port-forward:

      • kubectl port-forward es-cluster-0 9200:9200 --namespace=kube-logging

      Then, in a separate terminal window, perform a curl request against the REST API:

      • curl http://localhost:9200/_cluster/state?pretty

      You shoulds see the following output:

      Output

      { "cluster_name" : "k8s-logs", "compressed_size_in_bytes" : 348, "cluster_uuid" : "QD06dK7CQgids-GQZooNVw", "version" : 3, "state_uuid" : "mjNIWXAzQVuxNNOQ7xR-qg", "master_node" : "IdM5B7cUQWqFgIHXBp0JDg", "blocks" : { }, "nodes" : { "u7DoTpMmSCixOoictzHItA" : { "name" : "es-cluster-1", "ephemeral_id" : "ZlBflnXKRMC4RvEACHIVdg", "transport_address" : "10.244.8.2:9300", "attributes" : { } }, "IdM5B7cUQWqFgIHXBp0JDg" : { "name" : "es-cluster-0", "ephemeral_id" : "JTk1FDdFQuWbSFAtBxdxAQ", "transport_address" : "10.244.44.3:9300", "attributes" : { } }, "R8E7xcSUSbGbgrhAdyAKmQ" : { "name" : "es-cluster-2", "ephemeral_id" : "9wv6ke71Qqy9vk2LgJTqaA", "transport_address" : "10.244.40.4:9300", "attributes" : { } } }, ...

      This indicates that our Elasticsearch cluster k8s-logs has successfully been created with 3 nodes: es-cluster-0, es-cluster-1, and es-cluster-2. The current master node is es-cluster-0.

      Now that your Elasticsearch cluster is up and running, you can move on to setting up a Kibana frontend for it.

      Step 3 — Creating the Kibana Deployment and Service

      To launch Kibana on Kubernetes, we'll create a Service called kibana, and a Deployment consisting of one Pod replica. You can scale the number of replicas depending on your production needs, and optionally specify a LoadBalancer type for the Service to load balance requests across the Deployment pods.

      This time, we'll create the Service and Deployment in the same file. Open up a file called kibana.yaml in your favorite editor:

      Paste in the following service spec:

      kibana.yaml

      apiVersion: v1
      kind: Service
      metadata:
        name: kibana
        namespace: kube-logging
        labels:
          app: kibana
      spec:
        ports:
        - port: 5601
        selector:
          app: kibana
      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: kibana
        namespace: kube-logging
        labels:
          app: kibana
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: kibana
        template:
          metadata:
            labels:
              app: kibana
          spec:
            containers:
            - name: kibana
              image: docker.elastic.co/kibana/kibana-oss:6.4.3
              resources:
                limits:
                  cpu: 1000m
                requests:
                  cpu: 100m
              env:
                - name: ELASTICSEARCH_URL
                  value: http://elasticsearch:9200
              ports:
              - containerPort: 5601
      

      Then, save and close the file.

      In this spec we've defined a service called kibana in the kube-logging namespace, and gave it the app: kibana label.

      We've also specified that it should be accessible on port 5601 and use the app: kibana label to select the Service's target Pods.

      In the Deployment spec, we define a Deployment called kibana and specify that we'd like 1 Pod replica.

      We use the docker.elastic.co/kibana/kibana-oss:6.4.3 image. At this point you may substitute your own private or public Kibana image to use. We once again use the -oss suffix to specify that we'd like the open-source version.

      We specify that we'd like at the very least 0.1 vCPU guaranteed to the Pod, bursting up to a limit of 1 vCPU. You may change these parameters depending on your anticipated load and available resources.

      Next, we use the ELASTICSEARCH_URL environment variable to set the endpoint and port for the Elasticsearch cluster. Using Kubernetes DNS, this endpoint corresponds to its Service name elasticsearch. This domain will resolve to a list of IP addresses for the 3 Elasticsearch Pods. To learn more about Kubernetes DNS, consult DNS for Services and Pods.

      Finally, we set Kibana's container port to 5601, to which the kibana Service will forward requests.

      Once you're satisfied with your Kibana configuration, you can roll out the Service and Deployment using kubectl:

      • kubectl create -f kibana.yaml

      You should see the following output:

      Output

      service/kibana created deployment.apps/kibana created

      You can check that the rollout succeeded by running the following command:

      • kubectl rollout status deployment/kibana --namespace=kube-logging

      You should see the following output:

      Output

      deployment "kibana" successfully rolled out

      To access the Kibana interface, we'll once again forward a local port to the Kubernetes node running Kibana. Grab the Kibana Pod details using kubectl get:

      • kubectl get pods --namespace=kube-logging

      Output

      NAME READY STATUS RESTARTS AGE es-cluster-0 1/1 Running 0 55m es-cluster-1 1/1 Running 0 54m es-cluster-2 1/1 Running 0 54m kibana-6c9fb4b5b7-plbg2 1/1 Running 0 4m27s

      Here we observe that our Kibana Pod is called kibana-6c9fb4b5b7-plbg2.

      Forward the local port 5601 to port 5601 on this Pod:

      • kubectl port-forward kibana-6c9fb4b5b7-plbg2 5601:5601 --namespace=kube-logging

      You should see the following output:

      Output

      Forwarding from 127.0.0.1:5601 -> 5601 Forwarding from [::1]:5601 -> 5601

      Now, in your web browser, visit the following URL:

      http://localhost:5601
      

      If you see the following Kibana welcome page, you've successfully deployed Kibana into your Kubernetes cluster:

      Kibana Welcome Screen

      You can now move on to rolling out the final component of the EFK stack: the log collector, Fluentd.

      Step 4 — Creating the Fluentd DaemonSet

      In this guide, we'll set up Fluentd as a DaemonSet, which is a Kubernetes workload type that runs a copy of a given Pod on each Node in the Kubernetes cluster. Using this DaemonSet controller, we'll roll out a Fluentd logging agent Pod on every node in our cluster. To learn more about this logging architecture, consult “Using a node logging agent” from the official Kubernetes docs.

      In Kubernetes, containerized applications that log to stdout and stderr have their log streams captured and redirected to JSON files on the nodes. The Fluentd Pod will tail these log files, filter log events, transform the log data, and ship it off to the Elasticsearch logging backend we deployed in Step 2.

      In addition to container logs, the Fluentd agent will tail Kubernetes system component logs like kubelet, kube-proxy, and Docker logs. To see a full list of sources tailed by the Fluentd logging agent, consult the kubernetes.conf file used to configure the logging agent. To learn more about logging in Kubernetes clusters, consult “Logging at the node level” from the official Kubernetes documentation.

      Begin by opening a file called fluentd.yaml in your favorite text editor:

      Once again, we'll paste in the Kubernetes object definitions block by block, providing context as we go along. In this guide, we use the Fluentd DaemonSet spec provided by the Fluentd maintainers. Another helpful resource provided by the Fluentd maintainers is Kubernetes Logging with Fluentd.

      First, paste in the following ServiceAccount definition:

      fluentd.yaml

      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      

      Here, we create a Service Account called fluentd that the Fluentd Pods will use to access the Kubernetes API. We create it in the kube-logging Namespace and once again give it the label app: fluentd. To learn more about Service Accounts in Kubernetes, consult Configure Service Accounts for Pods in the official Kubernetes docs.

      Next, paste in the following ClusterRole block:

      fluentd.yaml

      . . .
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: fluentd
        labels:
          app: fluentd
      rules:
      - apiGroups:
        - ""
        resources:
        - pods
        - namespaces
        verbs:
        - get
        - list
        - watch
      

      Here we define a ClusterRole called fluentd to which we grant the get, list, and watch permissions on the pods and namespaces objects. ClusterRoles allow you to grant access to cluster-scoped Kubernetes resources like Nodes. To learn more about Role-Based Access Control and Cluster Roles, consult Using RBAC Authorization from the official Kubernetes documentation.

      Now, paste in the following ClusterRoleBinding block:

      fluentd.yaml

      . . .
      ---
      kind: ClusterRoleBinding
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        name: fluentd
      roleRef:
        kind: ClusterRole
        name: fluentd
        apiGroup: rbac.authorization.k8s.io
      subjects:
      - kind: ServiceAccount
        name: fluentd
        namespace: kube-logging
      

      In this block, we define a ClusterRoleBinding called fluentd which binds the fluentd ClusterRole to the fluentd Service Account. This grants the fluentd ServiceAccount the permissions listed in the fluentd Cluster Role.

      At this point we can begin pasting in the actual DaemonSet spec:

      fluentd.yaml

      . . .
      ---
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      

      Here, we define a DaemonSet called fluentd in the kube-logging Namespace and give it the app: fluentd label.

      Next, paste in the following section:

      fluentd.yaml

      . . .
      spec:
        selector:
          matchLabels:
            app: fluentd
        template:
          metadata:
            labels:
              app: fluentd
          spec:
            serviceAccount: fluentd
            serviceAccountName: fluentd
            tolerations:
            - key: node-role.kubernetes.io/master
              effect: NoSchedule
            containers:
            - name: fluentd
              image: fluent/fluentd-kubernetes-daemonset:v0.12-debian-elasticsearch
              env:
                - name:  FLUENT_ELASTICSEARCH_HOST
                  value: "elasticsearch.kube-logging.svc.cluster.local"
                - name:  FLUENT_ELASTICSEARCH_PORT
                  value: "9200"
                - name: FLUENT_ELASTICSEARCH_SCHEME
                  value: "http"
                - name: FLUENT_UID
                  value: "0"
      

      Here, we match the app: fluentd label defined in .metadata.labels and then assign the DaemonSet the fluentd Service Account. We also select the app: fluentd as the Pods managed by this DaemonSet.

      Next, we define a NoSchedule toleration to match the equivalent taint on Kubernetes master nodes. This will ensure that the DaemonSet also gets rolled out to the Kubernetes masters. If you don't want to run a Fluentd Pod on your master nodes, remove this toleration. To learn more about Kubernetes taints and tolerations, consult “Taints and Tolerations” from the official Kubernetes docs.

      Next, we begin defining the Pod container, which we call fluentd.

      We use the official v0.12 Debian image provided by the Fluentd maintainers. If you'd like to use your own private or public Fluentd image, or use a different image version, modify the image tag in the container spec. The Dockerfile and contents of this image are available in Fluentd's fluentd-kubernetes-daemonset Github repo.

      Next, we configure Fluentd using some environment variables:

      • FLUENT_ELASTICSEARCH_HOST: We set this to the Elasticsearch headless Service address defined earlier: elasticsearch.kube-logging.svc.cluster.local. This will resolve to a list of IP addresses for the 3 Elasticsearch Pods. The actual Elasticsearch host will most likely be the first IP address returned in this list. To distribute logs across the cluster, you will need to modify the configuration for Fluentd’s Elasticsearch Output plugin. To learn more about this plugin, consult Elasticsearch Output Plugin.
      • FLUENT_ELASTICSEARCH_PORT: We set this to the Elasticsearch port we configured earlier, 9200.
      • FLUENT_ELASTICSEARCH_SCHEME: We set this to http.
      • FLUENT_UID: We set this to 0 (superuser) so that Fluentd can access the files in /var/log.

      Finally, paste in the following section:

      fluentd.yaml

      . . .
              resources:
                limits:
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 200Mi
              volumeMounts:
              - name: varlog
                mountPath: /var/log
              - name: varlibdockercontainers
                mountPath: /var/lib/docker/containers
                readOnly: true
            terminationGracePeriodSeconds: 30
            volumes:
            - name: varlog
              hostPath:
                path: /var/log
            - name: varlibdockercontainers
              hostPath:
                path: /var/lib/docker/containers
      

      Here we specify a 512 MiB memory limit on the FluentD Pod, and guarantee it 0.1vCPU and 200MiB of memory. You can tune these resource limits and requests depending on your anticipated log volume and available resources.

      Next, we mount the /var/log and /var/lib/docker/containers host paths into the container using the varlog and varlibdockercontainers volumeMounts. These volumes are defined at the end of the block.

      The final parameter we define in this block is terminationGracePeriodSeconds, which gives Fluentd 30 seconds to shut down gracefully upon receiving a SIGTERM signal. After 30 seconds, the containers are sent a SIGKILL signal. The default value for terminationGracePeriodSeconds is 30s, so in most cases this parameter can be omitted. To learn more about gracefully terminating Kubernetes workloads, consult Google's “Kubernetes best practices: terminating with grace.”

      The entire Fluentd spec should look something like this:

      fluentd.yaml

      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: fluentd
        labels:
          app: fluentd
      rules:
      - apiGroups:
        - ""
        resources:
        - pods
        - namespaces
        verbs:
        - get
        - list
        - watch
      ---
      kind: ClusterRoleBinding
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        name: fluentd
      roleRef:
        kind: ClusterRole
        name: fluentd
        apiGroup: rbac.authorization.k8s.io
      subjects:
      - kind: ServiceAccount
        name: fluentd
        namespace: kube-logging
      ---
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      spec:
        selector:
          matchLabels:
            app: fluentd
        template:
          metadata:
            labels:
              app: fluentd
          spec:
            serviceAccount: fluentd
            serviceAccountName: fluentd
            tolerations:
            - key: node-role.kubernetes.io/master
              effect: NoSchedule
            containers:
            - name: fluentd
              image: fluent/fluentd-kubernetes-daemonset:v0.12-debian-elasticsearch
              env:
                - name:  FLUENT_ELASTICSEARCH_HOST
                  value: "elasticsearch.kube-logging.svc.cluster.local"
                - name:  FLUENT_ELASTICSEARCH_PORT
                  value: "9200"
                - name: FLUENT_ELASTICSEARCH_SCHEME
                  value: "http"
                - name: FLUENT_UID
                  value: "0"
              resources:
                limits:
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 200Mi
              volumeMounts:
              - name: varlog
                mountPath: /var/log
              - name: varlibdockercontainers
                mountPath: /var/lib/docker/containers
                readOnly: true
            terminationGracePeriodSeconds: 30
            volumes:
            - name: varlog
              hostPath:
                path: /var/log
            - name: varlibdockercontainers
              hostPath:
                path: /var/lib/docker/containers
      

      Once you've finished configuring the Fluentd DaemonSet, save and close the file.

      Now, roll out the DaemonSet using kubectl:

      • kubectl create -f fluentd.yaml

      You should see the following output:

      Output

      serviceaccount/fluentd created clusterrole.rbac.authorization.k8s.io/fluentd created clusterrolebinding.rbac.authorization.k8s.io/fluentd created daemonset.extensions/fluentd created

      Verify that your DaemonSet rolled out successfully using kubectl:

      • kubectl get ds --namespace=kube-logging

      You should see the following status output:

      Output

      NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE fluentd 3 3 3 3 3 <none> 58s

      This indicates that there are 3 fluentd Pods running, which corresponds to the number of nodes in our Kubernetes cluster.

      We can now check Kibana to verify that log data is being properly collected and shipped to Elasticsearch.

      With the kubectl port-forward still open, navigate to http://localhost:5601.

      Click on Discover in the left-hand navigation menu.

      You should see the following configuration window:

      Kibana Index Pattern Configuration

      This allows you to define the Elasticsearch indices you'd like to explore in Kibana. To learn more, consult Defining your index patterns in the official Kibana docs. For now, we'll just use the logstash-* wildcard pattern to capture all the log data in our Elasticsearch cluster. Enter logstash-* in the text box and click on Next step.

      You'll then be brought to the following page:

      Kibana Index Pattern Settings

      This allows you to configure which field Kibana will use to filter log data by time. In the dropdown, select the @timestamp field, and hit Create index pattern.

      Now, hit Discover in the left hand navigation menu.

      You should see a histogram graph and some recent log entries:

      Kibana Incoming Logs

      At this point you've successfully configured and rolled out the EFK stack on your Kubernetes cluster. To learn how to use Kibana to analyze your log data, consult the Kibana User Guide.

      In the next optional section, we'll deploy a simple counter Pod that prints numbers to stdout, and find its logs in Kibana.

      Step 5 (Optional) — Testing Container Logging

      To demonstrate a basic Kibana use case of exploring the latest logs for a given Pod, we'll deploy a minimal counter Pod that prints sequential numbers to stdout.

      Let's begin by creating the Pod. Open up a file called counter.yaml in your favorite editor:

      Then, paste in the following Pod spec:

      counter.yaml

      apiVersion: v1
      kind: Pod
      metadata:
        name: counter
      spec:
        containers:
        - name: count
          image: busybox
          args: [/bin/sh, -c,
                  'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
      

      Save and close the file.

      This is a minimal Pod called counter that runs a while loop, printing numbers sequentially.

      Deploy the counter Pod using kubectl:

      • kubectl create -f counter.yaml

      Once the Pod has been created and is running, navigate back to your Kibana dashboard.

      From the Discover page, in the search bar enter kubernetes.pod_name:counter. This filters the log data for Pods named counter.

      You should then see a list of log entries for the counter Pod:

      Counter Logs in Kibana

      You can click into any of the log entries to see additional metadata like the container name, Kubernetes node, Namespace, and more.

      Conclusion

      In this guide we've demonstrated how to set up and configure Elasticsearch, Fluentd, and Kibana on a Kubernetes cluster. We've used a minimal logging architecture that consists of a single logging agent Pod running on each Kubernetes worker node.

      Before deploying this logging stack into your production Kubernetes cluster, it’s best to tune the resource requirements and limits as indicated throughout this guide. You may also want to use the X-Pack enabled image with built-in monitoring and security.

      The logging architecture we’ve used here consists of 3 Elasticsearch Pods, a single Kibana Pod (not load-balanced), and a set of Fluentd Pods rolled out as a DaemonSet. You may wish to scale this setup depending on your production use case. To learn more about scaling your Elasticsearch and Kibana stack, consult Scaling Elasticsearch.

      Kubernetes also allows for more complex logging agent architectures that may better suit your use case. To learn more, consult Logging Architecture from the Kubernetes docs.



      Source link

      How To Install Elasticsearch, Logstash, and Kibana (Elastic Stack) on Ubuntu 18.04


      The author selected the Internet Archive to receive a donation as part of the Write for DOnations program.

      Introduction

      The Elastic Stack — formerly known as the ELK Stack — is a collection of open-source software produced by Elastic which allows you to search, analyze, and visualize logs generated from any source in any format, a practice known as centralized logging. Centralized logging can be very useful when attempting to identify problems with your servers or applications, as it allows you to search through all of your logs in a single place. It’s also useful because it allows you to identify issues that span multiple servers by correlating their logs during a specific time frame.

      The Elastic Stack has four main components:

      • Elasticsearch: a distributed RESTful search engine which stores all of the collected data.
      • Logstash: the data processing component of the Elastic Stack which sends incoming data to Elasticsearch.
      • Kibana: a web interface for searching and visualizing logs.
      • Beats: lightweight, single-purpose data shippers that can send data from hundreds or thousands of machines to either Logstash or Elasticsearch.

      In this tutorial, you will install the Elastic Stack on an Ubuntu 18.04 server. You will learn how to install all of the components of the Elastic Stack — including Filebeat, a Beat used for forwarding and centralizing logs and files — and configure them to gather and visualize system logs. Additionally, because Kibana is normally only available on the localhost, we will use Nginx to proxy it so it will be accessible over a web browser. We will install all of these components on a single server, which we will refer to as our Elastic Stack server.

      Note: When installing the Elastic Stack, you must use the same version across the entire stack. In this tutorial we will install the latest versions of the entire stack which are, at the time of this writing, Elasticsearch 6.4.3, Kibana 6.4.3, Logstash 6.4.3, and Filebeat 6.4.3.

      Prerequisites

      To complete this tutorial, you will need the following:

      • An Ubuntu 18.04 server set up by following our Initial Server Setup Guide for Ubuntu 18.04, including a non-root user with sudo privileges and a firewall configured with ufw. The amount of CPU, RAM, and storage that your Elastic Stack server will require depends on the volume of logs that you intend to gather. For this tutorial, we will be using a VPS with the following specifications for our Elastic Stack server:

        • OS: Ubuntu 18.04
        • RAM: 4GB
        • CPU: 2
      • Java 8 — which is required by Elasticsearch and Logstash — installed on your server. Note that Java 9 is not supported. To install this, follow the “Installing the Oracle JDK” section of our guide on how to install Java 8 on Ubuntu 18.04.

      • Nginx installed on your server, which we will configure later in this guide as a reverse proxy for Kibana. Follow our guide on How to Install Nginx on Ubuntu 18.04 to set this up.

      Additionally, because the Elastic Stack is used to access valuable information about your server that you would not want unauthorized users to access, it’s important that you keep your server secure by installing a TLS/SSL certificate. This is optional but strongly encouraged.

      However, because you will ultimately make changes to your Nginx server block over the course of this guide, it would likely make more sense for you to complete the Let’s Encrypt on Ubuntu 18.04 guide at the end of this tutorial’s second step. With that in mind, if you plan to configure Let’s Encrypt on your server, you will need the following in place before doing so:

      Step 1 — Installing and Configuring Elasticsearch

      The Elastic Stack components are not available in Ubuntu’s default package repositories. They can, however, be installed with APT after adding Elastic’s package source list.

      All of the Elastic Stack’s packages are signed with the Elasticsearch signing key in order to protect your system from package spoofing. Packages which have been authenticated using the key will be considered trusted by your package manager. In this step, you will import the Elasticsearch public GPG key and add the Elastic package source list in order to install Elasticsearch.

      To begin, run the following command to import the Elasticsearch public GPG key into APT:

      • wget -qO - https://artifacts.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -

      Next, add the Elastic source list to the sources.list.d directory, where APT will look for new sources:

      • echo "deb https://artifacts.elastic.co/packages/6.x/apt stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

      Next, update your package lists so APT will read the new Elastic source:

      Then install Elasticsearch with this command:

      • sudo apt install elasticsearch

      Once Elasticsearch is finished installing, use your preferred text editor to edit Elasticsearch's main configuration file, elasticsearch.yml. Here, we'll use nano:

      • sudo nano /etc/elasticsearch/elasticsearch.yml

      Note: Elasticsearch's configuration file is in YAML format, which means that indentation is very important! Be sure that you do not add any extra spaces as you edit this file.

      Elasticsearch listens for traffic from everywhere on port 9200. You will want to restrict outside access to your Elasticsearch instance to prevent outsiders from reading your data or shutting down your Elasticsearch cluster through the REST API. Find the line that specifies network.host, uncomment it, and replace its value with localhost so it looks like this:

      /etc/elasticsearch/elasticsearch.yml

      . . .
      network.host: localhost
      . . .
      

      Save and close elasticsearch.yml by pressing CTRL+X, followed by Y and then ENTER if you're using nano. Then, start the Elasticsearch service with systemctl:

      • sudo systemctl start elasticsearch

      Next, run the following command to enable Elasticsearch to start up every time your server boots:

      • sudo systemctl enable elasticsearch

      You can test whether your Elasticsearch service is running by sending an HTTP request:

      • curl -X GET "localhost:9200"

      You will see a response showing some basic information about your local node, similar to this:

      Output

      { "name" : "ZlJ0k2h", "cluster_name" : "elasticsearch", "cluster_uuid" : "beJf9oPSTbecP7_i8pRVCw", "version" : { "number" : "6.4.2", "build_flavor" : "default", "build_type" : "deb", "build_hash" : "04711c2", "build_date" : "2018-09-26T13:34:09.098244Z", "build_snapshot" : false, "lucene_version" : "7.4.0", "minimum_wire_compatibility_version" : "5.6.0", "minimum_index_compatibility_version" : "5.0.0" }, "tagline" : "You Know, for Search" }

      Now that Elasticsearch is up and running, let's install Kibana, the next component of the Elastic Stack.

      Step 2 — Installing and Configuring the Kibana Dashboard

      According to the official documentation, you should install Kibana only after installing Elasticsearch. Installing in this order ensures that the components each product depends on are correctly in place.

      Because you've already added the Elastic package source in the previous step, you can just install the remaining components of the Elastic Stack using apt:

      Then enable and start the Kibana service:

      • sudo systemctl enable kibana
      • sudo systemctl start kibana

      Because Kibana is configured to only listen on localhost, we must set up a reverse proxy to allow external access to it. We will use Nginx for this purpose, which should already be installed on your server.

      First, use the openssl command to create an administrative Kibana user which you'll use to access the Kibana web interface. As an example we will name this account kibanaadmin, but to ensure greater security we recommend that you choose a non-standard name for your user that would be difficult to guess.

      The following command will create the administrative Kibana user and password, and store them in the htpasswd.users file. You will configure Nginx to require this username and password and read this file momentarily:

      • echo "kibanaadmin:`openssl passwd -apr1`" | sudo tee -a /etc/nginx/htpasswd.users

      Enter and confirm a password at the prompt. Remember or take note of this login, as you will need it to access the Kibana web interface.

      Next, we will create an Nginx server block file. As an example, we will refer to this file as example.com, although you may find it helpful to give yours a more descriptive name. For instance, if you have a FQDN and DNS records set up for this server, you could name this file after your FQDN:

      • sudo nano /etc/nginx/sites-available/example.com

      Add the following code block into the file, being sure to update example.com to match your server's FQDN or public IP address. This code configures Nginx to direct your server's HTTP traffic to the Kibana application, which is listening on localhost:5601. Additionally, it configures Nginx to read the htpasswd.users file and require basic authentication.

      Note that if you followed the prerequisite Nginx tutorial through to the end, you may have already created this file and populated it with some content. In that case, delete all the existing content in the file before adding the following:

      example.com'>/etc/nginx/sites-available/example.com

      server {
          listen 80;
      
          server_name example.com;
      
          auth_basic "Restricted Access";
          auth_basic_user_file /etc/nginx/htpasswd.users;
      
          location / {
              proxy_pass http://localhost:5601;
              proxy_http_version 1.1;
              proxy_set_header Upgrade $http_upgrade;
              proxy_set_header Connection 'upgrade';
              proxy_set_header Host $host;
              proxy_cache_bypass $http_upgrade;
          }
      }
      

      When you're finished, save and close the file.

      Next, enable the new configuration by creating a symbolic link to the sites-enabled directory. If you already created a server block file with the same name in the Nginx prerequisite, you do not need to run this command:

      • sudo ln -s /etc/nginx/sites-available/example.com /etc/nginx/sites-enabled/example.com

      Then check the configuration for syntax errors:

      If any errors are reported in your output, go back and double check that the content you placed in your configuration file was added correctly. Once you see syntax is ok in the output, go ahead and restart the Nginx service:

      • sudo systemctl restart nginx

      If you followed the initial server setup guide, you should have a UFW firewall enabled. To allow connections to Nginx, we can adjust the rules by typing:

      • sudo ufw allow 'Nginx Full'

      Note: If you followed the prerequisite Nginx tutorial, you may have created a UFW rule allowing the Nginx HTTP profile through the firewall. Because the Nginx Full profile allows both HTTP and HTTPS traffic through the firewall, you can safely delete the rule you created in the prerequisite tutorial. Do so with the following command:

      • sudo ufw delete allow 'Nginx HTTP'

      Kibana is now accessible via your FQDN or the public IP address of your Elastic Stack server. You can check the Kibana server's status page by navigating to the following address and entering your login credentials when prompted:

      http://your_server_ip/status
      

      This status page displays information about the server’s resource usage and lists the installed plugins.

      |Kibana status page

      Note: As mentioned in the Prerequisites section, it is recommended that you enable SSL/TLS on your server. You can follow this tutorial now to obtain a free SSL certificate for Nginx on Ubuntu 18.04. After obtaining your SSL/TLS certificates, you can come back and complete this tutorial.

      Now that the Kibana dashboard is configured, let's install the next component: Logstash.

      Step 3 — Installing and Configuring Logstash

      Although it's possible for Beats to send data directly to the Elasticsearch database, we recommend using Logstash to process the data. This will allow you to collect data from different sources, transform it into a common format, and export it to another database.

      Install Logstash with this command:

      • sudo apt install logstash

      After installing Logstash, you can move on to configuring it. Logstash's configuration files are written in the JSON format and reside in the /etc/logstash/conf.d directory. As you configure it, it's helpful to think of Logstash as a pipeline which takes in data at one end, processes it in one way or another, and sends it out to its destination (in this case, the destination being Elasticsearch). A Logstash pipeline has two required elements, input and output, and one optional element, filter. The input plugins consume data from a source, the filter plugins process the data, and the output plugins write the data to a destination.

      Logstash pipeline

      Create a configuration file called 02-beats-input.conf where you will set up your Filebeat input:

      • sudo nano /etc/logstash/conf.d/02-beats-input.conf

      Insert the following input configuration. This specifies a beats input that will listen on TCP port 5044.

      /etc/logstash/conf.d/02-beats-input.conf

      input {
        beats {
          port => 5044
        }
      }
      

      Save and close the file. Next, create a configuration file called 10-syslog-filter.conf, where we will add a filter for system logs, also known as syslogs:

      • sudo nano /etc/logstash/conf.d/10-syslog-filter.conf

      Insert the following syslog filter configuration. This example system logs configuration was taken from official Elastic documentation. This filter is used to parse incoming system logs to make them structured and usable by the predefined Kibana dashboards:

      /etc/logstash/conf.d/10-syslog-filter.conf

      filter {
        if [fileset][module] == "system" {
          if [fileset][name] == "auth" {
            grok {
              match => { "message" => ["%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sshd(?:[%{POSINT:[system][auth][pid]}])?: %{DATA:[system][auth][ssh][event]} %{DATA:[system][auth][ssh][method]} for (invalid user )?%{DATA:[system][auth][user]} from %{IPORHOST:[system][auth][ssh][ip]} port %{NUMBER:[system][auth][ssh][port]} ssh2(: %{GREEDYDATA:[system][auth][ssh][signature]})?",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sshd(?:[%{POSINT:[system][auth][pid]}])?: %{DATA:[system][auth][ssh][event]} user %{DATA:[system][auth][user]} from %{IPORHOST:[system][auth][ssh][ip]}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sshd(?:[%{POSINT:[system][auth][pid]}])?: Did not receive identification string from %{IPORHOST:[system][auth][ssh][dropped_ip]}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} sudo(?:[%{POSINT:[system][auth][pid]}])?: s*%{DATA:[system][auth][user]} 🙁 %{DATA:[system][auth][sudo][error]} ;)? TTY=%{DATA:[system][auth][sudo][tty]} ; PWD=%{DATA:[system][auth][sudo][pwd]} ; USER=%{DATA:[system][auth][sudo][user]} ; COMMAND=%{GREEDYDATA:[system][auth][sudo][command]}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} groupadd(?:[%{POSINT:[system][auth][pid]}])?: new group: name=%{DATA:system.auth.groupadd.name}, GID=%{NUMBER:system.auth.groupadd.gid}",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} useradd(?:[%{POSINT:[system][auth][pid]}])?: new user: name=%{DATA:[system][auth][user][add][name]}, UID=%{NUMBER:[system][auth][user][add][uid]}, GID=%{NUMBER:[system][auth][user][add][gid]}, home=%{DATA:[system][auth][user][add][home]}, shell=%{DATA:[system][auth][user][add][shell]}$",
                        "%{SYSLOGTIMESTAMP:[system][auth][timestamp]} %{SYSLOGHOST:[system][auth][hostname]} %{DATA:[system][auth][program]}(?:[%{POSINT:[system][auth][pid]}])?: %{GREEDYMULTILINE:[system][auth][message]}"] }
              pattern_definitions => {
                "GREEDYMULTILINE"=> "(.|n)*"
              }
              remove_field => "message"
            }
            date {
              match => [ "[system][auth][timestamp]", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
            }
            geoip {
              source => "[system][auth][ssh][ip]"
              target => "[system][auth][ssh][geoip]"
            }
          }
          else if [fileset][name] == "syslog" {
            grok {
              match => { "message" => ["%{SYSLOGTIMESTAMP:[system][syslog][timestamp]} %{SYSLOGHOST:[system][syslog][hostname]} %{DATA:[system][syslog][program]}(?:[%{POSINT:[system][syslog][pid]}])?: %{GREEDYMULTILINE:[system][syslog][message]}"] }
              pattern_definitions => { "GREEDYMULTILINE" => "(.|n)*" }
              remove_field => "message"
            }
            date {
              match => [ "[system][syslog][timestamp]", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
            }
          }
        }
      }
      

      Save and close the file when finished.

      Lastly, create a configuration file called 30-elasticsearch-output.conf:

      • sudo nano /etc/logstash/conf.d/30-elasticsearch-output.conf

      Insert the following output configuration. Essentially, this output configures Logstash to store the Beats data in Elasticsearch, which is running at localhost:9200, in an index named after the Beat used. The Beat used in this tutorial is Filebeat:

      /etc/logstash/conf.d/30-elasticsearch-output.conf

      output {
        elasticsearch {
          hosts => ["localhost:9200"]
          manage_template => false
          index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
        }
      }
      

      Save and close the file.

      If you want to add filters for other applications that use the Filebeat input, be sure to name the files so they're sorted between the input and the output configuration, meaning that the file names should begin with a two-digit number between 02 and 30.

      Test your Logstash configuration with this command:

      • sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t

      If there are no syntax errors, your output will display Configruation OK after a few seconds. If you don't see this in your output, check for any errors that appear in your output and update your configuration to correct them.

      If your configuration test is successful, start and enable Logstash to put the configuration changes into effect:

      • sudo systemctl start logstash
      • sudo systemctl enable logstash

      Now that Logstash is running correctly and is fully configured, let's install Filebeat.

      Step 4 — Installing and Configuring Filebeat

      The Elastic Stack uses several lightweight data shippers called Beats to collect data from various sources and transport them to Logstash or Elasticsearch. Here are the Beats that are currently available from Elastic:

      • Filebeat: collects and ships log files.
      • Metricbeat: collects metrics from your systems and services.
      • Packetbeat: collects and analyzes network data.
      • Winlogbeat: collects Windows event logs.
      • Auditbeat: collects Linux audit framework data and monitors file integrity.
      • Heartbeat: monitors services for their availability with active probing.

      In this tutorial we will use Filebeat to forward local logs to our Elastic Stack.

      Install Filebeat using apt:

      • sudo apt install filebeat

      Next, configure Filebeat to connect to Logstash. Here, we will modify the example configuration file that comes with Filebeat.

      Open the Filebeat configuration file:

      • sudo nano /etc/filebeat/filebeat.yml

      Note: As with Elasticsearch, Filebeat's configuration file is in YAML format. This means that proper indentation is crucial, so be sure to use the same number of spaces that are indicated in these instructions.

      Filebeat supports numerous outputs, but you’ll usually only send events directly to Elasticsearch or to Logstash for additional processing. In this tutorial, we'll use Logstash to perform additional processing on the data collected by Filebeat. Filebeat will not need to send any data directly to Elasticsearch, so let's disable that output. To do so, find the output.elasticsearch section and comment out the following lines by preceding them with a #:

      /etc/filebeat/filebeat.yml

      ...
      #output.elasticsearch:
        # Array of hosts to connect to.
        #hosts: ["localhost:9200"]
      ...
      

      Then, configure the output.logstash section. Uncomment the lines output.logstash: and hosts: ["localhost:5044"] by removing the #. This will configure Filebeat to connect to Logstash on your Elastic Stack server at port 5044, the port for which we specified a Logstash input earlier:

      /etc/filebeat/filebeat.yml

      output.logstash:
        # The Logstash hosts
        hosts: ["localhost:5044"]
      

      Save and close the file.

      The functionality of Filebeat can be extended with Filebeat modules. In this tutorial we will use the system module, which collects and parses logs created by the system logging service of common Linux distributions.

      Let's enable it:

      • sudo filebeat modules enable system

      You can see a list of enabled and disabled modules by running:

      • sudo filebeat modules list

      You will see a list similar to the following:

      Output

      Enabled: system Disabled: apache2 auditd elasticsearch icinga iis kafka kibana logstash mongodb mysql nginx osquery postgresql redis traefik

      By default, Filebeat is configured to use default paths for the syslog and authorization logs. In the case of this tutorial, you do not need to change anything in the configuration. You can see the parameters of the module in the /etc/filebeat/modules.d/system.yml configuration file.

      Next, load the index template into Elasticsearch. An Elasticsearch index is a collection of documents that have similar characteristics. Indexes are identified with a name, which is used to refer to the index when performing various operations within it. The index template will be automatically applied when a new index is created.

      To load the template, use the following command:

      • sudo filebeat setup --template -E output.logstash.enabled=false -E 'output.elasticsearch.hosts=["localhost:9200"]'

      Output

      Loaded index template

      Filebeat comes packaged with sample Kibana dashboards that allow you to visualize Filebeat data in Kibana. Before you can use the dashboards, you need to create the index pattern and load the dashboards into Kibana.

      As the dashboards load, Filebeat connects to Elasticsearch to check version information. To load dashboards when Logstash is enabled, you need to disable the Logstash output and enable Elasticsearch output:

      • sudo filebeat setup -e -E output.logstash.enabled=false -E output.elasticsearch.hosts=['localhost:9200'] -E setup.kibana.host=localhost:5601

      You will see output that looks like this:

      Output

      2018-09-10T08:39:15.844Z INFO instance/beat.go:273 Setup Beat: filebeat; Version: 6.4.2 2018-09-10T08:39:15.845Z INFO elasticsearch/client.go:163 Elasticsearch url: http://localhost:9200 2018-09-10T08:39:15.845Z INFO pipeline/module.go:98 Beat name: elk 2018-09-10T08:39:15.845Z INFO elasticsearch/client.go:163 Elasticsearch url: http://localhost:9200 2018-09-10T08:39:15.849Z INFO elasticsearch/client.go:708 Connected to Elasticsearch version 6.4.2 2018-09-10T08:39:15.856Z INFO template/load.go:129 Template already exists and will not be overwritten. Loaded index template Loading dashboards (Kibana must be running and reachable) 2018-09-10T08:39:15.857Z INFO elasticsearch/client.go:163 Elasticsearch url: http://localhost:9200 2018-09-10T08:39:15.865Z INFO elasticsearch/client.go:708 Connected to Elasticsearch version 6.4.2 2018-09-10T08:39:15.865Z INFO kibana/client.go:113 Kibana url: http://localhost:5601 2018-09-10T08:39:45.357Z INFO instance/beat.go:659 Kibana dashboards successfully loaded. Loaded dashboards 2018-09-10T08:39:45.358Z INFO elasticsearch/client.go:163 Elasticsearch url: http://localhost:9200 2018-09-10T08:39:45.361Z INFO elasticsearch/client.go:708 Connected to Elasticsearch version 6.4.2 2018-09-10T08:39:45.361Z INFO kibana/client.go:113 Kibana url: http://localhost:5601 2018-09-10T08:39:45.455Z WARN fileset/modules.go:388 X-Pack Machine Learning is not enabled Loaded machine learning job configurations

      Now you can start and enable Filebeat:

      • sudo systemctl start filebeat
      • sudo systemctl enable filebeat

      If you've set up your Elastic Stack correctly, Filebeat will begin shipping your syslog and authorization logs to Logstash, which will then load that data into Elasticsearch.

      To verify that Elasticsearch is indeed receiving this data, query the Filebeat index with this command:

      • curl -XGET 'http://localhost:9200/filebeat-*/_search?pretty'

      You will see an output that looks similar to this:

      Output

      ... { "took" : 32, "timed_out" : false, "_shards" : { "total" : 3, "successful" : 3, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 1641, "max_score" : 1.0, "hits" : [ { "_index" : "filebeat-6.4.2-2018.10.10", "_type" : "doc", "_id" : "H_bZ62UBB4D0uxFRu_h3", "_score" : 1.0, "_source" : { "@version" : "1", "message" : "Oct 10 06:22:36 elk systemd[1]: Reached target Local File Systems (Pre).", "@timestamp" : "2018-10-10T08:43:56.969Z", "host" : { "name" : "elk" }, "source" : "/var/log/syslog", "input" : { "type" : "log" }, "tags" : [ "beats_input_codec_plain_applied" ], "offset" : 296, "prospector" : { "type" : "log" }, "beat" : { "version" : "6.4.2", "hostname" : "elk", "name" : "elk" } } }, ...

      If your output shows 0 total hits, Elasticsearch is not loading any logs under the index you searched for, and you will need to review your setup for errors. If you received the expected output, continue to the next step, in which we will see how to navigate through some of Kibana's dashboards.

      Step 5 — Exploring Kibana Dashboards

      Let's look at Kibana, the web interface that we installed earlier.

      In a web browser, go to the FQDN or public IP address of your Elastic Stack server. After entering the login credentials you defined in Step 2, you will see the Kibana homepage:

      Kibana Homepage

      Click the Discover link in the left-hand navigation bar. On the Discover page, select the predefined filebeat-* index pattern to see Filebeat data. By default, this will show you all of the log data over the last 15 minutes. You will see a histogram with log events, and some log messages below:

      Discover page

      Here, you can search and browse through your logs and also customize your dashboard. At this point, though, there won't be much in there because you are only gathering syslogs from your Elastic Stack server.

      Use the left-hand panel to navigate to the Dashboard page and search for the Filebeat System dashboards. Once there, you can search for the sample dashboards that come with Filebeat's system module.

      For example, you can view detailed stats based on your syslog messages:

      Syslog Dashboard

      You can also view which users have used the sudo command and when:

      Sudo Dashboard

      Kibana has many other features, such as graphing and filtering, so feel free to explore.

      Conclusion

      In this tutorial, you've learned how to install and configure the Elastic Stack to collect and analyze system logs. Remember that you can send just about any type of log or indexed data to Logstash using Beats, but the data becomes even more useful if it is parsed and structured with a Logstash filter, as this transforms the data into a consistent format that can be read easily by Elasticsearch.



      Source link