One place for hosting & domains

      How To Set Up Jupyter Notebook with Python 3 on Ubuntu 18.04


      Introduction

      An open-source web application, Jupyter Notebook lets you create and share interactive code, visualizations, and more. This tool can be used with several programming languages, including Python, Julia, R, Haskell, and Ruby. It is often used for working with data, statistical modeling, and machine learning.

      This tutorial will walk you through setting up Jupyter Notebook to run from an Ubuntu 18.04 server, as well as teach you how to connect to and use the notebook. Jupyter Notebooks (or simply Notebooks) are documents produced by the Jupyter Notebook app which contain both computer code and rich text elements (paragraph, equations, figures, links, etc.) which aid in presenting and sharing reproducible research.

      By the end of this guide, you will be able to run Python 3 code using Jupyter Notebook running on a remote server.

      Prerequisites

      In order to complete this guide, you should have a fresh Ubuntu 18.04 server instance with a basic firewall and a non-root user with sudo privileges configured. You can learn how to set this up by running through our initial server setup tutorial.

      Step 1 — Set Up Python

      To begin the process, we’ll install the dependencies we need for our Python programming environment from the Ubuntu repositories. Ubuntu 18.04 comes preinstalled with Python 3.6. We will use the Python package manager pip to install additional components a bit later.

      We first need to update the local apt package index and then download and install the packages:

      Next, install pip and the Python header files, which are used by some of Jupyter’s dependencies:

      • sudo apt install python3-pip python3-dev

      We can now move on to setting up a Python virtual environment into which we’ll install Jupyter.

      Step 2 — Create a Python Virtual Environment for Jupyter

      Now that we have Python 3, its header files, and pip ready to go, we can create a Python virtual environment to manage our projects. We will install Jupyter into this virtual environment.

      To do this, we first need access to the virtualenv command which we can install with pip.

      Upgrade pip and install the package by typing:

      • sudo -H pip3 install --upgrade pip
      • sudo -H pip3 install virtualenv

      The -H flag ensures that the security policy sets the home environment variable to the home directory of the target user.

      With virtualenv installed, we can start forming our environment. Create and move into a directory where we can keep our project files. We’ll call this my_project_dir, but you should use a name that is meaningful for you and what you’re working on.

      • mkdir ~/my_project_dir
      • cd ~/my_project_dir

      Within the project directory, we’ll create a Python virtual environment. For the purpose of this tutorial, we’ll call it my_project_env but you should call it something that is relevant to your project.

      • virtualenv my_project_env

      This will create a directory called my_project_env within your my_project_dir directory. Inside, it will install a local version of Python and a local version of pip. We can use this to install and configure an isolated Python environment for Jupyter.

      Before we install Jupyter, we need to activate the virtual environment. You can do that by typing:

      • source my_project_env/bin/activate

      Your prompt should change to indicate that you are now operating within a Python virtual environment. It will look something like this: (my_project_env)user@host:~/my_project_dir$.

      You’re now ready to install Jupyter into this virtual environment.

      Step 3 — Install Jupyter

      With your virtual environment active, install Jupyter with the local instance of pip.

      Note: When the virtual environment is activated (when your prompt has (my_project_env) preceding it), use pip instead of pip3, even if you are using Python 3. The virtual environment's copy of the tool is always named pip, regardless of the Python version.

      At this point, you’ve successfully installed all the software needed to run Jupyter. We can now start the Notebook server.

      Step 4 — Run Jupyter Notebook

      You now have everything you need to run Jupyter Notebook! To run it, execute the following command:

      A log of the activities of the Jupyter Notebook will be printed to the terminal. When you run Jupyter Notebook, it runs on a specific port number. The first Notebook you run will usually use port 8888. To check the specific port number Jupyter Notebook is running on, refer to the output of the command used to start it:

      Output

      [I 21:23:21.198 NotebookApp] Writing notebook server cookie secret to /run/user/1001/jupyter/notebook_cookie_secret [I 21:23:21.361 NotebookApp] Serving notebooks from local directory: /home/sammy/my_project_dir [I 21:23:21.361 NotebookApp] The Jupyter Notebook is running at: [I 21:23:21.361 NotebookApp] http://localhost:8888/?token=1fefa6ab49a498a3f37c959404f7baf16b9a2eda3eaa6d72 [I 21:23:21.361 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [W 21:23:21.361 NotebookApp] No web browser found: could not locate runnable browser. [C 21:23:21.361 NotebookApp] Copy/paste this URL into your browser when you connect for the first time, to login with a token: http://localhost:8888/?token=1fefa6ab49a498a3f37c959404f7baf16b9a2eda3eaa6d72

      If you are running Jupyter Notebook on a local computer (not on a server), you can navigate to the displayed URL to connect to Jupyter Notebook. If you are running Jupyter Notebook on a server, you will need to connect to the server using SSH tunneling as outlined in the next section.

      At this point, you can keep the SSH connection open and keep Jupyter Notebook running or you can exit the app and re-run it once you set up SSH tunneling. Let's choose to stop the Jupyter Notebook process. We will run it again once we have SSH tunneling set up. To stop the Jupyter Notebook process, press CTRL+C, type Y, and then ENTER to confirm. The following output will be displayed:

      Output

      [C 21:28:28.512 NotebookApp] Shutdown confirmed [I 21:28:28.512 NotebookApp] Shutting down 0 kernels

      We’ll now set up an SSH tunnel so that we can access the Notebook.

      Step 5 — Connect to the Server Using SSH Tunneling

      In this section we will learn how to connect to the Jupyter Notebook web interface using SSH tunneling. Since Jupyter Notebook will run on a specific port on the server (such as :8888, :8889 etc.), SSH tunneling enables you to connect to the server’s port securely.

      The next two subsections describe how to create an SSH tunnel from 1) a Mac or Linux, and 2) Windows. Please refer to the subsection for your local computer.

      SSH Tunneling with a Mac or Linux

      If you are using a Mac or Linux, the steps for creating an SSH tunnel are similar to using SSH to log in to your remote server, except that there are additional parameters in the ssh command. This subsection will outline the additional parameters needed in the ssh command to tunnel successfully.

      SSH tunneling can be done by running the following SSH command in a new local terminal window:

      • ssh -L 8888:localhost:8888 your_server_username@your_server_ip

      The ssh command opens an SSH connection, but -L specifies that the given port on the local (client) host is to be forwarded to the given host and port on the remote side (server). This means that whatever is running on the second port number (e.g. 8888) on the server will appear on the first port number (e.g. 8888) on your local computer.

      Optionally change port 8888 to one of your choosing to avoid using a port already in use by another process.

      server_username is your username (e.g. sammy) on the server which you created and your_server_ip is the IP address of your server.

      For example, for the username sammy and the server address 203.0.113.0, the command would be:

      • ssh -L 8888:localhost:8888 sammy@203.0.113.0

      If no error shows up after running the ssh -L command, you can move into your programming environment and run Jupyter Notebook:

      You’ll receive output with a URL. From a web browser on your local machine, open the Jupyter Notebook web interface with the URL that starts with http://localhost:8888. Ensure that the token number is included, or enter the token number string when prompted at http://localhost:8888.

      SSH Tunneling with Windows and Putty

      If you are using Windows, you can create an SSH tunnel using Putty.

      First, enter the server URL or IP address as the hostname as shown:

      Set Hostname for SSH Tunnel

      Next, click SSH on the bottom of the left pane to expand the menu, and then click Tunnels. Enter the local port number you want to use to access Jupyter on your local machine. Choose 8000 or greater to avoid ports used by other services, and set the destination as localhost:8888 where :8888 is the number of the port that Jupyter Notebook is running on.

      Now click the Add button, and the ports should appear in the Forwarded ports list:

      Forwarded ports list

      Finally, click the Open button to connect to the server via SSH and tunnel the desired ports. Navigate to http://localhost:8000 (or whatever port you chose) in a web browser to connect to Jupyter Notebook running on the server. Ensure that the token number is included, or enter the token number string when prompted at http://localhost:8000.

      Step 6 — Using Jupyter Notebook

      This section goes over the basics of using Jupyter Notebook. If you don’t currently have Jupyter Notebook running, start it with the jupyter notebook command.

      You should now be connected to it using a web browser. Jupyter Notebook is a very powerful tool with many features. This section will outline a few of the basic features to get you started using the Notebook. Jupyter Notebook will show all of the files and folders in the directory it is run from, so when you’re working on a project make sure to start it from the project directory.

      To create a new Notebook file, select New > Python 3 from the top right pull-down menu:

      Create a new Python 3 notebook

      This will open a Notebook. We can now run Python code in the cell or change the cell to markdown. For example, change the first cell to accept Markdown by clicking Cell > Cell Type > Markdown from the top navigation bar. We can now write notes using Markdown and even include equations written in LaTeX by putting them between the $$ symbols. For example, type the following into the cell after changing it to markdown:

      # First Equation
      
      Let us now implement the following equation:
      $$ y = x^2$$
      
      where $x = 2$
      

      To turn the markdown into rich text, press CTRL+ENTER, and the following should be the results:

      results of markdown

      You can use the markdown cells to make notes and document your code. Let's implement that equation and print the result. Click on the top cell, then press ALT+ENTER to add a cell below it. Enter the following code in the new cell.

      x = 2
      y = x**2
      print(y)
      

      To run the code, press CTRL+ENTER. You’ll receive the following results:

      first equation results

      You now have the ability to import modules and use the Notebook as you would with any other Python development environment!

      Conclusion

      Congratulations! You should now be able to write reproducible Python code and notes in Markdown using Jupyter Notebook. To get a quick tour of Jupyter Notebook from within the interface, select Help > User Interface Tour from the top navigation menu to learn more.

      From here, you can begin a data analysis and visualization project by reading Data Analysis and Visualization with pandas and Jupyter Notebook in Python 3.

      If you’re interested in digging in more, you can read our series on Time Series Visualization and Forecasting.



      Source link

      How To Set Up a Remote Database to Optimize Site Performance with MySQL on Ubuntu 18.04


      Introduction

      As your application or website grows, there may come a point where you’ve outgrown your current server setup. If you are hosting your web server and database backend on the same machine, it may be a good idea to separate these two functions so that each can operate on its own hardware and share the load of responding to your visitors’ requests.

      In this guide, we’ll go over how to configure a remote MySQL database server that your web application can connect to. We will use WordPress as an example in order to have something to work with, but the technique is widely applicable to any application backed by MySQL.

      Prerequisites

      Before beginning this tutorial, you will need:

      • Two Ubuntu 18.04 servers. Each should have a non-root user with sudo privileges and a UFW firewall enabled, as described in our Initial Server Setup with Ubuntu 18.04 tutorial. One of these servers will host your MySQL backend, and throughout this guide we will refer to it as the database server. The other will connect to your database server remotely and act as your web server; likewise, we will refer to it as the web server over the course of this guide.
      • Nginx and PHP installed on your web server. Our tutorial How To Install Linux, Nginx, MySQL, PHP (LEMP stack) in Ubuntu 18.04 will guide you through the process, but note that you should skip Step 2 of this tutorial, which focuses on installing MySQL, as you will install MySQL on your database server.
      • MySQL installed on your database server. Follow “How To Install MySQL on Ubuntu 18.04” to set this up.
      • Optionally (but strongly recommended), TLS/SSL certificates from Let’s Encrypt installed on your web server. You’ll need to purchase a domain name and have DNS records set up for your server, but the certificates themselves are free. Our guide How To Secure Nginx with Let’s Encrypt on Ubuntu 18.04 will show you how to obtain these certificates.

      Step 1 — Configuring MySQL to Listen for Remote Connections

      Having one’s data stored on a separate server is a good way to expand gracefully after hitting the performance ceiling of a one-machine configuration. It also provides the basic structure necessary to load balance and expand your infrastructure even more at a later time. After installing MySQL by following the prerequisite tutorial, you’ll need to change some configuration values to allow connections from other computers.

      Most of the MySQL server’s configuration changes can be made in the mysqld.cnf file, which is stored in the /etc/mysql/mysql.conf.d/ directory by default. Open up this file with root privileges in your preferred editor. Here, we’ll use nano:

      • sudo nano /etc/mysql/mysql.conf.d/mysqld.cnf

      This file is divided into sections denoted by labels in square brackets ([ and ]). Find the section labeled mysqld:

      /etc/mysql/mysql.conf.d/mysqld.cnf

      . . .
      [mysqld]
      . . .
      

      Within this section, look for a parameter called bind-address. This tells the database software which network address to listen for connections on.

      By default, this is set to 127.0.0.1, meaning that MySQL is configured to only look for local connections. You need to change this to reference an external IP address where your server can be reached.

      If both of your servers are in a datacenter with private networking capabilities, use your database server’s private network IP. Otherwise, you can use its public IP address:

      /etc/mysql/mysql.conf.d/mysqld.cnf

      [mysqld]
      . . .
      bind-address = db_server_ip
      

      Because you’ll connect to your database over the internet, it’s recommended that you require encrypted connections to keep your data secure. If you don’t encrypt your MySQL connection, anybody on the network could sniff sensitive information between your web and database servers. To encrypt MySQL connections, add the following line after the bind-address line you just updated:

      /etc/mysql/mysql.conf.d/mysqld.cnf

      [mysqld]
      . . .
      require_secure_transport = on
      . . .
      

      Save and close the file when you are finished. If you’re using nano, do this by pressing CTRL+X, Y, and then ENTER.

      For SSL connections to work, you will need to create some keys and certificates. MySQL comes with a command that will automatically set these up. Run the following command, which creates the necessary files. It also makes them readable by the MySQL server by specifying the UID of the mysql user:

      • sudo mysql_ssl_rsa_setup --uid=mysql

      To force MySQL to update its configuration and read the new SSL information, restart the database:

      • sudo systemctl restart mysql

      To confirm that the server is now listening on the external interface, run the following netstat command:

      • sudo netstat -plunt | grep mysqld

      Output

      tcp 0 0 db_server_ip:3306 0.0.0.0:* LISTEN 27328/mysqld

      netstat prints statistics about your server’s networking system. This output shows us that a process called mysqld is attached to the db_server_ip at port 3306, the standard MySQL port, confirming that the server is listening on the appropriate interface.

      Next, open up that port on the firewall to allow traffic through:

      Those are all the configuration changes you need to make to MySQL. Next, we will go over how to set up a database and some user profiles, one of which you will use to access the server remotely.

      Step 2 — Setting Up a WordPress Database and Remote Credentials

      Even though MySQL itself is now listening on an external IP address, there are currently no remote-enabled users or databases configured. Let's create a database for WordPress, and a pair of users that can access it.

      Begin by connecting to MySQL as the root MySQL user:

      Note: If you have password authentication enabled, as described in Step 3 of the prerequisite MySQL tutorial, you will instead need to use the following command to access the MySQL shell:

      After running this command, you will be asked for your MySQL root password and, after entering it, you'll be given a new mysql> prompt.

      From the MySQL prompt, create a database that WordPress will use. It may be helpful to give this database a recognizable name so that you can easily identify it later on. Here, we will name it wordpress:

      • CREATE DATABASE wordpress;

      Now that you've created your database, you next need to create a pair of users. We will create a local-only user as well as a remote user tied to the web server’s IP address.

      First, create your local user, wordpressuser, and make this account only match local connection attempts by using localhost in the declaration:

      • CREATE USER 'wordpressuser'@'localhost' IDENTIFIED BY 'password';

      Then grant this account full access to the wordpress database:

      • GRANT ALL PRIVILEGES ON wordpress.* TO 'wordpressuser'@'localhost';

      This user can now do any operation on the database for WordPress, but this account cannot be used remotely, as it only matches connections from the local machine. With this in mind, create a companion account that will match connections exclusively from your web server. For this, you'll need your web server's IP address.

      Please note that you must use an IP address that utilizes the same network that you configured in your mysqld.cnf file. This means that if you specified a private networking IP in the mysqld.cnf file, you'll need to include the private IP of your web server in the following two commands. If you configured MySQL to use the public internet, you should match that with the web server's public IP address.

      • CREATE USER 'wordpressuser'@'web-server_ip' IDENTIFIED BY 'password';

      After creating your remote account, give it the same privileges as your local user:

      • GRANT ALL PRIVILEGES ON wordpress.* TO 'wordpressuser'@'web_server_ip';

      Lastly, flush the privileges so MySQL knows to begin using them:

      Then exit the MySQL prompt by typing:

      Now that you've set up a new database and a remote-enabled user, you can move on to testing whether you're able to connect to the database from your web server.

      Step 3 — Testing Remote and Local Connections

      Before continuing, it's best to verify that you can connect to your database from both the local machine — your database server — and from your web server with each of the wordpressuser accounts.

      First, test the local connection from your database server by attempting to log in with your new account:

      • mysql -u wordpressuser -p

      When prompted, enter the password that you set up for this account.

      If you are given a MySQL prompt, then the local connection was successful. You can exit out again by typing:

      Next, log into your web server to test remote connections:

      You'll need to install some client tools for MySQL on your web server in order to access the remote database. First, update your local package cache if you haven't done so recently:

      Then install the MySQL client utilities:

      • sudo apt install mysql-client

      Following this, connect to your database server using the following syntax:

      • mysql -u wordpressuser -h db_server_ip -p

      Again, you must make sure that you are using the correct IP address for the database server. If you configured MySQL to listen on the private network, enter your database's private network IP. Otherwise, enter your database server's public IP address.

      You will be asked for the password for your wordpressuser account. After entering it, and if everything is working as expected, you will see the MySQL prompt. Verify that the connection is using SSL with the following command:

      If the connection is indeed using SSL, the SSL: line will indicate this, as shown here:

      Output

      -------------- mysql Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using EditLine wrapper Connection id: 52 Current database: Current user: wordpressuser@203.0.113.111 SSL: Cipher in use is DHE-RSA-AES256-SHA Current pager: stdout Using outfile: '' Using delimiter: ; Server version: 5.7.18-0ubuntu0.16.04.1 (Ubuntu) Protocol version: 10 Connection: 203.0.113.111 via TCP/IP Server characterset: latin1 Db characterset: latin1 Client characterset: utf8 Conn. characterset: utf8 TCP port: 3306 Uptime: 3 hours 43 min 40 sec Threads: 1 Questions: 1858 Slow queries: 0 Opens: 276 Flush tables: 1 Open tables: 184 Queries per second avg: 0.138 --------------

      After verifying that you can connect remotely, go ahead and exit the prompt:

      With that, you've verified local access and access from the web server, but you have not verified that other connections will be refused. For an additional check, try doing the same thing from a third server for which you did not configure a specific user account in order to make sure that this other server is not granted access.

      Note that before running the following command to attempt the connection, you may have to install the MySQL client utilities as you did above:

      • mysql -u wordpressuser -h db_server_ip -p

      This should not complete successfully, and should throw back an error that looks similar to this:

      Output

      ERROR 1130 (HY000): Host '203.0.113.12' is not allowed to connect to this MySQL server

      This is expected, since you haven't created a MySQL user that's allowed to connect from this server, and also desired, since you want to be sure that your database server will deny unauthorized users access to your MySQL server.

      After successfully testing your remote connection, you can proceed to installing WordPress on your web server.

      Step 4 — Installing WordPress

      To demonstrate the capabilities of your new remote-capable MySQL server, we will go through the process of installing and configuring WordPress — the popular content management system — on your web server. This will require you to download and extract the software, configure your connection information, and then run through WordPress's web-based installation.

      On your web server, download the latest release of WordPress to your home directory:

      • cd ~
      • curl -O https://wordpress.org/latest.tar.gz

      Extract the files, which will create a directory called wordpress in your home directory:

      WordPress includes a sample configuration file which we'll use as a starting point. Make a copy of this file, removing -sample from the filename so it will be loaded by WordPress:

      • cp ~/wordpress/wp-config-sample.php ~/wordpress/wp-config.php

      When you open the file, your first order of business will be to adjust some secret keys to provide more security to your installation. WordPress provides a secure generator for these values so that you do not have to try to come up with good values on your own. These are only used internally, so it won't hurt usability to have complex, secure values here.

      To grab secure values from the WordPress secret key generator, type:

      • curl -s https://api.wordpress.org/secret-key/1.1/salt/

      This will print some keys to your output. You will add these to your wp-config.php file momentarily:

      Warning! It is important that you request your own unique values each time. Do not copy the values shown here!

      Output

      define('AUTH_KEY', 'L4|2Yh(giOtMLHg3#] DO NOT COPY THESE VALUES %G00o|te^5YG@)'); define('SECURE_AUTH_KEY', 'DCs-k+MwB90/-E(=!/ DO NOT COPY THESE VALUES +WBzDq:7U[#Wn9'); define('LOGGED_IN_KEY', '*0kP!|VS.K=;#fPMlO DO NOT COPY THESE VALUES +&[%8xF*,18c @'); define('NONCE_KEY', 'fmFPF?UJi&(j-{8=$- DO NOT COPY THESE VALUES CCZ?Q+_~1ZU~;G'); define('AUTH_SALT', '@qA7f}2utTEFNdnbEa DO NOT COPY THESE VALUES t}Vw+8=K%20s=a'); define('SECURE_AUTH_SALT', '%BW6s+d:7K?-`C%zw4 DO NOT COPY THESE VALUES 70U}PO1ejW+7|8'); define('LOGGED_IN_SALT', '-l>F:-dbcWof%4kKmj DO NOT COPY THESE VALUES 8Ypslin3~d|wLD'); define('NONCE_SALT', '4J(<`4&&F (WiK9K#] DO NOT COPY THESE VALUES ^ZikS`es#Fo:V6');

      Copy the output you received to your clipboard, then open the configuration file in your text editor:

      • nano ~/wordpress/wp-config.php

      Find the section that contains the dummy values for those settings. It will look something like this:

      /wordpress/wp-config.php

      . . .
      define('AUTH_KEY',         'put your unique phrase here');
      define('SECURE_AUTH_KEY',  'put your unique phrase here');
      define('LOGGED_IN_KEY',    'put your unique phrase here');
      define('NONCE_KEY',        'put your unique phrase here');
      define('AUTH_SALT',        'put your unique phrase here');
      define('SECURE_AUTH_SALT', 'put your unique phrase here');
      define('LOGGED_IN_SALT',   'put your unique phrase here');
      define('NONCE_SALT',       'put your unique phrase here');
      . . .
      

      Delete those lines and paste in the values you copied from the command line.

      Next, enter the connection information for your remote database. These configuration lines are at the top of the file, just above where you pasted in your keys. Remember to use the same IP address you used in your remote database test earlier:

      /wordpress/wp-config.php

      . . .
      /** The name of the database for WordPress */
      define('DB_NAME', 'wordpress');
      
      /** MySQL database username */
      define('DB_USER', 'wordpressuser');
      
      /** MySQL database password */
      define('DB_PASSWORD', 'password');
      
      /** MySQL hostname */
      define('DB_HOST', 'db_server_ip');
      . . .
      

      And finally, anywhere in the file, add the following line which tells WordPress to use an SSL connection to our MySQL database:

      /wordpress/wp-config.php

      define('MYSQL_CLIENT_FLAGS', MYSQLI_CLIENT_SSL);
      

      Save and close the file.

      Next, copy the files and directories found in your ~/wordpress directory to Nginx's document root. Note that this command includes the -a flag to make sure all the existing permissions are carried over:

      • sudo cp -a ~/wordpress/* /var/www/html

      After this, the only thing left to do is modify the file ownership. Change the ownership of all the files in the document root over to www-data, Ubuntu's default web server user:

      • sudo chown -R www-data:www-data /var/www/html

      With that, WordPress is installed and you're ready to run through its web-based setup routine.

      Step 5 — Setting Up WordPress Through the Web Interface

      WordPress has a web-based setup process. As you go through it, it will ask a few questions and install all the tables it needs in your database. Here, we will go over the initial steps of setting up WordPress, which you can use as a starting point for building your own custom website that uses a remote database backend.

      Navigate to the domain name (or public IP address) associated with your web server:

      http://example.com
      

      You will see a language selection screen for the WordPress installer. Select the appropriate language and click through to the main installation screen:

      WordPress install screen

      Once you have submitted your information, you will need to log into the WordPress admin interface using the account you just created. You will then be taken to a dashboard where you can customize your new WordPress site.

      Conclusion

      By following this tutorial, you've set up a MySQL database to accept SSL-protected connections from a remote WordPress installation. The commands and techniques used in this guide are applicable to any web application written in any programming language, but the specific implementation details will differ. Refer to your application or language's database documentation for more information.



      Source link

      How to Set Up an Elasticsearch, Fluentd and Kibana (EFK) Logging Stack on Kubernetes


      Introduction

      When running multiple services and applications on a Kubernetes cluster, a centralized, cluster-level logging stack can help you quickly sort through and analyze the heavy volume of log data produced by your Pods. One popular centralized logging solution is the Elasticsearch, Fluentd, and Kibana (EFK) stack.

      Elasticsearch is a real-time, distributed, and scalable search engine which allows for full-text and structured search, as well as analytics. It is commonly used to index and search through large volumes of log data, but can also be used to search many different kinds of documents.

      Elasticsearch is commonly deployed alongside Kibana, a powerful data visualization frontend and dashboard for Elasticsearch. Kibana allows you to explore your Elasticsearch log data through a web interface, and build dashboards and queries to quickly answer questions and gain insight into your Kubernetes applications.

      In this tutorial we’ll use Fluentd to collect, transform, and ship log data to the Elasticsearch backend. Fluentd is a popular open-source data collector that we’ll set up on our Kubernetes nodes to tail container log files, filter and transform the log data, and deliver it to the Elasticsearch cluster, where it will be indexed and stored.

      We’ll begin by configuring and launching a scalable Elasticsearch cluster, and then create the Kibana Kubernetes Service and Deployment. To conclude, we’ll set up Fluentd as a DaemonSet so it runs on every Kubernetes worker node.

      Prerequisites

      Before you begin with this guide, ensure you have the following available to you:

      • A Kubernetes 1.10+ cluster with role-based access control (RBAC) enabled

        • Ensure your cluster has enough resources available to roll out the EFK stack, and if not scale your cluster by adding worker nodes. We’ll be deploying a 3-Pod Elasticsearch cluster (you can scale this down to 1 if necessary), as well as a single Kibana Pod. Every worker node will also run a Fluentd Pod. The cluster in this guide consists of 3 worker nodes and a managed control plane.
      • The kubectl command-line tool installed on your local machine, configured to connect to your cluster. You can read more about installing kubectl in the official documentation.

      Once you have these components set up, you’re ready to begin with this guide.

      Step 1 — Creating a Namespace

      Before we roll out an Elasticsearch cluster, we’ll first create a Namespace into which we’ll install all of our logging instrumentation. Kubernetes lets you separate objects running in your cluster using a “virtual cluster” abstraction called Namespaces. In this guide, we’ll create a kube-logging namespace into which we’ll install the EFK stack components. This Namespace will also allow us to quickly clean up and remove the logging stack without any loss of function to the Kubernetes cluster.

      To begin, first investigate the existing Namespaces in your cluster using kubectl:

      kubectl get namespaces
      

      You should see the following three initial Namespaces, which come preinstalled with your Kubernetes cluster:

      Output

      • NAME STATUS AGE
      • default Active 5m
      • kube-system Active 5m
      • kube-public Active 5m

      The default Namespace houses objects that are created without specifying a Namespace. The kube-system Namespace contains objects created and used by the Kubernetes system, like kube-dns, kube-proxy, and kubernetes-dashboard. It’s good practice to keep this Namespace clean and not pollute it with your application and instrumentation workloads.

      The kube-public Namespace is another automatically created Namespace that can be used to store objects you’d like to be readable and accessible throughout the whole cluster, even to unauthenticated users.

      To create the kube-logging Namespace, first open and edit a file called kube-logging.yaml using your favorite editor, such as nano:

      Inside your editor, paste the following Namespace object YAML:

      kube-logging.yaml

      kind: Namespace
      apiVersion: v1
      metadata:
        name: kube-logging
      

      Then, save and close the file.

      Here, we specify the Kubernetes object's kind as a Namespace object. To learn more about Namespace objects, consult the Namespaces Walkthrough in the official Kubernetes documentation. We also specify the Kubernetes API version used to create the object (v1), and give it a name, kube-logging.

      Once you've created the kube-logging.yaml Namespace object file, create the Namespace using kubectl create with the -f filename flag:

      • kubectl create -f kube-logging.yaml

      You should see the following output:

      Output

      namespace/kube-logging created

      You can then confirm that the Namespace was successfully created:

      At this point, you should see the new kube-logging Namespace:

      Output

      NAME STATUS AGE default Active 23m kube-logging Active 1m kube-public Active 23m kube-system Active 23m

      We can now deploy an Elasticsearch cluster into this isolated logging Namespace.

      Step 2 — Creating the Elasticsearch StatefulSet

      Now that we've created a Namespace to house our logging stack, we can begin rolling out its various components. We'll first begin by deploying a 3-node Elasticsearch cluster.

      In this guide, we use 3 Elasticsearch Pods to avoid the "split-brain" issue that occurs in highly-available, multi-node clusters. At a high-level, "split-brain" is what arises when one or more nodes can't communicate with the others, and several "split" masters get elected. To learn more, consult “Avoiding split brain.”

      One key takeaway is that you should set the discover.zen.minimum_master_nodes Elasticsearch parameter to N/2 + 1 (rounding down in the case of fractional numbers), where N is the number of master-eligible nodes in your Elasticsearch cluster. For our 3-node cluster, this means that we'll set this value to 2. That way, if one node gets disconnected from the cluster temporarily, the other two nodes can elect a new master and the cluster can continue functioning while the last node attempts to rejoin. It's important to keep this parameter in mind when scaling your Elasticsearch cluster.

      Creating the Headless Service

      To start, we'll create a headless Kubernetes service called elasticsearch that will define a DNS domain for the 3 Pods. A headless service does not perform load balancing or have a static IP; to learn more about headless services, consult the official Kubernetes documentation.

      Open a file called elasticsearch_svc.yaml using your favorite editor:

      • nano elasticsearch_svc.yaml

      Paste in the following Kubernetes service YAML:

      elasticsearch_svc.yaml

      kind: Service
      apiVersion: v1
      metadata:
        name: elasticsearch
        namespace: kube-logging
        labels:
          app: elasticsearch
      spec:
        selector:
          app: elasticsearch
        clusterIP: None
        ports:
          - port: 9200
            name: rest
          - port: 9300
            name: inter-node
      

      Then, save and close the file.

      We define a Service called elasticsearch in the kube-logging Namespace, and give it the app: elasticsearch label. We then set the .spec.selector to app: elasticsearch so that the Service selects Pods with the app: elasticsearch label. When we associate our Elasticsearch StatefulSet with this Service, the Service will return DNS A records that point to Elasticsearch Pods with the app: elasticsearch label.

      We then set clusterIP: None, which renders the service headless. Finally, we define ports 9200 and 9300 which are used to interact with the REST API, and for inter-node communication, respectively.

      Create the service using kubectl:

      • kubectl create -f elasticsearch_svc.yaml

      You should see the following output:

      Output

      service/elasticsearch created

      Finally, double-check that the service was successfully created using kubectl get:

      kubectl get services --namespace=kube-logging
      

      You should see the following:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE elasticsearch ClusterIP None <none> 9200/TCP,9300/TCP 26s

      Now that we've set up our headless service and a stable .elasticsearch.kube-logging.svc.cluster.local domain for our Pods, we can go ahead and create the StatefulSet.

      Creating the StatefulSet

      A Kubernetes StatefulSet allows you to assign a stable identity to Pods and grant them stable, persistent storage. Elasticsearch requires stable storage to persist data across Pod rescheduling and restarts. To learn more about the StatefulSet workload, consult the Statefulsets page from the Kubernetes docs.

      Open a file called elasticsearch_statefulset.yaml in your favorite editor:

      • nano elasticsearch_statefulset.yaml

      We will move through the StatefulSet object definition section by section, pasting blocks into this file.

      Begin by pasting in the following block:

      elasticsearch_statefulset.yaml

      apiVersion: apps/v1
      kind: StatefulSet
      metadata:
        name: es-cluster
        namespace: kube-logging
      spec:
        serviceName: elasticsearch
        replicas: 3
        selector:
          matchLabels:
            app: elasticsearch
        template:
          metadata:
            labels:
              app: elasticsearch
      

      In this block, we define a StatefulSet called es-cluster in the kube-logging namespace. We then associate it with our previously created elasticsearch Service using the serviceName field. This ensures that each Pod in the StatefulSet will be accessible using the following DNS address: es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local, where [0,1,2] corresponds to the Pod's assigned integer ordinal.

      We specify 3 replicas (Pods) and set the matchLabels selector to app: elasticseach, which we then mirror in the .spec.template.metadata section. The .spec.selector.matchLabels and .spec.template.metadata.labels fields must match.

      We can now move on to the object spec. Paste in the following block of YAML immediately below the preceding block:

      elasticsearch_statefulset.yaml

      . . .
          spec:
            containers:
            - name: elasticsearch
              image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3
              resources:
                  limits:
                    cpu: 1000m
                  requests:
                    cpu: 100m
              ports:
              - containerPort: 9200
                name: rest
                protocol: TCP
              - containerPort: 9300
                name: inter-node
                protocol: TCP
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
              env:
                - name: cluster.name
                  value: k8s-logs
                - name: node.name
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: discovery.zen.ping.unicast.hosts
                  value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
                - name: discovery.zen.minimum_master_nodes
                  value: "2"
                - name: ES_JAVA_OPTS
                  value: "-Xms512m -Xmx512m"
      

      Here we define the Pods in the StatefulSet. We name the containers elasticsearch and choose the docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3 Docker image. At this point, you may modify this image tag to correspond to your own internal Elasticsearch image, or a different version. Note that for the purposes of this guide, only Elasticsearch 6.4.3 has been tested.

      The -oss suffix ensures that we use the open-source version of Elasticsearch. If you'd like to use the default version containing X-Pack (which includes a free license), omit the -oss suffix. Note that you will have to modify the steps in this guide slightly to account for the added authentication provided by X-Pack.

      We then use the resources field to specify that the container needs at least 0.1 vCPU guaranteed to it, and can burst up to 1 vCPU (which limits the Pod's resource usage when performing an initial large ingest or dealing with a load spike). You should modify these values depending on your anticipated load and available resources. To learn more about resource requests and limits, consult the official Kubernetes Documentation.

      We then open and name ports 9200 and 9300 for REST API and inter-node communication, respectively. We specify a volumeMount called data that will mount the PersistentVolume named data to the container at the path /usr/share/elasticsearch/data. We will define the VolumeClaims for this StatefulSet in a later YAML block.

      Finally, we set some environment variables in the container:

      • cluster.name: The Elasticsearch cluster's name, which in this guide is k8s-logs.
      • node.name: The node's name, which we set to the .metadata.name field using valueFrom. This will resolve to es-cluster-[0,1,2], depending on the node's assigned ordinal.
      • discovery.zen.ping.unicast.hosts: This field sets the discovery method used to connect nodes to each other within an Elasticsearch cluster. We use unicast discovery, which specifies a static list of hosts for our cluster. In this guide, thanks to the headless service we configured earlier, our Pods have domains of the form es-cluster-[0,1,2].elasticsearch.kube-logging.svc.cluster.local, so we set this variable accordingly. Using local namespace Kubernetes DNS resolution, we can shorten this to es-cluster-[0,1,2].elasticsearch. To learn more about Elasticsearch discovery, consult the official Elasticsearch documentation.
      • discovery.zen.minimum_master_nodes: We set this to (N/2) + 1, where N is the number of master-eligible nodes in our cluster. In this guide we have 3 Elasticsearch nodes, so we set this value to 2 (rounding down to the nearest integer). To learn more about this parameter, consult the official Elasticsearch documenation.
      • ES_JAVA_OPTS: Here we set this to -Xms512m -Xmx512m which tells the JVM to use a minimum and maximum heap size of 512 MB. You should tune these parameters depending on your cluster's resource availability and needs. To learn more, consult Setting the heap size.

      The next block we'll paste in looks as follows:

      elasticsearch_statefulset.yaml

      . . .
            initContainers:
            - name: fix-permissions
              image: busybox
              command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
              securityContext:
                privileged: true
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
            - name: increase-vm-max-map
              image: busybox
              command: ["sysctl", "-w", "vm.max_map_count=262144"]
              securityContext:
                privileged: true
            - name: increase-fd-ulimit
              image: busybox
              command: ["sh", "-c", "ulimit -n 65536"]
              securityContext:
                privileged: true
      

      In this block, we define several Init Containers that run before the main elasticsearch app container. These Init Containers each run to completion in the order they are defined. To learn more about Init Containers, consult the official Kubernetes Documentation.

      The first, named fix-permissions, runs a chown command to change the owner and group of the Elasticsearch data directory to 1000:1000, the Elasticsearch user's UID. By default Kubernetes mounts the data directory as root, which renders it inaccessible to Elasticsearch. To learn more about this step, consult Elasticsearch's “Notes for production use and defaults.”

      The second, named increase-vm-max-map, runs a command to increase the operating system's limits on mmap counts, which by default may be too low, resulting in out of memory errors. To learn more about this step, consult the official Elasticsearch documentation.

      The next Init Container to run is increase-fd-ulimit, which runs the ulimit command to increase the maximum number of open file descriptors. To learn more about this step, consult the “Notes for Production Use and Defaults” from the official Elasticsearch documentation.

      Note: The Elasticsearch Notes for Production Use also mentions disabling swapping for performance reasons. Depending on your Kubernetes installation or provider, swapping may already be disabled. To check this, exec into a running container and run cat /proc/swaps to list active swap devices. If you see nothing there, swap is disabled.

      Now that we've defined our main app container and the Init Containers that run before it to tune the container OS, we can add the final piece to our StatefulSet object definition file: the volumeClaimTemplates.

      Paste in the following volumeClaimTemplate block:

      elasticsearch_statefulset.yaml

      . . .
        volumeClaimTemplates:
        - metadata:
            name: data
            labels:
              app: elasticsearch
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: do-block-storage
            resources:
              requests:
                storage: 100Gi
      

      In this block, we define the StatefulSet's volumeClaimTemplates. Kubernetes will use this to create PersistentVolumes for the Pods. In the block above, we name it data (which is the name we refer to in the volumeMounts defined previously), and give it the same app: elasticsearch label as our StatefulSet.

      We then specify its access mode as ReadWriteOnce, which means that it can only be mounted as read-write by a single node. We define the storage class as do-block-storage in this guide since we use a DigitalOcean Kubernetes cluster for demonstration purposes. You should change this value depending on where you are running your Kubernetes cluster. To learn more, consult the Persistent Volume documentation.

      Finally, we specify that we'd like each PersistentVolume to be 100GiB in size. You should adjust this value depending on your production needs.

      The complete StatefulSet spec should look something like this:

      elasticsearch_statefulset.yaml

      apiVersion: apps/v1
      kind: StatefulSet
      metadata:
        name: es-cluster
        namespace: kube-logging
      spec:
        serviceName: elasticsearch
        replicas: 3
        selector:
          matchLabels:
            app: elasticsearch
        template:
          metadata:
            labels:
              app: elasticsearch
          spec:
            containers:
            - name: elasticsearch
              image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.4.3
              resources:
                  limits:
                    cpu: 1000m
                  requests:
                    cpu: 100m
              ports:
              - containerPort: 9200
                name: rest
                protocol: TCP
              - containerPort: 9300
                name: inter-node
                protocol: TCP
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
              env:
                - name: cluster.name
                  value: k8s-logs
                - name: node.name
                  valueFrom:
                    fieldRef:
                      fieldPath: metadata.name
                - name: discovery.zen.ping.unicast.hosts
                  value: "es-cluster-0.elasticsearch,es-cluster-1.elasticsearch,es-cluster-2.elasticsearch"
                - name: discovery.zen.minimum_master_nodes
                  value: "2"
                - name: ES_JAVA_OPTS
                  value: "-Xms512m -Xmx512m"
            initContainers:
            - name: fix-permissions
              image: busybox
              command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
              securityContext:
                privileged: true
              volumeMounts:
              - name: data
                mountPath: /usr/share/elasticsearch/data
            - name: increase-vm-max-map
              image: busybox
              command: ["sysctl", "-w", "vm.max_map_count=262144"]
              securityContext:
                privileged: true
            - name: increase-fd-ulimit
              image: busybox
              command: ["sh", "-c", "ulimit -n 65536"]
              securityContext:
                privileged: true
        volumeClaimTemplates:
        - metadata:
            name: data
            labels:
              app: elasticsearch
          spec:
            accessModes: [ "ReadWriteOnce" ]
            storageClassName: do-block-storage
            resources:
              requests:
                storage: 100Gi
      

      Once you're satisfied with your Elasticsearch configuration, save and close the file.

      Now, deploy the StatefulSet using kubectl:

      • kubectl create -f elasticsearch_statefulset.yaml

      You should see the following output:

      Output

      statefulset.apps/es-cluster created

      You can monitor the StatefulSet as it is rolled out using kubectl rollout status:

      • kubectl rollout status sts/es-cluster --namespace=kube-logging

      You should see the following output as the cluster is rolled out:

      Output

      Waiting for 3 pods to be ready... Waiting for 2 pods to be ready... Waiting for 1 pods to be ready... partitioned roll out complete: 3 new pods have been updated...

      Once all the Pods have been deployed, you can check that your Elasticsearch cluster is functioning correctly by performing a request against the REST API.

      To do so, first forward the local port 9200 to the port 9200 on one of the Elasticsearch nodes (es-cluster-0) using kubectl port-forward:

      • kubectl port-forward es-cluster-0 9200:9200 --namespace=kube-logging

      Then, in a separate terminal window, perform a curl request against the REST API:

      • curl http://localhost:9200/_cluster/state?pretty

      You shoulds see the following output:

      Output

      { "cluster_name" : "k8s-logs", "compressed_size_in_bytes" : 348, "cluster_uuid" : "QD06dK7CQgids-GQZooNVw", "version" : 3, "state_uuid" : "mjNIWXAzQVuxNNOQ7xR-qg", "master_node" : "IdM5B7cUQWqFgIHXBp0JDg", "blocks" : { }, "nodes" : { "u7DoTpMmSCixOoictzHItA" : { "name" : "es-cluster-1", "ephemeral_id" : "ZlBflnXKRMC4RvEACHIVdg", "transport_address" : "10.244.8.2:9300", "attributes" : { } }, "IdM5B7cUQWqFgIHXBp0JDg" : { "name" : "es-cluster-0", "ephemeral_id" : "JTk1FDdFQuWbSFAtBxdxAQ", "transport_address" : "10.244.44.3:9300", "attributes" : { } }, "R8E7xcSUSbGbgrhAdyAKmQ" : { "name" : "es-cluster-2", "ephemeral_id" : "9wv6ke71Qqy9vk2LgJTqaA", "transport_address" : "10.244.40.4:9300", "attributes" : { } } }, ...

      This indicates that our Elasticsearch cluster k8s-logs has successfully been created with 3 nodes: es-cluster-0, es-cluster-1, and es-cluster-2. The current master node is es-cluster-0.

      Now that your Elasticsearch cluster is up and running, you can move on to setting up a Kibana frontend for it.

      Step 3 — Creating the Kibana Deployment and Service

      To launch Kibana on Kubernetes, we'll create a Service called kibana, and a Deployment consisting of one Pod replica. You can scale the number of replicas depending on your production needs, and optionally specify a LoadBalancer type for the Service to load balance requests across the Deployment pods.

      This time, we'll create the Service and Deployment in the same file. Open up a file called kibana.yaml in your favorite editor:

      Paste in the following service spec:

      kibana.yaml

      apiVersion: v1
      kind: Service
      metadata:
        name: kibana
        namespace: kube-logging
        labels:
          app: kibana
      spec:
        ports:
        - port: 5601
        selector:
          app: kibana
      ---
      apiVersion: apps/v1
      kind: Deployment
      metadata:
        name: kibana
        namespace: kube-logging
        labels:
          app: kibana
      spec:
        replicas: 1
        selector:
          matchLabels:
            app: kibana
        template:
          metadata:
            labels:
              app: kibana
          spec:
            containers:
            - name: kibana
              image: docker.elastic.co/kibana/kibana-oss:6.4.3
              resources:
                limits:
                  cpu: 1000m
                requests:
                  cpu: 100m
              env:
                - name: ELASTICSEARCH_URL
                  value: http://elasticsearch:9200
              ports:
              - containerPort: 5601
      

      Then, save and close the file.

      In this spec we've defined a service called kibana in the kube-logging namespace, and gave it the app: kibana label.

      We've also specified that it should be accessible on port 5601 and use the app: kibana label to select the Service's target Pods.

      In the Deployment spec, we define a Deployment called kibana and specify that we'd like 1 Pod replica.

      We use the docker.elastic.co/kibana/kibana-oss:6.4.3 image. At this point you may substitute your own private or public Kibana image to use. We once again use the -oss suffix to specify that we'd like the open-source version.

      We specify that we'd like at the very least 0.1 vCPU guaranteed to the Pod, bursting up to a limit of 1 vCPU. You may change these parameters depending on your anticipated load and available resources.

      Next, we use the ELASTICSEARCH_URL environment variable to set the endpoint and port for the Elasticsearch cluster. Using Kubernetes DNS, this endpoint corresponds to its Service name elasticsearch. This domain will resolve to a list of IP addresses for the 3 Elasticsearch Pods. To learn more about Kubernetes DNS, consult DNS for Services and Pods.

      Finally, we set Kibana's container port to 5601, to which the kibana Service will forward requests.

      Once you're satisfied with your Kibana configuration, you can roll out the Service and Deployment using kubectl:

      • kubectl create -f kibana.yaml

      You should see the following output:

      Output

      service/kibana created deployment.apps/kibana created

      You can check that the rollout succeeded by running the following command:

      • kubectl rollout status deployment/kibana --namespace=kube-logging

      You should see the following output:

      Output

      deployment "kibana" successfully rolled out

      To access the Kibana interface, we'll once again forward a local port to the Kubernetes node running Kibana. Grab the Kibana Pod details using kubectl get:

      • kubectl get pods --namespace=kube-logging

      Output

      NAME READY STATUS RESTARTS AGE es-cluster-0 1/1 Running 0 55m es-cluster-1 1/1 Running 0 54m es-cluster-2 1/1 Running 0 54m kibana-6c9fb4b5b7-plbg2 1/1 Running 0 4m27s

      Here we observe that our Kibana Pod is called kibana-6c9fb4b5b7-plbg2.

      Forward the local port 5601 to port 5601 on this Pod:

      • kubectl port-forward kibana-6c9fb4b5b7-plbg2 5601:5601 --namespace=kube-logging

      You should see the following output:

      Output

      Forwarding from 127.0.0.1:5601 -> 5601 Forwarding from [::1]:5601 -> 5601

      Now, in your web browser, visit the following URL:

      http://localhost:5601
      

      If you see the following Kibana welcome page, you've successfully deployed Kibana into your Kubernetes cluster:

      Kibana Welcome Screen

      You can now move on to rolling out the final component of the EFK stack: the log collector, Fluentd.

      Step 4 — Creating the Fluentd DaemonSet

      In this guide, we'll set up Fluentd as a DaemonSet, which is a Kubernetes workload type that runs a copy of a given Pod on each Node in the Kubernetes cluster. Using this DaemonSet controller, we'll roll out a Fluentd logging agent Pod on every node in our cluster. To learn more about this logging architecture, consult “Using a node logging agent” from the official Kubernetes docs.

      In Kubernetes, containerized applications that log to stdout and stderr have their log streams captured and redirected to JSON files on the nodes. The Fluentd Pod will tail these log files, filter log events, transform the log data, and ship it off to the Elasticsearch logging backend we deployed in Step 2.

      In addition to container logs, the Fluentd agent will tail Kubernetes system component logs like kubelet, kube-proxy, and Docker logs. To see a full list of sources tailed by the Fluentd logging agent, consult the kubernetes.conf file used to configure the logging agent. To learn more about logging in Kubernetes clusters, consult “Logging at the node level” from the official Kubernetes documentation.

      Begin by opening a file called fluentd.yaml in your favorite text editor:

      Once again, we'll paste in the Kubernetes object definitions block by block, providing context as we go along. In this guide, we use the Fluentd DaemonSet spec provided by the Fluentd maintainers. Another helpful resource provided by the Fluentd maintainers is Kubernetes Logging with Fluentd.

      First, paste in the following ServiceAccount definition:

      fluentd.yaml

      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      

      Here, we create a Service Account called fluentd that the Fluentd Pods will use to access the Kubernetes API. We create it in the kube-logging Namespace and once again give it the label app: fluentd. To learn more about Service Accounts in Kubernetes, consult Configure Service Accounts for Pods in the official Kubernetes docs.

      Next, paste in the following ClusterRole block:

      fluentd.yaml

      . . .
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: fluentd
        labels:
          app: fluentd
      rules:
      - apiGroups:
        - ""
        resources:
        - pods
        - namespaces
        verbs:
        - get
        - list
        - watch
      

      Here we define a ClusterRole called fluentd to which we grant the get, list, and watch permissions on the pods and namespaces objects. ClusterRoles allow you to grant access to cluster-scoped Kubernetes resources like Nodes. To learn more about Role-Based Access Control and Cluster Roles, consult Using RBAC Authorization from the official Kubernetes documentation.

      Now, paste in the following ClusterRoleBinding block:

      fluentd.yaml

      . . .
      ---
      kind: ClusterRoleBinding
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        name: fluentd
      roleRef:
        kind: ClusterRole
        name: fluentd
        apiGroup: rbac.authorization.k8s.io
      subjects:
      - kind: ServiceAccount
        name: fluentd
        namespace: kube-logging
      

      In this block, we define a ClusterRoleBinding called fluentd which binds the fluentd ClusterRole to the fluentd Service Account. This grants the fluentd ServiceAccount the permissions listed in the fluentd Cluster Role.

      At this point we can begin pasting in the actual DaemonSet spec:

      fluentd.yaml

      . . .
      ---
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      

      Here, we define a DaemonSet called fluentd in the kube-logging Namespace and give it the app: fluentd label.

      Next, paste in the following section:

      fluentd.yaml

      . . .
      spec:
        selector:
          matchLabels:
            app: fluentd
        template:
          metadata:
            labels:
              app: fluentd
          spec:
            serviceAccount: fluentd
            serviceAccountName: fluentd
            tolerations:
            - key: node-role.kubernetes.io/master
              effect: NoSchedule
            containers:
            - name: fluentd
              image: fluent/fluentd-kubernetes-daemonset:v0.12-debian-elasticsearch
              env:
                - name:  FLUENT_ELASTICSEARCH_HOST
                  value: "elasticsearch.kube-logging.svc.cluster.local"
                - name:  FLUENT_ELASTICSEARCH_PORT
                  value: "9200"
                - name: FLUENT_ELASTICSEARCH_SCHEME
                  value: "http"
                - name: FLUENT_UID
                  value: "0"
      

      Here, we match the app: fluentd label defined in .metadata.labels and then assign the DaemonSet the fluentd Service Account. We also select the app: fluentd as the Pods managed by this DaemonSet.

      Next, we define a NoSchedule toleration to match the equivalent taint on Kubernetes master nodes. This will ensure that the DaemonSet also gets rolled out to the Kubernetes masters. If you don't want to run a Fluentd Pod on your master nodes, remove this toleration. To learn more about Kubernetes taints and tolerations, consult “Taints and Tolerations” from the official Kubernetes docs.

      Next, we begin defining the Pod container, which we call fluentd.

      We use the official v0.12 Debian image provided by the Fluentd maintainers. If you'd like to use your own private or public Fluentd image, or use a different image version, modify the image tag in the container spec. The Dockerfile and contents of this image are available in Fluentd's fluentd-kubernetes-daemonset Github repo.

      Next, we configure Fluentd using some environment variables:

      • FLUENT_ELASTICSEARCH_HOST: We set this to the Elasticsearch headless Service address defined earlier: elasticsearch.kube-logging.svc.cluster.local. This will resolve to a list of IP addresses for the 3 Elasticsearch Pods. The actual Elasticsearch host will most likely be the first IP address returned in this list. To distribute logs across the cluster, you will need to modify the configuration for Fluentd’s Elasticsearch Output plugin. To learn more about this plugin, consult Elasticsearch Output Plugin.
      • FLUENT_ELASTICSEARCH_PORT: We set this to the Elasticsearch port we configured earlier, 9200.
      • FLUENT_ELASTICSEARCH_SCHEME: We set this to http.
      • FLUENT_UID: We set this to 0 (superuser) so that Fluentd can access the files in /var/log.

      Finally, paste in the following section:

      fluentd.yaml

      . . .
              resources:
                limits:
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 200Mi
              volumeMounts:
              - name: varlog
                mountPath: /var/log
              - name: varlibdockercontainers
                mountPath: /var/lib/docker/containers
                readOnly: true
            terminationGracePeriodSeconds: 30
            volumes:
            - name: varlog
              hostPath:
                path: /var/log
            - name: varlibdockercontainers
              hostPath:
                path: /var/lib/docker/containers
      

      Here we specify a 512 MiB memory limit on the FluentD Pod, and guarantee it 0.1vCPU and 200MiB of memory. You can tune these resource limits and requests depending on your anticipated log volume and available resources.

      Next, we mount the /var/log and /var/lib/docker/containers host paths into the container using the varlog and varlibdockercontainers volumeMounts. These volumes are defined at the end of the block.

      The final parameter we define in this block is terminationGracePeriodSeconds, which gives Fluentd 30 seconds to shut down gracefully upon receiving a SIGTERM signal. After 30 seconds, the containers are sent a SIGKILL signal. The default value for terminationGracePeriodSeconds is 30s, so in most cases this parameter can be omitted. To learn more about gracefully terminating Kubernetes workloads, consult Google's “Kubernetes best practices: terminating with grace.”

      The entire Fluentd spec should look something like this:

      fluentd.yaml

      apiVersion: v1
      kind: ServiceAccount
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      ---
      apiVersion: rbac.authorization.k8s.io/v1
      kind: ClusterRole
      metadata:
        name: fluentd
        labels:
          app: fluentd
      rules:
      - apiGroups:
        - ""
        resources:
        - pods
        - namespaces
        verbs:
        - get
        - list
        - watch
      ---
      kind: ClusterRoleBinding
      apiVersion: rbac.authorization.k8s.io/v1
      metadata:
        name: fluentd
      roleRef:
        kind: ClusterRole
        name: fluentd
        apiGroup: rbac.authorization.k8s.io
      subjects:
      - kind: ServiceAccount
        name: fluentd
        namespace: kube-logging
      ---
      apiVersion: apps/v1
      kind: DaemonSet
      metadata:
        name: fluentd
        namespace: kube-logging
        labels:
          app: fluentd
      spec:
        selector:
          matchLabels:
            app: fluentd
        template:
          metadata:
            labels:
              app: fluentd
          spec:
            serviceAccount: fluentd
            serviceAccountName: fluentd
            tolerations:
            - key: node-role.kubernetes.io/master
              effect: NoSchedule
            containers:
            - name: fluentd
              image: fluent/fluentd-kubernetes-daemonset:v0.12-debian-elasticsearch
              env:
                - name:  FLUENT_ELASTICSEARCH_HOST
                  value: "elasticsearch.kube-logging.svc.cluster.local"
                - name:  FLUENT_ELASTICSEARCH_PORT
                  value: "9200"
                - name: FLUENT_ELASTICSEARCH_SCHEME
                  value: "http"
                - name: FLUENT_UID
                  value: "0"
              resources:
                limits:
                  memory: 512Mi
                requests:
                  cpu: 100m
                  memory: 200Mi
              volumeMounts:
              - name: varlog
                mountPath: /var/log
              - name: varlibdockercontainers
                mountPath: /var/lib/docker/containers
                readOnly: true
            terminationGracePeriodSeconds: 30
            volumes:
            - name: varlog
              hostPath:
                path: /var/log
            - name: varlibdockercontainers
              hostPath:
                path: /var/lib/docker/containers
      

      Once you've finished configuring the Fluentd DaemonSet, save and close the file.

      Now, roll out the DaemonSet using kubectl:

      • kubectl create -f fluentd.yaml

      You should see the following output:

      Output

      serviceaccount/fluentd created clusterrole.rbac.authorization.k8s.io/fluentd created clusterrolebinding.rbac.authorization.k8s.io/fluentd created daemonset.extensions/fluentd created

      Verify that your DaemonSet rolled out successfully using kubectl:

      • kubectl get ds --namespace=kube-logging

      You should see the following status output:

      Output

      NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE fluentd 3 3 3 3 3 <none> 58s

      This indicates that there are 3 fluentd Pods running, which corresponds to the number of nodes in our Kubernetes cluster.

      We can now check Kibana to verify that log data is being properly collected and shipped to Elasticsearch.

      With the kubectl port-forward still open, navigate to http://localhost:5601.

      Click on Discover in the left-hand navigation menu.

      You should see the following configuration window:

      Kibana Index Pattern Configuration

      This allows you to define the Elasticsearch indices you'd like to explore in Kibana. To learn more, consult Defining your index patterns in the official Kibana docs. For now, we'll just use the logstash-* wildcard pattern to capture all the log data in our Elasticsearch cluster. Enter logstash-* in the text box and click on Next step.

      You'll then be brought to the following page:

      Kibana Index Pattern Settings

      This allows you to configure which field Kibana will use to filter log data by time. In the dropdown, select the @timestamp field, and hit Create index pattern.

      Now, hit Discover in the left hand navigation menu.

      You should see a histogram graph and some recent log entries:

      Kibana Incoming Logs

      At this point you've successfully configured and rolled out the EFK stack on your Kubernetes cluster. To learn how to use Kibana to analyze your log data, consult the Kibana User Guide.

      In the next optional section, we'll deploy a simple counter Pod that prints numbers to stdout, and find its logs in Kibana.

      Step 5 (Optional) — Testing Container Logging

      To demonstrate a basic Kibana use case of exploring the latest logs for a given Pod, we'll deploy a minimal counter Pod that prints sequential numbers to stdout.

      Let's begin by creating the Pod. Open up a file called counter.yaml in your favorite editor:

      Then, paste in the following Pod spec:

      counter.yaml

      apiVersion: v1
      kind: Pod
      metadata:
        name: counter
      spec:
        containers:
        - name: count
          image: busybox
          args: [/bin/sh, -c,
                  'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']
      

      Save and close the file.

      This is a minimal Pod called counter that runs a while loop, printing numbers sequentially.

      Deploy the counter Pod using kubectl:

      • kubectl create -f counter.yaml

      Once the Pod has been created and is running, navigate back to your Kibana dashboard.

      From the Discover page, in the search bar enter kubernetes.pod_name:counter. This filters the log data for Pods named counter.

      You should then see a list of log entries for the counter Pod:

      Counter Logs in Kibana

      You can click into any of the log entries to see additional metadata like the container name, Kubernetes node, Namespace, and more.

      Conclusion

      In this guide we've demonstrated how to set up and configure Elasticsearch, Fluentd, and Kibana on a Kubernetes cluster. We've used a minimal logging architecture that consists of a single logging agent Pod running on each Kubernetes worker node.

      Before deploying this logging stack into your production Kubernetes cluster, it’s best to tune the resource requirements and limits as indicated throughout this guide. You may also want to use the X-Pack enabled image with built-in monitoring and security.

      The logging architecture we’ve used here consists of 3 Elasticsearch Pods, a single Kibana Pod (not load-balanced), and a set of Fluentd Pods rolled out as a DaemonSet. You may wish to scale this setup depending on your production use case. To learn more about scaling your Elasticsearch and Kibana stack, consult Scaling Elasticsearch.

      Kubernetes also allows for more complex logging agent architectures that may better suit your use case. To learn more, consult Logging Architecture from the Kubernetes docs.



      Source link