One place for hosting & domains

      Migrate

      How To Migrate a MySQL Database to PostgreSQL Using pgLoader


      Introduction

      PostgreSQL, also known as “Postgres,” is an open-source relational database management system (RDBMS). It has seen a drastic growth in popularity in recent years, with many developers and companies migrating their data to Postgres from other database solutions.

      The prospect of migrating a database can be intimidating, especially when migrating from one database management system to another. pgLoader is an open-source database migration tool that aims to simplify the process of migrating to PostgreSQL. It supports migrations from several file types and RBDMSs — including MySQL and SQLite — to PostgreSQL.

      This tutorial provides instructions on how to install pgLoader and use it to migrate a remote MySQL database to PostgreSQL over an SSL connection. Near the end of the tutorial, we will also briefly touch on a few different migration scenarios where pgLoader may be useful.

      Prerequisites

      To complete this tutorial, you’ll need the following:

      • Access to two servers, each running Ubuntu 18.04. Both servers should have a firewall and a non-root user with sudo privileges configured. To set these up, you can follow our Initial Server Setup guide for Ubuntu 18.04.
      • MySQL installed on one of the servers. To set this up, follow Steps 1, 2, and 3 of our guide on How To Install MySQL on Ubuntu 18.04. Please note that in order to complete all the prerequisite tutorials linked here, you will need to configure your root MySQL user to authenticate with a password, as described in Step 3 of the MySQL installation guide.
      • PostgreSQL installed on the other server. To set this up, complete Step 1 of our guide How To Install and Use PostgreSQL on Ubuntu 18.04.
      • Your MySQL server should also be configured to accept encrypted connections. To set this up, complete every step of our tutorial on How To Configure SSL/TLS for MySQL on Ubuntu 18.04, including the optional Step 6. As you follow this guide, be sure to use your PostgreSQL server as the MySQL client machine, as you will need to be able to connect to your MySQL server from your Postgres machine in order to migrate the data with pgLoader.

      Please note that throughout this guide, the server on which you installed MySQL will be referred to as the “MySQL server” and any commands that should be run on this machine will be shown with a blue background, like this:

      Similarly, this guide will refer to the other server as the "PostgreSQL" or "Postgres" server and any commands that must be run on that machine will be shown with a red background:

      Please keep these in mind as you follow this tutorial so as to avoid any confusion.

      Step 1 — (Optional) Creating a Sample Database and Table in MySQL

      This step describes the process of creating a test database and populating it with dummy data. We encourage you to practice using pgLoader with this test case, but if you already have a database you want to migrate, you can move on to the next step.

      Start by opening up the MySQL prompt on your MySQL server:

      After entering your root MySQL user's password, you will see the MySQL prompt.

      From there, create a new database by running the following command. You can name your database whatever you'd like, but in this guide we will name it source_db:

      • CREATE DATABASE source_db;

      Then switch to this database with the USE command:

      Output

      Database changed

      Within this database, use the following command to create a sample table. Here, we will name this table sample_table but feel free to give it another name:

      • CREATE TABLE sample_table (
      • employee_id INT PRIMARY KEY,
      • first_name VARCHAR(50),
      • last_name VARCHAR(50),
      • start_date DATE,
      • salary VARCHAR(50)
      • );

      Then populate this table with some sample employee data using the following command:

      • INSERT INTO sample_table (employee_id, first_name, last_name, start_date, salary)
      • VALUES (1, 'Elizabeth', 'Cotten', '2007-11-11', '$105433.18'),
      • (2, 'Yanka', 'Dyagileva', '2017-10-30', '$107540.67'),
      • (3, 'Lee', 'Dorsey', '2013-06-04', '$118024.04'),
      • (4, 'Kasey', 'Chambers', '2010-08-18', '$116456.98'),
      • (5, 'Bram', 'Tchaikovsky', '2018-09-16', '$61989.50');

      Following this, you can close the MySQL prompt:

      Now that you have a sample database loaded with dummy data, you can move on to the next step in which you will install pgLoader on your PostgreSQL server.

      Step 2 — Installing pgLoader

      pgLoader is a program that can load data into a PostgreSQL database from a variety of different sources. It uses PostgreSQL's COPY command to copy data from a source database or file — such as a comma-separated values (CSV) file — into a target PostgreSQL database.

      pgLoader is available from the default Ubuntu APT repositories and you can install it using the apt command. However, in this guide we will take advantage of pgLoader's useSSL option, a feature that allows for migrations from MySQL over an SSL connection. This feature is only available in the latest version of pgLoader which, as of this writing, can only be installed using the source code from its GitHub repository.

      Before installing pgLoader, you will need to install its dependencies. If you haven't done so recently, update your Postgres server's package index:

      Then install the following packages:

      • sbcl: A Common Lisp compiler
      • unzip: A de-archiver for .zip files
      • libsqlite3-dev: A collection of development files for SQLite 3
      • gawk: Short for "GNU awk", a pattern scanning and processing language
      • curl: A command line tool for transferring data from a URL
      • make: A utility for managing package compilation
      • freetds-dev: A client library for MS SQL and Sybase databases
      • libzip-dev: A library for reading, creating, and modifying zip archives

      Use the following command to install these dependencies:

      • sudo apt install sbcl unzip libsqlite3-dev gawk curl make freetds-dev libzip-dev

      When prompted, confirm that you want to install these packages by pressing ENTER.

      Next, navigate to the pgLoader GitHub project's Releases page and find the latest release. For this guide, we will use the latest release at the time of this writing: version 3.6.1. Scroll down to its Assets menu and copy the link for the tar.gz file labeled Source code. Then paste the link into the following wget command. This will download the tarball to your server:

      • wget https://github.com/dimitri/pgloader/archive/v3.6.1.tar.gz

      Extract the tarball:

      This will create a number of new directories and files on your server. Navigate into the new pgLoader parent directory:

      Then use the make utility to compile the pgloader binary:

      This command will take some time to build the pgloader binary.

      Move the binary file into the /usr/local/bin directory, the location where Ubuntu looks for executable files:

      • sudo mv ./build/bin/pgloader /usr/local/bin/

      You can test that pgLoader was installed correctly by checking its version, like so:

      Output

      pgloader version "3.6.1" compiled with SBCL 1.4.5.debian

      pgLoader is now installed, but before you can begin your migration you'll need to make some configuration changes to both your PostgreSQL and MySQL instances. We'll focus on the PostgreSQL server first.

      Step 3 — Creating a PostgreSQL Role and Database

      The pgloader command works by copying source data, either from a file or directly from a database, and inserting it into a PostgreSQL database. For this reason, you must either run pgLoader as a Linux user who has access to your Postgres database or you must specify a PostgreSQL role with the appropriate permissions in your load command.

      PostgreSQL manages database access through the use of roles. Depending on how the role is configured, it can be thought of as either a database user or a group of database users. In most RDBMSs, you create a user with the CREATE USER SQL command. Postgres, however, comes installed with a handy script called createuser. This script serves as a wrapper for the CREATE USER SQL command that you can run directly from the command line.

      Note: In PostgreSQL, you authenticate as a database user using the Identification Protocol, or ident, authentication method by default, rather than with a password. This involves PostgreSQL taking the client's Ubuntu username and using it as the allowed Postgres database username. This allows for greater security in many cases, but it can also cause issues in instances where you'd like an outside program to connect to one of your databases.

      pgLoader can load data into a Postgres database through a role that authenticates with the ident method as long as that role shares the same name as the Linux user profile issuing the pgloader command. However, to keep this process as clear as possible, this tutorial describes setting up a different PostgreSQL role that authenticates with a password rather than with the ident method.

      Run the following command on your Postgres server to create a new role. Note the -P flag, which tells createuser to prompt you to enter a password for the new role:

      • sudo -u postgres createuser --interactive -P

      You may first be prompted for your sudo password. The script will then prompt you to enter a name for the new role. In this guide, we'll call this role pgloader_pg:

      Output

      Enter name of role to add: pgloader_pg

      Following that, createuser will prompt you to enter and confirm a password for this role. Be sure to take note of this password, as you'll need it to perform the migration in Step 5:

      Output

      Enter password for new role: Enter it again:

      Lastly, the script will ask you if the new role should be classified as a superuser. In PostgreSQL, connecting to the database with a superuser role allows you to circumvent all of the database's permissions checks, except for the right to log in. Because of this, the superuser privilege should not be used lightly, and the PostgreSQL documentation recommends that you do most of your database work as a non-superuser role. However, because pgLoader needs broad privileges to access and load data into tables, you can safely grant this new role superuser privileges. Do so by typing y and then pressing ENTER:

      Output

      . . . Shall the new role be a superuser? (y/n) y

      PostgreSQL comes with another useful script that allows you to create a database from the command line. Since pgLoader also needs a target database into which it can load the source data, run the following command to create one. We'll name this database new_db but feel free to modify that if you like:

      • sudo -u postgres createdb new_db

      If there aren't any errors, this command will complete without any output.

      Now that you have a dedicated PostgreSQL user and an empty database into which you can load your MySQL data, there are just a few more changes you'll need to make before performing a migration. You'll need to create a dedicated MySQL user with access to your source database and add your client-side certificates to Ubuntu's trusted certificate store.

      Step 4 — Creating a Dedicated User in MySQL and Managing Certificates

      Protecting data from snoopers is one of the most important parts of any database administrator's job. Migrating data from one machine to another opens up an opportunity for malicious actors to sniff the packets traveling over the network connection if it isn't encrypted. In this step, you will create a dedicated MySQL user which pgLoader will use to perform the migration over an SSL connection.

      Begin by opening up your MySQL prompt:

      From the MySQL prompt, use the following CREATE USER command to create a new MySQL user. We will name this user pgloader_my. Because this user will only access MySQL from your PostgreSQL server, be sure to replace your_postgres_server_ip with the public IP address of your PostgreSQL server. Additionally, replace password with a secure password or passphrase:

      • CREATE USER 'pgloader_my'@'your_postgres_server_ip' IDENTIFIED BY 'password' REQUIRE SSL;

      Note the REQUIRE SSL clause at the end of this command. This will restrict the pgloader_my user to only access the database through a secure SSL connection.

      Next, grant the pgloader_my user access to the target database and all of its tables. Here, we'll specify the database we created in the optional Step 1, but if you have your own database you'd like to migrate, use its name in place of source_db:

      • GRANT ALL ON source_db.* TO 'pgloader_my'@'your_postgresql_server_ip';

      Then run the FLUSH PRIVILEGES command to reload the grant tables, enabling the privilege changes:

      After this, you can close the MySQL prompt:

      Now go back to your Postgres server terminal and attempt to log in to the MySQL server as the new pgloader_my user. If you followed the prerequisite guide on configuring SSL/TLS for MySQL then you will already have mysql-client installed on your PostgreSQL server and you should be able to connect with the following command:

      • mysql -u pgloader_my -p -h your_mysql_server_ip

      If the command is successful, you will see the MySQL prompt:

      After confirming that your pgloader_my user can successfully connect, go ahead and close the prompt:

      At this point, you have a dedicated MySQL user that can access the source database from your Postgres machine. However, if you were to try to migrate your MySQL database using SSL the attempt would fail.

      The reason for this is that pgLoader isn't able to read MySQL's configuration files, and thus doesn't know where to look for the CA certificate or client certificate that you copied to your PostgreSQL server in the prerequisite SSL/TLS configuration guide. Rather than ignoring SSL requirements, though, pgLoader requires the use of trusted certificates in cases where SSL is needed to connect to MySQL. Accordingly, you can resolve this issue by adding the ca.pem and client-cert.pem files to Ubuntu's trusted certificate store.

      To do this, copy over the ca.pem and client-cert.pem files to the /usr/local/share/ca-certificates/ directory. Note that you must also rename these files so they have the .crt file extension. If you don't rename them, your system will not be able to recognize that you've added these new certificates:

      • sudo cp ~/client-ssl/ca.pem /usr/local/share/ca-certificates/ca.pem.crt
      • sudo cp ~/client-ssl/client-cert.pem /usr/local/share/ca-certificates/client-cert.pem.crt

      Following this, run the update-ca-certificates command. This program looks for certificates within /usr/local/share/ca-certificates, adds any new ones to the /etc/ssl/certs/ directory, and generates a list of trusted SSL certificates — ca-certificates.crt — based on the contents of the /etc/ssl/certs/ directory:

      • sudo update-ca-certificates

      Output

      Updating certificates in /etc/ssl/certs... 2 added, 0 removed; done. Running hooks in /etc/ca-certificates/update.d... done.

      With that, you're all set to migrate your MySQL database to PostgreSQL.

      Step 5 — Migrating the Data

      Now that you've configured remote access from your PostgreSQL server to your MySQL server, you're ready to begin the migration.

      Note: It's important to back up your database before taking any action that could impact the integrity of your data. However, this isn't necessary when performing a migration with pgLoader, since it doesn't delete or transform data; it only copies it.

      That said, if you're feeling cautious and would like to back up your data before migrating it, you can do so with the mysqldump utility. See the official MySQL documentation for details.

      pgLoader allows users to migrate an entire database with a single command. For a migration from a MySQL database to a PostgreSQL database on a separate server, the command would have the following syntax:

      • pgloader mysql://mysql_username:password@mysql_server_ip_/source_database_name?option_1=value&option_n=value postgresql://postgresql_role_name:password@postgresql_server_ip/target_database_name?option_1=value&option_n=value

      This includes the pgloader command and two connection strings, the first for the source database and the second for the target database. Both of these connection strings begin by declaring what type of DBMS the connection string points to, followed by the username and password that have access to the database (separated by a colon), the host address of the server where the database is installed, the name of the database pgLoader should target, and various options that affect pgLoader's behavior.

      Using the parameters defined earlier in this tutorial, you can migrate your MySQL database using a command with the following structure. Be sure to replace any highlighted values to align with your own setup:

      • pgloader mysql://pgloader_my:mysql_password@mysql_server_ip/source_db?useSSL=true postgresql://pgloader_pg:postgresql_password@localhost/new_db

      Note that this command includes the useSSL option in the MySQL connection string. By setting this option to true, pgLoader will connect to MySQL over SSL. This is necessary, as you've configured your MySQL server to only accept secure connections.

      If this command is successful, you will see an output table describing how the migration went:

      Output

      table name errors rows bytes total time ----------------------- --------- --------- --------- -------------- fetch meta data 0 2 0.111s Create Schemas 0 0 0.001s Create SQL Types 0 0 0.005s Create tables 0 2 0.017s Set Table OIDs 0 1 0.010s ----------------------- --------- --------- --------- -------------- source_db.sample_table 0 5 0.2 kB 0.048s ----------------------- --------- --------- --------- -------------- COPY Threads Completion 0 4 0.052s Index Build Completion 0 1 0.011s Create Indexes 0 1 0.006s Reset Sequences 0 0 0.014s Primary Keys 0 1 0.001s Create Foreign Keys 0 0 0.000s Create Triggers 0 0 0.000s Install Comments 0 0 0.000s ----------------------- --------- --------- --------- -------------- Total import time ✓ 5 0.2 kB 0.084s

      To check that the data was migrated correctly, open up the PostgreSQL prompt:

      From there, connect to the database into which you loaded the data:

      Then run the following query to test whether the migrated data is stored in your PostgreSQL database:

      • SELECT * FROM source_db.sample_table;

      Note: Notice the FROM clause in this query specifying the sample_table held within the source_db schema:

      • . . . FROM source_db.sample_table;

      This is called a qualified name. You could go further and specify the fully qualified name by including the database's name as well as those of the schema and table:

      • . . . FROM new_db.source_db.sample_table;

      When you run queries in a PostgreSQL database, you don't need to be this specific if the table is held within the default public schema. The reason you must do so here is that when pgLoader loads data into Postgres, it creates and targets a new schema named after the original database — in this case, source_db. This is pgLoader's default behavior for MySQL to PostgreSQL migrations. However, you can use a load file to instruct pgLoader to change the table's schema topubliconce it's done loading data. See the next step for an example of how to do this.

      If the data was indeed loaded correctly, you will see the following table in the query's output:

      Output

      employee_id | first_name | last_name | start_date | salary -------------+------------+-------------+------------+------------ 1 | Elizabeth | Cotten | 2007-11-11 | $105433.18 2 | Yanka | Dyagileva | 2017-10-30 | $107540.67 3 | Lee | Dorsey | 2013-06-04 | $118024.04 4 | Kasey | Chambers | 2010-08-18 | $116456.98 5 | Bram | Tchaikovsky | 2018-09-16 | $61989.50 (5 rows)

      To close the Postgres prompt, run the following command:

      Now that we've gone over how to migrate a MySQL database over a network and load it into a PostgreSQL database, we will go over a few other common migration scenarios in which pgLoader can be useful.

      Step 6 — Exploring Other Migration Options

      pgLoader is a highly flexible tool that can be useful in a wide variety of situations. Here, we'll take a quick look at a few other ways you can use pgLoader to migrate a MySQL database to PostgreSQL.

      Migrating with a pgLoader Load File

      In the context of pgLoader, a load file, or command file, is a file that tells pgLoader how to perform a migration. This file can include commands and options that affect pgLoader's behavior, giving you much finer control over how your data is loaded into PostgreSQL and allowing you to perform complex migrations.

      pgLoader's documentation provides comprehensive instructions on how to use and extend these files to support a number of migration types, so here we will work through a comparatively rudimentary example. We will perform the same migration we ran in Step 5, but will also include an ALTER SCHEMA command to change the new_db database's schema from source_db to public.

      To begin, create a new load file on the Postgres server using your preferred text editor:

      Then add the following content, making sure to update the highlighted values to align with your own configuration:

      pgload_test.load

      LOAD DATABASE
           FROM      mysql://pgloader_my:mysql_password@mysql_server_ip/source_db?useSSL=true
           INTO pgsql://pgloader_pg:postgresql_password@localhost/new_db
      
       WITH include drop, create tables
      
      ALTER SCHEMA 'source_db' RENAME TO 'public'
      ;
      

      Here is what each of these clauses do:

      • LOAD DATABASE: This line instructs pgLoader to load data from a separate database, rather than a file or data archive.
      • FROM: This clause specifies the source database. In this case, it points to the connection string for the MySQL database we created in Step 1.
      • INTO: Likewise, this line specifies the PostgreSQL database in to which pgLoader should load the data.
      • WITH: This clause allows you to define specific behaviors for pgLoader. You can find the full list of WITH options that are compatible with MySQL migrations here. In this example we only include two options:
        • include drop: When this option is used, pgLoader will drop any tables in the target PostgreSQL database that also appear in the source MySQL database. If you use this option when migrating data to an existing PostgreSQL database, you should back up the entire database to avoid losing any data.
        • create tables: This option tells pgLoader to create new tables in the target PostgreSQL database based on the metadata held in the MySQL database. If the opposite option, create no tables, is used, then the target tables must already exist in the target Postgres database prior to the migration.
      • ALTER SCHEMA: Following the WITH clause, you can add specific SQL commands like this to instruct pgLoader to perform additional actions. Here, we instruct pgLoader to change the new Postgres database's schema from source_db to public, but only after it has created the schema. Note that you can also nest such commands within other clauses — such as BEFORE LOAD DO — to instruct pgLoader to execute those commands at specific points in the migration process.

      This is a demonstrative example of what you can include in a load file to modify pgLoader's behavior. The complete list of clauses that one can add to a load file and what they do can be found in the official pgLoader documentation.

      Save and close the load file after you've finished adding this content. To use it, include the name of the file as an argument to the pgloader command:

      • pgloader pgload_test.load

      To test that the migration was successful, open up the Postgres prompt:

      Then connect to the database:

      And run the following query:

      • SELECT * FROM sample_table;

      Output

      employee_id | first_name | last_name | start_date | salary -------------+------------+-------------+------------+------------ 1 | Elizabeth | Cotten | 2007-11-11 | $105433.18 2 | Yanka | Dyagileva | 2017-10-30 | $107540.67 3 | Lee | Dorsey | 2013-06-04 | $118024.04 4 | Kasey | Chambers | 2010-08-18 | $116456.98 5 | Bram | Tchaikovsky | 2018-09-16 | $61989.50 (5 rows)

      This output confirms that pgLoader migrated the data successfully, and also that the ALTER SCHEMA command we added to the load file worked as expected, since we didn't need to specify the source_db schema in the query to view the data.

      Note that if you plan to use a load file to migrate data held on one database to another located on a separate machine, you will still need to adjust any relevant networking and firewall rules in order for the migration to be successful.

      Migrating a MySQL Database to PostgreSQL Locally

      You can use pgLoader to migrate a MySQL database to a PostgreSQL database housed on the same machine. All you need is to run the migration command from a Linux user profile with access to the root MySQL user:

      • pgloader mysql://root@localhost/source_db pgsql://sammy:postgresql_password@localhost/target_db

      Performing a local migration like this means you don't have to make any changes to MySQL's default networking configuration or your system's firewall rules.

      Migrating from a CSV file

      You can also load a PostgreSQL database with data from a CSV file.

      Assuming you have a CSV file of data named load.csv, the command to load it into a Postgres database might look like this:

      • pgloader load.csv pgsql://sammy:password@localhost/target_db

      Because the CSV format is not fully standardized, there's a chance that you will run into issues when loading data directly from a CSV file in this manner. Fortunately, you can correct for irregularities by including various options with pgLoader's command line options or by specifying them in a load file. See the pgLoader documentation on the subject for more details.

      Migrating to a Managed PostgreSQL Database

      It's also possible to perform a migration from a self-managed database to a managed PostgreSQL database. To illustrate how this kind of migration could look, we will use the MySQL server and a DigitalOcean Managed PostgreSQL Database. We'll also use the sample database we created in Step 1, but if you skipped that step and have your own database you'd like to migrate, you can point to that one instead.

      Note: For instructions on how to set up a DigitalOcean Managed Database, please refer to our Managed Database Quickstart guide.

      For this migration, we won't need pgLoader’s useSSL option since it only works with remote MySQL databases and we will run this migration from a local MySQL database. However, we will use the sslmode=require option when we load and connect to the DigitalOcean Managed PostgreSQL database, which will ensure your data stays protected.

      Because we're not using the useSSL this time around, you can use apt to install pgLoader along with the postgresql-client package, which will allow you to access the Managed PostgreSQL Database from your MySQL server:

      • sudo apt install pgloader postgresql-client

      Following that, you can run the pgloader command to migrate the database. To do this, you'll need the connection string for the Managed Database.

      For DigitalOcean Managed Databases, you can copy the connection string from the Cloud Control Panel. First, click Databases in the left-hand sidebar menu and select the database to which you want to migrate the data. Then scroll down to the Connection Details section. Click on the drop down menu and select Connection string. Then, click the Copy button to copy the string to your clipboard and paste it into the following migration command, replacing the example PostgreSQL connection string shown here. This will migrate your MySQL database into the defaultdb PostgreSQL database as the doadmin PostgreSQL role:

      • pgloader mysql://root:password@localhost/source_db postgres://doadmin:password@db_host/defaultdb?sslmode=require

      Following this, you can use the same connection string as an argument to psql to connect to the managed PostgreSQL database andhttps://www.digitalocean.com/community/tutorials/how-to-migrate-mysql-database-to-postgres-using-pgloader#step-1-%E2%80%94-(optional)-creating-a-sample-database-and-table-in-mysql confirm that the migration was successful:

      • psql postgres://doadmin:password@db_host/defaultdb?sslmode=require

      Then, run the following query to check that pgLoader correctly migrated the data:

      • SELECT * FROM source_db.sample_table;

      Output

      employee_id | first_name | last_name | start_date | salary -------------+------------+-------------+------------+------------ 1 | Elizabeth | Cotten | 2007-11-11 | $105433.18 2 | Yanka | Dyagileva | 2017-10-30 | $107540.67 3 | Lee | Dorsey | 2013-06-04 | $118024.04 4 | Kasey | Chambers | 2010-08-18 | $116456.98 5 | Bram | Tchaikovsky | 2018-09-16 | $61989.50 (5 rows)

      This confirms that pgLoader successfully migrated your MySQL database to your managed PostgreSQL instance.

      Conclusion

      pgLoader is a flexible tool that can perform a database migration in a single command. With a few configuration tweaks, it can migrate an entire database from one physical machine to another using a secure SSL/TLS connection. Our hope is that by following this tutorial, you will have gained a clearer understanding of pgLoader's capabilities and potential use cases.

      After migrating your data over to PostgreSQL, you may find the following tutorials to be of interest:



      Source link

      How To Migrate a Docker Compose Workflow to Kubernetes


      Introduction

      When building modern, stateless applications, containerizing your application’s components is the first step in deploying and scaling on distributed platforms. If you have used Docker Compose in development, you will have modernized and containerized your application by:

      • Extracting necessary configuration information from your code.
      • Offloading your application’s state.
      • Packaging your application for repeated use.

      You will also have written service definitions that specify how your container images should run.

      To run your services on a distributed platform like Kubernetes, you will need to translate your Compose service definitions to Kubernetes objects. This will allow you to scale your application with resiliency. One tool that can speed up the translation process to Kubernetes is kompose, a conversion tool that helps developers move Compose workflows to container orchestrators like Kubernetes or OpenShift.

      In this tutorial, you will translate Compose services to Kubernetes objects using kompose. You will use the object definitions that kompose provides as a starting point and make adjustments to ensure that your setup will use Secrets, Services, and PersistentVolumeClaims in the way that Kubernetes expects. By the end of the tutorial, you will have a single-instance Node.js application with a MongoDB database running on a Kubernetes cluster. This setup will mirror the functionality of the code described in Containerizing a Node.js Application with Docker Compose and will be a good starting point to build out a production-ready solution that will scale with your needs.

      Prerequisites

      Step 1 — Installing kompose

      To begin using kompose, navigate to the project’s GitHub Releases page, and copy the link to the current release (version 1.18.0 as of this writing). Paste this link into the following curl command to download the latest version of kompose:

      • curl -L https://github.com/kubernetes/kompose/releases/download/v1.18.0/kompose-linux-amd64 -o kompose

      For details about installing on non-Linux systems, please refer to the installation instructions.

      Make the binary executable:

      Move it to your PATH:

      • sudo mv ./kompose /usr/local/bin/kompose

      To verify that it has been installed properly, you can do a version check:

      If the installation was successful, you will see output like the following:

      Output

      1.18.0 (06a2e56)

      With kompose installed and ready to use, you can now clone the Node.js project code that you will be translating to Kubernetes.

      Step 2 — Cloning and Packaging the Application

      To use our application with Kubernetes, we will need to clone the project code and package the application so that the kubelet service can pull the image.

      Our first step will be to clone the node-mongo-docker-dev repository from the DigitalOcean Community GitHub account. This repository includes the code from the setup described in Containerizing a Node.js Application for Development With Docker Compose, which uses a demo Node.js application to demonstrate how to set up a development environment using Docker Compose. You can find more information about the application itself in the series From Containers to Kubernetes with Node.js.

      Clone the repository into a directory called node_project:

      • git clone https://github.com/do-community/node-mongo-docker-dev.git node_project

      Navigate to the node_project directory:

      The node_project directory contains files and directories for a shark information application that works with user input. It has been modernized to work with containers: sensitive and specific configuration information has been removed from the application code and refactored to be injected at runtime, and the application's state has been offloaded to a MongoDB database.

      For more information about designing modern, stateless applications, please see Architecting Applications for Kubernetes and Modernizing Applications for Kubernetes.

      The project directory includes a Dockerfile with instructions for building the application image. Let's build the image now so that you can push it to your Docker Hub account and use it in your Kubernetes setup.

      Using the docker build command, build the image with the -t flag, which allows you to tag it with a memorable name. In this case, tag the image with your Docker Hub username and name it node-kubernetes or a name of your own choosing:

      • docker build -t your_dockerhub_username/node-kubernetes .

      The . in the command specifies that the build context is the current directory.

      It will take a minute or two to build the image. Once it is complete, check your images:

      You will see the following output:

      Output

      REPOSITORY TAG IMAGE ID CREATED SIZE your_dockerhub_username/node-kubernetes latest 9c6f897e1fbc 3 seconds ago 90MB node 10-alpine 94f3c8956482 12 days ago 71MB

      Next, log in to the Docker Hub account you created in the prerequisites:

      • docker login -u your_dockerhub_username

      When prompted, enter your Docker Hub account password. Logging in this way will create a ~/.docker/config.json file in your user's home directory with your Docker Hub credentials.

      Push the application image to Docker Hub with the docker push command. Remember to replace your_dockerhub_username with your own Docker Hub username:

      • docker push your_dockerhub_username/node-kubernetes

      You now have an application image that you can pull to run your application with Kubernetes. The next step will be to translate your application service definitions to Kubernetes objects.

      Step 3 — Translating Compose Services to Kubernetes Objects with kompose

      Our Docker Compose file, here called docker-compose.yaml, lays out the definitions that will run our services with Compose. A service in Compose is a running container, and service definitions contain information about how each container image will run. In this step, we will translate these definitions to Kubernetes objects by using kompose to create yaml files. These files will contain specs for the Kubernetes objects that describe their desired state.

      We will use these files to create different types of objects: Services, which will ensure that the Pods running our containers remain accessible; Deployments, which will contain information about the desired state of our Pods; a PersistentVolumeClaim to provision storage for our database data; a ConfigMap for environment variables injected at runtime; and a Secret for our application's database user and password. Some of these definitions will be in the files kompose will create for us, and others we will need to create ourselves.

      First, we will need to modify some of the definitions in our docker-compose.yaml file to work with Kubernetes. We will include a reference to our newly-built application image in our nodejs service definition and remove the bind mounts, volumes, and additional commands that we used to run the application container in development with Compose. Additionally, we'll redefine both containers' restart policies to be in line with the behavior Kubernetes expects.

      Open the file with nano or your favorite editor:

      The current definition for the nodejs application service looks like this:

      ~/node_project/docker-compose.yaml

      ...
      services:
        nodejs:
          build:
            context: .
            dockerfile: Dockerfile
          image: nodejs
          container_name: nodejs
          restart: unless-stopped
          env_file: .env
          environment:
            - MONGO_USERNAME=$MONGO_USERNAME
            - MONGO_PASSWORD=$MONGO_PASSWORD
            - MONGO_HOSTNAME=db
            - MONGO_PORT=$MONGO_PORT
            - MONGO_DB=$MONGO_DB 
          ports:
            - "80:8080"
          volumes:
            - .:/home/node/app
            - node_modules:/home/node/app/node_modules
          networks:
            - app-network
          command: ./wait-for.sh db:27017 -- /home/node/app/node_modules/.bin/nodemon app.js
      ...
      

      Make the following edits to your service definition:

      • Use your node-kubernetes image instead of the local Dockerfile.
      • Change the container restart policy from unless-stopped to always.
      • Remove the volumes list and the command instruction.

      The finished service definition will now look like this:

      ~/node_project/docker-compose.yaml

      ...
      services:
        nodejs:
          image: your_dockerhub_username/node-kubernetes
          container_name: nodejs
          restart: always
          env_file: .env
          environment:
            - MONGO_USERNAME=$MONGO_USERNAME
            - MONGO_PASSWORD=$MONGO_PASSWORD
            - MONGO_HOSTNAME=db
            - MONGO_PORT=$MONGO_PORT
            - MONGO_DB=$MONGO_DB 
          ports:
            - "80:8080"
          networks:
            - app-network
      ...
      

      Next, scroll down to the db service definition. Here, change the restart policy for the service to always and remove the .env file. Instead of using values from the .env file, we will pass the values for our MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD to the database container using the Secret we will create in Step 4.

      The db service definition will now look like this:

      ~/node_project/docker-compose.yaml

      ...
        db:
          image: mongo:4.1.8-xenial
          container_name: db
          restart: always
          environment:
            - MONGO_INITDB_ROOT_USERNAME=$MONGO_USERNAME
            - MONGO_INITDB_ROOT_PASSWORD=$MONGO_PASSWORD
          volumes:  
            - dbdata:/data/db   
          networks:
            - app-network
      ...  
      

      Finally, at the bottom of the file, remove the node_modules volumes from the top-level volumes key. The key will now look like this:

      ~/node_project/docker-compose.yaml

      ...
      volumes:
        dbdata:
      

      Save and close the file when you are finished editing.

      Before translating our service definitions, we will need to write the .env file that kompose will use to create the ConfigMap with our non-sensitive information. Please see Step 2 of Containerizing a Node.js Application for Development With Docker Compose for a longer explanation of this file.

      In that tutorial, we added .env to our .gitignore file to ensure that it would not copy to version control. This means that it did not copy over when we cloned the node-mongo-docker-dev repository in Step 2 of this tutorial. We will therefore need to recreate it now.

      Create the file:

      kompose will use this file to create a ConfigMap for our application. However, instead of assigning all of the variables from the nodejs service definition in our Compose file, we will add only the MONGO_DB database name and the MONGO_PORT. We will assign the database username and password separately when we manually create a Secret object in Step 4.

      Add the following port and database name information to the .env file. Feel free to rename your database if you would like:

      ~/node_project/.env

      MONGO_PORT=27017
      MONGO_DB=sharkinfo
      

      Save and close the file when you are finished editing.

      You are now ready to create the files with your object specs. kompose offers multiple options for translating your resources. You can:

      • Create yaml files based on the service definitions in your docker-compose.yaml file with kompose convert.
      • Create Kubernetes objects directly with kompose up.
      • Create a Helm chart with kompose convert -c.

      For now, we will convert our service definitions to yaml files and then add to and revise the files kompose creates.

      Convert your service definitions to yaml files with the following command:

      You can also name specific or multiple Compose files using the -f flag.

      After you run this command, kompose will output information about the files it has created:

      Output

      INFO Kubernetes file "nodejs-service.yaml" created INFO Kubernetes file "db-deployment.yaml" created INFO Kubernetes file "dbdata-persistentvolumeclaim.yaml" created INFO Kubernetes file "nodejs-deployment.yaml" created INFO Kubernetes file "nodejs-env-configmap.yaml" created

      These include yaml files with specs for the Node application Service, Deployment, and ConfigMap, as well as for the dbdata PersistentVolumeClaim and MongoDB database Deployment.

      These files are a good starting point, but in order for our application's functionality to match the setup described in Containerizing a Node.js Application for Development With Docker Compose we will need to make a few additions and changes to the files kompose has generated.

      Step 4 — Creating Kubernetes Secrets

      In order for our application to function in the way we expect, we will need to make a few modifications to the files that kompose has created. The first of these changes will be generating a Secret for our database user and password and adding it to our application and database Deployments. Kubernetes offers two ways of working with environment variables: ConfigMaps and Secrets. kompose has already created a ConfigMap with the non-confidential information we included in our .env file, so we will now create a Secret with our confidential information: our database username and password.

      The first step in manually creating a Secret will be to convert your username and password to base64, an encoding scheme that allows you to uniformly transmit data, including binary data.

      Convert your database username:

      • echo -n 'your_database_username' | base64

      Note down the value you see in the output.

      Next, convert your password:

      • echo -n 'your_database_password' | base64

      Take note of the value in the output here as well.

      Open a file for the Secret:

      Note: Kubernetes objects are typically defined using YAML, which strictly forbids tabs and requires two spaces for indentation. If you would like to check the formatting of any of your yaml files, you can use a linter or test the validity of your syntax using kubectl create with the --dry-run and --validate flags:

      • kubectl create -f your_yaml_file.yaml --dry-run --validate=true

      In general, it is a good idea to validate your syntax before creating resources with kubectl.

      Add the following code to the file to create a Secret that will define your MONGO_USERNAME and MONGO_PASSWORD using the encoded values you just created. Be sure to replace the dummy values here with your encoded username and password:

      ~/node_project/secret.yaml

      apiVersion: v1
      kind: Secret
      metadata:
        name: mongo-secret
      data:
        MONGO_USERNAME: your_encoded_username
        MONGO_PASSWORD: your_encoded_password
      

      We have named the Secret object mongo-secret, but you are free to name it anything you would like.

      Save and close this file when you are finished editing. As you did with your .env file, be sure to add secret.yaml to your .gitignore file to keep it out of version control.

      With secret.yaml written, our next step will be to ensure that our application and database Pods both use the values we added to the file. Let's start by adding references to the Secret to our application Deployment.

      Open the file called nodejs-deployment.yaml:

      • nano nodejs-deployment.yaml

      The file's container specifications include the following environment variables defined under the env key:

      ~/node_project/nodejs-deployment.yaml

      apiVersion: extensions/v1beta1
      kind: Deployment
      ...
          spec:
            containers:
            - env:
              - name: MONGO_DB
                valueFrom:
                  configMapKeyRef:
                    key: MONGO_DB
                    name: nodejs-env
              - name: MONGO_HOSTNAME
                value: db
              - name: MONGO_PASSWORD
              - name: MONGO_PORT
                valueFrom:
                  configMapKeyRef:
                    key: MONGO_PORT
                    name: nodejs-env
              - name: MONGO_USERNAME
      

      We will need to add references to our Secret to the MONGO_USERNAME and MONGO_PASSWORD variables listed here, so that our application will have access to those values. Instead of including a configMapKeyRef key to point to our nodejs-env ConfigMap, as is the case with the values for MONGO_DB and MONGO_PORT, we'll include a secretKeyRef key to point to the values in our mongo-secret secret.

      Add the following Secret references to the MONGO_USERNAME and MONGO_PASSWORD variables:

      ~/node_project/nodejs-deployment.yaml

      apiVersion: extensions/v1beta1
      kind: Deployment
      ...
          spec:
            containers:
            - env:
              - name: MONGO_DB
                valueFrom:
                  configMapKeyRef:
                    key: MONGO_DB
                    name: nodejs-env
              - name: MONGO_HOSTNAME
                value: db
              - name: MONGO_PASSWORD
                valueFrom:
                  secretKeyRef:
                    name: mongo-secret
                    key: MONGO_PASSWORD
              - name: MONGO_PORT
                valueFrom:
                  configMapKeyRef:
                    key: MONGO_PORT
                    name: nodejs-env
              - name: MONGO_USERNAME
                valueFrom:
                  secretKeyRef:
                    name: mongo-secret
                    key: MONGO_USERNAME
      

      Save and close the file when you are finished editing.

      Next, we'll add the same values to the db-deployment.yaml file.

      Open the file for editing:

      In this file, we will add references to our Secret for following variable keys: MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD. The mongo image makes these variables available so that you can modify the initialization of your database instance. MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD together create a root user in the admin authentication database and ensure that authentication is enabled when the database container starts.

      Using the values we set in our Secret ensures that we will have an application user with root privileges on the database instance, with access to all of the administrative and operational privileges of that role. When working in production, you will want to create a dedicated application user with appropriately scoped privileges.

      Under the MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD variables, add references to the Secret values:

      ~/node_project/db-deployment.yaml

      apiVersion: extensions/v1beta1
      kind: Deployment
      ...
          spec:
            containers:
            - env:
              - name: MONGO_INITDB_ROOT_PASSWORD
                valueFrom:
                  secretKeyRef:
                    name: mongo-secret
                    key: MONGO_PASSWORD        
              - name: MONGO_INITDB_ROOT_USERNAME
                valueFrom:
                  secretKeyRef:
                    name: mongo-secret
                    key: MONGO_USERNAME
              image: mongo:4.1.8-xenial
      ...
      

      Save and close the file when you are finished editing.

      With your Secret in place, you can move on to creating your database Service and ensuring that your application container only attempts to connect to the database once it is fully set up and initialized.

      Step 5 — Creating the Database Service and an Application Init Container

      Now that we have our Secret, we can move on to creating our database Service and an Init Container that will poll this Service to ensure that our application only attempts to connect to the database once the database startup tasks, including creating the MONGO_INITDB user and password, are complete.

      For a discussion of how to implement this functionality in Compose, please see Step 4 of Containerizing a Node.js Application for Development with Docker Compose.

      Open a file to define the specs for the database Service:

      Add the following code to the file to define the Service:

      ~/node_project/db-service.yaml

      apiVersion: v1
      kind: Service
      metadata:
        annotations: 
          kompose.cmd: kompose convert
          kompose.version: 1.18.0 (06a2e56)
        creationTimestamp: null
        labels:
          io.kompose.service: db
        name: db
      spec:
        ports:
        - port: 27017
          targetPort: 27017
        selector:
          io.kompose.service: db
      status:
        loadBalancer: {}
      

      The selector that we have included here will match this Service object with our database Pods, which have been defined with the label io.kompose.service: db by kompose in the db-deployment.yaml file. We've also named this service db.

      Save and close the file when you are finished editing.

      Next, let's add an Init Container field to the containers array in nodejs-deployment.yaml. This will create an Init Container that we can use to delay our application container from starting until the db Service has been created with a Pod that is reachable. This is one of the possible uses for Init Containers; to learn more about other use cases, please see the official documentation.

      Open the nodejs-deployment.yaml file:

      • nano nodejs-deployment.yaml

      Within the Pod spec and alongside the containers array, we are going to add an initContainers field with a container that will poll the db Service.

      Add the following code below the ports and resources fields and above the restartPolicy in the nodejs containers array:

      ~/node_project/nodejs-deployment.yaml

      apiVersion: extensions/v1beta1
      kind: Deployment
      ...
          spec:
            containers:
            ...
              name: nodejs
              ports:
              - containerPort: 8080
              resources: {}
            initContainers:
            - name: init-db
              image: busybox
              command: ['sh', '-c', 'until nc -z db:27017; do echo waiting for db; sleep 2; done;']
            restartPolicy: Always
      ...               
      

      This Init Container uses the BusyBox image, a lightweight image that includes many UNIX utilities. In this case, we'll use the netcat utility to poll whether or not the Pod associated with the db Service is accepting TCP connections on port 27017.

      This container command replicates the functionality of the wait-for script that we removed from our docker-compose.yaml file in Step 3. For a longer discussion of how and why our application used the wait-for script when working with Compose, please see Step 4 of Containerizing a Node.js Application for Development with Docker Compose.

      Init Containers run to completion; in our case, this means that our Node application container will not start until the database container is running and accepting connections on port 27017. The db Service definition allows us to guarantee this functionality regardless of the exact location of the database container, which is mutable.

      Save and close the file when you are finished editing.

      With your database Service created and your Init Container in place to control the startup order of your containers, you can move on to checking the storage requirements in your PersistentVolumeClaim and exposing your application service using a LoadBalancer.

      Step 6 — Modifying the PersistentVolumeClaim and Exposing the Application Frontend

      Before running our application, we will make two final changes to ensure that our database storage will be provisioned properly and that we can expose our application frontend using a LoadBalancer.

      First, let's modify the storage resource defined in the PersistentVolumeClaim that kompose created for us. This Claim allows us to dynamically provision storage to manage our application's state.

      To work with PersistentVolumeClaims, you must have a StorageClass created and configured to provision storage resources. In our case, because we are working with DigitalOcean Kubernetes, our default StorageClass provisioner is set to dobs.csi.digitalocean.com — DigitalOcean Block Storage.

      We can check this by typing:

      If you are working with a DigitalOcean cluster, you will see the following output:

      Output

      NAME PROVISIONER AGE do-block-storage (default) dobs.csi.digitalocean.com 76m

      If you are not working with a DigitalOcean cluster, you will need to create a StorageClass and configure a provisioner of your choice. For details about how to do this, please see the official documentation.

      When kompose created dbdata-persistentvolumeclaim.yaml, it set the storage resource to a size that does not meet the minimum size requirements of our provisioner. We will therefore need to modify our PersistentVolumeClaim to use the minimum viable DigitalOcean Block Storage unit: 1GB. Please feel free to modify this to meet your storage requirements.

      Open dbdata-persistentvolumeclaim.yaml:

      • nano dbdata-persistentvolumeclaim.yaml

      Replace the storage value with 1Gi:

      ~/node_project/dbdata-persistentvolumeclaim.yaml

      apiVersion: v1
      kind: PersistentVolumeClaim
      metadata:
        creationTimestamp: null
        labels:
          io.kompose.service: dbdata
        name: dbdata
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
      status: {}
      

      Also note the accessMode: ReadWriteOnce means that the volume provisioned as a result of this Claim will be read-write only by a single node. Please see the documentation for more information about different access modes.

      Save and close the file when you are finished.

      Next, open nodejs-service.yaml:

      We are going to expose this Service externally using a DigitalOcean Load Balancer. If you are not using a DigitalOcean cluster, please consult the relevant documentation from your cloud provider for information about their load balancers. Alternatively, you can follow the official Kubernetes documentation on setting up a highly available cluster with kubeadm, but in this case you will not be able to use PersistentVolumeClaims to provision storage.

      Within the Service spec, specify LoadBalancer as the Service type:

      ~/node_project/nodejs-service.yaml

      apiVersion: v1
      kind: Service
      ...
      spec:
        type: LoadBalancer
        ports:
      ...
      

      When we create the nodejs Service, a load balancer will be automatically created, providing us with an external IP where we can access our application.

      Save and close the file when you are finished editing.

      With all of our files in place, we are ready to start and test the application.

      Step 7 — Starting and Accessing the Application

      It's time to create our Kubernetes objects and test that our application is working as expected.

      To create the objects we've defined, we'll use kubectl create with the -f flag, which will allow us to specify the files that kompose created for us, along with the files we wrote. Run the following command to create the Node application and MongoDB database Services and Deployments, along with your Secret, ConfigMap, and PersistentVolumeClaim:

      • kubectl create -f nodejs-service.yaml,nodejs-deployment.yaml,nodejs-env-configmap.yaml,db-service.yaml,db-deployment.yaml,dbdata-persistentvolumeclaim.yaml,secret.yaml

      You will see the following output indicating that the objects have been created:

      Output

      service/nodejs created deployment.extensions/nodejs created configmap/nodejs-env created service/db created deployment.extensions/db created persistentvolumeclaim/dbdata created secret/mongo-secret created

      To check that your Pods are running, type:

      You don't need to specify a Namespace here, since we have created our objects in the default Namespace. If you are working with multiple Namespaces, be sure to include the -n flag when running this command, along with the name of your Namespace.

      You will see the following output while your db container is starting and your application Init Container is running:

      Output

      NAME READY STATUS RESTARTS AGE db-679d658576-kfpsl 0/1 ContainerCreating 0 10s nodejs-6b9585dc8b-pnsws 0/1 Init:0/1 0 10s

      Once that container has run and your application and database containers have started, you will see this output:

      Output

      NAME READY STATUS RESTARTS AGE db-679d658576-kfpsl 1/1 Running 0 54s nodejs-6b9585dc8b-pnsws 1/1 Running 0 54s

      The Running STATUS indicates that your Pods are bound to nodes and that the containers associated with those Pods are running. READY indicates how many containers in a Pod are running. For more information, please consult the documentation on Pod lifecycles.

      Note:
      If you see unexpected phases in the STATUS column, remember that you can troubleshoot your Pods with the following commands:

      • kubectl describe pods your_pod
      • kubectl logs your_pod

      With your containers running, you can now access the application. To get the IP for the LoadBalancer, type:

      You will see the following output:

      Output

      NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE db ClusterIP 10.245.189.250 <none> 27017/TCP 93s kubernetes ClusterIP 10.245.0.1 <none> 443/TCP 25m12s nodejs LoadBalancer 10.245.15.56 your_lb_ip 80:30729/TCP 93s

      The EXTERNAL_IP associated with the nodejs service is the IP address where you can access the application. If you see a <pending> status in the EXTERNAL_IP column, this means that your load balancer is still being created.

      Once you see an IP in that column, navigate to it in your browser: http://your_lb_ip.

      You should see the following landing page:

      Application Landing Page

      Click on the Get Shark Info button. You will see a page with an entry form where you can enter a shark name and a description of that shark's general character:

      Shark Info Form

      In the form, add a shark of your choosing. To demonstrate, we will add Megalodon Shark to the Shark Name field, and Ancient to the Shark Character field:

      Filled Shark Form

      Click on the Submit button. You will see a page with this shark information displayed back to you:

      Shark Output

      You now have a single instance setup of a Node.js application with a MongoDB database running on a Kubernetes cluster.

      Conclusion

      The files you have created in this tutorial are a good starting point to build from as you move toward production. As you develop your application, you can work on implementing the following:



      Source link

      How To Back Up, Import, and Migrate Your Apache Kafka Data on Debian 9


      The author selected Tech Education Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Backing up your Apache Kafka data is an important practice that will help you recover from unintended data loss or bad data added to the cluster due to user error. Data dumps of cluster and topic data are an efficient way to perform backups and restorations.

      Importing and migrating your backed up data to a separate server is helpful in situations where your Kafka instance becomes unusable due to server hardware or networking failures and you need to create a new Kafka instance with your old data. Importing and migrating backed up data is also useful when you are moving the Kafka instance to an upgraded or downgraded server due to a change in resource usage.

      In this tutorial, you will back up, import, and migrate your Kafka data on a single Debian 9 installation as well as on multiple Debian 9 installations on separate servers. ZooKeeper is a critical component of Kafka’s operation. It stores information about cluster state such as consumer data, partition data, and the state of other brokers in the cluster. As such, you will also back up ZooKeeper’s data in this tutorial.

      Prerequisites

      To follow along, you will need:

      • A Debian 9 server with at least 4GB of RAM and a non-root sudo user set up by following the tutorial.
      • A Debian 9 server with Apache Kafka installed, to act as the source of the backup. Follow the How To Install Apache Kafka on Debian 9 guide to set up your Kafka installation, if Kafka isn’t already installed on the source server.
      • OpenJDK 8 installed on the server. To install this version, follow these instructions on installing specific versions of OpenJDK.
      • Optional for Step 7 — Another Debian 9 server with Apache Kafka installed, to act as the destination of the backup. Follow the article link in the previous prerequisite to install Kafka on the destination server. This prerequisite is required only if you are moving your Kafka data from one server to another. If you want to back up and import your Kafka data to a single server, you can skip this prerequisite.

      Step 1 — Creating a Test Topic and Adding Messages

      A Kafka message is the most basic unit of data storage in Kafka and is the entity that you will publish to and subscribe from Kafka. A Kafka topic is like a container for a group of related messages. When you subscribe to a particular topic, you will receive only messages that were published to that particular topic. In this section you will log in to the server that you would like to back up (the source server) and add a Kafka topic and a message so that you have some data populated for the backup.

      This tutorial assumes you have installed Kafka in the home directory of the kafka user (/home/kafka/kafka). If your installation is in a different directory, modify the ~/kafka part in the following commands with your Kafka installation’s path, and for the commands throughout the rest of this tutorial.

      SSH into the source server by executing:

      • ssh sammy@source_server_ip

      Run the following command to log in as the kafka user:

      Create a topic named BackupTopic using the kafka-topics.sh shell utility file in your Kafka installation's bin directory, by typing:

      • ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic BackupTopic

      Publish the string "Test Message 1" to the BackupTopic topic by using the ~/kafka/bin/kafka-console-producer.sh shell utility script.

      If you would like to add additional messages here, you can do so now.

      • echo "Test Message 1" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic BackupTopic > /dev/null

      The ~/kafka/bin/kafka-console-producer.sh file allows you to publish messages directly from the command line. Typically, you would publish messages using a Kafka client library from within your program, but since that involves different setups for different programming languages, you can use the shell script as a language-independent way of publishing messages during testing or while performing administrative tasks. The --topic flag specifies the topic that you will publish the message to.

      Next, verify that the kafka-console-producer.sh script has published the message(s) by running the following command:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      The ~/kafka/bin/kafka-console-consumer.sh shell script starts the consumer. Once started, it will subscribe to messages from the topic that you published in the "Test Message 1" message in the previous command. The --from-beginning flag in the command allows consuming messages that were published before the consumer was started. Without the flag enabled, only messages published after the consumer was started will appear. On running the command, you will see the following output in the terminal:

      Output

      Test Message 1

      Press CTRL+C to stop the consumer.

      You've created some test data and verified that it's persisted. Now you can back up the state data in the next section.

      Step 2 — Backing Up the ZooKeeper State Data

      Before backing up the actual Kafka data, you need to back up the cluster state stored in ZooKeeper.

      ZooKeeper stores its data in the directory specified by the dataDir field in the ~/kafka/config/zookeeper.properties configuration file. You need to read the value of this field to determine the directory to back up. By default, dataDir points to the /tmp/zookeeper directory. If the value is different in your installation, replace /tmp/zookeeper with that value in the following commands.

      Here is an example output of the ~/kafka/config/zookeeper.properties file:

      ~/kafka/config/zookeeper.properties

      ...
      ...
      ...
      # the directory where the snapshot is stored.
      dataDir=/tmp/zookeeper
      # the port at which the clients will connect
      clientPort=2181
      # disable the per-ip limit on the number of connections since this is a non-production config
      maxClientCnxns=0
      ...
      ...
      ...
      

      Now that you have the path to the directory, you can create a compressed archive file of its contents. Compressed archive files are a better option over regular archive files to save disk space. Run the following command:

      • tar -czf /home/kafka/zookeeper-backup.tar.gz /tmp/zookeeper/*

      The command's output tar: Removing leading / from member names you can safely ignore.

      The -c and -z flags tell tar to create an archive and apply gzip compression to the archive. The -f flag specifies the name of the output compressed archive file, which is zookeeper-backup.tar.gz in this case.

      You can run ls in your current directory to see zookeeper-backup.tar.gz as part of your output.

      You have now successfully backed up the ZooKeeper data. In the next section, you will back up the actual Kafka data.

      Step 3 — Backing Up the Kafka Topics and Messages

      In this section, you will back up Kafka's data directory into a compressed tar file like you did for ZooKeeper in the previous step.

      Kafka stores topics, messages, and internal files in the directory that the log.dirs field specifies in the ~/kafka/config/server.properties configuration file. You need to read the value of this field to determine the directory to back up. By default and in your current installation, log.dirs points to the /tmp/kafka-logs directory. If the value is different in your installation, replace /tmp/kafka-logs in the following commands with the correct value.

      Here is an example output of the ~/kafka/config/server.properties file:

      ~/kafka/config/server.properties

      ...
      ...
      ...
      ############################# Log Basics #############################
      
      # A comma separated list of directories under which to store log files
      log.dirs=/tmp/kafka-logs
      
      # The default number of log partitions per topic. More partitions allow greater
      # parallelism for consumption, but this will also result in more files across
      # the brokers.
      num.partitions=1
      
      # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
      # This value is recommended to be increased for installations with data dirs located in RAID array.
      num.recovery.threads.per.data.dir=1
      ...
      ...
      ...
      

      First, stop the Kafka service so that the data in the log.dirs directory is in a consistent state when creating the archive with tar. To do this, return to your server's non-root user by typing exit and then run the following command:

      • sudo systemctl stop kafka

      After stopping the Kafka service, log back in as your kafka user with:

      It is necessary to stop/start the Kafka and ZooKeeper services as your non-root sudo user because in the Apache Kafka installation prerequisite you restricted the kafka user as a security precaution. This step in the prerequisite disables sudo access for the kafka user, which leads to commands failing to execute.

      Now, create a compressed archive file of the directory's contents by running the following command:

      • tar -czf /home/kafka/kafka-backup.tar.gz /tmp/kafka-logs/*

      Once again, you can safely ignore the command's output (tar: Removing leading / from member names).

      You can run ls in the current directory to see kafka-backup.tar.gz as part of the output.

      You can start the Kafka service again — if you do not want to restore the data immediately — by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl start kafka

      Log back in as your kafka user:

      You have successfully backed up the Kafka data. You can now proceed to the next section, where you will be restoring the cluster state data stored in ZooKeeper.

      Step 4 — Restoring the ZooKeeper Data

      In this section you will restore the cluster state data that Kafka creates and manages internally when the user performs operations such as creating a topic, adding/removing additional nodes, and adding and consuming messages. You will restore the data to your existing source installation by deleting the ZooKeeper data directory and restoring the contents of the zookeeper-backup.tar.gz file. If you want to restore data to a different server, see Step 7.

      You need to stop the Kafka and ZooKeeper services as a precaution against the data directories receiving invalid data during the restoration process.

      First, stop the Kafka service by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl stop kafka

      Next, stop the ZooKeeper service:

      • sudo systemctl stop zookeeper

      Log back in as your kafka user:

      You can then safely delete the existing cluster data directory with the following command:

      Now restore the data you backed up in Step 2:

      • tar -C /tmp/zookeeper -xzf /home/kafka/zookeeper-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/zookeeper before extracting the data. You specify the --strip 2 flag to make tar extract the archive's contents in /tmp/zookeeper/ itself and not in another directory (such as /tmp/zookeeper/tmp/zookeeper/) inside of it.

      You have restored the cluster state data successfully. Now, you can proceed to the Kafka data restoration process in the next section.

      Step 5 — Restoring the Kafka Data

      In this section you will restore the backed up Kafka data to your existing source installation (or the destination server if you have followed the optional Step 7) by deleting the Kafka data directory and restoring the compressed archive file. This will allow you to verify that restoration works successfully.

      You can safely delete the existing Kafka data directory with the following command:

      Now that you have deleted the data, your Kafka installation resembles a fresh installation with no topics or messages present in it. To restore your backed up data, extract the files by running:

      • tar -C /tmp/kafka-logs -xzf /home/kafka/kafka-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/kafka-logs before extracting the data. You specify the --strip 2 flag to ensure that the archive's contents are extracted in /tmp/kafka-logs/ itself and not in another directory (such as /tmp/kafka-logs/kafka-logs/) inside of it.

      Now that you have extracted the data successfully, you can start the Kafka and ZooKeeper services again by typing exit, to switch to your non-root sudo user, and then executing:

      • sudo systemctl start kafka

      Start the ZooKeeper service with:

      • sudo systemctl start zookeeper

      Log back in as your kafka user:

      You have restored the kafka data, you can move on to verifying that the restoration is successful in the next section.

      Step 6 — Verifying the Restoration

      To test the restoration of the Kafka data, you will consume messages from the topic you created in Step 1.

      Wait a few minutes for Kafka to start up and then execute the following command to read messages from the BackupTopic:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      If you get a warning like the following, you need to wait for Kafka to start fully:

      Output

      [2018-09-13 15:52:45,234] WARN [Consumer clientId=consumer-1, groupId=console-consumer-87747] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

      Retry the previous command in another few minutes or run sudo systemctl restart kafka as your non-root sudo user. If there are no issues in the restoration, you will see the following output:

      Output

      Test Message 1

      If you do not see this message, you can check if you missed out any commands in the previous section and execute them.

      Now that you have verified the restored Kafka data, this means you have successfully backed up and restored your data in a single Kafka installation. You can continue to Step 7 to see how to migrate the cluster and topics data to an installation in another server.

      Step 7 — Migrating and Restoring the Backup to Another Kafka Server (Optional)

      In this section, you will migrate the backed up data from the source Kafka server to the destination Kafka server. To do so, you will first use the scp command to download the compressed tar.gz files to your local system. You will then use scp again to push the files to the destination server. Once the files are present in the destination server, you can follow the steps used previously to restore the backup and verify that the migration is successful.

      You are downloading the backup files locally and then uploading them to the destination server, instead of copying it directly from your source to destination server, because the destination server will not have your source server's SSH key in its /home/sammy/.ssh/authorized_keys file and cannot connect to and from the source server. Your local machine can connect to both servers however, saving you an additional step of setting up SSH access from the source to destination server.

      Download the zookeeper-backup.tar.gz and kafka-backup.tar.gz files to your local machine by executing:

      • scp sammy@source_server_ip:/home/kafka/zookeeper-backup.tar.gz .

      You will see output similar to:

      Output

      zookeeper-backup.tar.gz 100% 68KB 128.0KB/s 00:00

      Now run the following command to download the kafka-backup.tar.gz file to your local machine:

      • scp sammy@source_server_ip:/home/kafka/kafka-backup.tar.gz .

      You will see the following output:

      Output

      kafka-backup.tar.gz 100% 1031KB 488.3KB/s 00:02

      Run ls in the current directory of your local machine, you will see both of the files:

      Output

      kafka-backup.tar.gz zookeeper.tar.gz

      Run the following command to transfer the zookeeper-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp zookeeper-backup.tar.gz sammy@destination_server_ip:/home/sammy/zookeeper-backup.tar.gz

      Now run the following command to transfer the kafka-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp kafka-backup.tar.gz sammy@destination_server_ip:/home/sammy/kafka-backup.tar.gz

      You have uploaded the backup files to the destination server successfully. Since the files are in the /home/sammy/ directory and do not have the correct permissions for access by the kafka user, you can move the files to the /home/kafka/ directory and change their permissions.

      SSH into the destination server by executing:

      • ssh sammy@destination_server_ip

      Now move zookeeper-backup.tar.gz to /home/kafka/ by executing:

      • sudo mv zookeeper-backup.tar.gz /home/sammy/zookeeper-backup.tar.gz

      Similarly, run the following command to copy kafka-backup.tar.gz to /home/kafka/:

      • sudo mv kafka-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      Change the owner of the backup files by running the following command:

      • sudo chown kafka /home/kafka/zookeeper-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      The previous mv and chown commands will not display any output.

      Now that the backup files are present in the destination server at the correct directory, follow the commands listed in Steps 4 to 6 of this tutorial to restore and verify the data for your destination server.

      Conclusion

      In this tutorial, you backed up, imported, and migrated your Kafka topics and messages from both the same installation and installations on separate servers. If you would like to learn more about other useful administrative tasks in Kafka, you can consult the operations section of Kafka's official documentation.

      To store backed up files such as zookeeper-backup.tar.gz and kafka-backup.tar.gz remotely, you can explore DigitalOcean Spaces. If Kafka is the only service running on your server, you can also explore other backup methods such as full instance backups.



      Source link