One place for hosting & domains

      Import

      How To Import Existing DigitalOcean Assets into Terraform


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Terraform is an infrastructure as code tool created by HashiCorp that helps developers with deploying, updating, and removing different assets of their infrastructure in an efficient and more scalable way.

      Developers can use Terraform to organize different environments, track changes through version control, and automate repetitive work to limit human error. It also provides a way for teams to collaborate on improving their infrastructure through shared configurations.

      In this tutorial you’ll import existing DigitalOcean infrastructure into Terraform. By the end of this tutorial you’ll be able to use Terraform for all of your existing infrastructure in addition to creating new assets.

      Prerequisites

      Step 1 — Installing Terraform Locally

      In this first step you’ll install Terraform on your local machine. This step details the installation of the Linux binary. If you use Windows or Mac, you can check the Download Terraform page on the Terraform website.

      Move to the folder you want to download Terraform to on your local machine, then use the wget tool to download the Terraform 0.12.12 binary:

      • cd /tmp
      • wget https://releases.hashicorp.com/terraform/0.12.12/terraform_0.12.12_linux_amd64.zip

      To check if the sha256 checksum is the same value provided on the Terraform website, you’ll download the checksum file with the following command:

      • wget -q https://releases.hashicorp.com/terraform/0.12.12/terraform_0.12.12_SHA256SUMS

      Then run the following command to verify the checksums:

      • sha256sum -c --ignore-missing terraform_0.12.12_SHA256SUMS

      The SHA256SUMS file you downloaded lists the filenames and their hashes. This command will look for the same file terraform_0.12.12_SHA256SUMS locally and then check that the hashes match by using the -c flag. Since this file has more than one filename and its platform listed, you use the --ignore-missing flag to avoid errors in your output because you don’t have a copy of the other files.

      You will see output like the following:

      Output

      terraform_0.12.12_linux_amd64.zip: OK

      Use unzip to extract the binary:

      • sudo unzip terraform_0.12.12_linux_amd64.zip -d /usr/local/bin/

      Now check if Terraform is installed properly by checking the version:

      You’ll see output similar to the following:

      Output

      Terraform v0.12.12

      You’ve installed Terraform to your local machine, you’ll now prepare the configuration files.

      Step 2 — Preparing Terraform Configuration Files

      In this step you’ll import your existing assets into Terraform by creating a project directory and writing configuration files. Since Terraform doesn’t support generating configs from the import command at this time, you need to create those configurations manually.

      Run the following command to create your project directory:

      • mkdir -p do_terraform_import

      Then move into that directory with:

      Within this step you’ll create three additional files that will contain the required configurations. Your directory structure for this project will look like the following:

      ├── digitalocean_droplet.tf
      ├── digitalocean_firewall.tf
      └── provider.tf
      

      To begin you’ll create the file provider.tf to define your DigitalOcean Access Token as an environment variable instead of hardcoding it into your configuration.

      Warning: Your access token gives access to your complete infrastructure with unrestricted access, so treat it as such. Be sure that you’re the only one who has access to the machine where that token is stored.

      Besides your access token, you’ll also specify which provider you want to use. In this tutorial that’s digitalocean. For a full list of available Data Sources and Resources for DigitalOcean with Terraform, visit the Providers page on their website.

      Create and edit provider.tf with the following command:

      Add the following content into the provider.tf file:

      provider.tf

      variable "do_token" {}
      
      provider "digitalocean" {
          token   = "${var.do_token}"
          version = "1.9.1"
          }
      

      In this file you add your DigitalOcean Access Token as a variable, which Terraform will use as identification for the DigitalOcean API. You also specify the version of the DigitalOcean provider plugin. Terraform recommends that you specify which version of the provider you’re using so that future updates don’t potentially break your current setup.

      Now you’ll create the digitalocean_droplet.tf file. Here you’ll specify the resource that you’re going to use, in this case: droplet.

      Create the file with the following command:

      • nano digitalocean_droplet.tf

      Add the following configuration:

      digitalocean_droplet.tf

      resource "digitalocean_droplet" "do_droplet" {
          name   = "testing-terraform"
          region = "fra1"
          tags   = ["terraform-testing"]
          count  = "1"
      }
      

      Here you specify four parameters:

      • name: The Droplet name.

      • region: The region that the Droplet is located in.

      • tags: A list of the tags that are applied to this Droplet.

      • count: The number of resources needed for this configuration.

      Next you’ll create a configuration file for your firewall. Create the file digitalocean_firewall.tf with the following command:

      • nano digitalocean_firewall.tf

      Add the following content to the file:

      digitalocean_firewall.tf

      resource "digitalocean_firewall" "do_firewall" {
        name  = "testing-terraform-firewall"
        tags  = ["terraform-testing"]
        count = "1"
      }
      

      Here you specify the name of the firewall you wish to import and the tags of the Droplets to which the firewall rules apply. Finally the count value of 1 defines the required number of the particular resource.

      Note: You can include firewall resources in the digitalocean_droplet.tf file as well, however if you have multiple environments where multiple Droplets share the same firewall, it’s a good idea to separate it in case you only want to remove a single Droplet. This will then leave the firewall unaffected.

      Now it’s time to initialize those changes so Terraform can download the required dependencies. You will use the terraform init command for this, which will allow you to initialize a working directory containing Terraform configuration files.

      Run this command from your project directory:

      You’ll see the following output:

      Output

      Terraform has been successfully initialized!

      Terraform has successfully prepared the working directory by downloading plugins, searching for modules, and so on. Next you’ll begin importing your assets to Terraform.

      Step 3 — Importing Your Assets to Terraform

      In this step, you’ll import your DigitalOcean assets to Terraform. You’ll use doctl to find the ID numbers of your Droplets before importing your assets. You’ll then check the import configuration with the terraform show and terraform plan commands.

      To begin, you’ll export your DigitalOcean Access Token as an environment variable, which you’ll then inject into Terraform during runtime.

      Export it as an environment variable into your current shell session with the following command:

      • export DO_TOKEN="YOUR_TOKEN"

      In order to import your existing Droplet and firewall you’ll need their ID numbers. You can use doctl, the command line interface for the DigitalOcean API. Run the following command to list your Droplets and access their IDs:

      • doctl compute droplet list

      You’ll see output similar to the following:

      Output

      ID Name Public IPv4 Private IPv4 Public IPv6 Memory VCPUs Disk Region Image Status Tags Features Volumes DROPLET-ID DROPLET-NAME DROPLET-IPv4 1024 1 25 fra1 Ubuntu 18.04.3 (LTS) x64 active DROPLET-ID DROPLET-NAME DROPLET-IPv4 2048 1 50 fra1 Ubuntu 18.04.3 (LTS) x64 active DROPLET-ID DROPLET-NAME DROPLET-IPv4 1024 1 25 fra1 Ubuntu 18.04.3 (LTS) x64

      Now you’ll import your existing Droplet and firewall into Terraform:

      • terraform import -var "do_token=${DO_TOKEN}" digitalocean_droplet.do_droplet DROPLET_ID

      You use the -var flag to specify your DigitalOcean Access Token value that you previously exported to your shell session. This is needed so the DigitalOcean API can verify who you are and apply changes to your infrastructure.

      Now run the same command for your firewall:

      • terraform import -var "do_token=${DO_TOKEN}" digitalocean_firewall.do_firewall FIREWALL_ID

      You’ll check that the import was successful by using the terraform show command. This command provides human-readable output of your infrastructure state. It can be used to inspect a plan to ensure that wanted changes are going to be executed, or to inspect the current state as Terraform sees it.

      In this context state refers to the mapping of your DigitalOcean assets to the Terraform configuration that you’ve written and the tracking of metadata. This allows you to confirm that there’s no difference between existing DigitalOcean assets that you want to import and assets that Terraform is keeping track of:

      You’ll see output similar to this:

      Output

      . . . # digitalocean_droplet.do_droplet: resource "digitalocean_droplet" "do_droplet" { backups = false created_at = "2020-02-03T16:12:02Z" disk = 25 id = "DROPLET-ID" image = "DROPLET-IMAGE" ipv4_address = "DROPLET-IP" ipv6 = false locked = false memory = 1024 monitoring = false name = "testing-terraform-0" price_hourly = 0.00744 price_monthly = 5 private_networking = false region = "fra1" resize_disk = true size = "s-1vcpu-1gb" status = "active" tags = [ "terraform-testing", ] urn = "DROPLET-URN" vcpus = 1 volume_ids = [] . . . }

      You’ll see two resources in the output along with their attributes.

      After you import your Droplet and firewall into Terraform state, you need to make sure that configurations represent the current state of the imported assets. To do this, you’ll specify your Droplet’s image and its size. You can find these two values in the output of terraform show for digitalocean_droplet.do_droplet resource.

      Open the digitalocean_droplet.tf file:

      • nano digitalocean_droplet.tf

      In this tutorial:

      • The operating system image used for our existing Droplet is ubuntu-16-04-x64.
      • The region your Droplet is located in is fra1.
      • The Droplet tag for your existing Droplet is terraform-testing.

      The Droplet you imported using the configuration in digitalocean_droplet.tf will look like this:

      digitalocean_droplet.tf

      resource "digitalocean_droplet" "do_droplet" {
          image   = "ubuntu-16-04-x64"
          name    = "testing-terraform"
          region  = "fra1"
          size    = "s-1vcpu-1gb"
          tags    = ["terraform-testing"]
      }
      

      Next you’ll add in the firewall rules. In our example, open ports for inbound traffic are 22, 80, and 443. All ports are opened for outbound traffic. You can adjust this configuration accordingly to your open ports.

      Open digitalocean_firewall.tf:

      • nano digitalocean_firewall.tf

      Add the following configuration:

      digitalocean_firewall.tf

      resource "digitalocean_firewall" "do_firewall" {
        name  = "testing-terraform-firewall"
        tags  = ["terraform-testing"]
        count = "1"
      
        inbound_rule {
            protocol                = "tcp"
            port_range              = "22"
            source_addresses        = ["0.0.0.0/0", "::/0"]
          }
        inbound_rule {
            protocol                = "tcp"
            port_range              = "80"
            source_addresses        = ["0.0.0.0/0", "::/0"]
          }
        inbound_rule {
            protocol                = "tcp"
            port_range              = "443"
            source_addresses        = ["0.0.0.0/0", "::/0"]
          }
      
        outbound_rule {
            protocol                = "tcp"
            port_range              = "all"
            destination_addresses   = ["0.0.0.0/0", "::/0"]
          }
        outbound_rule {
            protocol                = "udp"
            port_range              = "all"
            destination_addresses   = ["0.0.0.0/0", "::/0"]
          }
        outbound_rule {
            protocol                = "icmp"
            destination_addresses   = ["0.0.0.0/0", "::/0"]
          }
      }
      

      These rules replicate the state of the existing example firewall. If you’d like to limit traffic to different IP addresses, different ports, or different protocol, you can adjust the file to replicate your existing firewall.

      After you’ve updated your Terraform files, you’ll use the plan command to see if changes you made replicate state of existing assets on DigitalOcean.

      The terraform plan command is used as a dry run. With this command you can check if changes Terraform is going to make are the changes you want to make. It is a good idea to always run this command for confirmation before applying changes.

      Run terraform plan with the following:

      • terraform plan -var "do_token=$DO_TOKEN"

      You’ll see output similar to the following output:

      Output

      No changes. Infrastructure is up-to-date.

      You’ve successfully imported existing DigitalOcean assets in Terraform, and now you can make changes to your infrastructure through Terraform without the risk of accidentally deleting or modifying existing assets.

      Step 4 — Creating New Assets via Terraform

      In this step you’ll add two additional Droplets to your existing infrastructure. Adding assets in this way to your existing infrastructure can be useful, for example, if you have a live website and don’t want to make any potentially breaking changes to that website while working on it. Instead you can add one more Droplet to use as a development environment and work on your project in the same environment as the production Droplet, without any of the potential risk.

      Now open digitalocean_droplet.tf to add the rules for your new Droplets:

      • nano digitalocean_droplet.tf

      Add the following lines to your file:

      digitalocean_droplet.tf

      resource "digitalocean_droplet" "do_droplet" {
          image   = "ubuntu-16-04-x64"
          name    = "testing-terraform"
          region  = "fra1"
          size    = "s-1vcpu-1gb"
          tags    = ["terraform-testing"]
          count   = "1"
      }
      
      resource "digitalocean_droplet" "do_droplet_new" {
          image   = "ubuntu-18-04-x64"
          name    = "testing-terraform-${count.index}"
          region  = "fra1"
          size    = "s-1vcpu-1gb"
          tags    = ["terraform-testing"]
          count   = "2"
      }
      

      You use the count meta-argument to tell Terraform how many Droplets with the same specifications you want. These new Droplets will also be added to your existing firewall as you specify the same tag as per your firewall.

      Apply these rules to check the changes you’re specifying in digitalocean_droplet.tf:

      • terraform plan -var "do_token=$DO_TOKEN"

      Verify that the changes you want to make are replicated in the output of this command.

      You’ll see output similar to the following:

      Output

      . . . # digitalocean_droplet.do_droplet_new[1] will be created + resource "digitalocean_droplet" "do_droplet_new" { + backups = false + created_at = (known after apply) + disk = (known after apply) + id = (known after apply) + image = "ubuntu-18-04-x64" + ipv4_address = (known after apply) + ipv4_address_private = (known after apply) + ipv6 = false + ipv6_address = (known after apply) + ipv6_address_private = (known after apply) + locked = (known after apply) + memory = (known after apply) + monitoring = false + name = "testing-terraform-1" + price_hourly = (known after apply) + price_monthly = (known after apply) + private_networking = true + region = "fra1" + resize_disk = true + size = "s-1vcpu-1gb" + status = (known after apply) + tags = [ + "terraform-testing", ] + urn = (known after apply) + vcpus = (known after apply) + volume_ids = (known after apply) } Plan: 2 to add, 1 to change, 0 to destroy.

      Once you’re satisfied with the output, use the terraform apply command to apply the changes you’ve specified to the state of the configuration:

      • terraform apply -var "do_token=$DO_TOKEN"

      Confirm the changes by entering yes on the command line. After successful execution, you’ll see output similar to the following:

      Output

      . . . digitalocean_droplet.do_droplet_new[1]: Creating... digitalocean_droplet.do_droplet_new[0]: Creating... digitalocean_firewall.do_firewall[0]: Modifying... [id=FIREWALL-ID] digitalocean_firewall.do_firewall[0]: Modifications complete after 1s [id=FIREWALL-ID] digitalocean_droplet.do_droplet_new[0]: Still creating... [10s elapsed] digitalocean_droplet.do_droplet_new[1]: Still creating... [10s elapsed] digitalocean_droplet.do_droplet_new[0]: Creation complete after 16s [id=DROPLET-ID] digitalocean_droplet.do_droplet_new[1]: Still creating... [20s elapsed] digitalocean_droplet.do_droplet_new[1]: Creation complete after 22s [id=DROPLET-ID] Apply complete! Resources: 2 added, 1 changed, 0 destroyed.

      You’ll see two new Droplets in your DigitalOcean web panel:
      New Droplets

      You’ll also see them attached to your existing firewall:
      Existing Firewall

      You’ve created new assets with Terraform using your existing assets. To learn how to destroy these assets you can optionally complete the next step.

      Step 5 — Destroying Imported and Created Assets (Optional)

      In this step, you’ll destroy assets that you’ve imported and created by adjusting the configuration.

      Begin by opening digitalocean_droplet.tf:

      • nano digitalocean_droplet.tf

      In the file, set the count to 0 as per the following:

      digitalocean_droplet.tf

      resource "digitalocean_droplet" "do_droplet" {
          image   = "ubuntu-16-04-x64"
          name    = "testing-terraform"
          region  = "fra1"
          size    = "s-1vcpu-1gb"
          tags    = ["terraform-testing"]
          count   = "0"
      }
      
      resource "digitalocean_droplet" "do_droplet_new" {
          image   = "ubuntu-18-04-x64"
          name    = "testing-terraform-${count.index}"
          region  = "fra1"
          size    = "s-1vcpu-1gb"
          tags    = ["terraform-testing"]
          count   = "0"
      }
      

      Save and exit the file.

      Open your firewall configuration file to alter the count as well:

      • nano digitalocean_firewall.tf

      Set the count to 0 like the following highlighted line:

      digitalocean_firewall.tf

      resource "digitalocean_firewall" "do_firewall" {
        name  = "testing-terraform-firewall"
        tags  = ["terraform-testing"]
        count = "0"
      
        inbound_rule {
            protocol                = "tcp"
            port_range              = "22"
            source_addresses        = ["0.0.0.0/0", "::/0"]
          }
        inbound_rule {
            protocol                = "tcp"
            port_range              = "80"
            source_addresses        = ["0.0.0.0/0", "::/0"]
          }
        inbound_rule {
            protocol                = "tcp"
            port_range              = "443"
            source_addresses        = ["0.0.0.0/0", "::/0"]
          }
      
        outbound_rule {
            protocol                = "tcp"
            port_range              = "all"
            destination_addresses   = ["0.0.0.0/0", "::/0"]
          }
        outbound_rule {
            protocol                = "udp"
            port_range              = "all"
            destination_addresses   = ["0.0.0.0/0", "::/0"]
          }
        outbound_rule {
            protocol                = "icmp"
            destination_addresses   = ["0.0.0.0/0", "::/0"]
          }
      }
      

      Save and exit the file.

      Now apply those changes with the following command:

      • terraform apply -var "do_token=${DO_TOKEN}"

      Terraform will ask you to confirm if you wish to destroy the Droplets and firewall. This will destroy all assets you imported and created via Terraform, so ensure you verify that you wish to proceed before typing yes.

      You’ll see output similar to:

      Output

      . . . digitalocean_droplet.do_droplet[0]: Destroying... [id=YOUR-DROPLET-ID]] digitalocean_droplet.do_droplet_new[0]: Destroying... [id=YOUR-DROPLET-ID] digitalocean_droplet.do_droplet_new[1]: Destroying... [id=YOUR-DROPLET-ID] digitalocean_firewall.do_firewall[0]: Destroying... [id=YOUR-FIREWALL-ID] digitalocean_firewall.do_firewall[0]: Destruction complete after 1s digitalocean_droplet.do_droplet_new[1]: Still destroying... [id=YOUR-DROPLET-ID, 10s elapsed] digitalocean_droplet.do_droplet[0]: Still destroying... [id=YOUR-DROPLET-ID, 10s elapsed] digitalocean_droplet.do_droplet_new[0]: Still destroying... [id=YOUR-DROPLET-ID, 10s elapsed] digitalocean_droplet.do_droplet_new[1]: Still destroying... [id=YOUR-DROPLET-ID, 20s elapsed] digitalocean_droplet.do_droplet_new[0]: Still destroying... [id=YOUR-DROPLET-ID, 20s elapsed] digitalocean_droplet.do_droplet[0]: Still destroying... [id=YOUR-DROPLET-ID, 20s elapsed] digitalocean_droplet.do_droplet_new[1]: Destruction complete after 22s digitalocean_droplet.do_droplet[0]: Destruction complete after 22s digitalocean_droplet.do_droplet_new[0]: Destruction complete after 22s Apply complete! Resources: 0 added, 0 changed, 4 destroyed.

      You’ve deleted all assets managed by Terraform. This is a useful workflow if you no longer need an asset or are scaling down.

      Conclusion

      In this tutorial you installed Terraform, imported existing assets, created new assets, and optionally destroyed those assets. You can scale this workflow to a larger project, such as deploying a production-ready Kubernetes cluster. Using Terraform you could manage all of the nodes, DNS entries, firewalls, storage, and other assets, as well as use version control to track changes and collaborate with a team.

      To explore further features of Terraform read their documentation. You can also read DigitalOcean’s Terraform content for further tutorials and Q&A.



      Source link

      How To Back Up, Import, and Migrate Your Apache Kafka Data on Debian 9


      The author selected Tech Education Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Backing up your Apache Kafka data is an important practice that will help you recover from unintended data loss or bad data added to the cluster due to user error. Data dumps of cluster and topic data are an efficient way to perform backups and restorations.

      Importing and migrating your backed up data to a separate server is helpful in situations where your Kafka instance becomes unusable due to server hardware or networking failures and you need to create a new Kafka instance with your old data. Importing and migrating backed up data is also useful when you are moving the Kafka instance to an upgraded or downgraded server due to a change in resource usage.

      In this tutorial, you will back up, import, and migrate your Kafka data on a single Debian 9 installation as well as on multiple Debian 9 installations on separate servers. ZooKeeper is a critical component of Kafka’s operation. It stores information about cluster state such as consumer data, partition data, and the state of other brokers in the cluster. As such, you will also back up ZooKeeper’s data in this tutorial.

      Prerequisites

      To follow along, you will need:

      • A Debian 9 server with at least 4GB of RAM and a non-root sudo user set up by following the tutorial.
      • A Debian 9 server with Apache Kafka installed, to act as the source of the backup. Follow the How To Install Apache Kafka on Debian 9 guide to set up your Kafka installation, if Kafka isn’t already installed on the source server.
      • OpenJDK 8 installed on the server. To install this version, follow these instructions on installing specific versions of OpenJDK.
      • Optional for Step 7 — Another Debian 9 server with Apache Kafka installed, to act as the destination of the backup. Follow the article link in the previous prerequisite to install Kafka on the destination server. This prerequisite is required only if you are moving your Kafka data from one server to another. If you want to back up and import your Kafka data to a single server, you can skip this prerequisite.

      Step 1 — Creating a Test Topic and Adding Messages

      A Kafka message is the most basic unit of data storage in Kafka and is the entity that you will publish to and subscribe from Kafka. A Kafka topic is like a container for a group of related messages. When you subscribe to a particular topic, you will receive only messages that were published to that particular topic. In this section you will log in to the server that you would like to back up (the source server) and add a Kafka topic and a message so that you have some data populated for the backup.

      This tutorial assumes you have installed Kafka in the home directory of the kafka user (/home/kafka/kafka). If your installation is in a different directory, modify the ~/kafka part in the following commands with your Kafka installation’s path, and for the commands throughout the rest of this tutorial.

      SSH into the source server by executing:

      • ssh sammy@source_server_ip

      Run the following command to log in as the kafka user:

      Create a topic named BackupTopic using the kafka-topics.sh shell utility file in your Kafka installation's bin directory, by typing:

      • ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic BackupTopic

      Publish the string "Test Message 1" to the BackupTopic topic by using the ~/kafka/bin/kafka-console-producer.sh shell utility script.

      If you would like to add additional messages here, you can do so now.

      • echo "Test Message 1" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic BackupTopic > /dev/null

      The ~/kafka/bin/kafka-console-producer.sh file allows you to publish messages directly from the command line. Typically, you would publish messages using a Kafka client library from within your program, but since that involves different setups for different programming languages, you can use the shell script as a language-independent way of publishing messages during testing or while performing administrative tasks. The --topic flag specifies the topic that you will publish the message to.

      Next, verify that the kafka-console-producer.sh script has published the message(s) by running the following command:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      The ~/kafka/bin/kafka-console-consumer.sh shell script starts the consumer. Once started, it will subscribe to messages from the topic that you published in the "Test Message 1" message in the previous command. The --from-beginning flag in the command allows consuming messages that were published before the consumer was started. Without the flag enabled, only messages published after the consumer was started will appear. On running the command, you will see the following output in the terminal:

      Output

      Test Message 1

      Press CTRL+C to stop the consumer.

      You've created some test data and verified that it's persisted. Now you can back up the state data in the next section.

      Step 2 — Backing Up the ZooKeeper State Data

      Before backing up the actual Kafka data, you need to back up the cluster state stored in ZooKeeper.

      ZooKeeper stores its data in the directory specified by the dataDir field in the ~/kafka/config/zookeeper.properties configuration file. You need to read the value of this field to determine the directory to back up. By default, dataDir points to the /tmp/zookeeper directory. If the value is different in your installation, replace /tmp/zookeeper with that value in the following commands.

      Here is an example output of the ~/kafka/config/zookeeper.properties file:

      ~/kafka/config/zookeeper.properties

      ...
      ...
      ...
      # the directory where the snapshot is stored.
      dataDir=/tmp/zookeeper
      # the port at which the clients will connect
      clientPort=2181
      # disable the per-ip limit on the number of connections since this is a non-production config
      maxClientCnxns=0
      ...
      ...
      ...
      

      Now that you have the path to the directory, you can create a compressed archive file of its contents. Compressed archive files are a better option over regular archive files to save disk space. Run the following command:

      • tar -czf /home/kafka/zookeeper-backup.tar.gz /tmp/zookeeper/*

      The command's output tar: Removing leading / from member names you can safely ignore.

      The -c and -z flags tell tar to create an archive and apply gzip compression to the archive. The -f flag specifies the name of the output compressed archive file, which is zookeeper-backup.tar.gz in this case.

      You can run ls in your current directory to see zookeeper-backup.tar.gz as part of your output.

      You have now successfully backed up the ZooKeeper data. In the next section, you will back up the actual Kafka data.

      Step 3 — Backing Up the Kafka Topics and Messages

      In this section, you will back up Kafka's data directory into a compressed tar file like you did for ZooKeeper in the previous step.

      Kafka stores topics, messages, and internal files in the directory that the log.dirs field specifies in the ~/kafka/config/server.properties configuration file. You need to read the value of this field to determine the directory to back up. By default and in your current installation, log.dirs points to the /tmp/kafka-logs directory. If the value is different in your installation, replace /tmp/kafka-logs in the following commands with the correct value.

      Here is an example output of the ~/kafka/config/server.properties file:

      ~/kafka/config/server.properties

      ...
      ...
      ...
      ############################# Log Basics #############################
      
      # A comma separated list of directories under which to store log files
      log.dirs=/tmp/kafka-logs
      
      # The default number of log partitions per topic. More partitions allow greater
      # parallelism for consumption, but this will also result in more files across
      # the brokers.
      num.partitions=1
      
      # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
      # This value is recommended to be increased for installations with data dirs located in RAID array.
      num.recovery.threads.per.data.dir=1
      ...
      ...
      ...
      

      First, stop the Kafka service so that the data in the log.dirs directory is in a consistent state when creating the archive with tar. To do this, return to your server's non-root user by typing exit and then run the following command:

      • sudo systemctl stop kafka

      After stopping the Kafka service, log back in as your kafka user with:

      It is necessary to stop/start the Kafka and ZooKeeper services as your non-root sudo user because in the Apache Kafka installation prerequisite you restricted the kafka user as a security precaution. This step in the prerequisite disables sudo access for the kafka user, which leads to commands failing to execute.

      Now, create a compressed archive file of the directory's contents by running the following command:

      • tar -czf /home/kafka/kafka-backup.tar.gz /tmp/kafka-logs/*

      Once again, you can safely ignore the command's output (tar: Removing leading / from member names).

      You can run ls in the current directory to see kafka-backup.tar.gz as part of the output.

      You can start the Kafka service again — if you do not want to restore the data immediately — by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl start kafka

      Log back in as your kafka user:

      You have successfully backed up the Kafka data. You can now proceed to the next section, where you will be restoring the cluster state data stored in ZooKeeper.

      Step 4 — Restoring the ZooKeeper Data

      In this section you will restore the cluster state data that Kafka creates and manages internally when the user performs operations such as creating a topic, adding/removing additional nodes, and adding and consuming messages. You will restore the data to your existing source installation by deleting the ZooKeeper data directory and restoring the contents of the zookeeper-backup.tar.gz file. If you want to restore data to a different server, see Step 7.

      You need to stop the Kafka and ZooKeeper services as a precaution against the data directories receiving invalid data during the restoration process.

      First, stop the Kafka service by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl stop kafka

      Next, stop the ZooKeeper service:

      • sudo systemctl stop zookeeper

      Log back in as your kafka user:

      You can then safely delete the existing cluster data directory with the following command:

      Now restore the data you backed up in Step 2:

      • tar -C /tmp/zookeeper -xzf /home/kafka/zookeeper-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/zookeeper before extracting the data. You specify the --strip 2 flag to make tar extract the archive's contents in /tmp/zookeeper/ itself and not in another directory (such as /tmp/zookeeper/tmp/zookeeper/) inside of it.

      You have restored the cluster state data successfully. Now, you can proceed to the Kafka data restoration process in the next section.

      Step 5 — Restoring the Kafka Data

      In this section you will restore the backed up Kafka data to your existing source installation (or the destination server if you have followed the optional Step 7) by deleting the Kafka data directory and restoring the compressed archive file. This will allow you to verify that restoration works successfully.

      You can safely delete the existing Kafka data directory with the following command:

      Now that you have deleted the data, your Kafka installation resembles a fresh installation with no topics or messages present in it. To restore your backed up data, extract the files by running:

      • tar -C /tmp/kafka-logs -xzf /home/kafka/kafka-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/kafka-logs before extracting the data. You specify the --strip 2 flag to ensure that the archive's contents are extracted in /tmp/kafka-logs/ itself and not in another directory (such as /tmp/kafka-logs/kafka-logs/) inside of it.

      Now that you have extracted the data successfully, you can start the Kafka and ZooKeeper services again by typing exit, to switch to your non-root sudo user, and then executing:

      • sudo systemctl start kafka

      Start the ZooKeeper service with:

      • sudo systemctl start zookeeper

      Log back in as your kafka user:

      You have restored the kafka data, you can move on to verifying that the restoration is successful in the next section.

      Step 6 — Verifying the Restoration

      To test the restoration of the Kafka data, you will consume messages from the topic you created in Step 1.

      Wait a few minutes for Kafka to start up and then execute the following command to read messages from the BackupTopic:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      If you get a warning like the following, you need to wait for Kafka to start fully:

      Output

      [2018-09-13 15:52:45,234] WARN [Consumer clientId=consumer-1, groupId=console-consumer-87747] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

      Retry the previous command in another few minutes or run sudo systemctl restart kafka as your non-root sudo user. If there are no issues in the restoration, you will see the following output:

      Output

      Test Message 1

      If you do not see this message, you can check if you missed out any commands in the previous section and execute them.

      Now that you have verified the restored Kafka data, this means you have successfully backed up and restored your data in a single Kafka installation. You can continue to Step 7 to see how to migrate the cluster and topics data to an installation in another server.

      Step 7 — Migrating and Restoring the Backup to Another Kafka Server (Optional)

      In this section, you will migrate the backed up data from the source Kafka server to the destination Kafka server. To do so, you will first use the scp command to download the compressed tar.gz files to your local system. You will then use scp again to push the files to the destination server. Once the files are present in the destination server, you can follow the steps used previously to restore the backup and verify that the migration is successful.

      You are downloading the backup files locally and then uploading them to the destination server, instead of copying it directly from your source to destination server, because the destination server will not have your source server's SSH key in its /home/sammy/.ssh/authorized_keys file and cannot connect to and from the source server. Your local machine can connect to both servers however, saving you an additional step of setting up SSH access from the source to destination server.

      Download the zookeeper-backup.tar.gz and kafka-backup.tar.gz files to your local machine by executing:

      • scp sammy@source_server_ip:/home/kafka/zookeeper-backup.tar.gz .

      You will see output similar to:

      Output

      zookeeper-backup.tar.gz 100% 68KB 128.0KB/s 00:00

      Now run the following command to download the kafka-backup.tar.gz file to your local machine:

      • scp sammy@source_server_ip:/home/kafka/kafka-backup.tar.gz .

      You will see the following output:

      Output

      kafka-backup.tar.gz 100% 1031KB 488.3KB/s 00:02

      Run ls in the current directory of your local machine, you will see both of the files:

      Output

      kafka-backup.tar.gz zookeeper.tar.gz

      Run the following command to transfer the zookeeper-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp zookeeper-backup.tar.gz sammy@destination_server_ip:/home/sammy/zookeeper-backup.tar.gz

      Now run the following command to transfer the kafka-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp kafka-backup.tar.gz sammy@destination_server_ip:/home/sammy/kafka-backup.tar.gz

      You have uploaded the backup files to the destination server successfully. Since the files are in the /home/sammy/ directory and do not have the correct permissions for access by the kafka user, you can move the files to the /home/kafka/ directory and change their permissions.

      SSH into the destination server by executing:

      • ssh sammy@destination_server_ip

      Now move zookeeper-backup.tar.gz to /home/kafka/ by executing:

      • sudo mv zookeeper-backup.tar.gz /home/sammy/zookeeper-backup.tar.gz

      Similarly, run the following command to copy kafka-backup.tar.gz to /home/kafka/:

      • sudo mv kafka-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      Change the owner of the backup files by running the following command:

      • sudo chown kafka /home/kafka/zookeeper-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      The previous mv and chown commands will not display any output.

      Now that the backup files are present in the destination server at the correct directory, follow the commands listed in Steps 4 to 6 of this tutorial to restore and verify the data for your destination server.

      Conclusion

      In this tutorial, you backed up, imported, and migrated your Kafka topics and messages from both the same installation and installations on separate servers. If you would like to learn more about other useful administrative tasks in Kafka, you can consult the operations section of Kafka's official documentation.

      To store backed up files such as zookeeper-backup.tar.gz and kafka-backup.tar.gz remotely, you can explore DigitalOcean Spaces. If Kafka is the only service running on your server, you can also explore other backup methods such as full instance backups.



      Source link

      How To Back Up, Import, and Migrate Your Apache Kafka Data on CentOS 7


      The author selected the Tech Education Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Backing up your Apache Kafka data is an important practice that will help you recover from unintended data loss or bad data added to the cluster due to user error. Data dumps of cluster and topic data are an efficient way to perform backups and restorations.

      Importing and migrating your backed up data to a separate server is helpful in situations where your Kafka instance becomes unusable due to server hardware or networking failures and you need to create a new Kafka instance with your old data. Importing and migrating backed up data is also useful when you are moving the Kafka instance to an upgraded or downgraded server due to a change in resource usage.

      In this tutorial, you will back up, import, and migrate your Kafka data on a single CentOS 7 installation as well as on multiple CentOS 7 installations on separate servers. ZooKeeper is a critical component of Kafka’s operation. It stores information about cluster state such as consumer data, partition data, and the state of other brokers in the cluster. As such, you will also back up ZooKeeper’s data in this tutorial.

      Prerequisites

      To follow along, you will need:

      • A CentOS 7 server with at least 4GB of RAM and a non-root sudo user set up by following this tutorial.
      • A CentOS 7 server with Apache Kafka installed, to act as the source of the backup. Follow the How To Install Apache Kafka on CentOS 7 guide to set up your Kafka installation, if Kafka isn’t already installed on the source server.
      • OpenJDK 8 installed on the server. To install this version, follow these instructions on installing specific versions of OpenJDK.
      • Optional for Step 7 — Another CentOS 7 server with Apache Kafka installed, to act as the destination of the backup. Follow the article link in the previous prerequisite to install Kafka on the destination server. This prerequisite is required only if you are moving your Kafka data from one server to another. If you want to back up and import your Kafka data to a single server, you can skip this prerequisite.

      Step 1 — Creating a Test Topic and Adding Messages

      A Kafka message is the most basic unit of data storage in Kafka and is the entity that you will publish to and subscribe from Kafka. A Kafka topic is like a container for a group of related messages. When you subscribe to a particular topic, you will receive only messages that were published to that particular topic. In this section you will log in to the server that you would like to back up (the source server) and add a Kafka topic and a message so that you have some data populated for the backup.

      This tutorial assumes you have installed Kafka in the home directory of the kafka user (/home/kafka/kafka). If your installation is in a different directory, modify the ~/kafka part in the following commands with your Kafka installation’s path, and for the commands throughout the rest of this tutorial.

      SSH into the source server by executing:

      • ssh sammy@source_server_ip

      Run the following command to log in as the kafka user:

      Create a topic named BackupTopic using the kafka-topics.sh shell utility file in your Kafka installation's bin directory, by typing:

      • ~/kafka/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic BackupTopic

      Publish the string "Test Message 1" to the BackupTopic topic by using the ~/kafka/bin/kafka-console-producer.sh shell utility script.

      If you would like to add additional messages here, you can do so now.

      • echo "Test Message 1" | ~/kafka/bin/kafka-console-producer.sh --broker-list localhost:9092 --topic BackupTopic > /dev/null

      The ~/kafka/bin/kafka-console-producer.sh file allows you to publish messages directly from the command line. Typically, you would publish messages using a Kafka client library from within your program, but since that involves different setups for different programming languages, you can use the shell script as a language-independent way of publishing messages during testing or while performing administrative tasks. The --topic flag specifies the topic that you will publish the message to.

      Next, verify that the kafka-console-producer.sh script has published the message(s) by running the following command:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      The ~/kafka/bin/kafka-console-consumer.sh shell script starts the consumer. Once started, it will subscribe to messages from the topic that you published in the "Test Message 1" message in the previous command. The --from-beginning flag in the command allows consuming messages that were published before the consumer was started. Without the flag enabled, only messages published after the consumer was started will appear. On running the command, you will see the following output in the terminal:

      Output

      Test Message 1

      Press CTRL+C to stop the consumer.

      You've created some test data and verified that it's persisted. Now you can back up the state data in the next section.

      Step 2 — Backing Up the ZooKeeper State Data

      Before backing up the actual Kafka data, you need to back up the cluster state stored in ZooKeeper.

      ZooKeeper stores its data in the directory specified by the dataDir field in the ~/kafka/config/zookeeper.properties configuration file. You need to read the value of this field to determine the directory to back up. By default, dataDir points to the /tmp/zookeeper directory. If the value is different in your installation, replace /tmp/zookeeper with that value in the following commands.

      Here is an example output of the ~/kafka/config/zookeeper.properties file:

      ~/kafka/config/zookeeper.properties

      ...
      ...
      ...
      # the directory where the snapshot is stored.
      dataDir=/tmp/zookeeper
      # the port at which the clients will connect
      clientPort=2181
      # disable the per-ip limit on the number of connections since this is a non-production config
      maxClientCnxns=0
      ...
      ...
      ...
      

      Now that you have the path to the directory, you can create a compressed archive file of its contents. Compressed archive files are a better option over regular archive files to save disk space. Run the following command:

      • tar -czf /home/kafka/zookeeper-backup.tar.gz /tmp/zookeeper/*

      The command's output tar: Removing leading / from member names you can safely ignore.

      The -c and -z flags tell tar to create an archive and apply gzip compression to the archive. The -f flag specifies the name of the output compressed archive file, which is zookeeper-backup.tar.gz in this case.

      You can run ls in your current directory to see zookeeper-backup.tar.gz as part of your output.

      You have now successfully backed up the ZooKeeper data. In the next section, you will back up the actual Kafka data.

      Step 3 — Backing Up the Kafka Topics and Messages

      In this section, you will back up Kafka's data directory into a compressed tar file like you did for ZooKeeper in the previous step.

      Kafka stores topics, messages, and internal files in the directory that the log.dirs field specifies in the ~/kafka/config/server.properties configuration file. You need to read the value of this field to determine the directory to back up. By default and in your current installation, log.dirs points to the /tmp/kafka-logs directory. If the value is different in your installation, replace /tmp/kafka-logs in the following commands with the correct value.

      Here is an example output of the ~/kafka/config/server.properties file:

      ~/kafka/config/server.properties

      ...
      ...
      ...
      ############################# Log Basics #############################
      
      # A comma separated list of directories under which to store log files
      log.dirs=/tmp/kafka-logs
      
      # The default number of log partitions per topic. More partitions allow greater
      # parallelism for consumption, but this will also result in more files across
      # the brokers.
      num.partitions=1
      
      # The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
      # This value is recommended to be increased for installations with data dirs located in RAID array.
      num.recovery.threads.per.data.dir=1
      ...
      ...
      ...
      

      First, stop the Kafka service so that the data in the log.dirs directory is in a consistent state when creating the archive with tar. To do this, return to your server's non-root user by typing exit and then run the following command:

      • sudo systemctl stop kafka

      After stopping the Kafka service, log back in as your kafka user with:

      It is necessary to stop/start the Kafka and ZooKeeper services as your non-root sudo user because in the Apache Kafka installation prerequisite you restricted the kafka user as a security precaution. This step in the prerequisite disables sudo access for the kafka user, which leads to commands failing to execute.

      Now, create a compressed archive file of the directory's contents by running the following command:

      • tar -czf /home/kafka/kafka-backup.tar.gz /tmp/kafka-logs/*

      Once again, you can safely ignore the command's output (tar: Removing leading / from member names).

      You can run ls in the current directory to see kafka-backup.tar.gz as part of the output.

      You can start the Kafka service again — if you do not want to restore the data immediately — by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl start kafka

      Log back in as your kafka user:

      You have successfully backed up the Kafka data. You can now proceed to the next section, where you will be restoring the cluster state data stored in ZooKeeper.

      Step 4 — Restoring the ZooKeeper Data

      In this section you will restore the cluster state data that Kafka creates and manages internally when the user performs operations such as creating a topic, adding/removing additional nodes, and adding and consuming messages. You will restore the data to your existing source installation by deleting the ZooKeeper data directory and restoring the contents of the zookeeper-backup.tar.gz file. If you want to restore data to a different server, see Step 7.

      You need to stop the Kafka and ZooKeeper services as a precaution against the data directories receiving invalid data during the restoration process.

      First, stop the Kafka service by typing exit, to switch to your non-root sudo user, and then running:

      • sudo systemctl stop kafka

      Next, stop the ZooKeeper service:

      • sudo systemctl stop zookeeper

      Log back in as your kafka user:

      You can then safely delete the existing cluster data directory with the following command:

      Now restore the data you backed up in Step 2:

      • tar -C /tmp/zookeeper -xzf /home/kafka/zookeeper-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/zookeeper before extracting the data. You specify the --strip 2 flag to make tar extract the archive's contents in /tmp/zookeeper/ itself and not in another directory (such as /tmp/zookeeper/tmp/zookeeper/) inside of it.

      You have restored the cluster state data successfully. Now, you can proceed to the Kafka data restoration process in the next section.

      Step 5 — Restoring the Kafka Data

      In this section you will restore the backed up Kafka data to your existing source installation (or the destination server if you have followed the optional Step 7) by deleting the Kafka data directory and restoring the compressed archive file. This will allow you to verify that restoration works successfully.

      You can safely delete the existing Kafka data directory with the following command:

      Now that you have deleted the data, your Kafka installation resembles a fresh installation with no topics or messages present in it. To restore your backed up data, extract the files by running:

      • tar -C /tmp/kafka-logs -xzf /home/kafka/kafka-backup.tar.gz --strip-components 2

      The -C flag tells tar to change to the directory /tmp/kafka-logs before extracting the data. You specify the --strip 2 flag to ensure that the archive's contents are extracted in /tmp/kafka-logs/ itself and not in another directory (such as /tmp/kafka-logs/kafka-logs/) inside of it.

      Now that you have extracted the data successfully, you can start the Kafka and ZooKeeper services again by typing exit, to switch to your non-root sudo user, and then executing:

      • sudo systemctl start kafka

      Start the ZooKeeper service with:

      • sudo systemctl start zookeeper

      Log back in as your kafka user:

      You have restored the kafka data, you can move on to verifying that the restoration is successful in the next section.

      Step 6 — Verifying the Restoration

      To test the restoration of the Kafka data, you will consume messages from the topic you created in Step 1.

      Wait a few minutes for Kafka to start up and then execute the following command to read messages from the BackupTopic:

      • ~/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic BackupTopic --from-beginning

      If you get a warning like the following, you need to wait for Kafka to start fully:

      Output

      [2018-09-13 15:52:45,234] WARN [Consumer clientId=consumer-1, groupId=console-consumer-87747] Connection to node -1 could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)

      Retry the previous command in another few minutes or run sudo systemctl restart kafka as your non-root sudo user. If there are no issues in the restoration, you will see the following output:

      Output

      Test Message 1

      If you do not see this message, you can check if you missed out any commands in the previous section and execute them.

      Now that you have verified the restored Kafka data, this means you have successfully backed up and restored your data in a single Kafka installation. You can continue to Step 7 to see how to migrate the cluster and topics data to an installation in another server.

      Step 7 — Migrating and Restoring the Backup to Another Kafka Server (Optional)

      In this section, you will migrate the backed up data from the source Kafka server to the destination Kafka server. To do so, you will first use the scp command to download the compressed tar.gz files to your local system. You will then use scp again to push the files to the destination server. Once the files are present in the destination server, you can follow the steps used previously to restore the backup and verify that the migration is successful.

      You are downloading the backup files locally and then uploading them to the destination server, instead of copying it directly from your source to destination server, because the destination server will not have your source server's SSH key in its /home/sammy/.ssh/authorized_keys file and cannot connect to and from the source server. Your local machine can connect to both servers however, saving you an additional step of setting up SSH access from the source to destination server.

      Download the zookeeper-backup.tar.gz and kafka-backup.tar.gz files to your local machine by executing:

      • scp sammy@source_server_ip:/home/kafka/zookeeper-backup.tar.gz .

      You will see output similar to:

      Output

      zookeeper-backup.tar.gz 100% 68KB 128.0KB/s 00:00

      Now run the following command to download the kafka-backup.tar.gz file to your local machine:

      • scp sammy@source_server_ip:/home/kafka/kafka-backup.tar.gz .

      You will see the following output:

      Output

      kafka-backup.tar.gz 100% 1031KB 488.3KB/s 00:02

      Run ls in the current directory of your local machine, you will see both of the files:

      Output

      kafka-backup.tar.gz zookeeper.tar.gz

      Run the following command to transfer the zookeeper-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp zookeeper-backup.tar.gz sammy@destination_server_ip:/home/sammy/zookeeper-backup.tar.gz

      Now run the following command to transfer the kafka-backup.tar.gz file to /home/kafka/ of the destination server:

      • scp kafka-backup.tar.gz sammy@destination_server_ip:/home/sammy/kafka-backup.tar.gz

      You have uploaded the backup files to the destination server successfully. Since the files are in the /home/sammy/ directory and do not have the correct permissions for access by the kafka user, you can move the files to the /home/kafka/ directory and change their permissions.

      SSH into the destination server by executing:

      • ssh sammy@destination_server_ip

      Now move zookeeper-backup.tar.gz to /home/kafka/ by executing:

      • sudo mv zookeeper-backup.tar.gz /home/sammy/zookeeper-backup.tar.gz

      Similarly, run the following command to copy kafka-backup.tar.gz to /home/kafka/:

      • sudo mv kafka-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      Change the owner of the backup files by running the following command:

      • sudo chown kafka /home/kafka/zookeeper-backup.tar.gz /home/kafka/kafka-backup.tar.gz

      The previous mv and chown commands will not display any output.

      Now that the backup files are present in the destination server at the correct directory, follow the commands listed in Steps 4 to 6 of this tutorial to restore and verify the data for your destination server.

      Conclusion

      In this tutorial, you backed up, imported, and migrated your Kafka topics and messages from both the same installation and installations on separate servers. If you would like to learn more about other useful administrative tasks in Kafka, you can consult the operations section of Kafka's official documentation.

      To store backed up files such as zookeeper-backup.tar.gz and kafka-backup.tar.gz remotely, you can explore Digital Ocean Spaces. If Kafka is the only service running on your server, you can also explore other backup methods such as full instance backups.



      Source link