One place for hosting & domains

      Statistics

      How To Analyze Managed Redis Database Statistics Using the Elastic Stack on Ubuntu 18.04


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Database monitoring is the continuous process of systematically tracking various metrics that show how the database is performing. By observing performance data, you can gain valuable insights and identify possible bottlenecks, as well as find additional ways of improving database performance. Such systems often implement alerting that notifies administrators when things go wrong. Gathered statistics can be used to not only improve the configuration and workflow of the database, but also those of client applications.

      The benefit of using the Elastic Stack (ELK stack) for monitoring your managed database is its excellent support for searching and the ability to ingest new data very quickly. It does not excel at updating the data, but this trade-off is acceptable for monitoring and logging purposes, where past data is almost never changed. Elasticsearch offers a powerful means of querying the data, which you can use through Kibana to get a better understanding of how the database fares through different time periods. This will allow you to correlate database load with real-life events to gain insight into how the database is being used.

      In this tutorial, you’ll import database metrics, generated by the Redis INFO command, into Elasticsearch via Logstash. This entails configuring Logstash to periodically run the command, parse its output and send it to Elasticsearch for indexing immediately afterward. The imported data can later be analyzed and visualized in Kibana. By the end of the tutorial, you’ll have an automated system pulling in Redis statistics for later analysis.

      Prerequisites

      Step 1 — Installing and Configuring Logstash

      In this section, you will install Logstash and configure it to pull statistics from your Redis database cluster, then parse them to send to Elasticsearch for indexing.

      Start off by installing Logstash with the following command:

      • sudo apt install logstash -y

      Once Logstash is installed, enable the service to automatically start on boot:

      • sudo systemctl enable logstash

      Before configuring Logstash to pull the statistics, let’s see what the data itself looks like. To connect to your Redis database, head over to your Managed Database Control Panel, and under the Connection details panel, select Flags from the dropdown:

      Managed Database Control Panel

      You’ll be shown a preconfigured command for the Redli client, which you’ll use to connect to your database. Click Copy and run the following command on your server, replacing redli_flags_command with the command you have just copied:

      Since the output from this command is long, we’ll explain this broken down into its different sections:

      In the output of the Redis info command, sections are marked with #, which signifies a comment. The values are populated in the form of key:value, which makes them relatively easy to parse.

      Output

      # Server redis_version:5.0.4 redis_git_sha1:ab60b2b1 redis_git_dirty:1 redis_build_id:7909f4de3561dc50 redis_mode:standalone os:Linux 5.2.14-200.fc30.x86_64 x86_64 arch_bits:64 multiplexing_api:epoll atomicvar_api:atomic-builtin gcc_version:9.1.1 process_id:72 run_id:ddb7b96c93bbd0c369c6d06ce1c02c78902e13cc tcp_port:25060 uptime_in_seconds:1733 uptime_in_days:0 hz:10 configured_hz:10 lru_clock:8687593 executable:/usr/bin/redis-server config_file:/etc/redis.conf # Clients connected_clients:3 client_recent_max_input_buffer:2 client_recent_max_output_buffer:0 blocked_clients:0 . . .

      The Server section contains technical information about the Redis build, such as its version and the Git commit it’s based on. While the Clients section provides the number of currently opened connections.

      Output

      . . . # Memory used_memory:941560 used_memory_human:919.49K used_memory_rss:4931584 used_memory_rss_human:4.70M used_memory_peak:941560 used_memory_peak_human:919.49K used_memory_peak_perc:100.00% used_memory_overhead:912190 used_memory_startup:795880 used_memory_dataset:29370 used_memory_dataset_perc:20.16% allocator_allocated:949568 allocator_active:1269760 allocator_resident:3592192 total_system_memory:1030356992 total_system_memory_human:982.62M used_memory_lua:37888 used_memory_lua_human:37.00K used_memory_scripts:0 used_memory_scripts_human:0B number_of_cached_scripts:0 maxmemory:463470592 maxmemory_human:442.00M maxmemory_policy:allkeys-lru allocator_frag_ratio:1.34 allocator_frag_bytes:320192 allocator_rss_ratio:2.83 allocator_rss_bytes:2322432 rss_overhead_ratio:1.37 rss_overhead_bytes:1339392 mem_fragmentation_ratio:5.89 mem_fragmentation_bytes:4093872 mem_not_counted_for_evict:0 mem_replication_backlog:0 mem_clients_slaves:0 mem_clients_normal:116310 mem_aof_buffer:0 mem_allocator:jemalloc-5.1.0 active_defrag_running:0 lazyfree_pending_objects:0 . . .

      Here Memory confirms how much RAM Redis has allocated for itself, as well as the maximum amount of memory it can possibly use. If it starts running out of memory, it will free up keys using the strategy you specified in the Control Panel (shown in the maxmemory_policy field in this output).

      Output

      . . . # Persistence loading:0 rdb_changes_since_last_save:0 rdb_bgsave_in_progress:0 rdb_last_save_time:1568966978 rdb_last_bgsave_status:ok rdb_last_bgsave_time_sec:0 rdb_current_bgsave_time_sec:-1 rdb_last_cow_size:217088 aof_enabled:0 aof_rewrite_in_progress:0 aof_rewrite_scheduled:0 aof_last_rewrite_time_sec:-1 aof_current_rewrite_time_sec:-1 aof_last_bgrewrite_status:ok aof_last_write_status:ok aof_last_cow_size:0 # Stats total_connections_received:213 total_commands_processed:2340 instantaneous_ops_per_sec:1 total_net_input_bytes:39205 total_net_output_bytes:776988 instantaneous_input_kbps:0.02 instantaneous_output_kbps:2.01 rejected_connections:0 sync_full:0 sync_partial_ok:0 sync_partial_err:0 expired_keys:0 expired_stale_perc:0.00 expired_time_cap_reached_count:0 evicted_keys:0 keyspace_hits:0 keyspace_misses:0 pubsub_channels:0 pubsub_patterns:0 latest_fork_usec:353 migrate_cached_sockets:0 slave_expires_tracked_keys:0 active_defrag_hits:0 active_defrag_misses:0 active_defrag_key_hits:0 active_defrag_key_misses:0 . . .

      In the Persistence section, you can see the last time Redis saved the keys it stores to disk, and if it was successful. The Stats section provides numbers related to client and in-cluster connections, the number of times the requested key was (or wasn’t) found, and so on.

      Output

      . . . # Replication role:master connected_slaves:0 master_replid:9c1d345a46d29d08537981c4fc44e312a21a160b master_replid2:0000000000000000000000000000000000000000 master_repl_offset:0 second_repl_offset:-1 repl_backlog_active:0 repl_backlog_size:46137344 repl_backlog_first_byte_offset:0 repl_backlog_histlen:0 . . .

      Note: The Redis project uses the terms “master” and “slave” in its documentation and in various commands. DigitalOcean generally prefers the alternative terms “primary” and “replica.”
      This guide will default to the terms “primary” and “replica” whenever possible, but note that there are a few instances where the terms “master” and “slave” unavoidably come up.

      By looking at the role under Replication, you’ll know if you’re connected to a primary or replica node. The rest of the section provides the number of currently connected replicas and the amount of data that the replica is lacking in regards to the primary. There may be additional fields if the instance you are connected to is a replica.

      Output

      . . . # CPU used_cpu_sys:1.972003 used_cpu_user:1.765318 used_cpu_sys_children:0.000000 used_cpu_user_children:0.001707 # Cluster cluster_enabled:0 # Keyspace

      Under CPU, you’ll see the amount of system (used_cpu_sys) and user (used_cpu_user) CPU Redis is consuming at the moment. The Cluster section contains only one unique field, cluster_enabled, which serves to indicate that the Redis cluster is running.

      Logstash will be tasked to periodically run the info command on your Redis database (similar to how you just did), parse the results, and send them to Elasticsearch. You’ll then be able to access them later from Kibana.

      You’ll store the configuration for indexing Redis statistics in Elasticsearch in a file named redis.conf under the /etc/logstash/conf.d directory, where Logstash stores configuration files. When started as a service, it will automatically run them in the background.

      Create redis.conf using your favorite editor (for example, nano):

      • sudo nano /etc/logstash/conf.d/redis.conf

      Add the following lines:

      /etc/logstash/conf.d/redis.conf

      input {
          exec {
              command => "redis_flags_command info"
              interval => 10
              type => "redis_info"
          }
      }
      
      filter {
          kv {
              value_split => ":"
              field_split => "rn"
              remove_field => [ "command", "message" ]
          }
      
          ruby {
              code =>
              "
              event.to_hash.keys.each { |k|
                  if event.get(k).to_i.to_s == event.get(k) # is integer?
                      event.set(k, event.get(k).to_i) # convert to integer
                  end
                  if event.get(k).to_f.to_s == event.get(k) # is float?
                      event.set(k, event.get(k).to_f) # convert to float
                  end
              }
              puts 'Ruby filter finished'
              "
          }
      }
      
      output {
          elasticsearch {
              hosts => "http://localhost:9200"
              index => "%{type}"
          }
      }
      

      Remember to replace redis_flags_command with the command shown in the control panel that you used earlier in the step.

      You define an input, which is a set of filters that will run on the collected data, and an output that will send the filtered data to Elasticsearch. The input consists of the exec command, which will run a command on the server periodically, after a set time interval (expressed in seconds). It also specifies a type parameter that defines the document type when indexed in Elasticsearch. The exec block passes down an object containing two fields, command and message string. The command field will contain the command that was run, and the message will contain its output.

      There are two filters that will run sequentially on the data collected from the input. The kv filter stands for key-value filter, and is built-in to Logstash. It is used for parsing data in the general form of keyvalue_separatorvalue and provides parameters for specifying what are considered a value and field separators. The field separator pertains to strings that separate the data formatted in the general form from each other. In the case of the output of the Redis INFO command, the field separator (field_split) is a new line, and the value separator (value_split) is :. Lines that do not follow the defined form will be discarded, including comments.

      To configure the kv filter, you pass : to thevalue_split parameter, and rn (signifying a new line) to the field_split parameter. You also order it to remove the command and message fields from the current data object by passing them to remove_field as elements of an array, because they contain data that are now useless.

      The kv filter represents the value it parsed as a string (text) type by design. This raises an issue because Kibana can’t easily process string types, even if it’s actually a number. To solve this, you’ll use custom Ruby code to convert the number-only strings to numbers, where possible. The second filter is a ruby block that provides a code parameter accepting a string containing the code to be run.

      event is a variable that Logstash provides to your code, and contains the current data in the filter pipeline. As was noted before, filters run one after another, meaning that the Ruby filter will receive the parsed data from the kv filter. The Ruby code itself converts the event to a Hash and traverses through the keys, then checks if the value associated with the key could be represented as an integer or as a float (a number with decimals). If it can, the string value is replaced with the parsed number. When the loop finishes, it prints out a message (Ruby filter finished) to report progress.

      The output sends the processed data to Elasticsearch for indexing. The resulting document will be stored in the redis_info index, defined in the input and passed in as a parameter to the output block.

      Save and close the file.

      You’ve installed Logstash using apt and configured it to periodically request statistics from Redis, process them, and send them to your Elasticsearch instance.

      Step 2 — Testing the Logstash Configuration

      Now you’ll test the configuration by running Logstash to verify it will properly pull the data.

      Logstash supports running a specific configuration by passing its file path to the -f parameter. Run the following command to test your new configuration from the last step:

      • sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/redis.conf

      It may take some time to show the output, but you’ll soon see something similar to the following:

      Output

      WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console [WARN ] 2019-09-20 11:59:53.440 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified [INFO ] 2019-09-20 11:59:53.459 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"6.8.3"} [INFO ] 2019-09-20 12:00:02.543 [Converge PipelineAction::Create<main>] pipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>2, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50} [INFO ] 2019-09-20 12:00:03.331 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-09-20 12:00:03.727 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-09-20 12:00:04.015 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>6} [WARN ] 2019-09-20 12:00:04.020 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6} [INFO ] 2019-09-20 12:00:04.071 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-09-20 12:00:04.100 [Ruby-0-Thread-5: :1] elasticsearch - Using default mapping template [INFO ] 2019-09-20 12:00:04.146 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}} [INFO ] 2019-09-20 12:00:04.295 [[main]-pipeline-manager] exec - Registering Exec Input {:type=>"redis_info", :command=>"...", :interval=>10, :schedule=>nil} [INFO ] 2019-09-20 12:00:04.315 [Converge PipelineAction::Create<main>] pipeline - Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x73adceba run>"} [INFO ] 2019-09-20 12:00:04.483 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]} [INFO ] 2019-09-20 12:00:05.318 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600} Ruby filter finished Ruby filter finished Ruby filter finished ...

      You’ll see the Ruby filter finished message being printed at regular intervals (set to 10 seconds in the previous step), which means that the statistics are being shipped to Elasticsearch.

      You can exit Logstash by clicking CTRL + C on your keyboard. As previously mentioned, Logstash will automatically run all config files found under /etc/logstash/conf.d in the background when started as a service. Run the following command to start it:

      • sudo systemctl start logstash

      You’ve run Logstash to check if it can connect to your Redis cluster and gather data. Next, you’ll explore some of the statistical data in Kibana.

      Step 3 — Exploring Imported Data in Kibana

      In this section, you’ll explore and visualize the statistical data describing your database’s performance in Kibana.

      In your web browser, navigate to your domain where you exposed Kibana as a part of the prerequisite. You’ll see the default welcome page:

      Kibana - Welcome Page

      Before exploring the data Logstash is sending to Elasticsearch, you’ll first need to add the redis_info index to Kibana. To do so, click on Management from the left-hand vertical sidebar, and then on Index Patterns under the Kibana section.

      Kibana - Index Pattern Creation

      You’ll see a form for creating a new Index Pattern. Index Patterns in Kibana provide a way to pull in data from multiple Elasticsearch indexes at once, and can be used to explore only one index.

      Beneath the Index pattern text field, you’ll see the redis_info index listed. Type it in the text field and then click on the Next step button.

      You’ll then be asked to choose a timestamp field, so you’ll later be able to narrow your searches by a time range. Logstash automatically adds one, called @timestamp. Select it from the dropdown and click on Create index pattern to finish adding the index to Kibana.

      Kibana - Index Pattern Timestamp Selection

      To create and see existing visualizations, click on the Visualize item in the left-hand vertical menu. You’ll see the following page:

      Kibana - Visualizations

      To create a new visualization, click on the Create a visualization button, then select Line from the list of types that will pop up. Then, select the redis_info* index pattern you have just created as the data source. You’ll see an empty visualization:

      Kibana - Empty Visualization

      The left-side panel provides a form for editing parameters that Kibana will use to draw the visualization, which will be shown on the central part of the screen. On the upper-right hand side of the screen is the date range picker. If the @timestamp field is being used in the visualization, Kibana will only show the data belonging to the time interval specified in the range picker.

      You’ll now visualize the average Redis memory usage during a specified time interval. Click on Y-Axis under Metrics in the panel on the left to unfold it, then select Average as the Aggregation and select used_memory as the Field. This will populate the Y axis of the plot with the average values.

      Next, click on X-Axis under Buckets. For the Aggregation, choose Date Histogram. @timestamp should be automatically selected as the Field. Then, show the visualization by clicking on the blue play button on the top of the panel. If your database is brand new and not used you won’t see a very long line. In all cases, however, you will see an accurate portrayal of average memory usage. Here is how the resulting visualization may look after little to no usage:

      Kibana - Redis Memory Usage Visualization

      In this step, you have visualized memory usage of your managed Redis database, using Kibana. You can also use other plot types Kibana offers, such as the Visual Builder, to create more complicated graphs that portray more than one field at the same time. This will allow you to gain a better understanding of how your database is being used, which will help you optimize client applications, as well as your database itself.

      Conclusion

      You now have the Elastic stack installed on your server and configured to pull statistics data from your managed Redis database on a regular basis. You can analyze and visualize the data using Kibana, or some other suitable software, which will help you gather valuable insights and real-world correlations into how your database is performing.

      For more information about what you can do with your Redis Managed Database, visit the product docs. If you’d like to present the database statistics using another visualization type, check out the Kibana docs for further instructions.



      Source link

      How To Analyze Managed PostgreSQL Database Statistics Using the Elastic Stack on Ubuntu 18.04


      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      Database monitoring is the continuous process of systematically tracking various metrics that show how the database is performing. By observing the performance data, you can gain valuable insights and identify possible bottlenecks, as well as find additional ways of improving database performance. Such systems often implement alerting, which notifies administrators when things go wrong. Gathered statistics can be used to not only improve the configuration and workflow of the database, but also those of client applications.

      The benefit of using the Elastic Stack (ELK stack) for monitoring your managed database is its excellent support for searching and the ability to ingest new data very quickly. It does not excel at updating the data, but this trade off is acceptable for monitoring and logging purposes, where past data is almost never changed. Elasticsearch offers powerful means of querying the data, which you can use through Kibana to get a better understanding of how the database fares through different time periods. This will allow you to correlate database load with real-life events to gain insight into how the database is being used.

      In this tutorial, you’ll import database metrics, generated by the PostgreSQL statistics collector, into Elasticsearch via Logstash. This entails configuring Logstash to pull data from the database using the PostgreSQL JDBC connector to send it to Elasticsearch for indexing immediately afterward. The imported data can later be analyzed and visualized in Kibana. Then, if your database is brand new, you’ll use pgbench, a PostgreSQL benchmarking tool, to create more interesting visualizations. In the end, you’ll have an automated system pulling in PostgreSQL statistics for later analysis.

      Prerequisites

      Step 1 — Setting up Logstash and the PostgreSQL JDBC Driver

      In this section, you will install Logstash and download the PostgreSQL JDBC driver so that Logstash will be able to connect to your managed database.

      Start off by installing Logstash with the following command:

      • sudo apt install logstash -y

      Once Logstash is installed, enable the service to automatically start on boot:

      • sudo systemctl enable logstash

      Logstash is written in Java, so in order to connect to PostgreSQL it requires the PostgreSQL JDBC (Java Database Connectivity) library to be available on the system it is running on. Because of an internal limitation, Logstash will properly load the library only if it is found under the /usr/share/logstash/logstash-core/lib/jars directory, where it stores third-party libraries it uses.

      Head over to the download page of the JDBC library and copy the link to latest version. Then, download it using curl by running the following command:

      • sudo curl https://jdbc.postgresql.org/download/postgresql-42.2.6.jar -o /usr/share/logstash/logstash-core/lib/jars/postgresql-jdbc.jar

      At the time of writing, the latest version of the library was 42.2.6, with Java 8 as the supported runtime version. Ensure you download the latest version; pairing it with the correct Java version that both JDBC and Logstash support.

      Logstash stores its configuration files under /etc/logstash/conf.d, and is itself stored under /usr/share/logstash/bin. Before you create a configuration that will pull statistics from your database, you’ll need to enable the JDBC plugin in Logstash by running the following command:

      • sudo /usr/share/logstash/bin/logstash-plugin install logstash-input-jdbc

      You’ve installed Logstash using apt and downloaded the PostgreSQL JDBC library so that Logstash can use it to connect to your managed database. In the next step, you will configure Logstash to pull statistical data from it.

      Step 2 — Configuring Logstash To Pull Statistics

      In this section, you will configure Logstash to pull metrics from your managed PostgreSQL database.

      You’ll configure Logstash to watch over three system databases in PostgreSQL, namely:

      • pg_stat_database: provides statistics about each database, including its name, number of connections, transactions, rollbacks, rows returned by querying the database, deadlocks, and so on. It has a stats_reset field, which specifies when the statistics were last reset.
      • pg_stat_user_tables: provides statistics about each table created by the user, such as the number of inserted, deleted, and updated rows.
      • pg_stat_user_indexes: collects data about all indexes in user-created tables, such as the number of times a particular index has been scanned.

      You’ll store the configuration for indexing PostgreSQL statistics in Elasticsearch in a file named postgresql.conf under the /etc/logstash/conf.d directory, where Logstash stores configuration files. When started as a service, it will automatically run them in the background.

      Create postgresql.conf using your favorite editor (for example, nano):

      • sudo nano /etc/logstash/conf.d/postgresql.conf

      Add the following lines:

      /etc/logstash/conf.d/postgresql.conf

      input {
              # pg_stat_database
              jdbc {
                      jdbc_driver_library => ""
                      jdbc_driver_class => "org.postgresql.Driver"
                      jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                      jdbc_user => "username"
                      jdbc_password => "password"
                      statement => "SELECT * FROM pg_stat_database"
                      schedule => "* * * * *"
                      type => "pg_stat_database"
              }
      
              # pg_stat_user_tables
              jdbc {
                      jdbc_driver_library => ""
                      jdbc_driver_class => "org.postgresql.Driver"
                      jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                      jdbc_user => "username"
                      jdbc_password => "password"
                      statement => "SELECT * FROM pg_stat_user_tables"
                      schedule => "* * * * *"
                      type => "pg_stat_user_tables"
              }
      
              # pg_stat_user_indexes
              jdbc {
                      jdbc_driver_library => ""
                      jdbc_driver_class => "org.postgresql.Driver"
                      jdbc_connection_string => "jdbc:postgresql://host:port/defaultdb"
                      jdbc_user => "username"
                      jdbc_password => "password"
                      statement => "SELECT * FROM pg_stat_user_indexes"
                      schedule => "* * * * *"
                      type => "pg_stat_user_indexes"
              }
      }
      
      output {
              elasticsearch {
                      hosts => "http://localhost:9200"
                      index => "%{type}"
              }
      }
      

      Remember to replace host with your host address, port with the port to which you can connect to your database, username with the database user username, and password with its password. All these values can be found in the Control Panel of your managed database.

      In this configuration, you define three JDBC inputs and one Elasticsearch output. The three inputs pull data from the pg_stat_database, pg_stat_user_tables, and pg_stat_user_indexes databases, respectively. They all set the jdbc_driver_library parameter to an empty string, because the PostgreSQL JDBC library is in a folder that Logstash automatically loads.

      Then, they set the jdbc_driver_class, whose value is specific to the JDBC library, and provide a jdbc_connection_string, which details how to connect to the database. The jdbc: part signifies that it is a JDBC connection, while postgres:// indicates that the target database is PostgreSQL. Next come the host and port of the database, and after the forward slash you also specify a database to connect to; this is because PostgreSQL requires you to be connected to a database to be able to issue any queries. Here, it is set to the default database that always exists and can not be deleted, aptly named defaultdb.

      Next, they set a username and password of the user through which the database will be accessed. The statement parameter contains a SQL query that should return the data you wish to process—in this configuration, it selects all rows from the appropriate database.

      The schedule parameter accepts a string in cron syntax that defines when Logstash should run this input; omitting it completely will make Logstash run it only once. Specifying * * * * *, as you have done so here, will tell Logstash to run it every minute. You can specify your own cron string if you want to collect data at different intervals.

      There is only one output, which accepts data from three inputs. They all send data to Elasticsearch, which is running locally and is reachable at http://localhost:9200. The index parameter defines to which Elasticsearch index it will send the data, and its value is passed in from the type field of the input.

      When you are done with editing, save and close the file.

      You’ve configured Logstash to gather data from various PostgreSQL statistical tables and send them to Elasticsearch for storage and indexing. Next, you’ll run Logstash to test the configuration.

      Step 3 — Testing the Logstash Configuration

      In this section, you will test the configuration by running Logstash to verify it will properly pull the data. Then, you will make this configuration run in the background by configuring it as a Logstash pipeline.

      Logstash supports running a specific configuration by passing its file path to the -f parameter. Run the following command to test your new configuration from the last step:

      • sudo /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/postgresql.conf

      It may take some time before it shows any output, which will look similar to this:

      Output

      Thread.exclusive is deprecated, use Thread::Mutex WARNING: Could not find logstash.yml which is typically located in $LS_HOME/config or /etc/logstash. You can specify the path using --path.settings. Continuing using the defaults Could not find log4j2 configuration at path /usr/share/logstash/config/log4j2.properties. Using default config which logs errors to the console [WARN ] 2019-08-02 18:29:15.123 [LogStash::Runner] multilocal - Ignoring the 'pipelines.yml' file because modules or command line options are specified [INFO ] 2019-08-02 18:29:15.154 [LogStash::Runner] runner - Starting Logstash {"logstash.version"=>"7.3.0"} [INFO ] 2019-08-02 18:29:18.209 [Converge PipelineAction::Create<main>] Reflections - Reflections took 77 ms to scan 1 urls, producing 19 keys and 39 values [INFO ] 2019-08-02 18:29:20.195 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:20.667 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.221 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.230 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.274 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.337 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.369 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.386 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.409 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.430 [[main]-pipeline-manager] elasticsearch - Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}} [WARN ] 2019-08-02 18:29:21.444 [[main]-pipeline-manager] elasticsearch - Restored connection to ES instance {:url=>"http://localhost:9200/"} [INFO ] 2019-08-02 18:29:21.465 [[main]-pipeline-manager] elasticsearch - ES Output version determined {:es_version=>7} [WARN ] 2019-08-02 18:29:21.466 [[main]-pipeline-manager] elasticsearch - Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7} [INFO ] 2019-08-02 18:29:21.468 [Ruby-0-Thread-7: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.538 [Ruby-0-Thread-5: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.545 [[main]-pipeline-manager] elasticsearch - New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]} [INFO ] 2019-08-02 18:29:21.589 [Ruby-0-Thread-9: :1] elasticsearch - Using default mapping template [INFO ] 2019-08-02 18:29:21.696 [Ruby-0-Thread-5: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.769 [Ruby-0-Thread-7: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [INFO ] 2019-08-02 18:29:21.771 [Ruby-0-Thread-9: :1] elasticsearch - Attempting to install template {:manage_template=>{"index_patterns"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s", "number_of_shards"=>1}, "mappings"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}} [WARN ] 2019-08-02 18:29:21.871 [[main]-pipeline-manager] LazyDelegatingGauge - A gauge metric of an unknown type (org.jruby.specialized.RubyArrayOneObject) has been create for key: cluster_uuids. This may result in invalid serialization. It is recommended to log an issue to the responsible developer/development team. [INFO ] 2019-08-02 18:29:21.878 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>1, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>125, :thread=>"#<Thread:0x470bf1ca run>"} [INFO ] 2019-08-02 18:29:22.351 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"} [INFO ] 2019-08-02 18:29:22.721 [Ruby-0-Thread-1: /usr/share/logstash/lib/bootstrap/environment.rb:6] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]} [INFO ] 2019-08-02 18:29:23.798 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600} /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/cronline.rb:77: warning: constant ::Fixnum is deprecated [INFO ] 2019-08-02 18:30:02.333 [Ruby-0-Thread-22: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:284] jdbc - (0.042932s) SELECT * FROM pg_stat_user_indexes [INFO ] 2019-08-02 18:30:02.340 [Ruby-0-Thread-23: /usr/share/logstash/vendor/bundle/jruby/2.5.0/gems/rufus-scheduler-3.0.9/lib/rufus/scheduler/jobs.rb:331] jdbc - (0.043178s) SELECT * FROM pg_stat_user_tables [INFO ] 2019-08-02 18:30:02.340 [Ruby-0-Thread-24: :1] jdbc - (0.036469s) SELECT * FROM pg_stat_database ...

      If Logstash does not show any errors and logs that it has successfully SELECTed rows from the three databases, your database metrics will be shipped to Elasticsearch. If you get an error, double check all the values in the configuration file to ensure that the machine you’re running Logstash on can connect to the managed database.

      Logstash will continue importing the data at specified times. You can safely stop it by pressing CTRL+C.

      As previously mentioned, when started as a service, Logstash automatically runs all configuration files it finds under /etc/logstash/conf.d in the background. Run the following command to start it as a service:

      • sudo systemctl start logstash

      In this step, you ran Logstash to check if it can connect to your database and gather data. Next, you’ll visualize and explore some of the statistical data in Kibana.

      Step 4 — Exploring Imported Data in Kibana

      In this section, you’ll see how you can explore the statistical data describing your database’s performance in Kibana.

      In your browser, navigate to the Kibana installation you set up as a prerequisite. You’ll see the default welcome page.

      Kibana - Default Welcome Page

      To interact with Elasticsearch indexes in Kibana, you’ll need to create an index pattern. Index patterns specify on which indexes Kibana should operate. To create one, press on the last icon (wrench) from the left-hand vertical sidebar to open the Management page. Then, from the left menu, press on Index Patterns under Kibana. You’ll see a dialog box for creating an index pattern.

      Kibana - Add Index Pattern

      Listed are the three indexes where Logstash has been sending statistics. Type in pg_stat_database in the Index Pattern input box and then press Next step. You’ll be asked to select a field that stores time, so you’ll be able to later narrow your data by a time range. From the dropdown, select @timestamp.

      Kibana - Index Pattern Timestamp Field

      Press on Create index pattern to finish creating the index pattern. You’ll now be able to explore it using Kibana. To create a visualization, press on the second icon in the sidebar, and then on Create new visualization. Select the Line visualization when the form pops up, and then choose the index pattern you have just created (pg_stat_database). You’ll see an empty visualization.

      Kibana - Empty Visualisation

      On the central part of the screen is the resulting plot—the left-side panel governs its generation from which you can set the data for X and Y axis. In the upper right-hand side of the screen is the date range picker. Unless you specifically choose another range when configuring the data, that range will be shown on the plot.

      You’ll now visualize the average number of data tuples INSERTed on minutes in the given interval. Press on Y-Axis under Metrics in the panel on the left to unfold it. Select Average as the Aggregation and select tup_inserted as the Field. This will populate the Y axis of the plot with the average values.

      Next, press on X-Axis under Buckets. For the Aggregation, choose Date Histogram. @timestamp should be automatically selected as the Field. Then, press on the blue play button on the top of the panel to generate your graph. If your database is brand new and not used, you won’t see anything yet. In all cases, however, you will see an accurate portrayal of database usage.

      Kibana supports many other visualization forms—you can explore other forms in the Kibana documentation. You can also add the two remaining indexes, mentioned in Step 2, into Kibana to be able to visualize them as well.

      In this step, you have learned how to visualize some of the PostgreSQL statistical data, using Kibana.

      Step 5 — (Optional) Benchmarking Using pgbench

      If you haven’t yet worked in your database outside of this tutorial, you can complete this step to create more interesting visualizations by using pgbench to benchmark your database. pgbench will run the same SQL commands over and over, simulating real-world database use by an actual client.

      You’ll first need to install pgbench by running the following command:

      • sudo apt install postgresql-contrib -y

      Because pgbench will insert and update test data, you’ll need to create a separate database for it. To do so, head over to the Users & Databases tab in the Control Panel of your managed database, and scroll down to the Databases section. Type in pgbench as the name of the new database, and then press on Save. You’ll pass this name, as well as the host, port, and username information to pgbench.

      Accessing Databases section in DO control panel

      Before actually running pgbench, you’ll need to run it with the -i flag to initialize its database:

      • pgbench -h host -p port -U username -i pgbench

      You’ll need to replace host with your host address, port with the port to which you can connect to your database, and username with the database user username. You can find all these values in the Control Panel of your managed database.

      Notice that pgbench does not have a password argument; instead, you’ll be asked for it every time you run it.

      The output will look like the following:

      Output

      NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping creating tables... 100000 of 100000 tuples (100%) done (elapsed 0.16 s, remaining 0.00 s) vacuum... set primary keys... done.

      pgbench created four tables, which it will use for benchmarking, and populated them with some example rows. You’ll now be able to run benchmarks.

      The two most important arguments that limit for how long the benchmark will run are -t, which specifies the number of transactions to complete, and -T, which defines for how many seconds the benchmark should run. These two options are mutually exclusive. At the end of each benchmark, you’ll receive statistics, such as the number of transactions per second (tps).

      Now, start a benchmark that will last for 30 seconds by running the following command:

      • pgbench -h host -p port -U username pgbench -T 30

      The output will look like:

      Output

      starting vacuum...end. transaction type: <builtin: TPC-B (sort of)> scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 duration: 30 s number of transactions actually processed: 7602 latency average = 3.947 ms tps = 253.382298 (including connections establishing) tps = 253.535257 (excluding connections establishing)

      In this output, you see the general info about the benchmark, such as the total number of transactions executed. The effect of these benchmarks is that the statistics Logstash ships to Elasticsearch will reflect that number, which will in turn make visualizations in Kibana more interesting and closer to real-world graphs. You can run the preceding command a few more times, and possibly alter the duration.

      When you are done, head over to Kibana and press on Refresh in the upper right corner. You’ll now see a different line than before, which shows the number of INSERTs. Feel free to change the time range of the data shown by changing the values in the picker positioned above the refresh button. Here is how the graph may look after multiple benchmarks of varying duration:

      Kibana - Visualization After Benchmarks

      You’ve used pgbench to benchmark your database, and evaluated the resulting graphs in Kibana.

      Conclusion

      You now have the Elastic stack installed on your server and configured to pull statistics data from your managed PostgreSQL database on a regular basis. You can analyze and visualize the data using Kibana, or some other suitable software, which will help you gather valuable insights and real-world correlations into how your database is performing.

      For more information about what you can do with your PostgreSQL Managed Database, visit the product docs.



      Source link