One place for hosting & domains

      Databases

      Managed Databases Connection Pools and PostgreSQL Benchmarking Using pgbench


      Introduction

      DigitalOcean Managed Databases allows you to scale your PostgreSQL database using several methods. One such method is a built-in connection pooler that allows you to efficiently handle large numbers of client connections and reduce the CPU and memory footprint of these open connections. By using a connection pool and sharing a fixed set of recyclable connections, you can handle significantly more concurrent client connections, and squeeze extra performance out of your PostgreSQL database.

      In this tutorial we’ll use pgbench, PostgreSQL’s built-in benchmarking tool, to run load tests on a DigitalOcean Managed PostgreSQL Database. We’ll dive in to connection pools, describe how they work, and show how to create one using the Cloud Control panel. Finally, using results from the pgbench tests, we’ll demonstrate how using a connection pool can be an inexpensive method of increasing database throughput.

      Prerequisites

      To complete this tutorial, you’ll need:

      • A DigitalOcean Managed PostgreSQL Database cluster. To learn how to provision and configure a DigitalOcean PostgreSQL cluster, consult the Managed Database product documentation.
      • A client machine with PostgreSQL installed. By default, your PostgreSQL installation will contain the pgbench benchmarking utility and the psql client, both of which we’ll use in this guide. Consult How To Install and Use PostgreSQL on Ubuntu 18.04 to learn how to Install PostgreSQL. If you’re not running Ubuntu on your client machine, you can use the version finder to find the appropriate tutorial.

      Once you have a DigitalOcean PostgreSQL cluster up and running and a client machine with pgbench installed, you’re ready to begin with this guide.

      Step 1 — Creating and Initializing benchmark Database

      Before we create a connection pool for our database, we’ll first create the benchmark database on our PostgreSQL cluster and populate it with some dummy data on which pgbench will run its tests. The pgbench utility repeatedly runs a series of five SQL commands (consisting of SELECT, UPDATE, and INSERT queries) in a transaction, using multiple threads and clients, and calculates a useful performance metric called Transactions per Second (TPS). TPS is a measure of database throughput, counting the number of atomic transactions processed by the database in one second. To learn more about the specific commands executed by pgbench, consult What is the “Transaction” Actually Performed in pgbench? from the official pgbench documentation.

      Let’s begin by connecting to our PostgreSQL cluster and creating the benchmark database.

      First, retrieve your cluster’s Connection Details by navigating to Databases and locating your PostgreSQL cluster. Click into your cluster. You should see a cluster overview page containing the following Connection Details box:

      PostgreSQL Cluster Connection Details

      From this, we can parse the following config variables:

      • Admin user: doadmin
      • Admin password: your_password
      • Cluster endpoint: dbaas-test-do-user-3587522-0.db.ondigitalocean.com
      • Connection port: 25060
      • Database to connect to: defaultdb
      • SSL Mode: require (use an SSL-encrypted connection for increased security)

      Take note of these parameters, as you’ll need them when using both the psql client and pgbench tool.

      Click on the dropdown above this box and select Connection String. We’ll copy this string and pass it in to psql to connect to this PostgreSQL node.

      Connect to your cluster using psql and the connection string you just copied:

      • psql postgresql://doadmin:your_password@your_cluster_endpoint:25060/defaultdb?sslmode=require

      You should see the following PostgreSQL client prompt, indicating that you’ve connected to your PostgreSQL cluster successfully:

      Output

      psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. defaultdb=>

      From here, create the benchmark database:

      • CREATE DATABASE benchmark;

      You should see the following output:

      Output

      CREATE DATABASE

      Now, disconnect from the cluster:

      Before we run the pgbench tests, we need to populate this benchmark database with some tables and dummy data required to run the tests.

      To do this, we’ll run pgbench with the following flags:

      • -h: The PostgreSQL cluster endpoint
      • -p: The PostgreSQL cluster connection port
      • -U: The database username
      • -i: Indicates that we'd like to initialize the benchmark database with benchmarking tables and their dummy data.
      • -s : Set a scale factor of 150, which will multiply table sizes by 150. The default scale factor of 1 results in tables of the following sizes:

        table                   # of rows
        ---------------------------------
        pgbench_branches        1
        pgbench_tellers         10
        pgbench_accounts        100000
        pgbench_history         0
        

        Using a scale factor of 150, the pgbench_accounts table will contain 15,000,000 rows.

        Note: To avoid excessive blocked transactions, be sure to set the scale factor to a value at least as large as the number of concurrent clients you intend to test with. In this tutorial we'll test with 150 clients at most, so we set -s to 150 here. To learn more, consult these recommended practices from the official pgbench documentation.

      Run the complete pgbench command:

      • pgbench -h your_cluster_endpoint -p 25060 -U doadmin -i -s 150 benchmark

      After running this command, you will be prompted to enter the password for the database user you specified. Enter the password, and hit ENTER.

      You should see the following output:

      Output

      dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data... 100000 of 15000000 tuples (0%) done (elapsed 0.19 s, remaining 27.93 s) 200000 of 15000000 tuples (1%) done (elapsed 0.85 s, remaining 62.62 s) 300000 of 15000000 tuples (2%) done (elapsed 1.21 s, remaining 59.23 s) 400000 of 15000000 tuples (2%) done (elapsed 1.63 s, remaining 59.44 s) 500000 of 15000000 tuples (3%) done (elapsed 2.05 s, remaining 59.51 s) . . . 14700000 of 15000000 tuples (98%) done (elapsed 70.87 s, remaining 1.45 s) 14800000 of 15000000 tuples (98%) done (elapsed 71.39 s, remaining 0.96 s) 14900000 of 15000000 tuples (99%) done (elapsed 71.91 s, remaining 0.48 s) 15000000 of 15000000 tuples (100%) done (elapsed 72.42 s, remaining 0.00 s) vacuuming... creating primary keys... done.

      At this point, we've created a benchmarking database, populated with the tables and data required to run the pgbench tests. We can now move on to running a baseline test which we'll use to compare performance before and after connection pooling is enabled.

      Step 2 — Running a Baseline pgbench Test

      Before we run our first benchmark, it's worth diving into what we're trying to optimize with connection pools.

      Typically when a client connects to a PostgreSQL database, the main PostgreSQL OS process forks itself into a child process corresponding to this new connection. When there are only a few connections, this rarely presents an issue. However, as clients and connections scale, the CPU and memory overhead of creating and maintaining these connections begins to add up, especially if the application in question does not efficiently use database connections. In addition, the max_connections PostgreSQL setting may limit the number of client connections allowed, resulting in additional connections being refused or dropped.

      A connection pool keeps open a fixed number of database connections, the pool size, which it then uses to distribute and execute client requests. This means that you can accommodate far more simultaneous connections, efficiently deal with idle or stagnant clients, as well as queue up client requests during traffic spikes instead of rejecting them. By recycling connections, you can more efficiently use your machine's resources in an environment where there is a heavy connection volume, and squeeze extra performance out of your database.

      A connection pool can be implemented either on the application side or as middleware between the database and your application. The Managed Databases connection pooler is built on top of pgBouncer, a lightweight, open-source middleware connection pooler for PostgreSQL. Its interface is available via the Cloud Control Panel UI.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool. You should see the following configuration window:

      Connection Pools Config Window

      Here, you can configure the following fields:

      • Pool Name: A unique name for your connection pool
      • Database: The database for which you'd like to pool connections
      • User: The PostgreSQL user the connection pool will authenticate as
      • Mode: One of Session, Transaction, or Statement. This option controls how long the pool assigns a backend connection to a client.
        • Session: The client holds on to the connection until it explicitly disconnects.
        • Transaction: The client obtains the connection until it completes a transaction, after which the connection is returned to the pool.
        • Statement: The pool aggressively recycles connections after each client statement. In statement mode, multi-statement transactions are not allowed. To learn more, consult the Connection Pools product documentation.
      • Pool Size: The number of connections the connection pool will keep open between itself and the database.

      Before we create a connection pool, we'll run a baseline test to which we can compare database performance with connection pooling.

      In this tutorial, we'll use a 4 GB RAM, 2 vCPU, 80 GB Disk, primary node only Managed Database setup. You can scale the benchmark test parameters in this section according to your PostgreSQL cluster specs.

      DigitalOcean Managed Database clusters have the PostgreSQL max_connections parameter preset to 25 connections per 1 GB RAM. A 4 GB RAM PostgreSQL node therefore has max_connections set to 100. In addition, for all clusters, 3 connections are reserved for maintenance. So for this 4 GB RAM PostgreSQL cluster, 97 connections are available for connection pooling.

      With this in mind, let's run our first baseline pgbench test.

      Log in to your client machine. We’ll run pgbench, specifying the database endpoint, port and user as usual. In addition, we’ll provide the following flags:

      • -c: The number of concurrent clients or database sessions to simulate. We set this to 50 so as to simulate a number of concurrent connections smaller than the max_connections parameter for our PostgreSQL cluster.
      • -j: The number of worker threads pgbench will use to run the benchmark. If you're using a multi-CPU machine, you can tune this upwards to distribute clients across threads. On a two-core machine, we set this to 2.
      • -P: Display progress and metrics every 60 seconds.
      • -T: Run the benchmark for 600 seconds (10 minutes). To produce consistent, reproducible results, it's important that you run the benchmark for several minutes, or through one checkpoint cycle.

      We’ll also specify that we'd like to run the benchmark against the benchmark database we created and populated earlier.

      Run the following complete pgbench command:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 50 -j 2 -P 60 -T 600 benchmark

      Hit ENTER and then type in the password for the doadmin user to begin running the test. You should see output similar to the following (results will depend on the specs of your PostgreSQL cluster):

      Output

      starting vacuum...end. progress: 60.0 s, 157.4 tps, lat 282.988 ms stddev 40.261 progress: 120.0 s, 176.2 tps, lat 283.726 ms stddev 38.722 progress: 180.0 s, 167.4 tps, lat 298.663 ms stddev 238.124 progress: 240.0 s, 178.9 tps, lat 279.564 ms stddev 43.619 progress: 300.0 s, 178.5 tps, lat 280.016 ms stddev 43.235 progress: 360.0 s, 178.8 tps, lat 279.737 ms stddev 43.307 progress: 420.0 s, 179.3 tps, lat 278.837 ms stddev 43.783 progress: 480.0 s, 178.5 tps, lat 280.203 ms stddev 43.921 progress: 540.0 s, 180.0 tps, lat 277.816 ms stddev 43.742 progress: 600.0 s, 178.5 tps, lat 280.044 ms stddev 43.705 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 105256 latency average = 282.039 ms latency stddev = 84.244 ms tps = 175.329321 (including connections establishing) tps = 175.404174 (excluding connections establishing)

      Here, we observed that over a 10 minute run with 50 concurrent sessions, we processed 105,256 transactions with a throughput of roughly 175 transactions per second.

      Now, let's run the same test, this time using 150 concurrent clients, a value that is higher than max_connections for this database, to synthetically simulate a mass influx of client connections:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 150 -j 2 -P 60 -T 600 benchmark

      You should see output similar to the following:

      Output

      starting vacuum...end. connection to database "pgbench" failed: FATAL: remaining connection slots are reserved for non-replication superuser connections progress: 60.0 s, 182.6 tps, lat 280.069 ms stddev 42.009 progress: 120.0 s, 253.8 tps, lat 295.612 ms stddev 237.448 progress: 180.0 s, 271.3 tps, lat 276.411 ms stddev 40.643 progress: 240.0 s, 273.0 tps, lat 274.653 ms stddev 40.942 progress: 300.0 s, 272.8 tps, lat 274.977 ms stddev 41.660 progress: 360.0 s, 250.0 tps, lat 300.033 ms stddev 282.712 progress: 420.0 s, 272.1 tps, lat 275.614 ms stddev 42.901 progress: 480.0 s, 261.1 tps, lat 287.226 ms stddev 112.499 progress: 540.0 s, 272.5 tps, lat 275.309 ms stddev 41.740 progress: 600.0 s, 271.2 tps, lat 276.585 ms stddev 41.221 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 154892 latency average = 281.421 ms latency stddev = 125.929 ms tps = 257.994941 (including connections establishing) tps = 258.049251 (excluding connections establishing)

      Note the FATAL error, indicating that pgbench hit the 100 connection limit threshold set by max_connections, resulting in a refused connection. The test was still able to complete, with a TPS of roughly 257.

      At this point we can investigate how a connection pool could potentially improve our database's throughput.

      Step 3 — Creating and Testing a Connection Pool

      In this step we'll create a connection pool and rerun the previous pgbench test to see if we can improve our database's throughput.

      In general, the max_connections setting and connection pool parameters are tuned in tandem to max out the database's load. However, because max_connections is abstracted away from the user in DigitalOcean Managed Databases, our main levers here are the connection pool Mode and Size settings.

      To begin, let's create a connection pool in Transaction mode that keeps open all the available backend connections.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool.

      In the configuration window that appears, fill in the following values:

      Connection Pool Configuration Values

      Here we name our connection pool test-pool, and use it with the benchmark database. Our database user is doadmin and we set the connection pool to Transaction mode. Recall from earlier that for a managed database cluster with 4GB of RAM, there are 97 available database connections. Accordingly, configure the pool to keep open 97 database connections.

      When you're done, hit Create Pool.

      You should now see this pool in the Control Panel:

      Connection Pool in Control Panel

      Grab its URI by clicking Connection Details. It should look something like the following

      postgres://doadmin:password@pool_endpoint:pool_port/test-pool?sslmode=require
      

      You should notice a different port here, and potentially a different endpoint and database name, corresponding to the pool name test-pool.

      Now that we've created the test-pool connection pool, we can rerun the pgbench test we ran above.

      Rerun pgbench

      From your client machine, run the following pgbench command (with 150 concurrent clients), making sure to substitute the highlighted values with those in your connection pool URI:

      • pgbench -h pool_endpoint -p pool_port -U doadmin -c 150 -j 2 -P 60 -T 600 test-pool

      Here we once again use 150 concurrent clients, run the test across 2 threads, print progress every 60 seconds, and run the test for 600 seconds. We set the database name to test-pool, the name of the connection pool.

      Once the test completes, you should see output similar to the following (note that these results will vary depending on the specs of your database node):

      Output

      starting vacuum...end. progress: 60.0 s, 240.0 tps, lat 425.251 ms stddev 59.773 progress: 120.0 s, 350.0 tps, lat 428.647 ms stddev 57.084 progress: 180.0 s, 340.3 tps, lat 440.680 ms stddev 313.631 progress: 240.0 s, 364.9 tps, lat 411.083 ms stddev 61.106 progress: 300.0 s, 366.5 tps, lat 409.367 ms stddev 60.165 progress: 360.0 s, 362.5 tps, lat 413.750 ms stddev 59.005 progress: 420.0 s, 359.5 tps, lat 417.292 ms stddev 60.395 progress: 480.0 s, 363.8 tps, lat 412.130 ms stddev 60.361 progress: 540.0 s, 351.6 tps, lat 426.661 ms stddev 62.960 progress: 600.0 s, 344.5 tps, lat 435.516 ms stddev 65.182 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 206768 latency average = 421.719 ms latency stddev = 114.676 ms tps = 344.240797 (including connections establishing) tps = 344.385646 (excluding connections establishing)

      Notice here that we were able to increase our database's throughput from 257 TPS to 344 TPS with 150 concurrent connections (an increase of 33%), and did not run up against the max_connections limit we previously hit without a connection pool. By placing a connection pool in front of the database, we can avoid dropped connections and significantly increase database throughput in an environment with a large number of simultaneous connections.

      If you run this same test, but with a -c value of 50 (specifying a smaller number of clients), the gains from using a connection pool become much less evident:

      Output

      starting vacuum...end. progress: 60.0 s, 154.0 tps, lat 290.592 ms stddev 35.530 progress: 120.0 s, 162.7 tps, lat 307.168 ms stddev 241.003 progress: 180.0 s, 172.0 tps, lat 290.678 ms stddev 36.225 progress: 240.0 s, 172.4 tps, lat 290.169 ms stddev 37.603 progress: 300.0 s, 177.8 tps, lat 281.214 ms stddev 35.365 progress: 360.0 s, 177.7 tps, lat 281.402 ms stddev 35.227 progress: 420.0 s, 174.5 tps, lat 286.404 ms stddev 34.797 progress: 480.0 s, 176.1 tps, lat 284.107 ms stddev 36.540 progress: 540.0 s, 173.1 tps, lat 288.771 ms stddev 38.059 progress: 600.0 s, 174.5 tps, lat 286.508 ms stddev 59.941 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 102938 latency average = 288.509 ms latency stddev = 83.503 ms tps = 171.482966 (including connections establishing) tps = 171.553434 (excluding connections establishing)

      Here we see that we were not able to increase throughput by using a connection pool. Our throughput went down to 171 TPS from 175 TPS.

      Although in this guide we use pgbench with its built-in benchmark data set, the best test for determining whether or not to use a connection pool is a benchmark load that accurately represents production load on your database, against production data. Creating custom benchmarking scripts and data is beyond the scope of this guide, but to learn more, consult the official pgbench documentation.

      Note: The pool size setting is highly workload-specific. In this guide, we configured the connection pool to use all the available backend database connections. This was because throughout our benchmark, the database rarely reached full utilization (you can monitor database load from the Metrics tab in the Cloud Control Panel). Depending on your database's load, this may not be the optimal setting. If you notice that your database is constantly fully saturated, shrinking the connection pool may increase throughput and improve performance by queuing additional requests instead of trying to execute them all at the same time on an already loaded server.

      Conclusion

      DigitalOcean Managed Databases connection pooling is a powerful feature that can help you quickly squeeze extra performance out of your database. Along with other techniques like replication, caching, and sharding, connection pooling can help you scale your database layer to process an even greater volume of requests.

      In this guide we focused on a simplistic and synthetic testing scenario using PostgreSQL's built-in pgbench benchmarking tool and its default benchmark test. In any production scenario, you should run benchmarks against actual production data while simulating production load. This will allow you to tune your database for your particular usage pattern.

      Along with pgbench, other tools exist to benchmark and load your database. One such tool developed by Percona is sysbench-tpcc. Another is Apache's JMeter, which can load test databases as well as web applications.

      To learn more about DigitalOcean Managed Databases, consult the Managed Databases product documentation. To learn more about sharding, another useful scaling technique, consult Understanding Database Sharding.

      References



      Source link

      Understanding Managed Databases


      Introduction

      Secure, reliable data storage is a must for nearly every modern application. However, the infrastructure needed for a self-managed, on-premises database can be prohibitively expensive for many teams. Similarly, employees who have the skills and experience needed to maintain a production database effectively can be difficult to come by.

      The spread of cloud computing services has lowered the barriers to entry associated with provisioning a database, but many developers still lack the time or expertise needed to manage and tune a database to suit their needs. For this reason, many businesses are turning to managed database services to help them build and scale their databases in line with their growth.

      In this conceptual article, we will go over what managed databases are and how they can be beneficial to many organizations. We will also cover some practical considerations one should make before building their next application on top of a managed database solution.

      Managed Databases in a Nutshell

      A managed database is a cloud computing service in which the end user pays a cloud service provider for access to a database. Unlike a typical database, users don’t have to set up or maintain a managed database on their own; rather, it’s the provider’s responsibility to oversee the database’s infrastructure. This allows the user to focus on building their application instead of spending time configuring their database and keeping it up to date.

      The process of provisioning a managed database varies by provider, but in general it’s similar to that of any other cloud-based service. After registering an account and logging in to the dashboard, the user reviews the available database options — such as the database engine and cluster size — and then chooses the setup that’s right for them. After you provision the managed database, you can connect to it through a GUI or client and can then begin loading data and and integrating the database with your application.

      Managed data solutions simplify the process of provisioning and maintaining a database. Instead of running commands from a terminal to install and set one up, you can deploy a production-ready database with just a few clicks in your browser. By simplifying and automating database management, cloud providers make it easier for anyone, even novice database users, to build data-driven applications and websites. This was the result of a decades-long trend towards simplifying, automating, and abstracting various database management tasks, which was itself a response to pain points long felt by database administrators.

      Pain Points of On-Premises and Self-Managed Databases

      Prior to the rise of the cloud computing model, any organization in need of a data center had to supply all the time, space, and resources that went into setting one up. Once their database was up and running, they also had to maintain the hardware, keep its software updated, hire a team to manage the database, and train their employees on how to use it.

      As cloud computing services grew in popularity in the 2000s, it became easier and more affordable to provision server infrastructure, since the hardware and the space required for it no longer had to be owned or managed by those using it. Likewise, setting up a database entirely within the cloud became far less difficult; a business or developer would just have to requisition a server, install and configure their chosen database management system, and begin storing data.

      While cloud computing did make the process of setting up a traditional database easier, it didn’t address all of its problems. For instance, in the cloud it can still be difficult to pinpoint the ideal size of a database’s infrastructure footprint before it begins collecting data. This is important because cloud consumers are charged based on the resources they consume, and they risk paying for more than what they require if the server they provision is larger than necessary. Additionally, as with traditional on-premises databases, managing one’s database in the cloud can be a costly endeavor. Depending on your needs, you may still need to hire an experienced database administrator or spend a significant amount of time and money training your existing staff to manage your database effectively.

      Many of these issues are compounded for smaller organizations and independent developers. While a large business can usually afford to hire employees with a deep knowledge of databases, smaller teams usually have fewer resources available, leaving them with only their existing institutional knowledge. This makes tasks like replication, migrations, and backups all the more difficult and time consuming, as they can require a great deal of on-the-job learning as well as trial and error.

      Managed databases help to resolve these pain points with a host of benefits to businesses and developers. Let’s walk through some of these benefits and how they can impact development teams.

      Benefits of Managed Databases

      Managed database services can help to reduce many of the headaches associated with provisioning and managing a database. For one thing, developers build applications on top of managed database services to drastically speed up the process of provisioning a database server. With a self-managed solution, you must obtain a server (either on-premises or in the cloud), connect to it from a client or terminal, configure and secure it, and then install and set up the database management software before you can begin storing data. With a managed database, you only have to decide on the initial size of the database server, configure any additional provider-specific options, and you’ll have a new database ready to integrate with your app or website. This can usually be done in just a few minutes through the provider’s user interface.

      Another appeal of managed databases is automation. Self-managed databases can consume a large amount of an organization’s resources because its employees have to perform every administrative task — from scaling to performing updates, running migrations, and creating backups — manually. With a managed database, however, these and other tasks are done either automatically or on-demand, which markedly reduces the risk of human error.

      This relates to the fact that managed database services help to streamline the process of database scaling. Scaling a self-managed database can be very time- and resource-intensive. Whether you choose sharding, replication, load balancing, or something else as your scaling strategy, if you manage the infrastructure yourself then you’re responsible for ensuring that no data is lost in the process and that the application will continue to work properly. If you integrate your application with a managed database service, however, you can scale the database cluster on demand. Rather than having to work out the optimal server size or CPU usage beforehand, you can quickly provision more resources on-the-fly. This helps you avoid using unnecessary resources, meaning you also won’t pay for what you don’t need.

      Managed solutions tend to have built-in high-availability. In the context of cloud computing, a service is said to be highly available if it is stable and likely to run without failure for long periods of time. Most reputable cloud providers’ products come with a service level agreement (SLA), a commitment between the provider and its customers that guarantees the availability and reliability of their services. A typical SLA will specify how much downtime the customer should expect, and many also define the compensation for customers if these service levels are not met. This provides assurance for the customer that their database won’t crash and, if it does, they can at least expect some kind of reparation from the provider.

      In general, managed databases simplify the tasks associated with provisioning and maintaining a database. Depending on the provider, you or your team will still likely need some level of experience working with databases in order to provision a database and interact with it as you build and scale your application. Ultimately, though, the database-specific experience needed to administer a managed database will be much less than with self-managed solution.

      Of course, managed databases aren’t able to solve every problem, and may prove to be a less-than-ideal choice for some. Next, we’ll go over a few of the potential drawbacks one should consider before provisioning a managed database.

      Practical Considerations

      A managed database service can ease the stress of deploying and maintaining a database, but there are still a few things to keep in mind before committing to one. Recall that a principal draw of managed databases is that they abstract away most of the more tedious aspects of database administration. To this end, a managed database provider aims to deliver a rudimentary database that will satisfy the most common use cases. Accordingly, their database offerings won’t feature tons of customization options or the unique features included in more specialized database software. Because of this, you won’t have as much freedom to tailor your database and you’ll be limited to what the cloud provider has to offer.

      A managed database is almost always more expensive than a self-managed one. This makes sense, since you’re paying for the cloud provider to support you in managing the database, but it can be a cause for concern for teams with limited resources. Moreover, pricing for managed databases is usually based on how much storage and RAM the database uses, how many reads it handles, and how many backups of the database the user creates. Likewise, any application using a managed database service that handle large amounts of data or traffic will be more expensive than if it were to use a self-managed cloud database.

      One should also reflect on the impact switching to a managed database will have on their internal workflows and whether or not they’ll be able to adjust to those changes. Every provider differs, and depending on their SLA they may shoulder responsibility for only some administration tasks, which would be problematic for developers looking for a full-service solution. On the other hand, some providers could have a prohibitively restrictive SLA or make the customer entirely dependent on the provider in question, a situation known as vendor lock-in.

      Lastly, and perhaps most importantly, one should carefully consider whether or not any managed database service they’re considering using will meet their security needs. All databases, including on-premises databases, are prone to certain security threats, like SQL injection attacks or data leaks. However, the security dynamic is far different for databases hosted in the cloud. Managed database users can’t control the physical location of their data or who has access to it, nor can they ensure compliance with specific security standards. This can be especially problematic if your client has heightened security needs.

      To illustrate, imagine that you’re hired by a bank to build an application where its clients can access financial records and make payments. The bank may stipulate that the app must have data at rest encryption and appropriately scoped user permissions, and that it must be compliant with certain regulatory standards like PCI DSS. Not all managed database providers adhere to the same regulatory standards or maintain the same security practices, and they’re unlikely to adopt new standards or practices for just one of their customers. For this reason, it’s critical that you ensure any managed database provider you rely on for such an application is able to meet your security needs as well as the needs of your clients.

      Conclusion

      Managed databases have many features that appeal to a wide variety of businesses and developers, but a managed database may not solve every problem or suit everyone’s needs. Some may find that a managed database’s limited feature set and configuration options, increased cost, and reduced flexibility outweigh any of its potential advantages. However, compelling benefits like ease of use, scalability, automated backups and upgrades, and high availability have led to increased adoption of managed database solutions in a variety of industries.

      If you’re interested in learning more about DigitalOcean Managed Databases, we encourage you to check out our Managed Databases product documentation.



      Source link

      Troubleshooting Web Servers, Databases, and Other Services


      Updated by Linode Written by Linode

      This guide presents troubleshooting strategies for when you can’t connect to your web server, database, or other services running on your Linode. This guide assumes that you have access to SSH. If you can’t log in with SSH, review Troubleshooting SSH and then return to this guide.

      Where to go for help outside this guide

      This guide explains how to use different troubleshooting commands on your Linode. These commands can produce diagnostic information and logs that may expose the root of your connection issues. For some specific examples of diagnostic information, this guide also explains the corresponding cause of the issue and presents solutions for it.

      If the information and logs you gather do not match a solution outlined here, consider searching the Linode Community Site for posts that match your system’s symptoms. Or, post a new question in the Community Site and include your commands’ output.

      Linode is not responsible for the configuration or installation of software on your Linode. Refer to Linode’s Scope of Support for a description of which issues Linode Support can help with.

      General Troubleshooting Strategies

      This section highlights troubleshooting strategies that apply to every service.

      Check if the Service is Running

      The service may not be running. Check the status of the service:

      Distribution Command                                                               
      systemd systems (Arch, Ubuntu 16.04+, Debian 8+, CentOS 7+, etc) sudo systemctl status <service name> -l
      sysvinit systems (CentOS 6, Ubuntu 14.04, Debian 7, etc) sudo service <service name> status

      Restart the Service

      If the service isn’t running, try restarting it:

      Distribution Command
      systemd systems sudo systemctl restart <service name>
      sysVinit systems sudo service <service name> restart

      Enable the Service

      If your system was recently rebooted, and the service didn’t start automatically at boot, then it may not be enabled. Enable the service to prevent this from happening in the future:

      Distribution Command
      systemd systems sudo systemctl enable <service name>
      sysVinit systems sudo chkconfig <service name> on

      Check your Service’s Bound IP Address and Ports

      Your service may be listening on an unexpected port, or it may not be bound to your public IP address (or whatever address is desirable). To view which address and ports a service is bound on, run the ss command with these options:

      sudo ss -atpu
      

      Review the application’s documentation for help determining the address and port your service should bind to.

      Note

      One notable example is if a service is only bound to a public IPv4 address and not to an IPv6 address. If a user connects to your Linode over IPv6, they will not be able to access the service.

      Analyze Service Logs

      If your service doesn’t start normally, review your system logs for the service. Your system logs may be in the following locations:

      Distribution System Logs
      systemd systems Run journalctl
      Ubuntu 14.04, Debian 7 /var/log/syslog
      CentOS 6 /var/log/messages

      Your service’s log location will vary by the application, but they are often stored in /var/log. The less command is a useful tool for browsing through your logs.

      Try pasting your log messages into a search engine or searching for your messages in the Linode Community Site to see if anyone else has run into similar issues. If you don’t find any results, you can try asking about your issues in a new post on the Linode Community Site. If it becomes difficult to find a solution, you may need to rebuild your Linode.

      Review Firewall Rules

      If your service is running but your connections still fail, your firewall (which is likely implemented by the iptables software) may be blocking the connections. To review your current firewall ruleset, run:

      sudo iptables -L # displays IPv4 rules
      sudo ip6tables -L # displays IPv6 rules
      

      Note

      Your deployment may be running FirewallD or UFW, which are frontends used to more easily manage your iptables rules. Run these commands to find out if you are running either package:

      sudo ufw status
      sudo firewall-cmd --state
      

      Review How to Configure a Firewall with UFW and Introduction to FirewallD on CentOS to learn how to manage and inspect your firewall rules with those packages.

      Firewall rulesets can vary widely. Review the Control Network Traffic with iptables guide to analyze your rules and determine if they are blocking connections. For example, a rule which allows incoming HTTP traffic could look like this:

        
      -A INPUT -p tcp -m tcp --dport 80 -m conntrack --ctstate NEW -j ACCEPT
      
      

      Disable Firewall Rules

      In addition to analyzing your firewall ruleset, you can also temporarily disable your firewall to test if it is interfering with your connections. Leaving your firewall disabled increases your security risk, so we recommend re-enabling it afterward with a modified ruleset that will accept your connections. Review Control Network Traffic with iptables for help with this subject.

      1. Create a temporary backup of your current iptables:

        sudo iptables-save > ~/iptables.txt
        
      2. Set the INPUT, FORWARD and OUTPUT packet policies as ACCEPT:

        sudo iptables -P INPUT ACCEPT
        sudo iptables -P FORWARD ACCEPT
        sudo iptables -P OUTPUT ACCEPT
        
      3. Flush the nat table that is consulted when a packet that creates a new connection is encountered:

        sudo iptables -t nat -F
        
      4. Flush the mangle table that is used for specialized packet alteration:

        sudo iptables -t mangle -F
        
      5. Flush all the chains in the table:

        sudo iptables -F
        
      6. Delete every non-built-in chain in the table:

        sudo iptables -X
        
      7. Repeat these steps with the ip6tables command to flush your IPv6 rules. Be sure to assign a different name to the IPv6 rules file (e.g. ~/ip6tables.txt).

      Troubleshoot Web Servers

      If your web server is not running or if connections are timing out, review the general troubleshooting strategies.

      Note

      If your web server is responding with an error code, your troubleshooting will vary by what code is returned. For more detailed information about each request that’s failing, read your web server’s logs. Here are some commands that can help you find your web server’s logs:

      • Apache:

        grep ErrorLog -r /etc/apache2  # On Ubuntu, Debian
        grep ErrorLog -r /etc/httpd    # On CentOS, Fedora, RHEL
        
      • NGINX:

        grep error_log -r /etc/nginx
        

      Frequent Error Codes

      • HTTP 401 Unauthorized, HTTP 403 Forbidden

        The requesting user did not have sufficient permission or access to the requested URL. Review your web server authorization and access control configuration:

      • HTTP 404 Not Found

        The URL that a user requested could not be found by the web server. Review your web server configuration and make sure your website files are stored in the right location on your filesystem:

      • HTTP 500, 502, 503, 504

        The web server requested a resource from a process it depends on, but the process did not respond as expected. For example, if a database query needs to be performed for a web request, but the database isn’t running, then a 50X code will be returned. To troubleshoot these issues, investigate the service that the web server depends on.

      Troubleshoot Databases

      Is your Disk Full?

      One common reason that a database may not start is if your disk is full. To check how much disk space you are using, run:

      df -h
      

      Note

      This reported disk usage is not the same as the reported storage usage in the Linode Manager. The storage usage in the Linode Manager refers to how much of the the disk space you pay for is allocated to your Linode’s disks. The output of df -h shows how full those disks are.

      You have several options for resolving disk space issues:

      • Free up space on your disk by locating and removing files you don’t need, using a tool like ncdu.

      • If you have any unallocated space on your Linode (storage that you pay for already but which isn’t assigned to your disk), resize your disk to take advantage of the space.

      • Upgrade your Linode to a higher-tier resource plan and then resize your disk to use the newly available space. If your Linode has a pending free upgrade for your storage space, you can choose to take this free upgrade to solve the issue.

      Database Performance Troubleshooting

      If your database is running but returning slowly, research how to optimize the database software for the resources your Linode has. If you run MySQL or MariaDB, read How to Optimize MySQL Performance Using MySQLTuner.

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.



      Source link