One place for hosting & domains

      Databases

      How To Manage Redis Databases and Keys


      Introduction

      Redis is an open-source, in-memory key-value data store. A key-value data store is a type of NoSQL database in which keys serve as unique identifiers for their associated values. Any given Redis instance includes a number of databases, each of which can hold many different keys of a variety of data types. In this tutorial, we will go over how to select a database, move keys between databases, and manage and delete keys.

      How To Use This Guide
      This guide is written as a cheat sheet with self-contained examples. We encourage you to jump to any section that is relevant to the task you’re trying to complete.

      The commands and outputs shown in this guide were tested on an Ubuntu 18.04 server running Redis version 4.0.9. To obtain a similar setup, you can follow Step 1 of our guide on How To Install and Secure Redis on Ubuntu 18.04. We will demonstrate how these commands behave by running them with redis-cli, the Redis command line interface. Note that if you’re using a different Redis interface — Redli, for example — the exact outputs of certain commands may differ.

      Alternatively, you could provision a managed Redis database instance to test these commands, but note that depending on the level of control allowed by your database provider, some commands in this guide may not work as described. To provision a DigitalOcean Managed Database, follow our Managed Databases product documentation. Then, you must either install Redli or set up a TLS tunnel in order to connect to the Managed Database over TLS.

      Managing Databases

      Out of the box, a Redis instance supports 16 logical databases. These databases are effectively siloed off from one another, and when you run a command in one database it doesn’t affect any of the data stored in other databases in your Redis instance.

      Redis databases are numbered from 0 to 15 and, by default, you connect to database 0 when you connect to your Redis instance. However, you can change the database you’re using with the select command after you connect:

      If you’ve selected a database other than 0, it will be reflected in the redis-cli prompt:

      To swap all the data held in one database with the data held in another, use the swapdb command. The following example will swap the data held in database 6 with that in database 8, and any clients connected to either database will be able to see changes immediately:

      swapdb will return OK if the swap is successful.

      If you want to move a key to a different Redis instance, you can run migrate. This command ensures the key exists on the target instance before deleting it from the source instance. When you run migrate, the command must include the following elements in this order:

      • The hostname or IP address of the destination database
      • The target database’s port number
      • The name of the key you want to migrate
      • The database number where you want to store the key on the destination instance
      • A timeout, in milliseconds, which defines the maximum amount of idle communication time between the two machines. Note that this isn’t a time limit for the operation, just that the operation should always make some level of progress within the defined length of time

      To illustrate:

      • migrate 203.0.113.0 6379 key_1 7 8000

      Additionally, migrate allows the following options which you can add after the timeout argument:

      • COPY: Specifies that the key should not be deleted from the source instance
      • REPLACE: Specifies that if the key already exists on the destination, the migrate operation should delete and replace it
      • KEYS: Instead of providing a specific key to migrate, you can enter an empty string ("") and then use the syntax from the keys command to migrate any key that matches a pattern. For more information on how keys works, see our tutorial on How To Troubleshoot Issues in Redis.

      Managing Keys

      There are a number of Redis commands that are useful for managing keys regardless of what type of data they hold. We’ll go over a few of these in this section.

      rename will rename the specified key. If it’s successful, it will return OK:

      You can use randomkey to return a random key from the currently selected database:

      Output

      "any_key"

      Use type to determine what type of data the given key holds. This command’s output can be either string, list, hash, set, zset, or stream:

      Output

      "string"

      If the specified key doesn’t exist, type will return none instead.

      You can move an individual key to another database in your Redis instance with the move command. move takes the name of a key and the database where you want to move the key as arguments. For example, to move the key key_1 to database 8, you would run the following:

      move will return OK if moving the key was successful.

      Deleting Keys

      To delete one or more keys of any data type, use the del command followed by one or more keys that you want to delete:

      If this command deletes the key(s) successfully it will return (integer) 1. Otherwise, it will return (integer) 0.

      The unlink command performs a similar function as del, with the difference being that del blocks the client as the server reclaims the memory taken up by the key. If the key being deleted is associated with a small object, the amount of time it takes for del to reclaim the memory is very small and the blocking time may not even be noticeable.

      However, it can become inconvenient if, for example, the key you’re deleting is associated with many objects, such as a hash with thousands or millions of fields. Deleting such a key can take a noticeably long time, and you’ll be blocked from performing any other operations until it’s fully removed from the server’s memory.

      unlink, however, first determines the cost of deallocating the memory taken up by the key. If it’s small then unlink functions the same way as del by the key immediately while also blocking the client. However, if there’s a high cost to deallocate memory for a key, unlink will delete the key asynchronously by creating another thread and incrementally reclaim memory in the background without blocking the client:

      Since it runs in the background, it’s generally recommended that you use unlink to remove keys from your server to reduce errors on your clients, though del will also suffice in many cases.

      Warning: The following two commands are considered dangerous. The flushdb and flushall commands will irreversibly delete all the keys in a single database and all the keys in every database on the Redis server, respectively. We recommend that you only run these commands if you are absolutely certain that you want to delete all the keys in your database or server.

      It may be in your interest to rename these commands to something with a lower likelihood of being run accidentally.

      To delete all the keys in the selected database, use the flushdb command:

      To delete all the keys in every database on a Redis server (including the currently selected database), run flushall:

      Both flushdb and flushall accept the async option, which allows you to delete all the keys on a single database or every database in the cluster asynchronously. This allows them to function similarly to the unlink command, and they will create a new thread to incrementally free up memory in the background.

      Backing Up Your Database

      To create a backup of the currently selected database, you can use the save command:

      This will export a snapshot of the current dataset as an .rdb file, which is a database dump file that holds the data in an internal, compressed serialization format.

      save runs synchronously and will block any other clients connected to the database. Hence, the save command documentation recommends that this command should almost never be run in a production environment. Instead, it suggests using the bgsave command. This tells Redis to fork the database: the parent will continue to serve clients while the child process saves the database before exiting:

      Note that if clients add or modify data while the bgsave operation is occurring, these changes won’t be captured in the snapshot.

      You can also edit the Redis configuration file to have Redis save a snapshot automatically (known as snapshotting or RDB mode) after a certain amount of time if a minimum number of changes were made to the database. This is known as a save point. The following save point settings are enabled by default in the redis.conf file:

      /etc/redis/redis.conf

      . . .
      save 900 1
      save 300 10
      save 60 10000
      . . .
      dbfilename "nextfile.rdb"
      . . .
      

      With these settings, Redis will export a snapshot of the database to the file defined by the dbfilename parameter every 900 seconds if at least 1 key is changed, every 300 seconds if at least 10 keys are changed, and every 60 seconds if at least 10000 keys are changed.

      You can use the shutdown command to back up your Redis data and then close your connection. This command will block every client connected to the database and then perform a save operation if at least one save point is configured, meaning that it will export the database in its current state to an .rdb file while preventing clients from making any changes.

      Additionally, the shutdown command will flush changes to Redis’s append-only file before quitting if append-only mode is enabled. The append-only file mode (AOF) involves creating a log of every write operation on the server in a file ending in .aof after every snapshot. AOF and RDB modes can be enabled on the same server, and using both persistence methods is an effective way to back up your data.

      In short, the shutdown command is essentially a blocking save command that also flushes all recent changes to the append-only file and closes the connection to the Redis instance:

      Warning: The shutdown command is considered dangerous. By blocking your Redis server’s clients, you can make your data unavailable to users and applications that depend on it. We recommend that you only run this command if you are testing out Redis’s behavior or you are absolutely certain that you want to block all your Redis server’s clients.

      In fact, it may be in your interest to rename this command to something with a lower likelihood of being run accidentally.

      If you’ve not configured any save points but still want Redis to perform a save operation, append the save option to the `shutdown command:

      If you have configured at least one save point but you want to shut down the Redis server without performing a save, you can add the nosave argument to the command:

      Note that the append-only file can grow to be very long over time, but you can configure Redis to rewrite the file based on certain variables by editing the redis.conf file. You can also instruct Redis to rewrite the append-only file by running the bgrewriteaof command:

      bgrewriteaof will create the shortest set of commands needed to bring the database back to its current state. As this command’s name implies, it will run in the background. However, if another persistence command is running in a background process already, that command must finish before Redis will execute bgrewriteaof.

      Conclusion

      This guide details a number of commands used to manage databases and keys. If there are other related commands, arguments, or procedures you’d like to see in this guide, please ask or make suggestions in the comments below.

      For more information on Redis commands, see our tutorial series on How to Manage a Redis Database.



      Source link

      How to Set Up a Scalable Django App with DigitalOcean Managed Databases and Spaces


      Introduction

      Django is a powerful web framework that can help you get your Python application or website off the ground quickly. It includes several convenient features like an object-relational mapper, a Python API, and a customizable administrative interface for your application. It also includes a caching framework and encourages clean app design through its URL Dispatcher and Template system.

      Out of the box, Django includes a minimal web server for testing and local development, but it should be paired with a more robust serving infrastructure for production use cases. Django is often rolled out with an Nginx web server to handle static file requests and HTTPS redirection, and a Gunicorn WSGI server to serve the app.

      In this guide, we will augment this setup by offloading static files like Javascript and CSS stylesheets to DigitalOcean Spaces, and optionally delivering them using a Content Delivery Network, or CDN, which stores these files closer to end users to reduce transfer times. We’ll also use a DigitalOcean Managed PostgreSQL database as our data store to simplify the data layer and avoid having to manually configure a scalable PostgreSQL database.

      Prerequisites

      Before you begin with this guide, you should have the following available to you:

      Step 1 — Installing Packages from the Ubuntu Repositories

      To begin, we’ll download and install all of the items we need from the Ubuntu repositories. We’ll use the Python package manager pip to install additional components a bit later.

      We need to first update the local apt package index and then download and install the packages.

      In this guide, we’ll use Django with Python 3. To install the necessary libraries, log in to your server and type:

      • sudo apt update
      • sudo apt install python3-pip python3-dev libpq-dev curl postgresql-client

      This will install pip, the Python development files needed to build Gunicorn, the libpq header files needed to build the Pyscopg PostgreSQL Python adapter, and the PostgreSQL command-line client.

      Hit Y and then ENTER when prompted to begin downloading and installing the packages.

      Next, we’ll configure the database to work with our Django app.

      Step 2 — Creating the PostgreSQL Database and User

      We’ll now create a database and database user for our Django application.

      To begin, grab the Connection Parameters for your cluster by navigating to Databases from the Cloud Control Panel, and clicking into your database. You should see a Connection Details box containing some parameters for your cluster. Note these down.

      Back on the command line, log in to your cluster using these credentials and the psql PostgreSQL client we just installed:

      • psql -U do_admin -h host -p port -d database

      When prompted, enter the password displayed alongside the doadmin Postgres username, and hit ENTER.

      You will be given a PostgreSQL prompt from which you can manage the database.

      First, create a database for your project called polls:

      Note: Every Postgres statement must end with a semicolon, so make sure that your command ends with one if you are experiencing issues.

      We can now switch to the polls database:

      Next, create a database user for the project. Make sure to select a secure password:

      • CREATE USER myprojectuser WITH PASSWORD 'password';

      We'll now modify a few of the connection parameters for the user we just created. This will speed up database operations so that the correct values do not have to be queried and set each time a connection is established.

      We are setting the default encoding to UTF-8, which Django expects. We are also setting the default transaction isolation scheme to "read committed", which blocks reads from uncommitted transactions. Lastly, we are setting the timezone. By default, our Django projects will be set to use UTC. These are all recommendations from the Django project itself.

      Enter the following commands at the PostgreSQL prompt:

      • ALTER ROLE myprojectuser SET client_encoding TO 'utf8';
      • ALTER ROLE myprojectuser SET default_transaction_isolation TO 'read committed';
      • ALTER ROLE myprojectuser SET timezone TO 'UTC';

      Now we can give our new user access to administer our new database:

      • GRANT ALL PRIVILEGES ON DATABASE polls TO myprojectuser;

      When you are finished, exit out of the PostgreSQL prompt by typing:

      Your Django app is now ready to connect to and manage this database.

      In the next step, we'll install virtualenv and create a Python virtual environment for our Django project.

      Step 3 — Creating a Python Virtual Environment for your Project

      Now that we've set up our database to work with our application, we'll create a Python virtual environment that will isolate this project's dependencies from the system's global Python installation.

      To do this, we first need access to the virtualenv command. We can install this with pip.

      Upgrade pip and install the package by typing:

      • sudo -H pip3 install --upgrade pip
      • sudo -H pip3 install virtualenv

      With virtualenv installed, we can create a directory to store our Python virtual environments and make one to use with the Django polls app.

      Create a directory called envs and navigate into it:

      Within this directory, create a Python virtual environment called polls by typing:

      This will create a directory called polls within the envs directory. Inside, it will install a local version of Python and a local version of pip. We can use this to install and configure an isolated Python environment for our project.

      Before we install our project's Python requirements, we need to activate the virtual environment. You can do that by typing:

      • source polls/bin/activate

      Your prompt should change to indicate that you are now operating within a Python virtual environment. It will look something like this: (polls)user@host:~/envs$.

      With your virtual environment active, install Django, Gunicorn, and the psycopg2 PostgreSQL adaptor with the local instance of pip:

      Note: When the virtual environment is activated (when your prompt has (polls) preceding it), use pip instead of pip3, even if you are using Python 3. The virtual environment's copy of the tool is always named pip, regardless of the Python version.

      • pip install django gunicorn psycopg2-binary

      You should now have all of the software you need to run the Django polls app. In the next step, we'll create a Django project and install this app.

      Step 4 — Creating the Polls Django Application

      We can now set up our sample application. In this tutorial, we'll use the Polls demo application from the Django documentation. It consists of a public site that allows users to view polls and vote in them, and an administrative control panel that allows the admin to modify, create, and delete polls.

      In this guide, we'll skip through the tutorial steps, and simply clone the final application from the DigitalOcean Community django-polls repo.

      If you'd like to complete the steps manually, create a directory called django-polls in your home directory and navigate into it:

      • cd
      • mkdir django-polls
      • cd django-polls

      From there, you can follow the Writing your first Django app tutorial from the official Django documentation. When you're done, skip to Step 5.

      If you just want to clone the finished app, navigate to your home directory and use git to clone the django-polls repo:

      • cd
      • git clone https://github.com/do-community/django-polls.git

      cd into it, and list the directory contents:

      You should see the following objects:

      Output

      LICENSE README.md manage.py mysite polls templates

      manage.py is the main command-line utility used to manipulate the app. polls contains the polls app code, and mysite contains project-scope code and settings. templates contains custom template files for the administrative interface. To learn more about the project structure and files, consult Creating a Project from the official Django documentation.

      Before running the app, we need to adjust its default settings and connect it to our database.

      Step 5 — Adjusting the App Settings

      In this step, we'll modify the Django project's default configuration to increase security, connect Django to our database, and collect static files into a local directory.

      Begin by opening the settings file in your text editor:

      • nano ~/django-polls/mysite/settings.py

      Start by locating the ALLOWED_HOSTS directive. This defines a list of the addresses or domain names that you want to use to connect to the Django instance. An incoming request with a Host header not in this list will raise an exception. Django requires that you set this to prevent a certain class of security vulnerability.

      In the square brackets, list the IP addresses or domain names associated with your Django server. Each item should be listed in quotations with entries separated by a comma. Your list will also include localhost, since you will be proxying connections through a local Nginx instance. If you wish to include requests for an entire domain and any subdomains, prepend a period to the beginning of the entry.

      In the snippet below, there are a few commented out examples that demonstrate what these entries should look like:

      ~/django-polls/mysite/settings.py

      . . .
      
      # The simplest case: just add the domain name(s) and IP addresses of your Django server
      # ALLOWED_HOSTS = [ 'example.com', '203.0.113.5']
      # To respond to 'example.com' and any subdomains, start the domain with a dot
      # ALLOWED_HOSTS = ['.example.com', '203.0.113.5']
      ALLOWED_HOSTS = ['your_server_domain_or_IP', 'second_domain_or_IP', . . ., 'localhost']
      
      . . . 
      

      Next, find the section of the file that configures database access. It will start with DATABASES. The configuration in the file is for a SQLite database. We already created a PostgreSQL database for our project, so we need to adjust these settings.

      We will tell Django to use the psycopg2 database adaptor we installed with pip, instead of the default SQLite engine. We’ll also reuse the Connection Parameters referenced in Step 2. You can always find this information from the Managed Databases section of the DigitalOcean Cloud Control Panel.

      Update the file with your database settings: the database name (polls), the database username, the database user's password, and the database host and port. Be sure to replace the database-specific values with your own information:

      ~/django-polls/mysite/settings.py

      . . .
      
      DATABASES = {
          'default': {
              'ENGINE': 'django.db.backends.postgresql_psycopg2',
              'NAME': 'polls',
              'USER': 'myprojectuser',
              'PASSWORD': 'password',
              'HOST': 'managed_db_host',
              'PORT': 'managed_db_port',
          }
      }
      
      . . .
      

      Next, move down to the bottom of the file and add a setting indicating where the static files should be placed. This is necessary so that Nginx can handle requests for these items. The following line tells Django to place them in a directory called static in the base project directory:

      ~/django-polls/mysite/settings.py

      . . .
      
      STATIC_URL = '/static/'
      STATIC_ROOT = os.path.join(BASE_DIR, 'static/')
      

      Save and close the file when you are finished.

      At this point, you've configured the Django project's database, security, and static files settings. If you followed the polls tutorial from the start and did not clone the GitHub repo, you can move on to Step 6. If you cloned the GitHub repo, there remains one additional step.

      The Django settings file contains a SECRET_KEY variable that is used to create hashes for various Django objects. It's important that it is set to a unique, unpredictable value. The SECRET_KEY variable has been scrubbed from the GitHub repository, so we'll create a new one using a function built-in to the django Python package called get_random_secret_key(). From the command line, open up a Python interpreter:

      You should see the following output and prompt:

      Output

      Python 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>

      Import the get_random_secret_key function from the Django package, then call the function:

      • from django.core.management.utils import get_random_secret_key
      • get_random_secret_key()

      Copy the resulting key to your clipboard.

      Exit the Python interpreter by pressing CTRL+D.

      Next, open up the settings file in your text editor once again:

      nano ~/django-polls/mysite/settings.py
      

      Locate the SECRET_KEY variable and paste in the key you just generated:

      ~/django-polls/mysite/settings.py

      . . .
      
      # SECURITY WARNING: keep the secret key used in production secret!
      SECRET_KEY = 'your_secret_key_here'
      
      . . .
      

      Save and close the file.

      We'll now test the app locally using the Django development server to ensure that everything's been correctly configured.

      Step 6 — Testing the App

      Before we run the Django development server, we need to use the manage.py utility to create the database schema and collect static files into the STATIC_ROOT directory.

      Navigate into the project's base directory, and create the initial database schema in our PostgreSQL database using the makemigrations and migrate commands:

      • cd django-polls
      • ./manage.py makemigrations
      • ./manage.py migrate

      makemigrations will create the migrations, or database schema changes, based on the changes made to Django models. migrate will apply these migrations to the database schema. To learn more about migrations in Django, consult Migrations from the official Django documentation.

      Create an administrative user for the project by typing:

      • ./manage.py createsuperuser

      You will have to select a username, provide an email address, and choose and confirm a password.

      We can collect all of the static content into the directory location we configured by typing:

      • ./manage.py collectstatic

      The static files will then be placed in a directory called static within your project directory.

      If you followed the initial server setup guide, you should have a UFW firewall protecting your server. In order to test the development server, we'll have to allow access to the port we'll be using.

      Create an exception for port 8000 by typing:

      Testing the App Using the Django Development Server

      Finally, you can test your project by starting the Django development server with this command:

      • ./manage.py runserver 0.0.0.0:8000

      In your web browser, visit your server's domain name or IP address followed by :8000 and the polls path:

      • http://server_domain_or_IP:8000/polls

      You should see the Polls app interface:

      Polls App Interface

      To check out the admin interface, visit your server's domain name or IP address followed by :8000 and the administrative interface's path:

      • http://server_domain_or_IP:8000/admin

      You should see the Polls app admin authentication window:

      Polls Admin Auth Page

      Enter the administrative username and password you created with the createsuperuser command.

      After authenticating, you can access the Polls app's administrative interface:

      Polls Admin Main Interface

      When you are finished exploring, hit CTRL-C in the terminal window to shut down the development server.

      Testing the App Using Gunicorn

      The last thing we want to do before offloading static files is test Gunicorn to make sure that it can serve the application. We can do this by entering our project directory and using gunicorn to load the project's WSGI module:

      • gunicorn --bind 0.0.0.0:8000 mysite.wsgi

      This will start Gunicorn on the same interface that the Django development server was running on. You can go back and test the app again.

      Note: The admin interface will not have any of the styling applied since Gunicorn does not know how to find the static CSS content responsible for this.

      We passed Gunicorn a module by specifying the relative directory path to Django's wsgi.py file, the entry point to our application,. This file defines a function called application, which communicates with the application. To learn more about the WSGI specification, click here.

      When you are finished testing, hit CTRL-C in the terminal window to stop Gunicorn.

      We'll now offload the application’s static files to DigitalOcean Spaces.

      Step 7 — Offloading Static Files to DigitalOcean Spaces

      At this point, Gunicorn can serve our Django application but not its static files. Usually we'd configure Nginx to serve these files, but in this tutorial we'll offload them to DigitalOcean Spaces using the django-storages plugin. This allows you to easily scale Django by centralizing its static content and freeing up server resources. In addition, you can deliver this static content using the DigitalOcean Spaces CDN.

      For a full guide on offloading Django static files to Object storage, consult How to Set Up Object Storage with Django.

      Installing and Configuring django-storages

      We'll begin by installing the django-storages Python package. The django-storages package provides Django with the S3Boto3Storage storage backend that uses the boto3 library to upload files to any S3-compatible object storage service.

      To start, install thedjango-storages and boto3 Python packages using pip:

      • pip install django-storages boto3

      Next, open your app's Django settings file again:

      • nano ~/django-polls/mysite/settings.py

      Navigate down to the INSTALLED_APPS section of the file, and append storages to the list of installed apps:

      ~/django-polls/mysite/settings.py

      . . .
      
      INSTALLED_APPS = [
          . . .
          'django.contrib.staticfiles',
          'storages',
      ]
      
      . . .
      

      Scroll further down the file to the STATIC_URL we previously modified. We'll now overwrite these values and append new S3Boto3Storage backend parameters. Delete the code you entered earlier, and add the following blocks, which include access and location information for your Space. Remember to replace the highlighted values here with your own information::

      ~/django-polls/mysite/settings.py

      . . .
      
      # Static files (CSS, JavaScript, Images)
      # https://docs.djangoproject.com/en/2.1/howto/static-files/
      
      AWS_ACCESS_KEY_ID = 'your_spaces_access_key'
      AWS_SECRET_ACCESS_KEY = 'your_spaces_secret_key'
      
      AWS_STORAGE_BUCKET_NAME = 'your_space_name'
      AWS_S3_ENDPOINT_URL = 'spaces_endpoint_URL'
      AWS_S3_OBJECT_PARAMETERS = {
          'CacheControl': 'max-age=86400',
      }
      AWS_LOCATION = 'static'
      AWS_DEFAULT_ACL = 'public-read'
      
      STATICFILES_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
      
      STATIC_URL = '{}/{}/'.format(AWS_S3_ENDPOINT_URL, AWS_LOCATION)
      STATIC_ROOT = 'static/'
      

      We define the following configuration items:

      • AWS_ACCESS_KEY_ID: The Access Key ID for the Space, which you created in the tutorial prerequisites. If you didn’t create a set of Access Keys, consult Sharing Access to Spaces with Access Keys.
      • AWS_SECRET_ACCESS_KEY: The secret key for the DigitalOcean Space.
      • AWS_STORAGE_BUCKET_NAME: Your DigitalOcean Space name.
      • AWS_S3_ENDPOINT_URL : The endpoint URL used to access the object storage service. For DigitalOcean, this will be something like https://nyc3.digitaloceanspaces.com depending on the Space region.
      • AWS_S3_OBJECT_PARAMETERS Sets the cache control headers on static files.
      • AWS_LOCATION: Defines a directory within the object storage bucket where all static files will be placed.
      • AWS_DEFAULT_ACL: Defines the access control list (ACL) for the static files. Setting it to public-read ensures that the files are publicly accessible to end users.
      • STATICFILES_STORAGE: Sets the storage backend Django will use to offload static files. This backend should work with any S3-compatible backend, including DigitalOcean Spaces.
      • STATIC_URL: Specifies the base URL that Django should use when generating URLs for static files. Here, we combine the endpoint URL and the static files subdirectory to construct a base URL for static files.
      • STATIC_ROOT: Specifies where to collect static files locally before copying them to object storage.

      From now on, when you run collectstatic, Django will upload your app's static files to the Space. When you start Django, it'll begin serving static assets like CSS and Javascript from this Space.

      Before we test that this is all functioning correctly, we need to configure Cross-Origin Resource Sharing (CORS) headers for our Spaces files or access to certain static assets may be denied by your web browser.

      CORS headers tell the web browser that the an application running at one domain can access scripts or resources located at another. In this case, we need to allow cross-origin resource sharing for our Django server's domain so that requests for static files in the Space are not denied by the web browser.

      To begin, navigate to the Settings page of your Space using the Cloud Control Panel:

      Screenshot of the Settings tab

      In the CORS Configurations section, click Add.

      CORS advanced settings

      Here, under Origin, enter the wildcard origin, *

      Warning: When you deploy your app into production, be sure to change this value to your exact origin domain (including the http:// or https:// protocol). Leaving this as the wildcard origin is insecure, and we do this here only for testing purposes since setting the origin to http://example.com:8000 (using a nonstandard port) is currently not supported.

      Under Allowed Methods, select GET.

      Click on Add Header, and in text box that appears, enter Access-Control-Allow-Origin.

      Set Access Control Max Age to 600 so that the header we just created expires every 10 minutes.

      Click Save Options.

      From now on, objects in your Space will contain the appropriate Access-Control-Allow-Origin response headers, allowing modern secure web browsers to fetch these files across domains.

      At this point, you can optionally enable the CDN for your Space, which will serve these static files from a distributed network of edge servers. To learn more about CDNs, consult Using a CDN to Speed Up Static Content Delivery. This can significantly improve web performance. If you don't want to enable the CDN for your Space, skip ahead to the next section, Testing Spaces Static File Delivery.

      Enabling CDN (Optional)

      To activate static file delivery via the DigitalOcean Spaces CDN, begin by enabling the CDN for your DigitalOcean Space. To learn how to do this, consult How to Enable the Spaces CDN from the DigitalOcean product documentation.

      Once you've enabled the CDN for your Space, navigate to it using the Cloud Control Panel. You should see a new Endpoints link under your Space name:

      List of Space Endpoints

      These endpoints should contain your Space name.

      Notice the addition of a new Edge endpoint. This endpoint routes requests for Spaces objects through the CDN, serving them from the edge cache as much as possible. Note down this Edge endpoint, as we'll use it to configure the django-storages plugin.

      Next, edit your app's Django settings file once again:

      • nano ~/django-polls/mysite/settings.py

      Navigate down to the Static Files section we recently modified. Add the AWS_S3_CUSTOM_DOMAIN parameter to configure the django-storages plugin CDN endpoint and update the STATIC_URL parameter to use this new CDN endpoint:

      ~/django-polls/mysite/settings.py

      . . .
      
      # Static files (CSS, JavaScript, Images)
      # https://docs.djangoproject.com/en/2.1/howto/static-files/
      
      # Moving static assets to DigitalOcean Spaces as per:
      # https://www.digitalocean.com/community/tutorials/how-to-set-up-object-storage-with-django
      AWS_ACCESS_KEY_ID = 'your_spaces_access_key'
      AWS_SECRET_ACCESS_KEY = 'your_spaces_secret_key'
      
      AWS_STORAGE_BUCKET_NAME = 'your_space_name'
      AWS_S3_ENDPOINT_URL = 'spaces_endpoint_URL'
      AWS_S3_CUSTOM_DOMAIN = 'spaces_edge_endpoint_URL'
      AWS_S3_OBJECT_PARAMETERS = {
          'CacheControl': 'max-age=86400',
      }
      AWS_LOCATION = 'static'
      AWS_DEFAULT_ACL = 'public-read'
      
      STATICFILES_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
      
      STATIC_URL = '{}/{}/'.format(AWS_S3_CUSTOM_DOMAIN, AWS_LOCATION)
      STATIC_ROOT = 'static/'
      

      Here, replace the spaces_edge_endpoint_URL with the Edge endpoint you just noted down, truncating the https:// prefix. For example, if the Edge endpoint URL is https://example.sfo2.cdn.digitaloceanspaces.com, AWS_S3_CUSTOM_DOMAIN should be set to example.sfo2.cdn.digitaloceanspaces.com.

      When you're done, save and close the file.

      When you start Django, it will now serve static content using the CDN for your DigitalOcean Space.

      Testing Spaces Static File Delivery

      We'll now test that Django is correctly serving static files from our DigitalOcean Space.

      Navigate to your Django app directory:

      From here, run collectstatic to collect and upload static files to your DigitalOcean Space:

      • python manage.py collectstatic

      You should see the following output:

      Output

      You have requested to collect static files at the destination location as specified in your settings. This will overwrite existing files! Are you sure you want to do this? Type 'yes' to continue, or 'no' to cancel:

      Type yes and hit ENTER to confirm.

      You should then see output like the following

      Output

      121 static files copied.

      This confirms that Django successfully uploaded the polls app static files to your Space. You can navigate to your Space using the Cloud Control Panel, and inspect the files in the static directory.

      Next, we'll verify that Django is rewriting the appropriate URLs.

      Start the Gunicorn server:

      • gunicorn --bind 0.0.0.0:8000 mysite.wsgi

      In your web browser, visit your server's domain name or IP address followed by :8000 and /admin:

      http://server_domain_or_IP:8000/admin
      

      You should once again see the Polls app admin authentication window, this time with correct styling.

      Now, use your browser's developer tools to inspect the page contents and reveal the source file storage locations.

      To do this using Google Chrome, right-click the page, and select Inspect.

      You should see the following window:

      Chrome Dev Tools Window

      From here, click on Sources in the toolbar. In the list of source files in the left-hand pane, you should see /admin/login under your Django server's domain, and static/admin under your Space's CDN endpoint. Within static/admin, you should see both the css and fonts directories.

      This confirms that CSS stylesheets and fonts are correctly being served from your Space's CDN.

      When you are finished testing, hit CTRL-C in the terminal window to stop Gunicorn.

      You can disable your active Python virtual environment by entering deactivate:

      Your prompt should return to normal.

      At this point you've successfully offloaded static files from your Django server, and are serving them from object storage. We can now move on to configuring Gunicorn to start automatically as a system service.

      Step 8 — Creating systemd Socket and Service Files for Gunicorn

      In Step 6 we tested that Gunicorn can interact with our Django application, but we should implement a more robust way of starting and stopping the application server. To accomplish this, we'll make systemd service and socket files.

      The Gunicorn socket will be created at boot and will listen for connections. When a connection occurs, systemd will automatically start the Gunicorn process to handle the connection.

      Start by creating and opening a systemd socket file for Gunicorn with sudo privileges:

      • sudo nano /etc/systemd/system/gunicorn.socket

      Inside, we will create a [Unit] section to describe the socket, a [Socket] section to define the socket location, and an [Install] section to make sure the socket is created at the right time. Add the following code to the file:

      /etc/systemd/system/gunicorn.socket

      [Unit]
      Description=gunicorn socket
      
      [Socket]
      ListenStream=/run/gunicorn.sock
      
      [Install]
      WantedBy=sockets.target
      

      Save and close the file when you are finished.

      Next, create and open a systemd service file for Gunicorn with sudo privileges in your text editor. The service filename should match the socket filename with the exception of the extension:

      • sudo nano /etc/systemd/system/gunicorn.service

      Start with the [Unit] section, which specifies metadata and dependencies. We'll put a description of our service here and tell the init system to only start this after the networking target has been reached. Because our service relies on the socket from the socket file, we need to include a Requires directive to indicate that relationship:

      /etc/systemd/system/gunicorn.service

      [Unit]
      Description=gunicorn daemon
      Requires=gunicorn.socket
      After=network.target
      

      Next, we'll open up the [Service] section. We'll specify the user and group that we want to process to run under. We will give our regular user account ownership of the process since it owns all of the relevant files. We'll give group ownership to the www-data group so that Nginx can communicate easily with Gunicorn.

      We'll then map out the working directory and specify the command to use to start the service. In this case, we'll have to specify the full path to the Gunicorn executable, which is installed within our virtual environment. We will bind the process to the Unix socket we created within the /run directory so that the process can communicate with Nginx. We log all data to standard output so that the journald process can collect the Gunicorn logs. We can also specify any optional Gunicorn tweaks here, like the number of worker processes. Here, we run Gunicorn with 3 worker processes.

      Add the following Service section to the file. Be sure to replace the username listed here with your own username:

      /etc/systemd/system/gunicorn.service

      [Unit]
      Description=gunicorn daemon
      Requires=gunicorn.socket
      After=network.target
      
      [Service]
      User=sammy
      Group=www-data
      WorkingDirectory=/home/sammy/django-polls
      ExecStart=/home/sammy/envs/polls/bin/gunicorn 
                --access-logfile - 
                --workers 3 
                --bind unix:/run/gunicorn.sock 
                mysite.wsgi:application
      

      Finally, we'll add an [Install] section. This will tell systemd what to link this service to if we enable it to start at boot. We want this service to start when the regular multi-user system is up and running:

      /etc/systemd/system/gunicorn.service

      [Unit]
      Description=gunicorn daemon
      Requires=gunicorn.socket
      After=network.target
      
      [Service]
      User=sammy
      Group=www-data
      WorkingDirectory=/home/sammy/django-polls
      ExecStart=/home/sammy/envs/polls/bin/gunicorn 
                --access-logfile - 
                --workers 3 
                --bind unix:/run/gunicorn.sock 
                mysite.wsgi:application
      
      [Install]
      WantedBy=multi-user.target
      

      With that, our systemd service file is complete. Save and close it now.

      We can now start and enable the Gunicorn socket. This will create the socket file at /run/gunicorn.sock now and at boot. When a connection is made to that socket, systemd will automatically start the gunicorn.service to handle it:

      • sudo systemctl start gunicorn.socket
      • sudo systemctl enable gunicorn.socket

      We can confirm that the operation was successful by checking for the socket file.

      Checking for the Gunicorn Socket File

      Check the status of the process to find out whether it started successfully:

      • sudo systemctl status gunicorn.socket

      You should see the following output:

      Output

      Failed to dump process list, ignoring: No such file or directory ● gunicorn.socket - gunicorn socket Loaded: loaded (/etc/systemd/system/gunicorn.socket; enabled; vendor preset: enabled) Active: active (running) since Tue 2019-03-05 19:19:16 UTC; 1h 22min ago Listen: /run/gunicorn.sock (Stream) CGroup: /system.slice/gunicorn.socket Mar 05 19:19:16 django systemd[1]: Listening on gunicorn socket.

      Next, check for the existence of the gunicorn.sock file within the /run directory:

      Output

      /run/gunicorn.sock: socket

      If the systemctl status command indicated that an error occurred, or if you do not find the gunicorn.sock file in the directory, it's an indication that the Gunicorn socket was not created correctly. Check the Gunicorn socket's logs by typing:

      • sudo journalctl -u gunicorn.socket

      Take another look at your /etc/systemd/system/gunicorn.socket file to fix any problems before continuing.

      Testing Socket Activation

      Currently, if you've only started the gunicorn.socket unit, the gunicorn.service will not be active, since the socket has not yet received any connections. You can check this by typing:

      • sudo systemctl status gunicorn

      Output

      ● gunicorn.service - gunicorn daemon Loaded: loaded (/etc/systemd/system/gunicorn.service; disabled; vendor preset: enabled) Active: inactive (dead)

      To test the socket activation mechanism, we can send a connection to the socket through curl by typing:

      • curl --unix-socket /run/gunicorn.sock localhost

      You should see the HTML output from your application in the terminal. This indicates that Gunicorn has started and is able to serve your Django application. You can verify that the Gunicorn service is running by typing:

      • sudo systemctl status gunicorn

      Output

      ● gunicorn.service - gunicorn daemon Loaded: loaded (/etc/systemd/system/gunicorn.service; disabled; vendor preset: enabled) Active: active (running) since Tue 2019-03-05 20:43:56 UTC; 1s ago Main PID: 19074 (gunicorn) Tasks: 4 (limit: 4915) CGroup: /system.slice/gunicorn.service ├─19074 /home/sammy/envs/polls/bin/python3 /home/sammy/envs/polls/bin/gunicorn --access-logfile - --workers 3 --bind unix:/run/gunicorn.sock mysite.wsgi:application ├─19098 /home/sammy/envs/polls/bin/python3 /home/sammy/envs/polls/bin/gunicorn . . . Mar 05 20:43:56 django systemd[1]: Started gunicorn daemon. Mar 05 20:43:56 django gunicorn[19074]: [2019-03-05 20:43:56 +0000] [19074] [INFO] Starting gunicorn 19.9.0 . . . Mar 05 20:44:15 django gunicorn[19074]: - - [05/Mar/2019:20:44:15 +0000] "GET / HTTP/1.1" 301 0 "-" "curl/7.58.0"

      If the output from curl or the output of systemctl status indicates that a problem occurred, check the logs for additional details:

      • sudo journalctl -u gunicorn

      You can also check your /etc/systemd/system/gunicorn.service file for problems. If you make changes to this file, be sure to reload the daemon to reread the service definition and restart the Gunicorn process:

      • sudo systemctl daemon-reload
      • sudo systemctl restart gunicorn

      Make sure you troubleshoot any issues before continuing on to configuring the Nginx server.

      Step 8 — Configuring Nginx HTTPS and Gunicorn Proxy Passing

      Now that Gunicorn is set up in a more robust fashion, we need to configure Nginx to encrypt connections and hand off traffic to the Gunicorn process.

      If you followed the preqrequisites and set up Nginx with Let's Encrypt, you should already have a server block file corresponding to your domain available to you in Nginx's sites-available directory. If not, follow How To Secure Nginx with Let's Encrypt on Ubuntu 18.04 and return to this step.

      Before we edit this example.com server block file, we’ll first remove the default server block file that gets rolled out by default after installing Nginx:

      • sudo rm /etc/nginx/sites-enabled/default

      We'll now modify the example.com server block file to pass traffic to Gunicorn instead of the default index.html page configured in the prerequisite step.

      Open the server block file corresponding to your domain in your editor:

      • sudo nano /etc/nginx/sites-available/example.com

      You should see something like the following:

      /etc/nginx/sites-available/example.com

      server {
      
              root /var/www/example.com/html;
              index index.html index.htm index.nginx-debian.html;
      
              server_name example.com www.example.com;
      
              location / {
                      try_files $uri $uri/ =404;
              }
      
          listen [::]:443 ssl ipv6only=on; # managed by Certbot
          listen 443 ssl; # managed by Certbot
          ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; # managed by Certbot
          ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; # managed by Certbot
          include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
          ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
      
      }
      
      server {
          if ($host = example.com) {
              return 301 https://$host$request_uri;
          } # managed by Certbot
      
      
              listen 80;
              listen [::]:80;
      
              server_name example.com www.example.com;
          return 404; # managed by Certbot
      
      
      }
      

      This is a combination of the default server block file created in How to Install Nginx on Ubuntu 18.04 as well as additions appended automatically by Let's Encrypt. We are going to delete the contents of this file and write a new configuration that redirects HTTP traffic to HTTPS, and forwards incoming requests to the Gunicorn socket we created in the previous step.

      If you'd like, you can make a backup of this file using cp. Quit your text editor and create a backup called example.com.old:

      • sudo cp /etc/nginx/sites-available/example.com /etc/nginx/sites-available/example.com.old

      Now, reopen the file and delete its contents. We'll build the new configuration block by block.

      Begin by pasting in the following block, which redirects HTTP requests at port 80 to HTTPS:

      /etc/nginx/sites-available/example.com

      server {
          listen 80 default_server;
          listen [::]:80 default_server;
          server_name _;
          return 301 https://example.com$request_uri;
      }
      

      Here we listen for HTTP IPv4 and IPv6 requests on port 80 and send a 301 response header to redirect the request to HTTPS port 443 using the example.com domain. This will also redirect direct HTTP requests to the server’s IP address.

      After this block, append the following block of config code that handles HTTPS requests for the example.com domain:

      /etc/nginx/sites-available/example.com

      . . . 
      server {
          listen [::]:443 ssl ipv6only=on;
          listen 443 ssl;
          server_name example.com www.example.com;
      
          # Let's Encrypt parameters
          ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
          ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
          include /etc/letsencrypt/options-ssl-nginx.conf;
          ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
      
          location = /favicon.ico { access_log off; log_not_found off; }
      
          location / {
              proxy_pass         http://unix:/run/gunicorn.sock;
              proxy_redirect     off;
      
              proxy_set_header   Host              $http_host;
              proxy_set_header   X-Real-IP         $remote_addr;
              proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
              proxy_set_header   X-Forwarded-Proto https;
          }
      }
      

      Here, we first listen on port 443 for requests hitting the example.com and www.example.com domains.

      Next, we provide the same Let's Encrypt configuration included in the default server block file, which specifies the location of the SSL certificate and private key, as well as some additional security parameters.

      The location = /favicon.ico line instructs Nginx to ignore any problems with finding a favicon.

      The last location = / block instructs Nginx to hand off requests to the Gunicorn socket configured in Step 8. In addition, it adds headers to inform the upstream Django server that a request has been forwarded and to provide it with various request properties.

      After you've pasted in those two configuration blocks, the final file should look something like this:

      /etc/nginx/sites-available/example.com

      server {
          listen 80 default_server;
          listen [::]:80 default_server;
          server_name _;
          return 301 https://example.com$request_uri;
      }
      server {
              listen [::]:443 ssl ipv6only=on;
              listen 443 ssl;
              server_name example.com www.example.com;
      
              # Let's Encrypt parameters
              ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
              ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
              include /etc/letsencrypt/options-ssl-nginx.conf;
              ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
      
              location = /favicon.ico { access_log off; log_not_found off; }
      
              location / {
                proxy_pass         http://unix:/run/gunicorn.sock;
                proxy_redirect     off;
      
                proxy_set_header   Host              $http_host;
                proxy_set_header   X-Real-IP         $remote_addr;
                proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
                proxy_set_header   X-Forwarded-Proto https;
              }
      }
      

      Save and close the file when you are finished.

      Test your Nginx configuration for syntax errors by typing:

      If your configuration is error-free, restart Nginx by typing:

      • sudo systemctl restart nginx

      You should now be able to visit your server's domain or IP address to view your application. Your browser should be using a secure HTTPS connection to connect to the Django backend.

      To completely secure our Django project, we need to add a couple of security parameters to its settings.py file. Reopen this file in your editor:

      • nano ~/django-polls/mysite/settings.py

      Scroll to the bottom of the file, and add the following parameters:

      ~/django-polls/mysite/settings.py

      . . .
      
      SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
      SESSION_COOKIE_SECURE = True
      CSRF_COOKIE_SECURE = True
      SECURE_SSL_REDIRECT = True
      

      These settings tell Django that you have enabled HTTPS on your server, and instruct it to use "secure" cookies. To learn more about these settings, consult the SSL/HTTPS section of Security in Django.

      When you're done, save and close the file.

      Finally, restart Gunicorn:

      • sudo systemctl restart gunicorn

      At this point, you have configured Nginx to redirect HTTP requests and hand off these requests to Gunicorn. HTTPS should now be fully enabled for your Django project and app. If you're running into errors, this discussion on troubleshooting Nginx and Gunicorn may help.

      Warning: As stated in Configuring CORS Headers, be sure to change the Origin from the wildcard * domain to your domain name (https://example.com in this guide) before making your app accessible to end users.

      Conclusion

      In this guide, you set up and configured a scalable Django application running on an Ubuntu 18.04 server. This setup can be replicated across multiple servers to create a highly-available architecture. Furthermore, this app and its config can be containerized using Docker or another container runtime to ease deployment and scaling. These containers can then be deployed into a container cluster like Kubernetes. In an upcoming Tutorial series, we will explore how to containerize and modernize this Django polls app so that it can run in a Kubernetes cluster.

      In addition to static files, you may also wish to offload your Django Media files to object storage. To learn how to do this, consult Using Amazon S3 to Store your Django Site's Static and Media Files. You might also consider compressing static files to further optimize their delivery to end users. To do this, you can use a Django plugin like Django compressor.



      Source link

      Managed Databases Connection Pools and PostgreSQL Benchmarking Using pgbench


      Introduction

      DigitalOcean Managed Databases allows you to scale your PostgreSQL database using several methods. One such method is a built-in connection pooler that allows you to efficiently handle large numbers of client connections and reduce the CPU and memory footprint of these open connections. By using a connection pool and sharing a fixed set of recyclable connections, you can handle significantly more concurrent client connections, and squeeze extra performance out of your PostgreSQL database.

      In this tutorial we’ll use pgbench, PostgreSQL’s built-in benchmarking tool, to run load tests on a DigitalOcean Managed PostgreSQL Database. We’ll dive in to connection pools, describe how they work, and show how to create one using the Cloud Control panel. Finally, using results from the pgbench tests, we’ll demonstrate how using a connection pool can be an inexpensive method of increasing database throughput.

      Prerequisites

      To complete this tutorial, you’ll need:

      • A DigitalOcean Managed PostgreSQL Database cluster. To learn how to provision and configure a DigitalOcean PostgreSQL cluster, consult the Managed Database product documentation.
      • A client machine with PostgreSQL installed. By default, your PostgreSQL installation will contain the pgbench benchmarking utility and the psql client, both of which we’ll use in this guide. Consult How To Install and Use PostgreSQL on Ubuntu 18.04 to learn how to Install PostgreSQL. If you’re not running Ubuntu on your client machine, you can use the version finder to find the appropriate tutorial.

      Once you have a DigitalOcean PostgreSQL cluster up and running and a client machine with pgbench installed, you’re ready to begin with this guide.

      Step 1 — Creating and Initializing benchmark Database

      Before we create a connection pool for our database, we’ll first create the benchmark database on our PostgreSQL cluster and populate it with some dummy data on which pgbench will run its tests. The pgbench utility repeatedly runs a series of five SQL commands (consisting of SELECT, UPDATE, and INSERT queries) in a transaction, using multiple threads and clients, and calculates a useful performance metric called Transactions per Second (TPS). TPS is a measure of database throughput, counting the number of atomic transactions processed by the database in one second. To learn more about the specific commands executed by pgbench, consult What is the “Transaction” Actually Performed in pgbench? from the official pgbench documentation.

      Let’s begin by connecting to our PostgreSQL cluster and creating the benchmark database.

      First, retrieve your cluster’s Connection Details by navigating to Databases and locating your PostgreSQL cluster. Click into your cluster. You should see a cluster overview page containing the following Connection Details box:

      PostgreSQL Cluster Connection Details

      From this, we can parse the following config variables:

      • Admin user: doadmin
      • Admin password: your_password
      • Cluster endpoint: dbaas-test-do-user-3587522-0.db.ondigitalocean.com
      • Connection port: 25060
      • Database to connect to: defaultdb
      • SSL Mode: require (use an SSL-encrypted connection for increased security)

      Take note of these parameters, as you’ll need them when using both the psql client and pgbench tool.

      Click on the dropdown above this box and select Connection String. We’ll copy this string and pass it in to psql to connect to this PostgreSQL node.

      Connect to your cluster using psql and the connection string you just copied:

      • psql postgresql://doadmin:your_password@your_cluster_endpoint:25060/defaultdb?sslmode=require

      You should see the following PostgreSQL client prompt, indicating that you’ve connected to your PostgreSQL cluster successfully:

      Output

      psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. defaultdb=>

      From here, create the benchmark database:

      • CREATE DATABASE benchmark;

      You should see the following output:

      Output

      CREATE DATABASE

      Now, disconnect from the cluster:

      Before we run the pgbench tests, we need to populate this benchmark database with some tables and dummy data required to run the tests.

      To do this, we’ll run pgbench with the following flags:

      • -h: The PostgreSQL cluster endpoint
      • -p: The PostgreSQL cluster connection port
      • -U: The database username
      • -i: Indicates that we'd like to initialize the benchmark database with benchmarking tables and their dummy data.
      • -s : Set a scale factor of 150, which will multiply table sizes by 150. The default scale factor of 1 results in tables of the following sizes:

        table                   # of rows
        ---------------------------------
        pgbench_branches        1
        pgbench_tellers         10
        pgbench_accounts        100000
        pgbench_history         0
        

        Using a scale factor of 150, the pgbench_accounts table will contain 15,000,000 rows.

        Note: To avoid excessive blocked transactions, be sure to set the scale factor to a value at least as large as the number of concurrent clients you intend to test with. In this tutorial we'll test with 150 clients at most, so we set -s to 150 here. To learn more, consult these recommended practices from the official pgbench documentation.

      Run the complete pgbench command:

      • pgbench -h your_cluster_endpoint -p 25060 -U doadmin -i -s 150 benchmark

      After running this command, you will be prompted to enter the password for the database user you specified. Enter the password, and hit ENTER.

      You should see the following output:

      Output

      dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data... 100000 of 15000000 tuples (0%) done (elapsed 0.19 s, remaining 27.93 s) 200000 of 15000000 tuples (1%) done (elapsed 0.85 s, remaining 62.62 s) 300000 of 15000000 tuples (2%) done (elapsed 1.21 s, remaining 59.23 s) 400000 of 15000000 tuples (2%) done (elapsed 1.63 s, remaining 59.44 s) 500000 of 15000000 tuples (3%) done (elapsed 2.05 s, remaining 59.51 s) . . . 14700000 of 15000000 tuples (98%) done (elapsed 70.87 s, remaining 1.45 s) 14800000 of 15000000 tuples (98%) done (elapsed 71.39 s, remaining 0.96 s) 14900000 of 15000000 tuples (99%) done (elapsed 71.91 s, remaining 0.48 s) 15000000 of 15000000 tuples (100%) done (elapsed 72.42 s, remaining 0.00 s) vacuuming... creating primary keys... done.

      At this point, we've created a benchmarking database, populated with the tables and data required to run the pgbench tests. We can now move on to running a baseline test which we'll use to compare performance before and after connection pooling is enabled.

      Step 2 — Running a Baseline pgbench Test

      Before we run our first benchmark, it's worth diving into what we're trying to optimize with connection pools.

      Typically when a client connects to a PostgreSQL database, the main PostgreSQL OS process forks itself into a child process corresponding to this new connection. When there are only a few connections, this rarely presents an issue. However, as clients and connections scale, the CPU and memory overhead of creating and maintaining these connections begins to add up, especially if the application in question does not efficiently use database connections. In addition, the max_connections PostgreSQL setting may limit the number of client connections allowed, resulting in additional connections being refused or dropped.

      A connection pool keeps open a fixed number of database connections, the pool size, which it then uses to distribute and execute client requests. This means that you can accommodate far more simultaneous connections, efficiently deal with idle or stagnant clients, as well as queue up client requests during traffic spikes instead of rejecting them. By recycling connections, you can more efficiently use your machine's resources in an environment where there is a heavy connection volume, and squeeze extra performance out of your database.

      A connection pool can be implemented either on the application side or as middleware between the database and your application. The Managed Databases connection pooler is built on top of pgBouncer, a lightweight, open-source middleware connection pooler for PostgreSQL. Its interface is available via the Cloud Control Panel UI.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool. You should see the following configuration window:

      Connection Pools Config Window

      Here, you can configure the following fields:

      • Pool Name: A unique name for your connection pool
      • Database: The database for which you'd like to pool connections
      • User: The PostgreSQL user the connection pool will authenticate as
      • Mode: One of Session, Transaction, or Statement. This option controls how long the pool assigns a backend connection to a client.
        • Session: The client holds on to the connection until it explicitly disconnects.
        • Transaction: The client obtains the connection until it completes a transaction, after which the connection is returned to the pool.
        • Statement: The pool aggressively recycles connections after each client statement. In statement mode, multi-statement transactions are not allowed. To learn more, consult the Connection Pools product documentation.
      • Pool Size: The number of connections the connection pool will keep open between itself and the database.

      Before we create a connection pool, we'll run a baseline test to which we can compare database performance with connection pooling.

      In this tutorial, we'll use a 4 GB RAM, 2 vCPU, 80 GB Disk, primary node only Managed Database setup. You can scale the benchmark test parameters in this section according to your PostgreSQL cluster specs.

      DigitalOcean Managed Database clusters have the PostgreSQL max_connections parameter preset to 25 connections per 1 GB RAM. A 4 GB RAM PostgreSQL node therefore has max_connections set to 100. In addition, for all clusters, 3 connections are reserved for maintenance. So for this 4 GB RAM PostgreSQL cluster, 97 connections are available for connection pooling.

      With this in mind, let's run our first baseline pgbench test.

      Log in to your client machine. We’ll run pgbench, specifying the database endpoint, port and user as usual. In addition, we’ll provide the following flags:

      • -c: The number of concurrent clients or database sessions to simulate. We set this to 50 so as to simulate a number of concurrent connections smaller than the max_connections parameter for our PostgreSQL cluster.
      • -j: The number of worker threads pgbench will use to run the benchmark. If you're using a multi-CPU machine, you can tune this upwards to distribute clients across threads. On a two-core machine, we set this to 2.
      • -P: Display progress and metrics every 60 seconds.
      • -T: Run the benchmark for 600 seconds (10 minutes). To produce consistent, reproducible results, it's important that you run the benchmark for several minutes, or through one checkpoint cycle.

      We’ll also specify that we'd like to run the benchmark against the benchmark database we created and populated earlier.

      Run the following complete pgbench command:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 50 -j 2 -P 60 -T 600 benchmark

      Hit ENTER and then type in the password for the doadmin user to begin running the test. You should see output similar to the following (results will depend on the specs of your PostgreSQL cluster):

      Output

      starting vacuum...end. progress: 60.0 s, 157.4 tps, lat 282.988 ms stddev 40.261 progress: 120.0 s, 176.2 tps, lat 283.726 ms stddev 38.722 progress: 180.0 s, 167.4 tps, lat 298.663 ms stddev 238.124 progress: 240.0 s, 178.9 tps, lat 279.564 ms stddev 43.619 progress: 300.0 s, 178.5 tps, lat 280.016 ms stddev 43.235 progress: 360.0 s, 178.8 tps, lat 279.737 ms stddev 43.307 progress: 420.0 s, 179.3 tps, lat 278.837 ms stddev 43.783 progress: 480.0 s, 178.5 tps, lat 280.203 ms stddev 43.921 progress: 540.0 s, 180.0 tps, lat 277.816 ms stddev 43.742 progress: 600.0 s, 178.5 tps, lat 280.044 ms stddev 43.705 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 105256 latency average = 282.039 ms latency stddev = 84.244 ms tps = 175.329321 (including connections establishing) tps = 175.404174 (excluding connections establishing)

      Here, we observed that over a 10 minute run with 50 concurrent sessions, we processed 105,256 transactions with a throughput of roughly 175 transactions per second.

      Now, let's run the same test, this time using 150 concurrent clients, a value that is higher than max_connections for this database, to synthetically simulate a mass influx of client connections:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 150 -j 2 -P 60 -T 600 benchmark

      You should see output similar to the following:

      Output

      starting vacuum...end. connection to database "pgbench" failed: FATAL: remaining connection slots are reserved for non-replication superuser connections progress: 60.0 s, 182.6 tps, lat 280.069 ms stddev 42.009 progress: 120.0 s, 253.8 tps, lat 295.612 ms stddev 237.448 progress: 180.0 s, 271.3 tps, lat 276.411 ms stddev 40.643 progress: 240.0 s, 273.0 tps, lat 274.653 ms stddev 40.942 progress: 300.0 s, 272.8 tps, lat 274.977 ms stddev 41.660 progress: 360.0 s, 250.0 tps, lat 300.033 ms stddev 282.712 progress: 420.0 s, 272.1 tps, lat 275.614 ms stddev 42.901 progress: 480.0 s, 261.1 tps, lat 287.226 ms stddev 112.499 progress: 540.0 s, 272.5 tps, lat 275.309 ms stddev 41.740 progress: 600.0 s, 271.2 tps, lat 276.585 ms stddev 41.221 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 154892 latency average = 281.421 ms latency stddev = 125.929 ms tps = 257.994941 (including connections establishing) tps = 258.049251 (excluding connections establishing)

      Note the FATAL error, indicating that pgbench hit the 100 connection limit threshold set by max_connections, resulting in a refused connection. The test was still able to complete, with a TPS of roughly 257.

      At this point we can investigate how a connection pool could potentially improve our database's throughput.

      Step 3 — Creating and Testing a Connection Pool

      In this step we'll create a connection pool and rerun the previous pgbench test to see if we can improve our database's throughput.

      In general, the max_connections setting and connection pool parameters are tuned in tandem to max out the database's load. However, because max_connections is abstracted away from the user in DigitalOcean Managed Databases, our main levers here are the connection pool Mode and Size settings.

      To begin, let's create a connection pool in Transaction mode that keeps open all the available backend connections.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool.

      In the configuration window that appears, fill in the following values:

      Connection Pool Configuration Values

      Here we name our connection pool test-pool, and use it with the benchmark database. Our database user is doadmin and we set the connection pool to Transaction mode. Recall from earlier that for a managed database cluster with 4GB of RAM, there are 97 available database connections. Accordingly, configure the pool to keep open 97 database connections.

      When you're done, hit Create Pool.

      You should now see this pool in the Control Panel:

      Connection Pool in Control Panel

      Grab its URI by clicking Connection Details. It should look something like the following

      postgres://doadmin:password@pool_endpoint:pool_port/test-pool?sslmode=require
      

      You should notice a different port here, and potentially a different endpoint and database name, corresponding to the pool name test-pool.

      Now that we've created the test-pool connection pool, we can rerun the pgbench test we ran above.

      Rerun pgbench

      From your client machine, run the following pgbench command (with 150 concurrent clients), making sure to substitute the highlighted values with those in your connection pool URI:

      • pgbench -h pool_endpoint -p pool_port -U doadmin -c 150 -j 2 -P 60 -T 600 test-pool

      Here we once again use 150 concurrent clients, run the test across 2 threads, print progress every 60 seconds, and run the test for 600 seconds. We set the database name to test-pool, the name of the connection pool.

      Once the test completes, you should see output similar to the following (note that these results will vary depending on the specs of your database node):

      Output

      starting vacuum...end. progress: 60.0 s, 240.0 tps, lat 425.251 ms stddev 59.773 progress: 120.0 s, 350.0 tps, lat 428.647 ms stddev 57.084 progress: 180.0 s, 340.3 tps, lat 440.680 ms stddev 313.631 progress: 240.0 s, 364.9 tps, lat 411.083 ms stddev 61.106 progress: 300.0 s, 366.5 tps, lat 409.367 ms stddev 60.165 progress: 360.0 s, 362.5 tps, lat 413.750 ms stddev 59.005 progress: 420.0 s, 359.5 tps, lat 417.292 ms stddev 60.395 progress: 480.0 s, 363.8 tps, lat 412.130 ms stddev 60.361 progress: 540.0 s, 351.6 tps, lat 426.661 ms stddev 62.960 progress: 600.0 s, 344.5 tps, lat 435.516 ms stddev 65.182 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 206768 latency average = 421.719 ms latency stddev = 114.676 ms tps = 344.240797 (including connections establishing) tps = 344.385646 (excluding connections establishing)

      Notice here that we were able to increase our database's throughput from 257 TPS to 344 TPS with 150 concurrent connections (an increase of 33%), and did not run up against the max_connections limit we previously hit without a connection pool. By placing a connection pool in front of the database, we can avoid dropped connections and significantly increase database throughput in an environment with a large number of simultaneous connections.

      If you run this same test, but with a -c value of 50 (specifying a smaller number of clients), the gains from using a connection pool become much less evident:

      Output

      starting vacuum...end. progress: 60.0 s, 154.0 tps, lat 290.592 ms stddev 35.530 progress: 120.0 s, 162.7 tps, lat 307.168 ms stddev 241.003 progress: 180.0 s, 172.0 tps, lat 290.678 ms stddev 36.225 progress: 240.0 s, 172.4 tps, lat 290.169 ms stddev 37.603 progress: 300.0 s, 177.8 tps, lat 281.214 ms stddev 35.365 progress: 360.0 s, 177.7 tps, lat 281.402 ms stddev 35.227 progress: 420.0 s, 174.5 tps, lat 286.404 ms stddev 34.797 progress: 480.0 s, 176.1 tps, lat 284.107 ms stddev 36.540 progress: 540.0 s, 173.1 tps, lat 288.771 ms stddev 38.059 progress: 600.0 s, 174.5 tps, lat 286.508 ms stddev 59.941 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 102938 latency average = 288.509 ms latency stddev = 83.503 ms tps = 171.482966 (including connections establishing) tps = 171.553434 (excluding connections establishing)

      Here we see that we were not able to increase throughput by using a connection pool. Our throughput went down to 171 TPS from 175 TPS.

      Although in this guide we use pgbench with its built-in benchmark data set, the best test for determining whether or not to use a connection pool is a benchmark load that accurately represents production load on your database, against production data. Creating custom benchmarking scripts and data is beyond the scope of this guide, but to learn more, consult the official pgbench documentation.

      Note: The pool size setting is highly workload-specific. In this guide, we configured the connection pool to use all the available backend database connections. This was because throughout our benchmark, the database rarely reached full utilization (you can monitor database load from the Metrics tab in the Cloud Control Panel). Depending on your database's load, this may not be the optimal setting. If you notice that your database is constantly fully saturated, shrinking the connection pool may increase throughput and improve performance by queuing additional requests instead of trying to execute them all at the same time on an already loaded server.

      Conclusion

      DigitalOcean Managed Databases connection pooling is a powerful feature that can help you quickly squeeze extra performance out of your database. Along with other techniques like replication, caching, and sharding, connection pooling can help you scale your database layer to process an even greater volume of requests.

      In this guide we focused on a simplistic and synthetic testing scenario using PostgreSQL's built-in pgbench benchmarking tool and its default benchmark test. In any production scenario, you should run benchmarks against actual production data while simulating production load. This will allow you to tune your database for your particular usage pattern.

      Along with pgbench, other tools exist to benchmark and load your database. One such tool developed by Percona is sysbench-tpcc. Another is Apache's JMeter, which can load test databases as well as web applications.

      To learn more about DigitalOcean Managed Databases, consult the Managed Databases product documentation. To learn more about sharding, another useful scaling technique, consult Understanding Database Sharding.

      References



      Source link