One place for hosting & domains

      Managed

      How to Set Up a Scalable Django App with DigitalOcean Managed Databases and Spaces


      Introduction

      Django is a powerful web framework that can help you get your Python application or website off the ground quickly. It includes several convenient features like an object-relational mapper, a Python API, and a customizable administrative interface for your application. It also includes a caching framework and encourages clean app design through its URL Dispatcher and Template system.

      Out of the box, Django includes a minimal web server for testing and local development, but it should be paired with a more robust serving infrastructure for production use cases. Django is often rolled out with an Nginx web server to handle static file requests and HTTPS redirection, and a Gunicorn WSGI server to serve the app.

      In this guide, we will augment this setup by offloading static files like Javascript and CSS stylesheets to DigitalOcean Spaces, and optionally delivering them using a Content Delivery Network, or CDN, which stores these files closer to end users to reduce transfer times. We’ll also use a DigitalOcean Managed PostgreSQL database as our data store to simplify the data layer and avoid having to manually configure a scalable PostgreSQL database.

      Prerequisites

      Before you begin with this guide, you should have the following available to you:

      Step 1 — Installing Packages from the Ubuntu Repositories

      To begin, we’ll download and install all of the items we need from the Ubuntu repositories. We’ll use the Python package manager pip to install additional components a bit later.

      We need to first update the local apt package index and then download and install the packages.

      In this guide, we’ll use Django with Python 3. To install the necessary libraries, log in to your server and type:

      • sudo apt update
      • sudo apt install python3-pip python3-dev libpq-dev curl postgresql-client

      This will install pip, the Python development files needed to build Gunicorn, the libpq header files needed to build the Pyscopg PostgreSQL Python adapter, and the PostgreSQL command-line client.

      Hit Y and then ENTER when prompted to begin downloading and installing the packages.

      Next, we’ll configure the database to work with our Django app.

      Step 2 — Creating the PostgreSQL Database and User

      We’ll now create a database and database user for our Django application.

      To begin, grab the Connection Parameters for your cluster by navigating to Databases from the Cloud Control Panel, and clicking into your database. You should see a Connection Details box containing some parameters for your cluster. Note these down.

      Back on the command line, log in to your cluster using these credentials and the psql PostgreSQL client we just installed:

      • psql -U do_admin -h host -p port -d database

      When prompted, enter the password displayed alongside the doadmin Postgres username, and hit ENTER.

      You will be given a PostgreSQL prompt from which you can manage the database.

      First, create a database for your project called polls:

      Note: Every Postgres statement must end with a semicolon, so make sure that your command ends with one if you are experiencing issues.

      We can now switch to the polls database:

      Next, create a database user for the project. Make sure to select a secure password:

      • CREATE USER myprojectuser WITH PASSWORD 'password';

      We'll now modify a few of the connection parameters for the user we just created. This will speed up database operations so that the correct values do not have to be queried and set each time a connection is established.

      We are setting the default encoding to UTF-8, which Django expects. We are also setting the default transaction isolation scheme to "read committed", which blocks reads from uncommitted transactions. Lastly, we are setting the timezone. By default, our Django projects will be set to use UTC. These are all recommendations from the Django project itself.

      Enter the following commands at the PostgreSQL prompt:

      • ALTER ROLE myprojectuser SET client_encoding TO 'utf8';
      • ALTER ROLE myprojectuser SET default_transaction_isolation TO 'read committed';
      • ALTER ROLE myprojectuser SET timezone TO 'UTC';

      Now we can give our new user access to administer our new database:

      • GRANT ALL PRIVILEGES ON DATABASE polls TO myprojectuser;

      When you are finished, exit out of the PostgreSQL prompt by typing:

      Your Django app is now ready to connect to and manage this database.

      In the next step, we'll install virtualenv and create a Python virtual environment for our Django project.

      Step 3 — Creating a Python Virtual Environment for your Project

      Now that we've set up our database to work with our application, we'll create a Python virtual environment that will isolate this project's dependencies from the system's global Python installation.

      To do this, we first need access to the virtualenv command. We can install this with pip.

      Upgrade pip and install the package by typing:

      • sudo -H pip3 install --upgrade pip
      • sudo -H pip3 install virtualenv

      With virtualenv installed, we can create a directory to store our Python virtual environments and make one to use with the Django polls app.

      Create a directory called envs and navigate into it:

      Within this directory, create a Python virtual environment called polls by typing:

      This will create a directory called polls within the envs directory. Inside, it will install a local version of Python and a local version of pip. We can use this to install and configure an isolated Python environment for our project.

      Before we install our project's Python requirements, we need to activate the virtual environment. You can do that by typing:

      • source polls/bin/activate

      Your prompt should change to indicate that you are now operating within a Python virtual environment. It will look something like this: (polls)user@host:~/envs$.

      With your virtual environment active, install Django, Gunicorn, and the psycopg2 PostgreSQL adaptor with the local instance of pip:

      Note: When the virtual environment is activated (when your prompt has (polls) preceding it), use pip instead of pip3, even if you are using Python 3. The virtual environment's copy of the tool is always named pip, regardless of the Python version.

      • pip install django gunicorn psycopg2-binary

      You should now have all of the software you need to run the Django polls app. In the next step, we'll create a Django project and install this app.

      Step 4 — Creating the Polls Django Application

      We can now set up our sample application. In this tutorial, we'll use the Polls demo application from the Django documentation. It consists of a public site that allows users to view polls and vote in them, and an administrative control panel that allows the admin to modify, create, and delete polls.

      In this guide, we'll skip through the tutorial steps, and simply clone the final application from the DigitalOcean Community django-polls repo.

      If you'd like to complete the steps manually, create a directory called django-polls in your home directory and navigate into it:

      • cd
      • mkdir django-polls
      • cd django-polls

      From there, you can follow the Writing your first Django app tutorial from the official Django documentation. When you're done, skip to Step 5.

      If you just want to clone the finished app, navigate to your home directory and use git to clone the django-polls repo:

      • cd
      • git clone https://github.com/do-community/django-polls.git

      cd into it, and list the directory contents:

      You should see the following objects:

      Output

      LICENSE README.md manage.py mysite polls templates

      manage.py is the main command-line utility used to manipulate the app. polls contains the polls app code, and mysite contains project-scope code and settings. templates contains custom template files for the administrative interface. To learn more about the project structure and files, consult Creating a Project from the official Django documentation.

      Before running the app, we need to adjust its default settings and connect it to our database.

      Step 5 — Adjusting the App Settings

      In this step, we'll modify the Django project's default configuration to increase security, connect Django to our database, and collect static files into a local directory.

      Begin by opening the settings file in your text editor:

      • nano ~/django-polls/mysite/settings.py

      Start by locating the ALLOWED_HOSTS directive. This defines a list of the addresses or domain names that you want to use to connect to the Django instance. An incoming request with a Host header not in this list will raise an exception. Django requires that you set this to prevent a certain class of security vulnerability.

      In the square brackets, list the IP addresses or domain names associated with your Django server. Each item should be listed in quotations with entries separated by a comma. Your list will also include localhost, since you will be proxying connections through a local Nginx instance. If you wish to include requests for an entire domain and any subdomains, prepend a period to the beginning of the entry.

      In the snippet below, there are a few commented out examples that demonstrate what these entries should look like:

      ~/django-polls/mysite/settings.py

      . . .
      
      # The simplest case: just add the domain name(s) and IP addresses of your Django server
      # ALLOWED_HOSTS = [ 'example.com', '203.0.113.5']
      # To respond to 'example.com' and any subdomains, start the domain with a dot
      # ALLOWED_HOSTS = ['.example.com', '203.0.113.5']
      ALLOWED_HOSTS = ['your_server_domain_or_IP', 'second_domain_or_IP', . . ., 'localhost']
      
      . . . 
      

      Next, find the section of the file that configures database access. It will start with DATABASES. The configuration in the file is for a SQLite database. We already created a PostgreSQL database for our project, so we need to adjust these settings.

      We will tell Django to use the psycopg2 database adaptor we installed with pip, instead of the default SQLite engine. We’ll also reuse the Connection Parameters referenced in Step 2. You can always find this information from the Managed Databases section of the DigitalOcean Cloud Control Panel.

      Update the file with your database settings: the database name (polls), the database username, the database user's password, and the database host and port. Be sure to replace the database-specific values with your own information:

      ~/django-polls/mysite/settings.py

      . . .
      
      DATABASES = {
          'default': {
              'ENGINE': 'django.db.backends.postgresql_psycopg2',
              'NAME': 'polls',
              'USER': 'myprojectuser',
              'PASSWORD': 'password',
              'HOST': 'managed_db_host',
              'PORT': 'managed_db_port',
          }
      }
      
      . . .
      

      Next, move down to the bottom of the file and add a setting indicating where the static files should be placed. This is necessary so that Nginx can handle requests for these items. The following line tells Django to place them in a directory called static in the base project directory:

      ~/django-polls/mysite/settings.py

      . . .
      
      STATIC_URL = '/static/'
      STATIC_ROOT = os.path.join(BASE_DIR, 'static/')
      

      Save and close the file when you are finished.

      At this point, you've configured the Django project's database, security, and static files settings. If you followed the polls tutorial from the start and did not clone the GitHub repo, you can move on to Step 6. If you cloned the GitHub repo, there remains one additional step.

      The Django settings file contains a SECRET_KEY variable that is used to create hashes for various Django objects. It's important that it is set to a unique, unpredictable value. The SECRET_KEY variable has been scrubbed from the GitHub repository, so we'll create a new one using a function built-in to the django Python package called get_random_secret_key(). From the command line, open up a Python interpreter:

      You should see the following output and prompt:

      Output

      Python 3.6.7 (default, Oct 22 2018, 11:32:17) [GCC 8.2.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>

      Import the get_random_secret_key function from the Django package, then call the function:

      • from django.core.management.utils import get_random_secret_key
      • get_random_secret_key()

      Copy the resulting key to your clipboard.

      Exit the Python interpreter by pressing CTRL+D.

      Next, open up the settings file in your text editor once again:

      nano ~/django-polls/mysite/settings.py
      

      Locate the SECRET_KEY variable and paste in the key you just generated:

      ~/django-polls/mysite/settings.py

      . . .
      
      # SECURITY WARNING: keep the secret key used in production secret!
      SECRET_KEY = 'your_secret_key_here'
      
      . . .
      

      Save and close the file.

      We'll now test the app locally using the Django development server to ensure that everything's been correctly configured.

      Step 6 — Testing the App

      Before we run the Django development server, we need to use the manage.py utility to create the database schema and collect static files into the STATIC_ROOT directory.

      Navigate into the project's base directory, and create the initial database schema in our PostgreSQL database using the makemigrations and migrate commands:

      • cd django-polls
      • ./manage.py makemigrations
      • ./manage.py migrate

      makemigrations will create the migrations, or database schema changes, based on the changes made to Django models. migrate will apply these migrations to the database schema. To learn more about migrations in Django, consult Migrations from the official Django documentation.

      Create an administrative user for the project by typing:

      • ./manage.py createsuperuser

      You will have to select a username, provide an email address, and choose and confirm a password.

      We can collect all of the static content into the directory location we configured by typing:

      • ./manage.py collectstatic

      The static files will then be placed in a directory called static within your project directory.

      If you followed the initial server setup guide, you should have a UFW firewall protecting your server. In order to test the development server, we'll have to allow access to the port we'll be using.

      Create an exception for port 8000 by typing:

      Testing the App Using the Django Development Server

      Finally, you can test your project by starting the Django development server with this command:

      • ./manage.py runserver 0.0.0.0:8000

      In your web browser, visit your server's domain name or IP address followed by :8000 and the polls path:

      • http://server_domain_or_IP:8000/polls

      You should see the Polls app interface:

      Polls App Interface

      To check out the admin interface, visit your server's domain name or IP address followed by :8000 and the administrative interface's path:

      • http://server_domain_or_IP:8000/admin

      You should see the Polls app admin authentication window:

      Polls Admin Auth Page

      Enter the administrative username and password you created with the createsuperuser command.

      After authenticating, you can access the Polls app's administrative interface:

      Polls Admin Main Interface

      When you are finished exploring, hit CTRL-C in the terminal window to shut down the development server.

      Testing the App Using Gunicorn

      The last thing we want to do before offloading static files is test Gunicorn to make sure that it can serve the application. We can do this by entering our project directory and using gunicorn to load the project's WSGI module:

      • gunicorn --bind 0.0.0.0:8000 mysite.wsgi

      This will start Gunicorn on the same interface that the Django development server was running on. You can go back and test the app again.

      Note: The admin interface will not have any of the styling applied since Gunicorn does not know how to find the static CSS content responsible for this.

      We passed Gunicorn a module by specifying the relative directory path to Django's wsgi.py file, the entry point to our application,. This file defines a function called application, which communicates with the application. To learn more about the WSGI specification, click here.

      When you are finished testing, hit CTRL-C in the terminal window to stop Gunicorn.

      We'll now offload the application’s static files to DigitalOcean Spaces.

      Step 7 — Offloading Static Files to DigitalOcean Spaces

      At this point, Gunicorn can serve our Django application but not its static files. Usually we'd configure Nginx to serve these files, but in this tutorial we'll offload them to DigitalOcean Spaces using the django-storages plugin. This allows you to easily scale Django by centralizing its static content and freeing up server resources. In addition, you can deliver this static content using the DigitalOcean Spaces CDN.

      For a full guide on offloading Django static files to Object storage, consult How to Set Up Object Storage with Django.

      Installing and Configuring django-storages

      We'll begin by installing the django-storages Python package. The django-storages package provides Django with the S3Boto3Storage storage backend that uses the boto3 library to upload files to any S3-compatible object storage service.

      To start, install thedjango-storages and boto3 Python packages using pip:

      • pip install django-storages boto3

      Next, open your app's Django settings file again:

      • nano ~/django-polls/mysite/settings.py

      Navigate down to the INSTALLED_APPS section of the file, and append storages to the list of installed apps:

      ~/django-polls/mysite/settings.py

      . . .
      
      INSTALLED_APPS = [
          . . .
          'django.contrib.staticfiles',
          'storages',
      ]
      
      . . .
      

      Scroll further down the file to the STATIC_URL we previously modified. We'll now overwrite these values and append new S3Boto3Storage backend parameters. Delete the code you entered earlier, and add the following blocks, which include access and location information for your Space. Remember to replace the highlighted values here with your own information::

      ~/django-polls/mysite/settings.py

      . . .
      
      # Static files (CSS, JavaScript, Images)
      # https://docs.djangoproject.com/en/2.1/howto/static-files/
      
      AWS_ACCESS_KEY_ID = 'your_spaces_access_key'
      AWS_SECRET_ACCESS_KEY = 'your_spaces_secret_key'
      
      AWS_STORAGE_BUCKET_NAME = 'your_space_name'
      AWS_S3_ENDPOINT_URL = 'spaces_endpoint_URL'
      AWS_S3_OBJECT_PARAMETERS = {
          'CacheControl': 'max-age=86400',
      }
      AWS_LOCATION = 'static'
      AWS_DEFAULT_ACL = 'public-read'
      
      STATICFILES_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
      
      STATIC_URL = '{}/{}/'.format(AWS_S3_ENDPOINT_URL, AWS_LOCATION)
      STATIC_ROOT = 'static/'
      

      We define the following configuration items:

      • AWS_ACCESS_KEY_ID: The Access Key ID for the Space, which you created in the tutorial prerequisites. If you didn’t create a set of Access Keys, consult Sharing Access to Spaces with Access Keys.
      • AWS_SECRET_ACCESS_KEY: The secret key for the DigitalOcean Space.
      • AWS_STORAGE_BUCKET_NAME: Your DigitalOcean Space name.
      • AWS_S3_ENDPOINT_URL : The endpoint URL used to access the object storage service. For DigitalOcean, this will be something like https://nyc3.digitaloceanspaces.com depending on the Space region.
      • AWS_S3_OBJECT_PARAMETERS Sets the cache control headers on static files.
      • AWS_LOCATION: Defines a directory within the object storage bucket where all static files will be placed.
      • AWS_DEFAULT_ACL: Defines the access control list (ACL) for the static files. Setting it to public-read ensures that the files are publicly accessible to end users.
      • STATICFILES_STORAGE: Sets the storage backend Django will use to offload static files. This backend should work with any S3-compatible backend, including DigitalOcean Spaces.
      • STATIC_URL: Specifies the base URL that Django should use when generating URLs for static files. Here, we combine the endpoint URL and the static files subdirectory to construct a base URL for static files.
      • STATIC_ROOT: Specifies where to collect static files locally before copying them to object storage.

      From now on, when you run collectstatic, Django will upload your app's static files to the Space. When you start Django, it'll begin serving static assets like CSS and Javascript from this Space.

      Before we test that this is all functioning correctly, we need to configure Cross-Origin Resource Sharing (CORS) headers for our Spaces files or access to certain static assets may be denied by your web browser.

      CORS headers tell the web browser that the an application running at one domain can access scripts or resources located at another. In this case, we need to allow cross-origin resource sharing for our Django server's domain so that requests for static files in the Space are not denied by the web browser.

      To begin, navigate to the Settings page of your Space using the Cloud Control Panel:

      Screenshot of the Settings tab

      In the CORS Configurations section, click Add.

      CORS advanced settings

      Here, under Origin, enter the wildcard origin, *

      Warning: When you deploy your app into production, be sure to change this value to your exact origin domain (including the http:// or https:// protocol). Leaving this as the wildcard origin is insecure, and we do this here only for testing purposes since setting the origin to http://example.com:8000 (using a nonstandard port) is currently not supported.

      Under Allowed Methods, select GET.

      Click on Add Header, and in text box that appears, enter Access-Control-Allow-Origin.

      Set Access Control Max Age to 600 so that the header we just created expires every 10 minutes.

      Click Save Options.

      From now on, objects in your Space will contain the appropriate Access-Control-Allow-Origin response headers, allowing modern secure web browsers to fetch these files across domains.

      At this point, you can optionally enable the CDN for your Space, which will serve these static files from a distributed network of edge servers. To learn more about CDNs, consult Using a CDN to Speed Up Static Content Delivery. This can significantly improve web performance. If you don't want to enable the CDN for your Space, skip ahead to the next section, Testing Spaces Static File Delivery.

      Enabling CDN (Optional)

      To activate static file delivery via the DigitalOcean Spaces CDN, begin by enabling the CDN for your DigitalOcean Space. To learn how to do this, consult How to Enable the Spaces CDN from the DigitalOcean product documentation.

      Once you've enabled the CDN for your Space, navigate to it using the Cloud Control Panel. You should see a new Endpoints link under your Space name:

      List of Space Endpoints

      These endpoints should contain your Space name.

      Notice the addition of a new Edge endpoint. This endpoint routes requests for Spaces objects through the CDN, serving them from the edge cache as much as possible. Note down this Edge endpoint, as we'll use it to configure the django-storages plugin.

      Next, edit your app's Django settings file once again:

      • nano ~/django-polls/mysite/settings.py

      Navigate down to the Static Files section we recently modified. Add the AWS_S3_CUSTOM_DOMAIN parameter to configure the django-storages plugin CDN endpoint and update the STATIC_URL parameter to use this new CDN endpoint:

      ~/django-polls/mysite/settings.py

      . . .
      
      # Static files (CSS, JavaScript, Images)
      # https://docs.djangoproject.com/en/2.1/howto/static-files/
      
      # Moving static assets to DigitalOcean Spaces as per:
      # https://www.digitalocean.com/community/tutorials/how-to-set-up-object-storage-with-django
      AWS_ACCESS_KEY_ID = 'your_spaces_access_key'
      AWS_SECRET_ACCESS_KEY = 'your_spaces_secret_key'
      
      AWS_STORAGE_BUCKET_NAME = 'your_space_name'
      AWS_S3_ENDPOINT_URL = 'spaces_endpoint_URL'
      AWS_S3_CUSTOM_DOMAIN = 'spaces_edge_endpoint_URL'
      AWS_S3_OBJECT_PARAMETERS = {
          'CacheControl': 'max-age=86400',
      }
      AWS_LOCATION = 'static'
      AWS_DEFAULT_ACL = 'public-read'
      
      STATICFILES_STORAGE = 'storages.backends.s3boto3.S3Boto3Storage'
      
      STATIC_URL = '{}/{}/'.format(AWS_S3_CUSTOM_DOMAIN, AWS_LOCATION)
      STATIC_ROOT = 'static/'
      

      Here, replace the spaces_edge_endpoint_URL with the Edge endpoint you just noted down, truncating the https:// prefix. For example, if the Edge endpoint URL is https://example.sfo2.cdn.digitaloceanspaces.com, AWS_S3_CUSTOM_DOMAIN should be set to example.sfo2.cdn.digitaloceanspaces.com.

      When you're done, save and close the file.

      When you start Django, it will now serve static content using the CDN for your DigitalOcean Space.

      Testing Spaces Static File Delivery

      We'll now test that Django is correctly serving static files from our DigitalOcean Space.

      Navigate to your Django app directory:

      From here, run collectstatic to collect and upload static files to your DigitalOcean Space:

      • python manage.py collectstatic

      You should see the following output:

      Output

      You have requested to collect static files at the destination location as specified in your settings. This will overwrite existing files! Are you sure you want to do this? Type 'yes' to continue, or 'no' to cancel:

      Type yes and hit ENTER to confirm.

      You should then see output like the following

      Output

      121 static files copied.

      This confirms that Django successfully uploaded the polls app static files to your Space. You can navigate to your Space using the Cloud Control Panel, and inspect the files in the static directory.

      Next, we'll verify that Django is rewriting the appropriate URLs.

      Start the Gunicorn server:

      • gunicorn --bind 0.0.0.0:8000 mysite.wsgi

      In your web browser, visit your server's domain name or IP address followed by :8000 and /admin:

      http://server_domain_or_IP:8000/admin
      

      You should once again see the Polls app admin authentication window, this time with correct styling.

      Now, use your browser's developer tools to inspect the page contents and reveal the source file storage locations.

      To do this using Google Chrome, right-click the page, and select Inspect.

      You should see the following window:

      Chrome Dev Tools Window

      From here, click on Sources in the toolbar. In the list of source files in the left-hand pane, you should see /admin/login under your Django server's domain, and static/admin under your Space's CDN endpoint. Within static/admin, you should see both the css and fonts directories.

      This confirms that CSS stylesheets and fonts are correctly being served from your Space's CDN.

      When you are finished testing, hit CTRL-C in the terminal window to stop Gunicorn.

      You can disable your active Python virtual environment by entering deactivate:

      Your prompt should return to normal.

      At this point you've successfully offloaded static files from your Django server, and are serving them from object storage. We can now move on to configuring Gunicorn to start automatically as a system service.

      Step 8 — Creating systemd Socket and Service Files for Gunicorn

      In Step 6 we tested that Gunicorn can interact with our Django application, but we should implement a more robust way of starting and stopping the application server. To accomplish this, we'll make systemd service and socket files.

      The Gunicorn socket will be created at boot and will listen for connections. When a connection occurs, systemd will automatically start the Gunicorn process to handle the connection.

      Start by creating and opening a systemd socket file for Gunicorn with sudo privileges:

      • sudo nano /etc/systemd/system/gunicorn.socket

      Inside, we will create a [Unit] section to describe the socket, a [Socket] section to define the socket location, and an [Install] section to make sure the socket is created at the right time. Add the following code to the file:

      /etc/systemd/system/gunicorn.socket

      [Unit]
      Description=gunicorn socket
      
      [Socket]
      ListenStream=/run/gunicorn.sock
      
      [Install]
      WantedBy=sockets.target
      

      Save and close the file when you are finished.

      Next, create and open a systemd service file for Gunicorn with sudo privileges in your text editor. The service filename should match the socket filename with the exception of the extension:

      • sudo nano /etc/systemd/system/gunicorn.service

      Start with the [Unit] section, which specifies metadata and dependencies. We'll put a description of our service here and tell the init system to only start this after the networking target has been reached. Because our service relies on the socket from the socket file, we need to include a Requires directive to indicate that relationship:

      /etc/systemd/system/gunicorn.service

      [Unit]
      Description=gunicorn daemon
      Requires=gunicorn.socket
      After=network.target
      

      Next, we'll open up the [Service] section. We'll specify the user and group that we want to process to run under. We will give our regular user account ownership of the process since it owns all of the relevant files. We'll give group ownership to the www-data group so that Nginx can communicate easily with Gunicorn.

      We'll then map out the working directory and specify the command to use to start the service. In this case, we'll have to specify the full path to the Gunicorn executable, which is installed within our virtual environment. We will bind the process to the Unix socket we created within the /run directory so that the process can communicate with Nginx. We log all data to standard output so that the journald process can collect the Gunicorn logs. We can also specify any optional Gunicorn tweaks here, like the number of worker processes. Here, we run Gunicorn with 3 worker processes.

      Add the following Service section to the file. Be sure to replace the username listed here with your own username:

      /etc/systemd/system/gunicorn.service

      [Unit]
      Description=gunicorn daemon
      Requires=gunicorn.socket
      After=network.target
      
      [Service]
      User=sammy
      Group=www-data
      WorkingDirectory=/home/sammy/django-polls
      ExecStart=/home/sammy/envs/polls/bin/gunicorn 
                --access-logfile - 
                --workers 3 
                --bind unix:/run/gunicorn.sock 
                mysite.wsgi:application
      

      Finally, we'll add an [Install] section. This will tell systemd what to link this service to if we enable it to start at boot. We want this service to start when the regular multi-user system is up and running:

      /etc/systemd/system/gunicorn.service

      [Unit]
      Description=gunicorn daemon
      Requires=gunicorn.socket
      After=network.target
      
      [Service]
      User=sammy
      Group=www-data
      WorkingDirectory=/home/sammy/django-polls
      ExecStart=/home/sammy/envs/polls/bin/gunicorn 
                --access-logfile - 
                --workers 3 
                --bind unix:/run/gunicorn.sock 
                mysite.wsgi:application
      
      [Install]
      WantedBy=multi-user.target
      

      With that, our systemd service file is complete. Save and close it now.

      We can now start and enable the Gunicorn socket. This will create the socket file at /run/gunicorn.sock now and at boot. When a connection is made to that socket, systemd will automatically start the gunicorn.service to handle it:

      • sudo systemctl start gunicorn.socket
      • sudo systemctl enable gunicorn.socket

      We can confirm that the operation was successful by checking for the socket file.

      Checking for the Gunicorn Socket File

      Check the status of the process to find out whether it started successfully:

      • sudo systemctl status gunicorn.socket

      You should see the following output:

      Output

      Failed to dump process list, ignoring: No such file or directory ● gunicorn.socket - gunicorn socket Loaded: loaded (/etc/systemd/system/gunicorn.socket; enabled; vendor preset: enabled) Active: active (running) since Tue 2019-03-05 19:19:16 UTC; 1h 22min ago Listen: /run/gunicorn.sock (Stream) CGroup: /system.slice/gunicorn.socket Mar 05 19:19:16 django systemd[1]: Listening on gunicorn socket.

      Next, check for the existence of the gunicorn.sock file within the /run directory:

      Output

      /run/gunicorn.sock: socket

      If the systemctl status command indicated that an error occurred, or if you do not find the gunicorn.sock file in the directory, it's an indication that the Gunicorn socket was not created correctly. Check the Gunicorn socket's logs by typing:

      • sudo journalctl -u gunicorn.socket

      Take another look at your /etc/systemd/system/gunicorn.socket file to fix any problems before continuing.

      Testing Socket Activation

      Currently, if you've only started the gunicorn.socket unit, the gunicorn.service will not be active, since the socket has not yet received any connections. You can check this by typing:

      • sudo systemctl status gunicorn

      Output

      ● gunicorn.service - gunicorn daemon Loaded: loaded (/etc/systemd/system/gunicorn.service; disabled; vendor preset: enabled) Active: inactive (dead)

      To test the socket activation mechanism, we can send a connection to the socket through curl by typing:

      • curl --unix-socket /run/gunicorn.sock localhost

      You should see the HTML output from your application in the terminal. This indicates that Gunicorn has started and is able to serve your Django application. You can verify that the Gunicorn service is running by typing:

      • sudo systemctl status gunicorn

      Output

      ● gunicorn.service - gunicorn daemon Loaded: loaded (/etc/systemd/system/gunicorn.service; disabled; vendor preset: enabled) Active: active (running) since Tue 2019-03-05 20:43:56 UTC; 1s ago Main PID: 19074 (gunicorn) Tasks: 4 (limit: 4915) CGroup: /system.slice/gunicorn.service ├─19074 /home/sammy/envs/polls/bin/python3 /home/sammy/envs/polls/bin/gunicorn --access-logfile - --workers 3 --bind unix:/run/gunicorn.sock mysite.wsgi:application ├─19098 /home/sammy/envs/polls/bin/python3 /home/sammy/envs/polls/bin/gunicorn . . . Mar 05 20:43:56 django systemd[1]: Started gunicorn daemon. Mar 05 20:43:56 django gunicorn[19074]: [2019-03-05 20:43:56 +0000] [19074] [INFO] Starting gunicorn 19.9.0 . . . Mar 05 20:44:15 django gunicorn[19074]: - - [05/Mar/2019:20:44:15 +0000] "GET / HTTP/1.1" 301 0 "-" "curl/7.58.0"

      If the output from curl or the output of systemctl status indicates that a problem occurred, check the logs for additional details:

      • sudo journalctl -u gunicorn

      You can also check your /etc/systemd/system/gunicorn.service file for problems. If you make changes to this file, be sure to reload the daemon to reread the service definition and restart the Gunicorn process:

      • sudo systemctl daemon-reload
      • sudo systemctl restart gunicorn

      Make sure you troubleshoot any issues before continuing on to configuring the Nginx server.

      Step 8 — Configuring Nginx HTTPS and Gunicorn Proxy Passing

      Now that Gunicorn is set up in a more robust fashion, we need to configure Nginx to encrypt connections and hand off traffic to the Gunicorn process.

      If you followed the preqrequisites and set up Nginx with Let's Encrypt, you should already have a server block file corresponding to your domain available to you in Nginx's sites-available directory. If not, follow How To Secure Nginx with Let's Encrypt on Ubuntu 18.04 and return to this step.

      Before we edit this example.com server block file, we’ll first remove the default server block file that gets rolled out by default after installing Nginx:

      • sudo rm /etc/nginx/sites-enabled/default

      We'll now modify the example.com server block file to pass traffic to Gunicorn instead of the default index.html page configured in the prerequisite step.

      Open the server block file corresponding to your domain in your editor:

      • sudo nano /etc/nginx/sites-available/example.com

      You should see something like the following:

      /etc/nginx/sites-available/example.com

      server {
      
              root /var/www/example.com/html;
              index index.html index.htm index.nginx-debian.html;
      
              server_name example.com www.example.com;
      
              location / {
                      try_files $uri $uri/ =404;
              }
      
          listen [::]:443 ssl ipv6only=on; # managed by Certbot
          listen 443 ssl; # managed by Certbot
          ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem; # managed by Certbot
          ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem; # managed by Certbot
          include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot
          ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem; # managed by Certbot
      
      }
      
      server {
          if ($host = example.com) {
              return 301 https://$host$request_uri;
          } # managed by Certbot
      
      
              listen 80;
              listen [::]:80;
      
              server_name example.com www.example.com;
          return 404; # managed by Certbot
      
      
      }
      

      This is a combination of the default server block file created in How to Install Nginx on Ubuntu 18.04 as well as additions appended automatically by Let's Encrypt. We are going to delete the contents of this file and write a new configuration that redirects HTTP traffic to HTTPS, and forwards incoming requests to the Gunicorn socket we created in the previous step.

      If you'd like, you can make a backup of this file using cp. Quit your text editor and create a backup called example.com.old:

      • sudo cp /etc/nginx/sites-available/example.com /etc/nginx/sites-available/example.com.old

      Now, reopen the file and delete its contents. We'll build the new configuration block by block.

      Begin by pasting in the following block, which redirects HTTP requests at port 80 to HTTPS:

      /etc/nginx/sites-available/example.com

      server {
          listen 80 default_server;
          listen [::]:80 default_server;
          server_name _;
          return 301 https://example.com$request_uri;
      }
      

      Here we listen for HTTP IPv4 and IPv6 requests on port 80 and send a 301 response header to redirect the request to HTTPS port 443 using the example.com domain. This will also redirect direct HTTP requests to the server’s IP address.

      After this block, append the following block of config code that handles HTTPS requests for the example.com domain:

      /etc/nginx/sites-available/example.com

      . . . 
      server {
          listen [::]:443 ssl ipv6only=on;
          listen 443 ssl;
          server_name example.com www.example.com;
      
          # Let's Encrypt parameters
          ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
          ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
          include /etc/letsencrypt/options-ssl-nginx.conf;
          ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
      
          location = /favicon.ico { access_log off; log_not_found off; }
      
          location / {
              proxy_pass         http://unix:/run/gunicorn.sock;
              proxy_redirect     off;
      
              proxy_set_header   Host              $http_host;
              proxy_set_header   X-Real-IP         $remote_addr;
              proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
              proxy_set_header   X-Forwarded-Proto https;
          }
      }
      

      Here, we first listen on port 443 for requests hitting the example.com and www.example.com domains.

      Next, we provide the same Let's Encrypt configuration included in the default server block file, which specifies the location of the SSL certificate and private key, as well as some additional security parameters.

      The location = /favicon.ico line instructs Nginx to ignore any problems with finding a favicon.

      The last location = / block instructs Nginx to hand off requests to the Gunicorn socket configured in Step 8. In addition, it adds headers to inform the upstream Django server that a request has been forwarded and to provide it with various request properties.

      After you've pasted in those two configuration blocks, the final file should look something like this:

      /etc/nginx/sites-available/example.com

      server {
          listen 80 default_server;
          listen [::]:80 default_server;
          server_name _;
          return 301 https://example.com$request_uri;
      }
      server {
              listen [::]:443 ssl ipv6only=on;
              listen 443 ssl;
              server_name example.com www.example.com;
      
              # Let's Encrypt parameters
              ssl_certificate /etc/letsencrypt/live/example.com/fullchain.pem;
              ssl_certificate_key /etc/letsencrypt/live/example.com/privkey.pem;
              include /etc/letsencrypt/options-ssl-nginx.conf;
              ssl_dhparam /etc/letsencrypt/ssl-dhparams.pem;
      
              location = /favicon.ico { access_log off; log_not_found off; }
      
              location / {
                proxy_pass         http://unix:/run/gunicorn.sock;
                proxy_redirect     off;
      
                proxy_set_header   Host              $http_host;
                proxy_set_header   X-Real-IP         $remote_addr;
                proxy_set_header   X-Forwarded-For   $proxy_add_x_forwarded_for;
                proxy_set_header   X-Forwarded-Proto https;
              }
      }
      

      Save and close the file when you are finished.

      Test your Nginx configuration for syntax errors by typing:

      If your configuration is error-free, restart Nginx by typing:

      • sudo systemctl restart nginx

      You should now be able to visit your server's domain or IP address to view your application. Your browser should be using a secure HTTPS connection to connect to the Django backend.

      To completely secure our Django project, we need to add a couple of security parameters to its settings.py file. Reopen this file in your editor:

      • nano ~/django-polls/mysite/settings.py

      Scroll to the bottom of the file, and add the following parameters:

      ~/django-polls/mysite/settings.py

      . . .
      
      SECURE_PROXY_SSL_HEADER = ('HTTP_X_FORWARDED_PROTO', 'https')
      SESSION_COOKIE_SECURE = True
      CSRF_COOKIE_SECURE = True
      SECURE_SSL_REDIRECT = True
      

      These settings tell Django that you have enabled HTTPS on your server, and instruct it to use "secure" cookies. To learn more about these settings, consult the SSL/HTTPS section of Security in Django.

      When you're done, save and close the file.

      Finally, restart Gunicorn:

      • sudo systemctl restart gunicorn

      At this point, you have configured Nginx to redirect HTTP requests and hand off these requests to Gunicorn. HTTPS should now be fully enabled for your Django project and app. If you're running into errors, this discussion on troubleshooting Nginx and Gunicorn may help.

      Warning: As stated in Configuring CORS Headers, be sure to change the Origin from the wildcard * domain to your domain name (https://example.com in this guide) before making your app accessible to end users.

      Conclusion

      In this guide, you set up and configured a scalable Django application running on an Ubuntu 18.04 server. This setup can be replicated across multiple servers to create a highly-available architecture. Furthermore, this app and its config can be containerized using Docker or another container runtime to ease deployment and scaling. These containers can then be deployed into a container cluster like Kubernetes. In an upcoming Tutorial series, we will explore how to containerize and modernize this Django polls app so that it can run in a Kubernetes cluster.

      In addition to static files, you may also wish to offload your Django Media files to object storage. To learn how to do this, consult Using Amazon S3 to Store your Django Site's Static and Media Files. You might also consider compressing static files to further optimize their delivery to end users. To do this, you can use a Django plugin like Django compressor.



      Source link

      Managed Databases Connection Pools and PostgreSQL Benchmarking Using pgbench


      Introduction

      DigitalOcean Managed Databases allows you to scale your PostgreSQL database using several methods. One such method is a built-in connection pooler that allows you to efficiently handle large numbers of client connections and reduce the CPU and memory footprint of these open connections. By using a connection pool and sharing a fixed set of recyclable connections, you can handle significantly more concurrent client connections, and squeeze extra performance out of your PostgreSQL database.

      In this tutorial we’ll use pgbench, PostgreSQL’s built-in benchmarking tool, to run load tests on a DigitalOcean Managed PostgreSQL Database. We’ll dive in to connection pools, describe how they work, and show how to create one using the Cloud Control panel. Finally, using results from the pgbench tests, we’ll demonstrate how using a connection pool can be an inexpensive method of increasing database throughput.

      Prerequisites

      To complete this tutorial, you’ll need:

      • A DigitalOcean Managed PostgreSQL Database cluster. To learn how to provision and configure a DigitalOcean PostgreSQL cluster, consult the Managed Database product documentation.
      • A client machine with PostgreSQL installed. By default, your PostgreSQL installation will contain the pgbench benchmarking utility and the psql client, both of which we’ll use in this guide. Consult How To Install and Use PostgreSQL on Ubuntu 18.04 to learn how to Install PostgreSQL. If you’re not running Ubuntu on your client machine, you can use the version finder to find the appropriate tutorial.

      Once you have a DigitalOcean PostgreSQL cluster up and running and a client machine with pgbench installed, you’re ready to begin with this guide.

      Step 1 — Creating and Initializing benchmark Database

      Before we create a connection pool for our database, we’ll first create the benchmark database on our PostgreSQL cluster and populate it with some dummy data on which pgbench will run its tests. The pgbench utility repeatedly runs a series of five SQL commands (consisting of SELECT, UPDATE, and INSERT queries) in a transaction, using multiple threads and clients, and calculates a useful performance metric called Transactions per Second (TPS). TPS is a measure of database throughput, counting the number of atomic transactions processed by the database in one second. To learn more about the specific commands executed by pgbench, consult What is the “Transaction” Actually Performed in pgbench? from the official pgbench documentation.

      Let’s begin by connecting to our PostgreSQL cluster and creating the benchmark database.

      First, retrieve your cluster’s Connection Details by navigating to Databases and locating your PostgreSQL cluster. Click into your cluster. You should see a cluster overview page containing the following Connection Details box:

      PostgreSQL Cluster Connection Details

      From this, we can parse the following config variables:

      • Admin user: doadmin
      • Admin password: your_password
      • Cluster endpoint: dbaas-test-do-user-3587522-0.db.ondigitalocean.com
      • Connection port: 25060
      • Database to connect to: defaultdb
      • SSL Mode: require (use an SSL-encrypted connection for increased security)

      Take note of these parameters, as you’ll need them when using both the psql client and pgbench tool.

      Click on the dropdown above this box and select Connection String. We’ll copy this string and pass it in to psql to connect to this PostgreSQL node.

      Connect to your cluster using psql and the connection string you just copied:

      • psql postgresql://doadmin:your_password@your_cluster_endpoint:25060/defaultdb?sslmode=require

      You should see the following PostgreSQL client prompt, indicating that you’ve connected to your PostgreSQL cluster successfully:

      Output

      psql (10.6 (Ubuntu 10.6-0ubuntu0.18.04.1)) SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. defaultdb=>

      From here, create the benchmark database:

      • CREATE DATABASE benchmark;

      You should see the following output:

      Output

      CREATE DATABASE

      Now, disconnect from the cluster:

      Before we run the pgbench tests, we need to populate this benchmark database with some tables and dummy data required to run the tests.

      To do this, we’ll run pgbench with the following flags:

      • -h: The PostgreSQL cluster endpoint
      • -p: The PostgreSQL cluster connection port
      • -U: The database username
      • -i: Indicates that we'd like to initialize the benchmark database with benchmarking tables and their dummy data.
      • -s : Set a scale factor of 150, which will multiply table sizes by 150. The default scale factor of 1 results in tables of the following sizes:

        table                   # of rows
        ---------------------------------
        pgbench_branches        1
        pgbench_tellers         10
        pgbench_accounts        100000
        pgbench_history         0
        

        Using a scale factor of 150, the pgbench_accounts table will contain 15,000,000 rows.

        Note: To avoid excessive blocked transactions, be sure to set the scale factor to a value at least as large as the number of concurrent clients you intend to test with. In this tutorial we'll test with 150 clients at most, so we set -s to 150 here. To learn more, consult these recommended practices from the official pgbench documentation.

      Run the complete pgbench command:

      • pgbench -h your_cluster_endpoint -p 25060 -U doadmin -i -s 150 benchmark

      After running this command, you will be prompted to enter the password for the database user you specified. Enter the password, and hit ENTER.

      You should see the following output:

      Output

      dropping old tables... NOTICE: table "pgbench_accounts" does not exist, skipping NOTICE: table "pgbench_branches" does not exist, skipping NOTICE: table "pgbench_history" does not exist, skipping NOTICE: table "pgbench_tellers" does not exist, skipping creating tables... generating data... 100000 of 15000000 tuples (0%) done (elapsed 0.19 s, remaining 27.93 s) 200000 of 15000000 tuples (1%) done (elapsed 0.85 s, remaining 62.62 s) 300000 of 15000000 tuples (2%) done (elapsed 1.21 s, remaining 59.23 s) 400000 of 15000000 tuples (2%) done (elapsed 1.63 s, remaining 59.44 s) 500000 of 15000000 tuples (3%) done (elapsed 2.05 s, remaining 59.51 s) . . . 14700000 of 15000000 tuples (98%) done (elapsed 70.87 s, remaining 1.45 s) 14800000 of 15000000 tuples (98%) done (elapsed 71.39 s, remaining 0.96 s) 14900000 of 15000000 tuples (99%) done (elapsed 71.91 s, remaining 0.48 s) 15000000 of 15000000 tuples (100%) done (elapsed 72.42 s, remaining 0.00 s) vacuuming... creating primary keys... done.

      At this point, we've created a benchmarking database, populated with the tables and data required to run the pgbench tests. We can now move on to running a baseline test which we'll use to compare performance before and after connection pooling is enabled.

      Step 2 — Running a Baseline pgbench Test

      Before we run our first benchmark, it's worth diving into what we're trying to optimize with connection pools.

      Typically when a client connects to a PostgreSQL database, the main PostgreSQL OS process forks itself into a child process corresponding to this new connection. When there are only a few connections, this rarely presents an issue. However, as clients and connections scale, the CPU and memory overhead of creating and maintaining these connections begins to add up, especially if the application in question does not efficiently use database connections. In addition, the max_connections PostgreSQL setting may limit the number of client connections allowed, resulting in additional connections being refused or dropped.

      A connection pool keeps open a fixed number of database connections, the pool size, which it then uses to distribute and execute client requests. This means that you can accommodate far more simultaneous connections, efficiently deal with idle or stagnant clients, as well as queue up client requests during traffic spikes instead of rejecting them. By recycling connections, you can more efficiently use your machine's resources in an environment where there is a heavy connection volume, and squeeze extra performance out of your database.

      A connection pool can be implemented either on the application side or as middleware between the database and your application. The Managed Databases connection pooler is built on top of pgBouncer, a lightweight, open-source middleware connection pooler for PostgreSQL. Its interface is available via the Cloud Control Panel UI.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool. You should see the following configuration window:

      Connection Pools Config Window

      Here, you can configure the following fields:

      • Pool Name: A unique name for your connection pool
      • Database: The database for which you'd like to pool connections
      • User: The PostgreSQL user the connection pool will authenticate as
      • Mode: One of Session, Transaction, or Statement. This option controls how long the pool assigns a backend connection to a client.
        • Session: The client holds on to the connection until it explicitly disconnects.
        • Transaction: The client obtains the connection until it completes a transaction, after which the connection is returned to the pool.
        • Statement: The pool aggressively recycles connections after each client statement. In statement mode, multi-statement transactions are not allowed. To learn more, consult the Connection Pools product documentation.
      • Pool Size: The number of connections the connection pool will keep open between itself and the database.

      Before we create a connection pool, we'll run a baseline test to which we can compare database performance with connection pooling.

      In this tutorial, we'll use a 4 GB RAM, 2 vCPU, 80 GB Disk, primary node only Managed Database setup. You can scale the benchmark test parameters in this section according to your PostgreSQL cluster specs.

      DigitalOcean Managed Database clusters have the PostgreSQL max_connections parameter preset to 25 connections per 1 GB RAM. A 4 GB RAM PostgreSQL node therefore has max_connections set to 100. In addition, for all clusters, 3 connections are reserved for maintenance. So for this 4 GB RAM PostgreSQL cluster, 97 connections are available for connection pooling.

      With this in mind, let's run our first baseline pgbench test.

      Log in to your client machine. We’ll run pgbench, specifying the database endpoint, port and user as usual. In addition, we’ll provide the following flags:

      • -c: The number of concurrent clients or database sessions to simulate. We set this to 50 so as to simulate a number of concurrent connections smaller than the max_connections parameter for our PostgreSQL cluster.
      • -j: The number of worker threads pgbench will use to run the benchmark. If you're using a multi-CPU machine, you can tune this upwards to distribute clients across threads. On a two-core machine, we set this to 2.
      • -P: Display progress and metrics every 60 seconds.
      • -T: Run the benchmark for 600 seconds (10 minutes). To produce consistent, reproducible results, it's important that you run the benchmark for several minutes, or through one checkpoint cycle.

      We’ll also specify that we'd like to run the benchmark against the benchmark database we created and populated earlier.

      Run the following complete pgbench command:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 50 -j 2 -P 60 -T 600 benchmark

      Hit ENTER and then type in the password for the doadmin user to begin running the test. You should see output similar to the following (results will depend on the specs of your PostgreSQL cluster):

      Output

      starting vacuum...end. progress: 60.0 s, 157.4 tps, lat 282.988 ms stddev 40.261 progress: 120.0 s, 176.2 tps, lat 283.726 ms stddev 38.722 progress: 180.0 s, 167.4 tps, lat 298.663 ms stddev 238.124 progress: 240.0 s, 178.9 tps, lat 279.564 ms stddev 43.619 progress: 300.0 s, 178.5 tps, lat 280.016 ms stddev 43.235 progress: 360.0 s, 178.8 tps, lat 279.737 ms stddev 43.307 progress: 420.0 s, 179.3 tps, lat 278.837 ms stddev 43.783 progress: 480.0 s, 178.5 tps, lat 280.203 ms stddev 43.921 progress: 540.0 s, 180.0 tps, lat 277.816 ms stddev 43.742 progress: 600.0 s, 178.5 tps, lat 280.044 ms stddev 43.705 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 105256 latency average = 282.039 ms latency stddev = 84.244 ms tps = 175.329321 (including connections establishing) tps = 175.404174 (excluding connections establishing)

      Here, we observed that over a 10 minute run with 50 concurrent sessions, we processed 105,256 transactions with a throughput of roughly 175 transactions per second.

      Now, let's run the same test, this time using 150 concurrent clients, a value that is higher than max_connections for this database, to synthetically simulate a mass influx of client connections:

      • pgbench -h your_db_endpoint -p 25060 -U doadmin -c 150 -j 2 -P 60 -T 600 benchmark

      You should see output similar to the following:

      Output

      starting vacuum...end. connection to database "pgbench" failed: FATAL: remaining connection slots are reserved for non-replication superuser connections progress: 60.0 s, 182.6 tps, lat 280.069 ms stddev 42.009 progress: 120.0 s, 253.8 tps, lat 295.612 ms stddev 237.448 progress: 180.0 s, 271.3 tps, lat 276.411 ms stddev 40.643 progress: 240.0 s, 273.0 tps, lat 274.653 ms stddev 40.942 progress: 300.0 s, 272.8 tps, lat 274.977 ms stddev 41.660 progress: 360.0 s, 250.0 tps, lat 300.033 ms stddev 282.712 progress: 420.0 s, 272.1 tps, lat 275.614 ms stddev 42.901 progress: 480.0 s, 261.1 tps, lat 287.226 ms stddev 112.499 progress: 540.0 s, 272.5 tps, lat 275.309 ms stddev 41.740 progress: 600.0 s, 271.2 tps, lat 276.585 ms stddev 41.221 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 154892 latency average = 281.421 ms latency stddev = 125.929 ms tps = 257.994941 (including connections establishing) tps = 258.049251 (excluding connections establishing)

      Note the FATAL error, indicating that pgbench hit the 100 connection limit threshold set by max_connections, resulting in a refused connection. The test was still able to complete, with a TPS of roughly 257.

      At this point we can investigate how a connection pool could potentially improve our database's throughput.

      Step 3 — Creating and Testing a Connection Pool

      In this step we'll create a connection pool and rerun the previous pgbench test to see if we can improve our database's throughput.

      In general, the max_connections setting and connection pool parameters are tuned in tandem to max out the database's load. However, because max_connections is abstracted away from the user in DigitalOcean Managed Databases, our main levers here are the connection pool Mode and Size settings.

      To begin, let's create a connection pool in Transaction mode that keeps open all the available backend connections.

      Navigate to Databases in the Control Panel, and then click into your PostgreSQL cluster. From here, click into Connection Pools. Then, click on Create a Connection Pool.

      In the configuration window that appears, fill in the following values:

      Connection Pool Configuration Values

      Here we name our connection pool test-pool, and use it with the benchmark database. Our database user is doadmin and we set the connection pool to Transaction mode. Recall from earlier that for a managed database cluster with 4GB of RAM, there are 97 available database connections. Accordingly, configure the pool to keep open 97 database connections.

      When you're done, hit Create Pool.

      You should now see this pool in the Control Panel:

      Connection Pool in Control Panel

      Grab its URI by clicking Connection Details. It should look something like the following

      postgres://doadmin:password@pool_endpoint:pool_port/test-pool?sslmode=require
      

      You should notice a different port here, and potentially a different endpoint and database name, corresponding to the pool name test-pool.

      Now that we've created the test-pool connection pool, we can rerun the pgbench test we ran above.

      Rerun pgbench

      From your client machine, run the following pgbench command (with 150 concurrent clients), making sure to substitute the highlighted values with those in your connection pool URI:

      • pgbench -h pool_endpoint -p pool_port -U doadmin -c 150 -j 2 -P 60 -T 600 test-pool

      Here we once again use 150 concurrent clients, run the test across 2 threads, print progress every 60 seconds, and run the test for 600 seconds. We set the database name to test-pool, the name of the connection pool.

      Once the test completes, you should see output similar to the following (note that these results will vary depending on the specs of your database node):

      Output

      starting vacuum...end. progress: 60.0 s, 240.0 tps, lat 425.251 ms stddev 59.773 progress: 120.0 s, 350.0 tps, lat 428.647 ms stddev 57.084 progress: 180.0 s, 340.3 tps, lat 440.680 ms stddev 313.631 progress: 240.0 s, 364.9 tps, lat 411.083 ms stddev 61.106 progress: 300.0 s, 366.5 tps, lat 409.367 ms stddev 60.165 progress: 360.0 s, 362.5 tps, lat 413.750 ms stddev 59.005 progress: 420.0 s, 359.5 tps, lat 417.292 ms stddev 60.395 progress: 480.0 s, 363.8 tps, lat 412.130 ms stddev 60.361 progress: 540.0 s, 351.6 tps, lat 426.661 ms stddev 62.960 progress: 600.0 s, 344.5 tps, lat 435.516 ms stddev 65.182 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 150 number of threads: 2 duration: 600 s number of transactions actually processed: 206768 latency average = 421.719 ms latency stddev = 114.676 ms tps = 344.240797 (including connections establishing) tps = 344.385646 (excluding connections establishing)

      Notice here that we were able to increase our database's throughput from 257 TPS to 344 TPS with 150 concurrent connections (an increase of 33%), and did not run up against the max_connections limit we previously hit without a connection pool. By placing a connection pool in front of the database, we can avoid dropped connections and significantly increase database throughput in an environment with a large number of simultaneous connections.

      If you run this same test, but with a -c value of 50 (specifying a smaller number of clients), the gains from using a connection pool become much less evident:

      Output

      starting vacuum...end. progress: 60.0 s, 154.0 tps, lat 290.592 ms stddev 35.530 progress: 120.0 s, 162.7 tps, lat 307.168 ms stddev 241.003 progress: 180.0 s, 172.0 tps, lat 290.678 ms stddev 36.225 progress: 240.0 s, 172.4 tps, lat 290.169 ms stddev 37.603 progress: 300.0 s, 177.8 tps, lat 281.214 ms stddev 35.365 progress: 360.0 s, 177.7 tps, lat 281.402 ms stddev 35.227 progress: 420.0 s, 174.5 tps, lat 286.404 ms stddev 34.797 progress: 480.0 s, 176.1 tps, lat 284.107 ms stddev 36.540 progress: 540.0 s, 173.1 tps, lat 288.771 ms stddev 38.059 progress: 600.0 s, 174.5 tps, lat 286.508 ms stddev 59.941 transaction type: <builtin: TPC-B (sort of)> scaling factor: 150 query mode: simple number of clients: 50 number of threads: 2 duration: 600 s number of transactions actually processed: 102938 latency average = 288.509 ms latency stddev = 83.503 ms tps = 171.482966 (including connections establishing) tps = 171.553434 (excluding connections establishing)

      Here we see that we were not able to increase throughput by using a connection pool. Our throughput went down to 171 TPS from 175 TPS.

      Although in this guide we use pgbench with its built-in benchmark data set, the best test for determining whether or not to use a connection pool is a benchmark load that accurately represents production load on your database, against production data. Creating custom benchmarking scripts and data is beyond the scope of this guide, but to learn more, consult the official pgbench documentation.

      Note: The pool size setting is highly workload-specific. In this guide, we configured the connection pool to use all the available backend database connections. This was because throughout our benchmark, the database rarely reached full utilization (you can monitor database load from the Metrics tab in the Cloud Control Panel). Depending on your database's load, this may not be the optimal setting. If you notice that your database is constantly fully saturated, shrinking the connection pool may increase throughput and improve performance by queuing additional requests instead of trying to execute them all at the same time on an already loaded server.

      Conclusion

      DigitalOcean Managed Databases connection pooling is a powerful feature that can help you quickly squeeze extra performance out of your database. Along with other techniques like replication, caching, and sharding, connection pooling can help you scale your database layer to process an even greater volume of requests.

      In this guide we focused on a simplistic and synthetic testing scenario using PostgreSQL's built-in pgbench benchmarking tool and its default benchmark test. In any production scenario, you should run benchmarks against actual production data while simulating production load. This will allow you to tune your database for your particular usage pattern.

      Along with pgbench, other tools exist to benchmark and load your database. One such tool developed by Percona is sysbench-tpcc. Another is Apache's JMeter, which can load test databases as well as web applications.

      To learn more about DigitalOcean Managed Databases, consult the Managed Databases product documentation. To learn more about sharding, another useful scaling technique, consult Understanding Database Sharding.

      References



      Source link

      Understanding Managed Databases


      Introduction

      Secure, reliable data storage is a must for nearly every modern application. However, the infrastructure needed for a self-managed, on-premises database can be prohibitively expensive for many teams. Similarly, employees who have the skills and experience needed to maintain a production database effectively can be difficult to come by.

      The spread of cloud computing services has lowered the barriers to entry associated with provisioning a database, but many developers still lack the time or expertise needed to manage and tune a database to suit their needs. For this reason, many businesses are turning to managed database services to help them build and scale their databases in line with their growth.

      In this conceptual article, we will go over what managed databases are and how they can be beneficial to many organizations. We will also cover some practical considerations one should make before building their next application on top of a managed database solution.

      Managed Databases in a Nutshell

      A managed database is a cloud computing service in which the end user pays a cloud service provider for access to a database. Unlike a typical database, users don’t have to set up or maintain a managed database on their own; rather, it’s the provider’s responsibility to oversee the database’s infrastructure. This allows the user to focus on building their application instead of spending time configuring their database and keeping it up to date.

      The process of provisioning a managed database varies by provider, but in general it’s similar to that of any other cloud-based service. After registering an account and logging in to the dashboard, the user reviews the available database options — such as the database engine and cluster size — and then chooses the setup that’s right for them. After you provision the managed database, you can connect to it through a GUI or client and can then begin loading data and and integrating the database with your application.

      Managed data solutions simplify the process of provisioning and maintaining a database. Instead of running commands from a terminal to install and set one up, you can deploy a production-ready database with just a few clicks in your browser. By simplifying and automating database management, cloud providers make it easier for anyone, even novice database users, to build data-driven applications and websites. This was the result of a decades-long trend towards simplifying, automating, and abstracting various database management tasks, which was itself a response to pain points long felt by database administrators.

      Pain Points of On-Premises and Self-Managed Databases

      Prior to the rise of the cloud computing model, any organization in need of a data center had to supply all the time, space, and resources that went into setting one up. Once their database was up and running, they also had to maintain the hardware, keep its software updated, hire a team to manage the database, and train their employees on how to use it.

      As cloud computing services grew in popularity in the 2000s, it became easier and more affordable to provision server infrastructure, since the hardware and the space required for it no longer had to be owned or managed by those using it. Likewise, setting up a database entirely within the cloud became far less difficult; a business or developer would just have to requisition a server, install and configure their chosen database management system, and begin storing data.

      While cloud computing did make the process of setting up a traditional database easier, it didn’t address all of its problems. For instance, in the cloud it can still be difficult to pinpoint the ideal size of a database’s infrastructure footprint before it begins collecting data. This is important because cloud consumers are charged based on the resources they consume, and they risk paying for more than what they require if the server they provision is larger than necessary. Additionally, as with traditional on-premises databases, managing one’s database in the cloud can be a costly endeavor. Depending on your needs, you may still need to hire an experienced database administrator or spend a significant amount of time and money training your existing staff to manage your database effectively.

      Many of these issues are compounded for smaller organizations and independent developers. While a large business can usually afford to hire employees with a deep knowledge of databases, smaller teams usually have fewer resources available, leaving them with only their existing institutional knowledge. This makes tasks like replication, migrations, and backups all the more difficult and time consuming, as they can require a great deal of on-the-job learning as well as trial and error.

      Managed databases help to resolve these pain points with a host of benefits to businesses and developers. Let’s walk through some of these benefits and how they can impact development teams.

      Benefits of Managed Databases

      Managed database services can help to reduce many of the headaches associated with provisioning and managing a database. For one thing, developers build applications on top of managed database services to drastically speed up the process of provisioning a database server. With a self-managed solution, you must obtain a server (either on-premises or in the cloud), connect to it from a client or terminal, configure and secure it, and then install and set up the database management software before you can begin storing data. With a managed database, you only have to decide on the initial size of the database server, configure any additional provider-specific options, and you’ll have a new database ready to integrate with your app or website. This can usually be done in just a few minutes through the provider’s user interface.

      Another appeal of managed databases is automation. Self-managed databases can consume a large amount of an organization’s resources because its employees have to perform every administrative task — from scaling to performing updates, running migrations, and creating backups — manually. With a managed database, however, these and other tasks are done either automatically or on-demand, which markedly reduces the risk of human error.

      This relates to the fact that managed database services help to streamline the process of database scaling. Scaling a self-managed database can be very time- and resource-intensive. Whether you choose sharding, replication, load balancing, or something else as your scaling strategy, if you manage the infrastructure yourself then you’re responsible for ensuring that no data is lost in the process and that the application will continue to work properly. If you integrate your application with a managed database service, however, you can scale the database cluster on demand. Rather than having to work out the optimal server size or CPU usage beforehand, you can quickly provision more resources on-the-fly. This helps you avoid using unnecessary resources, meaning you also won’t pay for what you don’t need.

      Managed solutions tend to have built-in high-availability. In the context of cloud computing, a service is said to be highly available if it is stable and likely to run without failure for long periods of time. Most reputable cloud providers’ products come with a service level agreement (SLA), a commitment between the provider and its customers that guarantees the availability and reliability of their services. A typical SLA will specify how much downtime the customer should expect, and many also define the compensation for customers if these service levels are not met. This provides assurance for the customer that their database won’t crash and, if it does, they can at least expect some kind of reparation from the provider.

      In general, managed databases simplify the tasks associated with provisioning and maintaining a database. Depending on the provider, you or your team will still likely need some level of experience working with databases in order to provision a database and interact with it as you build and scale your application. Ultimately, though, the database-specific experience needed to administer a managed database will be much less than with self-managed solution.

      Of course, managed databases aren’t able to solve every problem, and may prove to be a less-than-ideal choice for some. Next, we’ll go over a few of the potential drawbacks one should consider before provisioning a managed database.

      Practical Considerations

      A managed database service can ease the stress of deploying and maintaining a database, but there are still a few things to keep in mind before committing to one. Recall that a principal draw of managed databases is that they abstract away most of the more tedious aspects of database administration. To this end, a managed database provider aims to deliver a rudimentary database that will satisfy the most common use cases. Accordingly, their database offerings won’t feature tons of customization options or the unique features included in more specialized database software. Because of this, you won’t have as much freedom to tailor your database and you’ll be limited to what the cloud provider has to offer.

      A managed database is almost always more expensive than a self-managed one. This makes sense, since you’re paying for the cloud provider to support you in managing the database, but it can be a cause for concern for teams with limited resources. Moreover, pricing for managed databases is usually based on how much storage and RAM the database uses, how many reads it handles, and how many backups of the database the user creates. Likewise, any application using a managed database service that handle large amounts of data or traffic will be more expensive than if it were to use a self-managed cloud database.

      One should also reflect on the impact switching to a managed database will have on their internal workflows and whether or not they’ll be able to adjust to those changes. Every provider differs, and depending on their SLA they may shoulder responsibility for only some administration tasks, which would be problematic for developers looking for a full-service solution. On the other hand, some providers could have a prohibitively restrictive SLA or make the customer entirely dependent on the provider in question, a situation known as vendor lock-in.

      Lastly, and perhaps most importantly, one should carefully consider whether or not any managed database service they’re considering using will meet their security needs. All databases, including on-premises databases, are prone to certain security threats, like SQL injection attacks or data leaks. However, the security dynamic is far different for databases hosted in the cloud. Managed database users can’t control the physical location of their data or who has access to it, nor can they ensure compliance with specific security standards. This can be especially problematic if your client has heightened security needs.

      To illustrate, imagine that you’re hired by a bank to build an application where its clients can access financial records and make payments. The bank may stipulate that the app must have data at rest encryption and appropriately scoped user permissions, and that it must be compliant with certain regulatory standards like PCI DSS. Not all managed database providers adhere to the same regulatory standards or maintain the same security practices, and they’re unlikely to adopt new standards or practices for just one of their customers. For this reason, it’s critical that you ensure any managed database provider you rely on for such an application is able to meet your security needs as well as the needs of your clients.

      Conclusion

      Managed databases have many features that appeal to a wide variety of businesses and developers, but a managed database may not solve every problem or suit everyone’s needs. Some may find that a managed database’s limited feature set and configuration options, increased cost, and reduced flexibility outweigh any of its potential advantages. However, compelling benefits like ease of use, scalability, automated backups and upgrades, and high availability have led to increased adoption of managed database solutions in a variety of industries.

      If you’re interested in learning more about DigitalOcean Managed Databases, we encourage you to check out our Managed Databases product documentation.



      Source link