One place for hosting & domains

      PostgreSQL

      An Introduction to Queries in PostgreSQL


      Introduction

      Databases are a key component of many websites and applications, and are at the core of how data is stored and exchanged across the internet. One of the most important aspects of database management is the practice of retrieving data from a database, whether it’s on an ad hoc basis or part of a process that’s been coded into an application. There are several ways to retrieve information from a database, but one of the most commonly-used methods is performed through submitting queries through the command line.

      In relational database management systems, a query is any command used to retrieve data from a table. In Structured Query Language (SQL), queries are almost always made using the SELECT statement.

      In this guide, we will discuss the basic syntax of SQL queries as well as some of the more commonly-employed functions and operators. We will also practice making SQL queries using some sample data in a PostgreSQL database.

      PostgreSQL, often shortened to “Postgres,” is a relational database management system with an object-oriented approach, meaning that information can be represented as objects or classes in PostgreSQL schemas. PostgreSQL aligns closely with standard SQL, although it also includes some features not found in other relational database systems.

      Prerequisites

      In general, the commands and concepts presented in this guide can be used on any Linux-based operating system running any SQL database software. However, it was written specifically with an Ubuntu 18.04 server running PostgreSQL in mind. To set this up, you will need the following:

      With this setup in place, we can begin the tutorial.

      Creating a Sample Database

      Before we can begin making queries in SQL, we will first create a database and a couple tables, then populate these tables with some sample data. This will allow you to gain some hands-on experience when you begin making queries later on.

      For the sample database we’ll use throughout this guide, imagine the following scenario:

      You and several of your friends all celebrate your birthdays with one another. On each occasion, the members of the group head to the local bowling alley, participate in a friendly tournament, and then everyone heads to your place where you prepare the birthday-person’s favorite meal.

      Now that this tradition has been going on for a while, you’ve decided to begin tracking the records from these tournaments. Also, to make planning dinners easier, you decide to create a record of your friends’ birthdays and their favorite entrees, sides, and desserts. Rather than keep this information in a physical ledger, you decide to exercise your database skills by recording it in a PostgreSQL database.

      To begin, open up a PostgreSQL prompt as your postgres superuser:

      Note: If you followed all the steps of the prerequisite tutorial on Installing PostgreSQL on Ubuntu 18.04, you may have configured a new role for your PostgreSQL installation. In this case, you can connect to the Postgres prompt with the following command, substituting sammy with your own username:

      Next, create the database by running:

      • CREATE DATABASE birthdays;

      Then select this database by typing:

      Next, create two tables within this database. We'll use the first table to track your friends' records at the bowling alley. The following command will create a table called tourneys with columns for the name of each of your friends, the number of tournaments they've won (wins), their all-time best score, and what size bowling shoe they wear (size):

      • CREATE TABLE tourneys (
      • name varchar(30),
      • wins real,
      • best real,
      • size real
      • );

      Once you run the CREATE TABLE command and populate it with column headings, you’ll receive the following output:

      Output

      CREATE TABLE

      Populate the tourneys table with some sample data:

      • INSERT INTO tourneys (name, wins, best, size)
      • VALUES ('Dolly', '7', '245', '8.5'),
      • ('Etta', '4', '283', '9'),
      • ('Irma', '9', '266', '7'),
      • ('Barbara', '2', '197', '7.5'),
      • ('Gladys', '13', '273', '8');

      You’ll receive the following output:

      Output

      INSERT 0 5

      Following this, create another table within the same database which we'll use to store information about your friends' favorite birthday meals. The following command creates a table named dinners with columns for the name of each of your friends, their birthdate, their favorite entree, their preferred side dish, and their favorite dessert:

      • CREATE TABLE dinners (
      • name varchar(30),
      • birthdate date,
      • entree varchar(30),
      • side varchar(30),
      • dessert varchar(30)
      • );

      Similarly for this table, you’ll receive feedback verifying that the table was created:

      Output

      CREATE TABLE

      Populate this table with some sample data as well:

      • INSERT INTO dinners (name, birthdate, entree, side, dessert)
      • VALUES ('Dolly', '1946-01-19', 'steak', 'salad', 'cake'),
      • ('Etta', '1938-01-25', 'chicken', 'fries', 'ice cream'),
      • ('Irma', '1941-02-18', 'tofu', 'fries', 'cake'),
      • ('Barbara', '1948-12-25', 'tofu', 'salad', 'ice cream'),
      • ('Gladys', '1944-05-28', 'steak', 'fries', 'ice cream');

      Output

      INSERT 0 5

      Once that command completes successfully, you're done setting up your database. Next, we'll go over the basic command structure of SELECT queries.

      Understanding SELECT Statements

      As mentioned in the introduction, SQL queries almost always begin with the SELECT statement. SELECT is used in queries to specify which columns from a table should be returned in the result-set. Queries also almost always include FROM, which is used to specify which table the statement will query.

      Generally, SQL queries follow this syntax:

      • SELECT column_to_select FROM table_to_select WHERE certain_conditions_apply;

      By way of example, the following statement will return the entire name column from the dinners table:

      • SELECT name FROM dinners;

      Output

      name --------- Dolly Etta Irma Barbara Gladys (5 rows)

      You can select multiple columns from the same table by separating their names with a comma, like this:

      • SELECT name, birthdate FROM dinners;

      Output

      name | birthdate ---------+------------ Dolly | 1946-01-19 Etta | 1938-01-25 Irma | 1941-02-18 Barbara | 1948-12-25 Gladys | 1944-05-28 (5 rows)

      Instead of naming a specific column or set of columns, you can follow the SELECT operator with an asterisk (*) which serves as a placeholder representing all the columns in a table. The following command returns every column from the tourneys table:

      Output

      name | wins | best | size ---------+------+------+------ Dolly | 7 | 245 | 8.5 Etta | 4 | 283 | 9 Irma | 9 | 266 | 7 Barbara | 2 | 197 | 7.5 Gladys | 13 | 273 | 8 (5 rows)

      WHERE is used in queries to filter records that meet a specified condition, and any rows that do not meet that condition are eliminated from the result. A WHERE clause typically follows this syntax:

      • . . . WHERE column_name comparison_operator value

      The comparison operator in a WHERE clause defines how the specified column should be compared against the value. Here are some common SQL comparison operators:

      Operator What it does
      = tests for equality
      != tests for inequality
      < tests for less-than
      > tests for greater-than
      <= tests for less-than or equal-to
      >= tests for greater-than or equal-to
      BETWEEN tests whether a value lies within a given range
      IN tests whether a row's value is contained in a set of specified values
      EXISTS tests whether rows exist, given the specified conditions
      LIKE tests whether a value matches a specified string
      IS NULL tests for NULL values
      IS NOT NULL tests for all values other than NULL

      For example, if you wanted to find Irma's shoe size, you could use the following query:

      • SELECT size FROM tourneys WHERE name = 'Irma';

      Output

      size ------ 7 (1 row)

      SQL allows the use of wildcard characters, and these are especially handy when used in WHERE clauses. Percentage signs (%) represent zero or more unknown characters, and underscores (_) represent a single unknown character. These are useful if you're trying to find a specific entry in a table, but aren't sure of what that entry is exactly. To illustrate, let's say that you've forgotten the favorite entree of a few of your friends, but you're certain this particular entree starts with a "t." You could find its name by running the following query:

      • SELECT entree FROM dinners WHERE entree LIKE 't%';

      Output

      entree ------- tofu tofu (2 rows)

      Based on the output above, we see that the entree we have forgotten is tofu.

      There may be times when you're working with databases that have columns or tables with relatively long or difficult-to-read names. In these cases, you can make these names more readable by creating an alias with the AS keyword. Aliases created with AS are temporary, and only exist for the duration of the query for which they're created:

      • SELECT name AS n, birthdate AS b, dessert AS d FROM dinners;

      Output

      n | b | d ---------+------------+----------- Dolly | 1946-01-19 | cake Etta | 1938-01-25 | ice cream Irma | 1941-02-18 | cake Barbara | 1948-12-25 | ice cream Gladys | 1944-05-28 | ice cream (5 rows)

      Here, we have told SQL to display the name column as n, the birthdate column as b, and the dessert column as d.

      The examples we've gone through up to this point include some of the more frequently-used keywords and clauses in SQL queries. These are useful for basic queries, but they aren't helpful if you're trying to perform a calculation or derive a scalar value (a single value, as opposed to a set of multiple different values) based on your data. This is where aggregate functions come into play.

      Aggregate Functions

      Oftentimes, when working with data, you don't necessarily want to see the data itself. Rather, you want information about the data. The SQL syntax includes a number of functions that allow you to interpret or run calculations on your data just by issuing a SELECT query. These are known as aggregate functions.

      The COUNT function counts and returns the number of rows that match a certain criteria. For example, if you'd like to know how many of your friends prefer tofu for their birthday entree, you could issue this query:

      • SELECT COUNT(entree) FROM dinners WHERE entree = 'tofu';

      Output

      count ------- 2 (1 row)

      The AVG function returns the average (mean) value of a column. Using our example table, you could find the average best score amongst your friends with this query:

      • SELECT AVG(best) FROM tourneys;

      Output

      avg ------- 252.8 (1 row)

      SUM is used to find the total sum of a given column. For instance, if you'd like to see how many games you and your friends have bowled over the years, you could run this query:

      • SELECT SUM(wins) FROM tourneys;

      Output

      sum ----- 35 (1 row)

      Note that the AVG and SUM functions will only work correctly when used with numeric data. If you try to use them on non-numerical data, it will result in either an error or just 0, depending on which RDBMS you're using:

      • SELECT SUM(entree) FROM dinners;

      Output

      ERROR: function sum(character varying) does not exist LINE 1: select sum(entree) from dinners; ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.

      MIN is used to find the smallest value within a specified column. You could use this query to see what the worst overall bowling record is so far (in terms of number of wins):

      • SELECT MIN(wins) FROM tourneys;

      Output

      min ----- 2 (1 row)

      Similarly, MAX is used to find the largest numeric value in a given column. The following query will show the best overall bowling record:

      • SELECT MAX(wins) FROM tourneys;

      Output

      max ----- 13 (1 row)

      Unlike SUM and AVG, the MIN and MAX functions can be used for both numeric and alphabetic data types. When run on a column containing string values, the MIN function will show the first value alphabetically:

      • SELECT MIN(name) FROM dinners;

      Output

      min --------- Barbara (1 row)

      Likewise, when run on a column containing string values, the MAX function will show the last value alphabetically:

      • SELECT MAX(name) FROM dinners;

      Output

      max ------ Irma (1 row)

      Aggregate functions have many uses beyond what was described in this section. They're particularly useful when used with the GROUP BY clause, which is covered in the next section along with several other query clauses that affect how result-sets are sorted.

      Manipulating Query Outputs

      In addition to the FROM and WHERE clauses, there are several other clauses which are used to manipulate the results of a SELECT query. In this section, we will explain and provide examples for some of the more commonly-used query clauses.

      One of the most frequently-used query clauses, aside from FROM and WHERE, is the GROUP BY clause. It's typically used when you're performing an aggregate function on one column, but in relation to matching values in another.

      For example, let's say you wanted to know how many of your friends prefer each of the three entrees you make. You could find this info with the following query:

      • SELECT COUNT(name), entree FROM dinners GROUP BY entree;

      Output

      count | entree -------+--------- 1 | chicken 2 | steak 2 | tofu (3 rows)

      The ORDER BY clause is used to sort query results. By default, numeric values are sorted in ascending order, and text values are sorted in alphabetical order. To illustrate, the following query lists the name and birthdate columns, but sorts the results by birthdate:

      • SELECT name, birthdate FROM dinners ORDER BY birthdate;

      Output

      name | birthdate ---------+------------ Etta | 1938-01-25 Irma | 1941-02-18 Gladys | 1944-05-28 Dolly | 1946-01-19 Barbara | 1948-12-25 (5 rows)

      Notice that the default behavior of ORDER BY is to sort the result-set in ascending order. To reverse this and have the result-set sorted in descending order, close the query with DESC:

      • SELECT name, birthdate FROM dinners ORDER BY birthdate DESC;

      Output

      name | birthdate ---------+------------ Barbara | 1948-12-25 Dolly | 1946-01-19 Gladys | 1944-05-28 Irma | 1941-02-18 Etta | 1938-01-25 (5 rows)

      As mentioned previously, the WHERE clause is used to filter results based on specific conditions. However, if you use the WHERE clause with an aggregate function, it will return an error, as is the case with the following attempt to find which sides are the favorite of at least three of your friends:

      • SELECT COUNT(name), side FROM dinners WHERE COUNT(name) >= 3;

      Output

      ERROR: aggregate functions are not allowed in WHERE LINE 1: SELECT COUNT(name), side FROM dinners WHERE COUNT(name) >= 3...

      The HAVING clause was added to SQL to provide functionality similar to that of the WHERE clause while also being compatible with aggregate functions. It's helpful to think of the difference between these two clauses as being that WHERE applies to individual records, while HAVING applies to group records. To this end, any time you issue a HAVING clause, the GROUP BY clause must also be present.

      The following example is another attempt to find which side dishes are the favorite of at least three of your friends, although this one will return a result without error:

      • SELECT COUNT(name), side FROM dinners GROUP BY side HAVING COUNT(name) >= 3;

      Output

      count | side -------+------- 3 | fries (1 row)

      Aggregate functions are useful for summarizing the results of a particular column in a given table. However, there are many cases where it's necessary to query the contents of more than one table. We'll go over a few ways you can do this in the next section.

      Querying Multiple Tables

      More often than not, a database contains multiple tables, each holding different sets of data. SQL provides a few different ways to run a single query on multiple tables.

      The JOIN clause can be used to combine rows from two or more tables in a query result. It does this by finding a related column between the tables and sorts the results appropriately in the output.

      SELECT statements that include a JOIN clause generally follow this syntax:

      • SELECT table1.column1, table2.column2
      • FROM table1
      • JOIN table2 ON table1.related_column=table2.related_column;

      Note that because JOIN clauses compare the contents of more than one table, the previous example specifies which table to select each column from by preceding the name of the column with the name of the table and a period. You can specify which table a column should be selected from like this for any query, although it's not necessary when selecting from a single table, as we've done in the previous sections. Let's walk through an example using our sample data.

      Imagine that you wanted to buy each of your friends a pair of bowling shoes as a birthday gift. Because the information about your friends' birthdates and shoe sizes are held in separate tables, you could query both tables separately then compare the results from each. With a JOIN clause, though, you can find all the information you want with a single query:

      • SELECT tourneys.name, tourneys.size, dinners.birthdate
      • FROM tourneys
      • JOIN dinners ON tourneys.name=dinners.name;

      Output

      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 (5 rows)

      The JOIN clause used in this example, without any other arguments, is an inner JOIN clause. This means that it selects all the records that have matching values in both tables and prints them to the results set, while any records that aren't matched are excluded. To illustrate this idea, let's add a new row to each table that doesn't have a corresponding entry in the other:

      • INSERT INTO tourneys (name, wins, best, size)
      • VALUES ('Bettye', '0', '193', '9');
      • INSERT INTO dinners (name, birthdate, entree, side, dessert)
      • VALUES ('Lesley', '1946-05-02', 'steak', 'salad', 'ice cream');

      Then, re-run the previous SELECT statement with the JOIN clause:

      • SELECT tourneys.name, tourneys.size, dinners.birthdate
      • FROM tourneys
      • JOIN dinners ON tourneys.name=dinners.name;

      Output

      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 (5 rows)

      Notice that, because the tourneys table has no entry for Lesley and the dinners table has no entry for Bettye, those records are absent from this output.

      It is possible, though, to return all the records from one of the tables using an outer JOIN clause. Outer JOIN clauses are written as either LEFT JOIN, RIGHT JOIN, or FULL JOIN.

      A LEFT JOIN clause returns all the records from the “left” table and only the matching records from the right table. In the context of outer joins, the left table is the one referenced by the FROM clause, and the right table is any other table referenced after the JOIN statement.

      Run the previous query again, but this time use a LEFT JOIN clause:

      • SELECT tourneys.name, tourneys.size, dinners.birthdate
      • FROM tourneys
      • LEFT JOIN dinners ON tourneys.name=dinners.name;

      This command will return every record from the left table (in this case, tourneys) even if it doesn't have a corresponding record in the right table. Any time there isn't a matching record from the right table, it's returned as a blank value or NULL, depending on your RDBMS:

      Output

      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 Bettye | 9 | (6 rows)

      Now run the query again, this time with a RIGHT JOIN clause:

      • SELECT tourneys.name, tourneys.size, dinners.birthdate
      • FROM tourneys
      • RIGHT JOIN dinners ON tourneys.name=dinners.name;

      This will return all the records from the right table (dinners). Because Lesley's birthdate is recorded in the right table, but there is no corresponding row for her in the left table, the name and size columns will return as blank values in that row:

      Output

      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 | | 1946-05-02 (6 rows)

      Note that left and right joins can be written as LEFT OUTER JOIN or RIGHT OUTER JOIN, although the OUTER part of the clause is implied. Likewise, specifying INNER JOIN will produce the same result as just writing JOIN.

      There is a fourth join clause called FULL JOIN available for some RDBMS distributions, including PostgreSQL. A FULL JOIN will return all the records from each table, including any null values:

      • SELECT tourneys.name, tourneys.size, dinners.birthdate
      • FROM tourneys
      • FULL JOIN dinners ON tourneys.name=dinners.name;

      Output

      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 Bettye | 9 | | | 1946-05-02 (7 rows)

      Note: As of this writing, the FULL JOIN clause is not supported by either MySQL or MariaDB.

      As an alternative to using FULL JOIN to query all the records from multiple tables, you can use the UNION clause.

      The UNION operator works slightly differently than a JOIN clause: instead of printing results from multiple tables as unique columns using a single SELECT statement, UNION combines the results of two SELECT statements into a single column.

      To illustrate, run the following query:

      • SELECT name FROM tourneys UNION SELECT name FROM dinners;

      This query will remove any duplicate entries, which is the default behavior of the UNION operator:

      Output

      name --------- Irma Etta Bettye Gladys Barbara Lesley Dolly (7 rows)

      To return all entries (including duplicates) use the UNION ALL operator:

      • SELECT name FROM tourneys UNION ALL SELECT name FROM dinners;

      Output

      name --------- Dolly Etta Irma Barbara Gladys Bettye Dolly Etta Irma Barbara Gladys Lesley (12 rows)

      The names and number of the columns in the results table reflect the name and number of columns queried by the first SELECT statement. Note that when using UNION to query multiple columns from more than one table, each SELECT statement must query the same number of columns, the respective columns must have similar data types, and the columns in each SELECT statement must be in the same order. The following example shows what might result if you use a UNION clause on two SELECT statements that query a different number of columns:

      • SELECT name FROM dinners UNION SELECT name, wins FROM tourneys;

      Output

      ERROR: each UNION query must have the same number of columns LINE 1: SELECT name FROM dinners UNION SELECT name, wins FROM tourne...

      Another way to query multiple tables is through the use of subqueries. Subqueries (also known as inner or nested queries) are queries enclosed within another query. These are useful in cases where you're trying to filter the results of a query against the result of a separate aggregate function.

      To illustrate this idea, say you want to know which of your friends have won more matches than Barbara. Rather than querying how many matches Barbara has won then running another query to see who has won more games than that, you can calculate both with a single query:

      • SELECT name, wins FROM tourneys
      • WHERE wins > (
      • SELECT wins FROM tourneys WHERE name = 'Barbara'
      • );

      Output

      name | wins --------+------ Dolly | 7 Etta | 4 Irma | 9 Gladys | 13 (4 rows)

      The subquery in this statement was run only once; it only needed to find the value from the wins column in the same row as Barbara in the name column, and the data returned by the subquery and outer query are independent of one another. There are cases, though, where the outer query must first read every row in a table and compare those values against the data returned by the subquery in order to return the desired data. In this case, the subquery is referred to as a correlated subquery.

      The following statement is an example of a correlated subquery. This query seeks to find which of your friends have won more games than is the average for those with the same shoe size:

      • SELECT name, size FROM tourneys AS t
      • WHERE wins > (
      • SELECT AVG(wins) FROM tourneys WHERE size = t.size
      • );

      In order for the query to complete, it must first collect the name and size columns from the outer query. Then, it compares each row from that result set against the results of the inner query, which determines the average number of wins for individuals with identical shoe sizes. Because you only have two friends that have the same shoe size, there can only be one row in the result-set:

      Output

      name | size ------+------ Etta | 9 (1 row)

      As mentioned earlier, subqueries can be used to query results from multiple tables. To illustrate this with one final example, say you wanted to throw a surprise dinner for the group's all-time best bowler. You could find which of your friends has the best bowling record and return their favorite meal with the following query:

      • SELECT name, entree, side, dessert
      • FROM dinners
      • WHERE name = (SELECT name FROM tourneys
      • WHERE wins = (SELECT MAX(wins) FROM tourneys));

      Output

      name | entree | side | dessert --------+--------+-------+----------- Gladys | steak | fries | ice cream (1 row)

      Notice that this statement not only includes a subquery, but also contains a subquery within that subquery.

      Conclusion

      Issuing queries is one of the most commonly-performed tasks within the realm of database management. There are a number of database administration tools, such as phpMyAdmin or pgAdmin, that allow you to perform queries and visualize the results, but issuing SELECT statements from the command line is still a widely-practiced workflow that can also provide you with greater control.

      If you're new to working with SQL, we encourage you to use our SQL Cheat Sheet as a reference and to review the official PostgreSQL documenation. Additionally, if you'd like to learn more about SQL and relational databases, the following tutorials may be of interest to you:



      Source link

      How To Move a PostgreSQL Data Directory to a New Location on Ubuntu 18.04


      Introduction

      Databases grow over time, sometimes outgrowing the space on their original file system. When they’re located on the same partition as the rest of the operating system, this can also potentially lead to I/O contention.

      RAID, network block storage, and other devices can offer redundancy and improve scalability, along with other desirable features. Whether you’re adding more space, evaluating ways to optimize performance, or looking to take advantage of other storage features, this tutorial will guide you through relocating PostgreSQL’s data directory.

      Prerequisites

      To complete this guide, you will need:

      In this example, we’re moving the data to a block storage device mounted at /mnt/volume_nyc1_01. If you are using Block Storage on DigitalOcean, this guide can help you mount your volume before continuing with this tutorial.

      Regardless of what underlying storage you use, though, the following steps can help you move the data directory to a new location.

      Step 1 — Moving the PostgreSQL Data Directory

      Before we get started with moving PostgreSQL’s data directory, let’s verify the current location by starting an interactive PostgreSQL session. In the following command, psql is the command to enter the interactive monitor and -u postgres tells sudo to execute psql as the system’s postgres user:

      Once you have the PostgreSQL prompt opened up, use the following command to show the current data directory:

      Output

      data_directory ------------------------------ /var/lib/postgresql/10/main (1 row)

      This output confirms that PostgreSQL is configured to use the default data directory, /var/lib/postgresql/10/main, so that’s the directory we need to move. Once you've confirmed the directory on your system, type q and press ENTER to close the PostgreSQL prompt.

      To ensure the integrity of the data, stop PostgreSQL before you actually make changes to the data directory:

      • sudo systemctl stop postgresql

      systemctl doesn't display the outcome of all service management commands. To verify that you’ve successfully stopped the service, use the following command:

      • sudo systemctl status postgresql

      The final line of the output should tell you that PostgreSQL has been stopped:

      Output

      . . . Jul 12 15:22:44 ubuntu-512mb-nyc1-01 systemd[1]: Stopped PostgreSQL RDBMS.

      Now that the PostgreSQL server is shut down, we’ll copy the existing database directory to the new location with rsync. Using the -a flag preserves the permissions and other directory properties, while -v provides verbose output so you can follow the progress. We’re going to start the rsync from the postgresql directory in order to mimic the original directory structure in the new location. By creating that postgresql directory within the mount-point directory and retaining ownership by the PostgreSQL user, we can avoid permissions problems for future upgrades.

      Note: Be sure there is no trailing slash on the directory, which may be added if you use tab completion. If you do include a trailing slash, rsync will dump the contents of the directory into the mount point instead of copying over the directory itself.

      The version directory, 10, isn’t strictly necessary since we’ve defined the location explicitly in the postgresql.conf file, but following the project convention certainly won’t hurt, especially if there’s a need in the future to run multiple versions of PostgreSQL:

      • sudo rsync -av /var/lib/postgresql /mnt/volume_nyc1_01

      Once the copy is complete, we'll rename the current folder with a .bak extension and keep it until we’ve confirmed that the move was successful. This will help to avoid confusion that could arise from having similarly-named directories in both the new and the old location:

      • sudo mv /var/lib/postgresql/10/main /var/lib/postgresql/10/main.bak

      Now we’re ready to configure PostgreSQL to access the data directory in its new location.

      Step 2 — Pointing to the New Data Location

      By default, the data_directory is set to /var/lib/postgresql/10/main in the /etc/postgresql/10/main/postgresql.conf file. Edit this file to reflect the new data directory:

      • sudo nano /etc/postgresql/10/main/postgresql.conf

      Find the line that begins with data_directory and change the path which follows to reflect the new location. In the context of this tutorial, the updated directive will look like this:

      /etc/postgresql/10/main/postgresql.conf

      . . .
      data_directory = '/mnt/volume_nyc1_01/postgresql/10/main'
      . . .
      

      Save and close the file by pressing CTRL + X, Y, then ENTER. This is all you need to do to configure PostgreSQL to use the new data directory location. All that’s left at this point is to start the PostgreSQL service again and check that it is indeed pointing to the correct data directory.

      Step 3 — Restarting PostgreSQL

      After changing the data-directory directive in the postgresql.conf file, go ahead and start the PostgreSQL server using systemctl:

      • sudo systemctl start postgresql

      To confirm that the PostgreSQL server started successfully, check its status by again using systemctl:

      • sudo systemctl status postgresql

      If the service started correctly, you will see the following line at the end of this command’s output:

      Output

      . . . Jul 12 15:45:01 ubuntu-512mb-nyc1-01[1]: Started PostgreSQL RDBMS. . . .

      Lastly, to make sure that the new data directory is indeed in use, open the PostgreSQL command prompt.

      Check the value for the data directory again:

      Output

      data_directory ----------------------------------------- /mnt/volume_nyc1_01/postgresql/10/main (1 row)

      This confirms that PostgreSQL is using the new data directory location. Following this, take a moment to ensure that you’re able to access your database as well as interact with the data within. Once you’ve verified the integrity of any existing data, you can remove the backup data directory:

      • sudo rm -Rf /var/lib/postgresql/10/main.bak

      With that, you have successfully moved your PostgreSQL data directory to a new location.

      Conclusion:

      If you’ve followed along, your database should be running with its data directory in the new location and you’ve completed an important step toward being able to scale your storage. You might also want to take a look at 5 Common Server Setups For Your Web Application for ideas on how to create a server infrastructure to help you scale and optimize web applications.



      Source link

      How To Set Up Logical Replication with PostgreSQL 10 on Ubuntu 18.04


      Introduction

      When setting up an application for production, it’s often useful to have multiple copies of your database in place. The process of keeping database copies in sync is called replication. Replication can provide high-availability horizontal scaling for high volumes of simultaneous read operations, along with reduced read latencies. It also allows for peer-to-peer replication between geographically distributed database servers.

      PostgreSQL is an open-source object-relational database system that is highly extensible and compliant with ACID (Atomicity, Consistency, Isolation, Durability) and the SQL standard. Version 10.0 of PostgreSQL introduced support for logical replication, in addition to physical replication. In a logical replication scheme, high-level write operations are streamed from a master database server into one or more replica database servers. In a physical replication scheme, binary write operations are instead streamed from master to replica, producing a byte-for-byte exact copy of the original content. In cases where you would like to target a particular subset of data, such as off-load reporting, patching, or upgrading, logical replication can offer speed and flexibility.

      In this tutorial, you will configure logical replication with PostgreSQL 10 on two Ubuntu 18.04 servers, with one server acting as the master and the other as the replica. By the end of the tutorial you will be able to replicate data from the master server to the replica using logical replication.

      Prerequisites

      To follow this tutorial, you will need:

      Step 1 — Configuring PostgreSQL for Logical Replication

      There are several configuration settings you will need to modify to enable logical replication between your servers. First, you’ll configure Postgres to listen on the private network interface instead of the public one, as exposing data over the public network is a security risk. Then you’ll configure the appropriate settings to allow replication to db-replica.

      On db-master, open /etc/postgresql/10/main/postgresql.conf, the main server configuration file:

      • sudo nano /etc/postgresql/10/main/postgresql.conf

      Find the following line:

      /etc/postgresql/10/main/postgresql.conf

      ...
      #listen_addresses = 'localhost'         # what IP address(es) to listen on;
      ...
      

      Uncomment it by removing the #, and add your db_master_private_ip_address to enable connections on the private network:

      Note: In this step and the steps that follow, make sure to use the private IP addresses of your servers, and not their public IPs. Exposing a database server to the public internet is a considerable security risk.

      /etc/postgresql/10/main/postgresql.conf

      ...
      listen_addresses = 'localhost, db_master_private_ip_address'
      ...
      

      This makes db-master listen for incoming connections on the private network in addition to the loopback interface.

      Next, find the following line:

      /etc/postgresql/10/main/postgresql.conf

      ...
      #wal_level = replica                    # minimal, replica, or logical
      ...
      

      Uncomment it, and change it to set the PostgreSQL Write Ahead Log (WAL) level to logical. This increases the volume of entries in the log, adding the necessary information for extracting discrepancies or changes to particular data sets:

      /etc/postgresql/10/main/postgresql.conf

      ...
      wal_level = logical
      ...
      

      The entries on this log will be consumed by the replica server, allowing for the replication of the high-level write operations from the master.

      Save the file and close it.

      Next, let’s edit /etc/postgresql/10/main/pg_hba.conf, the file that controls allowed hosts, authentication, and access to databases:

      • sudo nano /etc/postgresql/10/main/pg_hba.conf

      After the last line, let’s add a line to allow incoming network connections from db-replica. We’ll use db-replica‘s private IP address, and specify that connections are allowed from all users and databases:

      /etc/postgresql/10/main/pg_hba.conf

      ...
      # TYPE      DATABASE        USER            ADDRESS                               METHOD
      ...
      host         all            all             db_replica_private_ip_address/32      md5
      

      Incoming network connections will now be allowed from db-replica, authenticated by a password hash (md5).

      Save the file and close it.

      Next, let’s set our firewall rules to allow traffic from db-replica to port 5432 on db-master:

      • sudo ufw allow from db_replica_private_ip_address to any port 5432

      Finally, restart the PostgreSQL server for the changes to take effect:

      • sudo systemctl restart postgresql

      With your configuration set to allow logical replication, you can now move on to creating a database, user role, and table.

      Step 2 — Setting Up a Database, User Role, and Table

      To test the functionality of your replication settings, let’s create a database, table, and user role. You will create an example database with a sample table, which you can then use to test logical replication between your servers. You will also create a dedicated user and assign them privileges over both the database and the table.

      First, open the psql prompt as the postgres user with the following command on both db-master and db-replica:

      Create a new database called example on both hosts:

      Note: The final ; in these commands is required. On interactive sessions, PostgreSQL will not execute SQL commands until you terminate them with a semicolon. Meta-commands (those starting with a backslash, like q and c) directly control the psql client itself, and are therefore exempt from this rule. For more on meta-commands and the psql client, please refer to the PostgreSQL documentation.

      Using the connect meta-command, connect to the databases you just created on each host:

      Create a new table called widgets with arbitrary fields on both hosts:

      • CREATE TABLE widgets
      • (
      • id SERIAL,
      • name TEXT,
      • price DECIMAL,
      • CONSTRAINT widgets_pkey PRIMARY KEY (id)
      • );
      • CREATE TABLE widgets
      • (
      • id SERIAL,
      • name TEXT,
      • price DECIMAL,
      • CONSTRAINT widgets_pkey PRIMARY KEY (id)
      • );

      The table on db-replica does not need to be identical to its db-master counterpart. However, it must contain every single column present on the table at db-master. Additional columns must not have NOT NULL or other constraints. If they do, replication will fail.

      On db-master, let's create a new user role with the REPLICATION option and a login password. The REPLICATION attribute must be assigned to any role used for replication. We will call our user sammy, but you can replace this with your own username. Make sure to also replace my_password with your own secure password:

      • CREATE ROLE sammy WITH REPLICATION LOGIN PASSWORD 'my_password';

      Make a note of your password, as you will use it later on db-replica to set up replication.

      Still on db-master, grant full privileges on the example database to the user role you just created:

      • GRANT ALL PRIVILEGES ON DATABASE example TO sammy;

      Next, grant privileges on all of the tables contained in the database to your user:

      • GRANT ALL PRIVILEGES ON ALL TABLES IN SCHEMA public TO sammy;

      The public schema is a default schema in each database into which tables are automatically placed.

      With these privileges set, you can now move on to making the tables in your example database available for replication.

      Step 3 — Setting Up a Publication

      Publications are the mechanism that PostgreSQL uses to make tables available for replication. The database server will keep track internally of the connection and replication status of any replica servers associated with a given publication. On db-master, you will create a publication, my_publication, that will function as a master copy of the data that will be sent to your subscribers — in our case, db-replica.

      On db-master, create a publication called my_publication:

      • CREATE PUBLICATION my_publication;

      Add the widgets table you created previously to it:

      • ALTER PUBLICATION my_publication ADD TABLE widgets;

      With your publication in place, you can now add a subscriber that will pull data from it.

      Step 4 — Creating a Subscription

      Subscriptions are used by PostgreSQL to connect to existing publications. A publication can have many subscriptions across different replica servers, and replica servers can also have their own publications with subscribers. To access the data from the table you created on db-master, you will need to create a subscription to the publication you created in the previous step, my_publication.

      On db-replica, let's create a subscription called my_subscription. The CREATE SUBSCRIPTION command will name the subscription, while the CONNECTION parameter will define the connection string to the publisher. This string will include the master server's connection details and login credentials, including the username and password you defined earlier, along with the name of the example database. Once again, remember to use db-master's private IP address, and replace my_password with your own password:

      • CREATE SUBSCRIPTION my_subscription CONNECTION 'host=db_master_private_ip_address port=5432 password=my_password user=sammy dbname=example' PUBLICATION my_publication;

      You will see the following output confirming the subscription:

      Output

      NOTICE: created replication slot "my_subscription" on publisher CREATE SUBSCRIPTION

      Upon creating a subscription, PostgreSQL will automatically sync any pre-existing data from the master to the replica. In our case there is no data to sync since the widgets table is empty, but this is a useful feature when adding new subscriptions to an existing database.

      With a subscription in place, let's test the setup by adding some demo data to the widgets table.

      Step 5 — Testing and Troubleshooting

      To test replication between our master and replica, let's add some data to the widgets table and verify that it replicates correctly.

      On db-master, insert the following data on the widgets table:

      • INSERT INTO widgets (name, price) VALUES ('Hammer', 4.50), ('Coffee Mug', 6.20), ('Cupholder', 3.80);

      On db-replica, run the following query to fetch all the entries on this table:

      You should now see:

      Output

      id | name | price ----+------------+------- 1 | Hammer | 4.50 2 | Coffee Mug | 6.20 3 | Cupholder | 3.80 (3 rows)

      Success! The entries have been successfully replicated from db-master to db-replica. From now on, all INSERT, UPDATE, and DELETE queries will be replicated across servers unidirectionally.

      One thing to note about write queries on replica servers is that they are not replicated back to the master server. PostgreSQL currently has limited support for resolving conflicts when the data between servers diverges. If there is a conflict, the replication will stop and PostgreSQL will wait until the issue is manually fixed by the database administrator. For that reason, most applications will direct all write operations to the master server, and distribute reads among available replica servers.

      You can now exit the psql prompt on both servers:

      Now that you have finished testing your setup, you can add and replicate data on your own.

      Troubleshooting

      If replication doesn't seem to be working, a good first step is checking the PostgreSQL log on db-replica for any possible errors:

      • tail /var/log/postgresql/postgresql-10-main.log

      Here are some common problems that can prevent replication from working:

      • Private networking is not enabled on both servers, or the servers are on different networks;
      • db-master is not configured to listen for connections on the correct private network IP;
      • The Write Ahead Log level on db-master is incorrectly configured (it must be set to logical);
      • db-master is not configured to accept incoming connections from the correct db-replica private IP address;
      • A firewall like UFW is blocking incoming PostgreSQL connections on port 5432;
      • There are mismatched table names or fields between db-master and db-replica;
      • The sammy database role is missing the required permissions to access the example database on db-master;
      • The sammy database role is missing the REPLICATION option on db-master;
      • The sammy database role is missing the required permissions to access the widgets table on db-master;
      • The table wasn't added to the publication on db-master.

      After resolving the existing problem(s), replication should take place automatically. If it doesn't, use following command to remove the existing subscription before recreating it:

      • DROP SUBSCRIPTION my_subscription;

      Conclusion

      In this tutorial you've successfully installed PostgreSQL 10 on two Ubuntu 18.04 servers and configured logical replication between them.

      You now have the required knowledge to experiment with horizontal read scaling, high availability, and the geographical distribution of your PostgreSQL database by adding additional replica servers.

      To learn more about logical replication in PostgreSQL 10, you can read the chapter on the topic on the official PostgreSQL documentation, as well as the manual entries on the CREATE PUBLICATION and CREATE SUBSCRIPTION commands.



      Source link