One place for hosting & domains

      How To Use Joins in SQL


      Introduction

      Many database designs separate information into different tables based on the relationships between certain data points. Even in cases like this, it’s likely that there will be times when someone will want to retrieve information from more than one table at a time.

      A common way of accessing data from multiple tables in a single Structured Query Language (SQL) operation is to combine the tables with a JOIN clause. Based on join operations in relational algebra, a JOIN clause combines separate tables by matching up rows in each table that relate to one another. Usually, this relationship is based on a pair of columns — one from each table — that share common values, such as one table’s foreign key and the primary key of another table that the foreign key references.

      This guide outlines how to construct a variety of SQL queries that include a JOIN clause. It also highlights different types of JOIN clauses, how they combine data from multiple tables, and how to alias column names to make writing JOIN operations less tedious.

      Prerequisites

      In order to follow this guide, you will need a computer running some type of relational database management system (RDBMS) that uses SQL. The instructions and examples in this guide were validated using the following environment:

      Note: Please note that many RDBMSs use their own unique implementations of SQL. Although the commands outlined in this tutorial will work on most RDBMSs, the exact syntax or output may differ if you test them on a system other than MySQL.

      • You’ll also need a database with some tables loaded with sample data which you can use to practice using JOIN operations. We encourage you to go through the following Connecting to MySQL and Setting up a Sample Database section for details on how to connect to a MySQL server and create the testing database used in examples throughout this guide.

      Connecting to and Setting up a Sample Database

      If your SQL database system runs on a remote server, SSH into your server from your local machine:

      Then open up the MySQL server prompt, replacing sammy with the name of your MySQL user account:

      Create a database named joinsDB:

      If the database was created successfully, you’ll receive output like this:

      Output

      Query OK, 1 row affected (0.01 sec)

      To select the joinsDB database, run the following USE statement:

      Output

      Database changed

      After selecting joinsDB, create a few tables within it. For the examples used in this guide, imagine that you run a factory and have decided to begin tracking information about your product line, employees on your sales team, and your company’s sales in an SQL database. You plan to start off with three tables, the first of which will store information about your products. You decide this first table needs three columns:

      • productID: each product’s identification number, expressed with the int data type. This column will serve as the table’s primary key, meaning that each value will function as a unique identifier for its respective row. Because every value in a primary key must be unique, this column will also have a UNIQUE constraint applied to it
      • productName: each product’s name, expressed using the varchar data type with a maximum of 20 characters
      • price: the price of each product, expressed using the decimal data type. This statement specifies that any values in this column are limited to a maximum of four digits in length with two of those digits to the right of the decimal point. Thus, the range of values allowed in this column goes from -99.99 to 99.99

      Create a table named products that has these three columns:

      • CREATE TABLE products (
      • productID int UNIQUE,
      • productName varchar(20),
      • price decimal (4,2),
      • PRIMARY KEY (productID)
      • );

      The second table will store information about the employees on your company’s sales team. You decide this table also needs three columns:

      • empID: similar to the productID column, this column will hold a unique identification number for each employee on the sales team expressed with the int data type. Likewise, this column will have a UNIQUE constraint applied to it and will serve as the primary key for the team table
      • empName: the name of each salesperson, expressed using the varchar data type with a maximum of 20 characters
      • productSpecialty: each member of your sales team has been assigned a product as their specialty; they can sell any product your company makes, but their overall focus will be on whatever product they specialize in. To indicate this in the table, you create this column which holds the productID value of whatever product each employee specializes in

      To ensure that the productSpecialty column only holds values that represent valid product ID numbers, you decide to apply a foreign key constraint to the column that references the products table’s productID column. A foreign key constraint is a way to express a relationship between two tables by requiring that values in the column on which it applies must already exist in the column that it references. In the following CREATE TABLE statement, the FOREIGN KEY constraint requires that any value added to the productSpecialty column in the team table must already exist in the products table’s productID column.

      Create a table named team with these three columns:

      • CREATE TABLE team (
      • empID int UNIQUE,
      • empName varchar(20),
      • productSpecialty int,
      • PRIMARY KEY (empID),
      • FOREIGN KEY (productSpecialty) REFERENCES products (productID)
      • );

      The last table you create will hold records of the company’s sales. This table will have four columns:

      • saleID: similar to the productID and empIDcolumns, this column will hold a unique identification number for each sale expressed with the int data type. This column will also have a UNIQUE constraint so it can serve as the primary key for the sales table
      • quantity: the number of units of each product sold, expressed with the int data type
      • productID: the identification number of the product sold, expressed as an int
      • salesperson: the identification number of the employee who made the sale

      Like the productSpecialty column from the team table, you decide to apply FOREIGN KEY constraints to both the productID and salesperson columns. This will ensure that these columns only contain values that already exist in the products table’s productID column and the team table’s empID columns, respectively.

      Create a table named sales with these four columns:

      • CREATE TABLE sales (
      • saleID int UNIQUE,
      • quantity int,
      • productID int,
      • salesperson int,
      • PRIMARY KEY (saleID),
      • FOREIGN KEY (productID) REFERENCES products (productID),
      • FOREIGN KEY (salesperson) REFERENCES team (empID)
      • );

      Following that, load the products table with some sample data by running the following INSERT INTO operation:

      • INSERT INTO products
      • VALUES
      • (1, 'widget', 18.99),
      • (2, 'gizmo', 14.49),
      • (3, 'thingamajig', 39.99),
      • (4, 'doodad', 11.50),
      • (5, 'whatzit', 29.99);

      Then load the team table with some sample data:

      • INSERT INTO team
      • VALUES
      • (1, 'Florence', 1),
      • (2, 'Mary', 4),
      • (3, 'Diana', 3),
      • (4, 'Betty', 2);

      Load the sales table with some sample data as well:

      • INSERT INTO sales
      • VALUES
      • (1, 7, 1, 1),
      • (2, 10, 5, 4),
      • (3, 8, 2, 4),
      • (4, 1, 3, 3),
      • (5, 5, 1, 3);

      Lastly, imagine that your company makes a few sales without the involvement of anyone on your sales team. To record these sales, run the following operation to add three rows to the sales table that don’t include a value for the salesperson column:

      • INSERT INTO sales (saleID, quantity, productID)
      • VALUES
      • (6, 1, 5),
      • (7, 3, 1),
      • (8, 4, 5);

      With that, you’re ready to follow the rest of the guide and begin learning about how to join tables together in SQL.

      Understanding the Syntax of JOIN Operations

      JOIN clauses can be used in a variety of SQL statements, including UPDATE and DELETE operations. For illustration purposes, though, the examples in this guide use SELECT queries to demonstrate how JOIN clauses work.

      The following example shows the general syntax of a SELECT statement that includes a JOIN clause:

      • SELECT table1.column1, table2.column2
      • FROM table1 JOIN table2
      • ON search_condition;

      This syntax begins with a SELECT statement that will return two columns from two separate tables. Note that because JOIN clauses compare the contents of more than one table, this example syntax specifies which table to select each column from by preceding the name of the column with the name of the table and a period. This is known as a fully qualified column reference.

      You can use fully qualified column references like these in any operation, but doing so is technically only necessary in operations where two columns from different tables share the same name. It’s good practice to use them when working with multiple tables, though, as they can help make JOIN operations easier to read and understand.

      After the SELECT clause comes the FROM clause. In any query, the FROM clause is where you define the data set that should be searched in order to return the desired data. The only difference here is that the FROM clause includes two tables separated by the JOIN keyword. A helpful way to think of writing queries is to remember that you SELECT which columns to return FROM which table you’d like to query.

      Following that is an ON clause, which describes how the query should join the two tables together by defining a search condition. A search condition is a set of one or more predicates, or expressions that can evaluate whether a certain condition is “true,” “false,” or “unknown.” It can be helpful to think of a JOIN operation as combining every row from both tables, and then returning any rows for which the search condition in the ON clause evaluates to “true”.

      In an ON clause, it usually makes sense to include a search condition that tests whether two related columns — like one table’s foreign key and the primary key of another table that the foreign key references — have values that are equal. This is sometimes referred to as an equi join.

      As an example of how equi joins match data from multiple tables, run the following query using the sample data you added previously. This statement will join the products and team tables with a search condition that tests for matching values in their respective productID and productSpecialty columns. It will then return the names of every member of the sales team, the name of each product they specialize in, and the price of those products:

      • SELECT team.empName, products.productName, products.price
      • FROM products JOIN team
      • ON products.productID = team.productSpecialty;

      Here is this query’s result set:

      Output

      +----------+-------------+-------+ | empName | productName | price | +----------+-------------+-------+ | Florence | widget | 18.99 | | Mary | doodad | 11.50 | | Diana | thingamajig | 39.99 | | Betty | gizmo | 14.49 | +----------+-------------+-------+ 4 rows in set (0.00 sec)

      To illustrate how SQL combines these tables to form this result set, let’s take a closer look at this process. To be clear, the following isn’t exactly what happens when a database management system joins two tables together, but it can be helpful to think of JOIN operations as following a procedure like this.

      First, the query prints every row and column in the first table in the FROM clause, products:

      JOIN Process Example

      +-----------+-------------+-------+ | productID | productName | price | +-----------+-------------+-------+ | 1 | widget | 18.99 | | 2 | gizmo | 14.49 | | 3 | thingamajig | 39.99 | | 4 | doodad | 11.50 | | 5 | whatzit | 29.99 | +-----------+-------------+-------+

      Then, it looks at each of these rows and matches any row from the team table whose productSpecialty is equal to the productID value in that row:

      JOIN Process Example

      +-----------+-------------+-------+-------+----------+------------------+ | productID | productName | price | empID | empName | productSpecialty | +-----------+-------------+-------+-------+----------+------------------+ | 1 | widget | 18.99 | 1 | Florence | 1 | | 2 | gizmo | 14.49 | 4 | Betty | 2 | | 3 | thingamajig | 39.99 | 3 | Diana | 3 | | 4 | doodad | 11.50 | 2 | Mary | 4 | | 5 | whatzit | 29.99 | | | | +-----------+-------------+-------+-------+----------+------------------+

      Then, it cuts any rows that don’t have a match and rearranges the columns based on their order in the SELECT clause, drops any columns that weren’t specified, resorts the rows, and returns the final result set:

      JOIN Process Example

      +----------+-------------+-------+ | empName | productName | price | +----------+-------------+-------+ | Florence | widget | 18.99 | | Mary | doodad | 11.50 | | Diana | thingamajig | 39.99 | | Betty | gizmo | 14.49 | +----------+-------------+-------+ 4 rows in set (0.00 sec)

      Using equi joins is the most common way to join tables, but it’s possible to use other SQL operators such as <, >, LIKE, NOT LIKE, or even BETWEEN in ON clause search conditions. Be aware, though, that using more complicated search conditions can make it difficult to predict what data will appear in the result set.

      In most implementations, you can join tables with any set of columns that have what the SQL standard refers to as “JOIN eligible” data type. This means that, in general, it’s possible to join a column that holds numeric data with any other column that holds numeric data, regardless of their respective data types. Likewise, it’s usually possible to join any column that holds character values with any other column holding character data. As stated previously, though, the columns you match to join two tables will typically be ones that already signify a relationship between the tables, like a foreign key and the primary key of another table that it references.

      Many SQL implementations also allow you to join columns that have the same name with the USING keyword instead of ON. This is how the syntax for such an operation might look:

      • SELECT table1.column1, table2.column2
      • FROM table1 JOIN table2
      • USING (related_column);

      In this example syntax, the USING clause is equivalent to ON table1.related_column = table2.related_column;.

      Because sales and products each have a column named productID, you can join them by matching these columns with the USING keyword. The following command does this, and returns the saleID of each sale, the quantity of units sold, the name of each product sold, and its price. Also, it sorts the result set in ascending order based on the saleID value:

      • SELECT sales.saleID, sales.quantity, products.productName, products.price
      • FROM sales JOIN products
      • USING (productID)
      • ORDER BY saleID;

      Output

      +--------+----------+-------------+-------+ | saleID | quantity | productName | price | +--------+----------+-------------+-------+ | 1 | 7 | widget | 18.99 | | 2 | 10 | whatzit | 29.99 | | 3 | 8 | gizmo | 14.49 | | 4 | 1 | thingamajig | 39.99 | | 5 | 5 | widget | 18.99 | | 6 | 1 | whatzit | 29.99 | | 7 | 3 | widget | 18.99 | | 8 | 4 | whatzit | 29.99 | +--------+----------+-------------+-------+ 8 rows in set (0.00 sec)

      When joining tables, the database system will sometimes rearrange rows in ways that aren’t easy to predict. Including an ORDER BY clause like this can help make result sets more coherent and readable.

      Joining More than Two Tables

      There may be times when you need to combine data from more than just two tables. You can join any number of tables together by embedding JOIN clauses within other JOIN clauses. The following syntax is an example of how this can look when joining three tables:

      • SELECT table1.column1, table2.column2, table3.column3
      • FROM table1 JOIN table2
      • ON table1.related_column = table2.related_column
      • JOIN table3
      • ON table3.related_column = table1_or_2.related_column;

      This example syntax’s FROM clause starts by joining table1 with table2. After this joining’s ON clause, it starts a second JOIN that combines the initial set of joined tables with table3. Note that the third table can be joined to a column in either the first or second table.

      To illustrate, imagine that you want to know how much revenue your employee’s sales have brought in, but you only care about sales records that involve an employee selling the product they specialize in.

      To get this information, you could run the following query. This query starts by joining the products and sales tables together by matching their respective productID columns. It then joins the team table to the first two by matching each row in the initial JOIN to its productSpecialty column. The query then filters the results with a WHERE clause to only return rows where the matched employee was also the person who made the sale. This query also includes an ORDER BY clause that sorts the final results in ascending order based on the value in the saleID column:

      • SELECT sales.saleID,
      • team.empName,
      • products.productName,
      • (sales.quantity * products.price)
      • FROM products JOIN sales
      • USING (productID)
      • JOIN team
      • ON team.productSpecialty = sales.productID
      • WHERE team.empID = sales.salesperson
      • ORDER BY sales.saleID;

      Note that among the columns listed in this query’s SELECT clause is an expression that multiplies the values in the sales table’s quantity column by the products table’s price values. It returns the products of these values in the matched rows:

      Output

      +--------+----------+-------------+-----------------------------------+ | saleID | empName | productName | (sales.quantity * products.price) | +--------+----------+-------------+-----------------------------------+ | 1 | Florence | widget | 132.93 | | 3 | Betty | gizmo | 115.92 | | 4 | Diana | thingamajig | 39.99 | +--------+----------+-------------+-----------------------------------+ 3 rows in set (0.00 sec)

      All the examples so far have featured the same type of JOIN clause: the INNER JOIN. For an overview of INNER joins, OUTER joins, and how they differ, continue reading the next section.

      Inner vs. Outer JOIN Operations

      There are two main types of JOIN clauses: INNER joins and OUTER joins. The difference between these two types of joins has to do with what data they return. INNER join operations return only matching rows from each joined table, while OUTER joins return both matching and non-matching rows.

      The example syntaxes and queries from the previous sections all used INNER JOIN clauses even though none of them include the INNER keyword. Most SQL implementations treat any JOIN clause as an INNER join unless explicitly stated otherwise.

      Queries that specify an OUTER JOIN combine multiple tables and return any rows that match as well as rows that do not match. This can be useful for finding rows with missing values, or in cases where partial matches are acceptable.

      OUTER join operations can be further divided into three types: LEFT OUTER joins, RIGHT OUTER joins, and FULL OUTER joins. LEFT OUTER joins, or just LEFT joins, return every matching row from the two joined tables, as well as every non-matching row from the “left” table. In the context of JOIN operations, the “left” table is always the first table specified immediately after the FROM keyword and to the left of the JOIN keyword. Likewise, the “right” table is the second table, or the one immediately following JOIN, and RIGHT OUTER joins return every matching row from the joined tables along with every non-matching row from the “right” table. A FULL OUTER JOIN returns every row from both tables, including any rows from either table that don’t have matches.

      To illustrate how these different types of JOIN clauses return data, run the following example queries on the tables created in the previous Connecting to and Setting up a Sample Database subsection. These queries are identical except that each specifies a different type of JOIN clause.

      This first example uses an INNER JOIN to combine the sales and team tables together by matching their respective salesperson and empID columns. Again, the INNER keyword is implied even though it’s not explicitly included:

      • SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
      • FROM sales JOIN team
      • ON sales.salesperson = team.empID;

      Because this query uses an INNER JOIN clause, it only returns matching rows from both tables:

      Output

      +--------+----------+-------------+----------+ | saleID | quantity | salesperson | empName | +--------+----------+-------------+----------+ | 1 | 7 | 1 | Florence | | 4 | 1 | 3 | Diana | | 5 | 5 | 3 | Diana | | 2 | 10 | 4 | Betty | | 3 | 8 | 4 | Betty | +--------+----------+-------------+----------+ 5 rows in set (0.00 sec)

      This version of the query uses a LEFT OUTER JOIN clause instead:

      • SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
      • FROM sales LEFT OUTER JOIN team
      • ON sales.salesperson = team.empID;

      Like the previous query, this one also returns every matching value from both tables. However, it also returns any values from the “left” table (in this case, sales) that don’t have matches in the “right” table (team). Because these rows in the left table don’t have matches in the right, the unmatched values are returned as NULL:

      Output

      +--------+----------+-------------+----------+ | saleID | quantity | salesperson | empName | +--------+----------+-------------+----------+ | 1 | 7 | 1 | Florence | | 2 | 10 | 4 | Betty | | 3 | 8 | 4 | Betty | | 4 | 1 | 3 | Diana | | 5 | 5 | 3 | Diana | | 6 | 1 | NULL | NULL | | 7 | 3 | NULL | NULL | | 8 | 4 | NULL | NULL | +--------+----------+-------------+----------+ 8 rows in set (0.00 sec)

      This next version of the query instead uses a RIGHT JOIN clause:

      • SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
      • FROM sales RIGHT JOIN team
      • ON sales.salesperson = team.empID;

      Notice that this query’s JOIN clause reads RIGHT JOIN instead of RIGHT OUTER JOIN. Similarly to how the INNER keyword isn’t required to specify an INNER JOIN clause, OUTER is implied any time you write LEFT JOIN or RIGHT JOIN.

      This query’s result is the opposite of the previous one in that it returns every row from both tables, but only the unmatched rows from the “right” table:

      Output

      +--------+----------+-------------+----------+ | saleID | quantity | salesperson | empName | +--------+----------+-------------+----------+ | 1 | 7 | 1 | Florence | | NULL | NULL | NULL | Mary | | 4 | 1 | 3 | Diana | | 5 | 5 | 3 | Diana | | 2 | 10 | 4 | Betty | | 3 | 8 | 4 | Betty | +--------+----------+-------------+----------+ 6 rows in set (0.00 sec)

      Note: Be aware that MySQL doesn’t support FULL OUTER JOIN clauses. To illustrate what data this query would return if it used a FULL OUTER JOIN clause, here’s what the result set would look like on a PostgreSQL database:

      • SELECT sales.saleID, sales.quantity, sales.salesperson, team.empName
      • FROM sales FULL OUTER JOIN team
      • ON sales.salesperson = team.empID;

      Output

      saleid | quantity | salesperson | empname --------+----------+-------------+---------- 1 | 7 | 1 | Florence 2 | 10 | 4 | Betty 3 | 8 | 4 | Betty 4 | 1 | 3 | Diana 5 | 5 | 3 | Diana 6 | 1 | | 7 | 3 | | 8 | 4 | | | | | Mary (9 rows)

      As this output indicates, the FULL JOIN returns every row in both tables including the unmatched ones.

      Aliasing Table and Column Names in JOIN Clauses

      When joining tables with long or highly descriptive names, having to write multiple fully qualified column references can become tedious. To avoid this, users sometimes find it helpful to provide table or column names with shorter aliases.

      You can do this in SQL by following any table definition in the FROM clause with the AS keyword, and then following that with an alias of your choice:

      • SELECT t1.column1, t2.column2
      • FROM table1 AS t1 JOIN table2 AS t2
      • ON t1.related_column = t2.related_column;

      This example syntax uses aliases in the SELECT clause even though they aren’t defined until the FROM clause. This is possible because, in SQL queries, the order of execution begins with the FROM clause. This can be confusing, but it’s helpful to remember this and think of your aliases before you begin writing the query.

      As an example, run the following query which joins the sales and products tables and provides them with the aliases S and P, respectively:

      • SELECT S.saleID, S.quantity,
      • P.productName,
      • (P.price * S.quantity) AS revenue
      • FROM sales AS S JOIN products AS P
      • USING (productID);

      Notice that this example also creates a third alias, revenue, for the product of the values in the sales table’s quantity column and their matching values from the products table’s price column. This is only apparent in the column name in the result set, but providing aliases like this can be helpful for conveying the meaning or purpose behind query results:

      Output

      +--------+----------+-------------+---------+ | saleID | quantity | productName | revenue | +--------+----------+-------------+---------+ | 1 | 7 | widget | 132.93 | | 2 | 10 | whatzit | 299.90 | | 3 | 8 | gizmo | 115.92 | | 4 | 1 | thingamajig | 39.99 | | 5 | 5 | widget | 94.95 | | 6 | 1 | whatzit | 29.99 | | 7 | 3 | widget | 56.97 | | 8 | 4 | whatzit | 119.96 | +--------+----------+-------------+---------+ 8 rows in set (0.00 sec)

      Note that when defining an alias the AS keyword is technically optional. The previous example could also be written like this:

      • SELECT S.saleID, S.quantity, P.productName, (P.price * S.quantity) revenue
      • FROM sales S JOIN products P
      • USING (productID);

      Even though the AS keyword isn’t needed to define an alias, it’s considered a good practice to include it. Doing so can help keep the query’s purpose clear and improve its readability.

      Conclusion

      By reading this guide, you learned how to use JOIN operations to combine separate tables into a single query result set. While the commands shown here should work on most relational databases, be aware that every SQL database uses its own unique implementation of the language. You should consult your DBMS’s official documentation for a more complete description of each command and their full sets of options.

      If you’d like to learn more about working with SQL, we encourage you to check out the other tutorials in this series on How To Use SQL.



      Source link