One place for hosting & domains

      How to Create a DigitalOcean Droplet from an Ubuntu ISO Format Image


      DigitalOcean’s Custom Images feature allows you to bring your virtual disk images from an on-premise environment or another cloud platform to DigitalOcean and use them to start DigitalOcean Droplets.

      As described in the Custom Images documentation, the following image types are supported natively by the Custom Images upload tool:

      ISO is another popular image format which you may want to use with Custom Images. ISO images are frequently provided by Linux distributions as a convenient method for installing Linux. Unfortunately, ISO images aren’t currently supported by the upload tool, although support is planned for the end of 2018.

      In this tutorial, we’ll demonstrate how to use the free and open-source VirtualBox virtualization tool to create a DigitalOcean-compatible VDI image (VirtualBox Disk Image) from an Ubuntu 18.04 ISO. The steps in this guide can be adapted to work with your preferred distribution’s ISO images.


      Before you begin, you’ll need the following available to you:

      If you’re adapting these steps for another distribution’s ISO and your image does not have cloud-init installed and configured, you must install and configure it manually after installing the OS.

      Once you have these prerequisites available to you, you’re ready to begin with this guide.

      Step 1 — Installing VirtualBox and Creating a Virtual Machine

      The tool we’ll use to convert the ISO-format image in this guide is VirtualBox, a free and open-source virtualizer for x86 hardware. By default, VirtualBox uses a GUI, which we’ll use to create the VDI image in this guide.

      To begin, download and install VirtualBox from the downloads page. Follow the appropriate link in the VirtualBox 5.2.20 platform packages section depending on your host operating system. In this guide, we’ll be using an OSX system, so we’ll download and install VirtualBox using the provided DMG.

      Once you’ve installed VirtualBox, open the application.

      You should see the following welcome screen:

      VirtualBox Welcome Screen

      Click on New to begin creating your Ubuntu virtual machine.

      The following window should pop up, allowing you to name your virtual machine (VM) and select its OS:

      Name Virtual Machine Window

      In this tutorial, we’ll name our VM Ubuntu 18.04, but feel free to give the VM a more descriptive name.

      For Type, select Linux, and for Version, select Ubuntu (64-bit). Then, hit Continue.

      The following screen should appear, allowing you to specify how much memory to allocate to your virtual machine:

      Allocate Memory Window

      Unless you have a more complex use case, 1024 MB should be enough memory for your virtual machine. If you need to adjust memory size, enter the amount of memory to be allocated to the VM, then hit Continue.

      You should see the following screen:

      Create Hard Disk Window

      This window allows you to create a virtual hard disk for your VM. This virtual hard disk is the image that you’ll upload to DigitalOcean in a later step. The Ubuntu operating system will be installed from the ISO you downloaded to this virtual hard disk. Make sure Create a virtual hard disk now is selected, and hit Create.

      The following Hard disk file type window should appear, allowing you to select the format you’d like to use for your image:

      Select Hard Disk Type Window

      All three types are supported by DigitalOcean Custom Images, so unless you have a strong preference, select VDI (VirtualBox Disk Image). Hit Continue.

      You should then see the following window:

      Hard Disk Options

      This window allows you to choose between a Dynamically allocated or Fixed size hard disk file. We’ll use the default Dynamically allocated option and allow the file to grow as we install the Ubuntu OS and packages. Hit Continue.

      The next window allows you to name your hard disk file (as well as choose the path to which it will be saved), and specify its maximum size:

      Hard Disk Size

      Be sure to give yourself enough disk space to install the operating system as well as additional packages you may need. The default 10 GB should be fine for most purposes, but if you anticipate installing a large number of packages or storing a lot of data in the image, you should bump this up to your anticipated disk usage.

      Once you’ve selected the size of the virtual hard disk, hit Create.

      At this point, you’ll be returned to the initial welcome screen, where you’ll see the virtual machine you just created:

      VM Welcome Screen

      We can now begin installing Ubuntu onto the virtual machine.

      Step 2 — Installing Ubuntu 18.04 onto the Virtual Machine

      In this step we’ll install and configure the Ubuntu operating system onto our virtual machine.

      To begin, from the VirtualBox welcome screen, select your virtual machine, and hit the Start button in the toolbar.

      You should see the following virtual machine window, prompting you to select the ISO file from which you’ll boot the system:

      Select ISO

      Select the Ubuntu 18.04 Server ISO you downloaded, and hit Start.

      In the VM, the Ubuntu installer will begin booting from the ISO, and you should be brought to the following menu:

      Ubuntu Select Language

      Choose your preferred language using the arrow keys, and hit ENTER to continue.

      You should then see the following Keyboard configuration screen:

      Ubuntu Keyboard Config

      Choose your preferred keyboard configuration, select Done, and hit ENTER.

      Next, you’ll be brought to the following installer selection screen:

      Ubuntu Installer Selection

      Select Install Ubuntu, and hit ENTER.

      The following Network connections screen should appear:

      Ubuntu Network connections

      This screen allows you to configure the network interfaces for your Ubuntu server. Since we’re performing the installation on a virtual machine, we’ll just use the default option as the configured interface will be overwritten when we launch the image on the DigitalOcean platform.

      Select Done and hit ENTER.

      You’ll then be brought to the following Configure proxy screen:

      Ubuntu Configure Proxy

      If you require a proxy, enter it here. Then, select Done, and hit ENTER.

      The next screen will allow you to choose an Ubuntu archive mirror:

      Ubuntu Archive Mirror

      Unless you require a specific mirror, the default should be fine here. Select Done and hit ENTER.

      Next, you’ll be prompted to partition your virtual disk:

      Ubuntu Partition Disk

      Unless you’d like to set up Logical Volume Manager (LVM) or manually partition the virtual disk, select Use An Entire Disk to use the entire attached virtual disk, and hit ENTER.

      The following screen allows you to select the virtual disk that will be partitioned:

      Ubuntu Filesystem setup

      As described in the prompt text, the installer will create a partition for the bootloader, and use the remaining virtual disk space to create an ext4 partition to which the Ubuntu OS will be installed.

      Select the attached virtual disk and hit ENTER.

      The following screen displays a summary of the filesystem installer options before partitioning:

      Ubuntu Filesystem Summary

      The ext4 partition will be mounted to /, and a second partition (1 MB) will be created for the GRUB bootloader. Once you’ve gone over and confirmed the partitioning scheme for your virtual disk, select Done and hit ENTER.

      In the confirmation screen that appears, select Continue and hit ENTER.

      The next screen will allow you to configure the system hostname, as well as an Ubuntu user:

      Ubuntu Create User

      Note that as you fill out this screen, the installer will continue copying files to the virtual disk in the background.

      In this tutorial, we’ll create a user named sammy and call our server ubuntu. The server name will likely be overwritten when this image is run on the DigitalOcean platform, so feel free to give it a temporary name here.

      You can upload your SSH keys to DigitalOcean and automatically embed them into created Droplets, so for now we won’t Import SSH identity. To learn how to upload your SSH keys to DigitalOcean, consult the Droplet Product Documentation.

      Once you’ve filled in all the required fields, the prompt should look something like this:

      Ubuntu Profile Complete

      Select Done and hit ENTER.

      The next screen will prompt you to select popular snaps for your Ubuntu server. Snaps are prepackaged bundles of software that contain an application, its dependencies, and configuration. To learn more about snaps, consult the Snap Documentation.

      Ubuntu Select Snaps

      In this guide we won’t install any snaps and will manually install packages in a later step. If you’d like to install a snap, select or deselect it using SPACE and scroll down to Done. Then, hit ENTER.

      Regardless of your selection in the snap screen, you’ll then be brought to an installation progress and summary screen:

      Ubuntu Install Progress

      Once the installation completes, select Reboot Now and hit ENTER.

      The installer will shut down and prompt you to remove the installation medium (in this case this is the ISO image we selected earlier). In most cases, the ISO will be detached automatically upon reboot, so you can simply hit ENTER.

      To double check, in the VirtualBox GUI menu, navigate to Devices, and then Optical Drives. If the Remove disk from virtual drive option is available to you, click on it to detach the ISO from the virtual machine. Then, back in the virtual machine window, hit ENTER.

      The system will reboot in the virtual machine, this time from the virtual disk to which we installed Ubuntu.

      Since cloud-init is installed by default on Ubuntu 18.04 Server, the first time Ubuntu boots, cloud-init will run and configure itself. In the virtual machine window, you should see some cloud-init log items and have a prompt available to you. Hit ENTER.

      You can then log in to your Ubuntu server using the user you created in the installer.

      Enter your username and hit ENTER, then enter your password and hit ENTER.

      You should now have access to a command prompt, indicating that you’ve successfully completed the Ubuntu 18.04 installation, and are now logged in as the user you created previously.

      In the next step of this guide, we’ll reconfigure cloud-init and set it up to run when the Ubuntu image is launched as a Droplet on the DigitalOcean platform.

      Step 3 — Reconfiguring cloud-init

      Now that we’ve installed Ubuntu 18.04 to a virtual disk and have the system up and running, we need to reconfigure cloud-init to use the appropriate datasource for the DigitalOcean platform. A cloud-init datasource is a source of config data for cloud-init that typically consists of userdata (like shell scripts) or server metadata, like hostname, instance-id, etc. To learn more about cloud-init datasources, consult the official cloud-init docs.

      By default, on Ubuntu 18.04, cloud-init configures itself to use the DataSourceNoCloud datasource. This will cause problems when running the image on DigitalOcean, so we need to reconfigure cloud-init to use the ConfigDrive datasource and ensure that cloud-init reruns when the image is launched on DigitalOcean.

      To begin, ensure that you’ve started your Ubuntu 18.04 virtual machine and have logged in as the user you created earlier.

      From the command line, navigate to the /etc/cloud/cloud.cfg.d directory:

      • cd /etc/cloud/cloud.cfg.d

      Use the ls command to list the cloud-init config files present in the directory:


      05_logging.cfg 50-curtin-networking.cfg 90_dpkg.cfg curtin-preserve-sources.cfg README

      First, delete the 50-curtin-networking.cfg file, which configures networking interfaces for your Ubuntu server. When the image is launched on DigitalOcean, cloud-init will run and reconfigure these interfaces automatically. If this file is not deleted, the DigitalOcean Droplet created from this Ubuntu image will have its interfaces misconfigured and won't be accessible from the internet.

      • sudo rm 50-curtin-networking.cfg

      Next, we'll run dpkg-reconfigure cloud-init to remove the NoCloud datasource, ensuring that cloud-init searches for and finds the ConfigDrive datasource used on DigitalOcean:

      • sudo dpkg-reconfigure cloud-init

      You should see the following graphical menu:

      Cloud Init dpkg Menu

      The NoCloud datasource is initially highlighted. Press SPACE to unselect it, then hit ENTER.

      Finally, navigate to /etc/netplan:

      Remove the 50-cloud-init.yaml file (this was generated from the cloud-init networking file we removed earlier):

      • sudo rm 50-cloud-init.yaml

      The final step is ensuring that we clean up configuration from the initial cloud-init run so that it reruns when the image is launched on DigitalOcean.

      To do this, run cloud-init clean:

      At this point, your image is ready to be launched on the DigitalOcean platform. You can install additional packages and software into your image. Once you're done, shutdown your virtual machine:

      We can now move on to uploading and launching this custom image on the DigitalOcean platform.

      Step 4 — Uploading Custom Image and Creating Droplet

      Now that we've created an Ubuntu 18.04 VDI image and configured it for use on DigitalOcean, we can upload it using the Custom Images upload tool.

      On macOS, the Ubuntu virtual disk image we created and configured will be located by default at ~/VirtualBox VMs/your_VM_name/your_virtual_disk_name.vdi. This path may vary slightly depending on the OS you're using with VirtualBox.

      Before we upload the image, we'll compress it to speed up the file transfer to DigitalOcean.

      On your host OS (not inside the virtual machine), navigate to the directory containing your VDI image file:

      • cd ~/VirtualBox VMs/Ubuntu 18.04/

      Now, use gzip to compress the file:

      • gzip < Ubuntu 18.04.vdi > Ubuntu 18.04.gz

      In this command we pipe the source Ubuntu 18.04.vdi file into gzip, specifying as output the Ubuntu 18.04.gz compressed file.

      Once gzip finishes compressing your file, upload the .gz file to DigitalOcean, following instructions in the Custom Images Quickstart.

      You should now be able to create and use Droplets from your custom Ubuntu 18.04 Server image.


      In this tutorial, we learned how to create a custom VDI image from a vanilla Ubuntu 18.04 ISO using the VirtualBox virtualization tool. We adjusted cloud-init so it can properly configure Droplet networking on DigitalOcean, and finally compressed and uploaded the image using the Custom Images upload tool.

      You can adjust the steps in this tutorial to work with your preferred Linux distribution’s ISO images. Ensure that you have an SSH server installed and configured to start on boot, and that cloud-init has been installed and properly configured to use the ConfigDrive datasource. Finally, ensure that any stale networking configuration files have been purged.

      You may also wish to use a tool like Packer to automate the creation of your machine images.

      To learn more about DigitalOcean Custom Images, consult the Custom Images product docs and launch blog post.

      Source link

      How to Use Block Storage with Your Linode

      Updated by Linode

      Written by Linode

      Use promo code DOCS10 for $10 credit on a new account.

      How to Use Block Storage with Your Linode

      Linode’s Block Storage service allows you to attach additional storage volumes to your Linode. A single volume can range from 10 GiB to 10,000 GiB in size and costs $0.10/GiB per month. They can be partitioned however you like and can accommodate any filesystem type you choose. Up to eight volumes can be attached to a single Linode, be it new or already existing, so you do not need to recreate your server to add a Block Storage Volume.

      The Block Storage service is currently available in the Dallas, Fremont, Frankfurt, London, Newark, and Singapore data centers.


      • Linode’s backup services do not cover Block Storage Volumes. You must execute your own backups for this data.

      • Your Linode must be running in Paravirtualization mode. Block storage currently does not support Full-virtualization.

      How to Add a Block Storage Volume to a Linode

      This guide assumes a Linode with the root disk mounted as /dev/sda and swap space mounted as /dev/sdb. In this scenario, the Block Storage Volume will be available to the operating system as /dev/disk/by-id/scsi-0Linode_Volume_EXAMPLE, where EXAMPLE is a label you assign the volume in the Linode Manager. Storage volumes can be added when your Linode is already running, and will show immediately in /dev/disk/by-id/.

      Add a Volume from the Linode Dashboard

      1. Go to the page of the Linode to which you want to attach a Block Storage Volume.

        Select a Linode from the Manager

      2. Click on the Volumes tab, then click Add a Volume:

        Click Add a Volume

      3. Assign the Block Storage Volume a label and size. The label can be up to 32 characters long and consist only of ASCII characters a-z; 0-9.-_. The maximum volume size is 10,000 GiB. When finished, click Submit:

        Create a Volume with a label.


        There is currently a soft limit of 100 TB of Block Storage Volume per account.

      4. Once you add a volume it will appear under Attached Volumes with the new volume’s label, size, and file system path.

        A Volume has been created

      5. You’ll need to create a filesystem in your new volume. If your Linode is not already running, boot then SSH into your Linode and execute the following command, where FILE_SYSTEM_PATH is your volume’s file system path:

        mkfs.ext4 FILE_SYSTEM_PATH
      6. Once the volume has a filesystem, you can create a mountpoint for it:

        mkdir /mnt/BlockStorage1
      7. You can then mount the new volume:

        mount FILE_SYSTEM_PATH /mnt/BlockStorage1
      8. If you want to mount the new volume automatically every time your Linode boots, you’ll want to add the following line to your /etc/fstab file:

        FILE_SYSTEM_PATH /mnt/BlockStorage1 ext4 defaults 0 2

      Attach a Volume from Your Account’s Volume List

      1. Click on the Volumes page of the Linode Manager to see your account’s volume list:

        View your available Volumes

      2. Click the menu option (three dots) for the volume you want to attach to a Linode and select Attach:

        Open volume menu.

      3. Select the label of the Linode you want to attach the volume to from the dropdown menu, then click Save:

        Attach a Volume to a Linode


        The Linodes available in this dropdown menu all share the same region as your volume.

      4. You’ll need to create a filesystem in your new volume. If your Linode is not already running, boot then SSH into your Linode and execute the following command, where FILE_SYSTEM_PATH is your volume’s file system path:

        mkfs.ext4 FILE_SYSTEM_PATH
      5. Once the volume has a filesystem, you can create a mountpoint for it:

        mkdir /mnt/BlockStorage1
      6. You can then mount the new volume, where FILE_SYSTEM_PATH is your volume’s file system path:

        mount FILE_SYSTEM_PATH /mnt/BlockStorage1
      7. If you want to mount the new volume automatically every time your Linode boots, you’ll want to add the following line to your /etc/fstab file:

        FILE_SYSTEM_PATH /mnt/BlockStorage1

      How to Detach a Block Storage Volume from a Linode

      1. Go back to the page of the Linode which the volume is attached to. Shut down the Linode.

      2. When the Linode is powered off, click on the Volumes tab, then click Detach under the volume’s menu (three dots):

        Detach a Volume from a Linode from the Volume menu.

      3. A confirmation screen appears and explains that the volume will be detached from the Linode. Click Detach to confirm:

        Linode Manager detach volume confirmation

        The Linode’s dashboard does not show the volume present anymore:

        The Linode's Volumes tab shows no attached volumes.

        The volume still exists on your account and you can see it if you view the Volumes page:

        Volume not attached, but still exists.

      How to Delete a Block Storage Volume


      The removal process is irreversible, and the data will be permanently deleted.

      1. Shut down the attached Linode.

      2. Detach the volume as described above.

      3. Click the volume’s Delete option on the Volumes page.

        Delete a Volume

      How to Resize a Block Storage Volume

      Storage volumes cannot be sized down, only up. Keep this in mind when sizing your volumes.

      1. Shut down your Linode.

      2. Click the Resize option for the volume you want to resize.

        Select Resize from the Volume menu.

      3. Enter the new volume size. The minimum size is 10 GiB and maximum is 10,000 GiB. Then click Submit.

        Resize Volume menu.

      4. You’ll be returned to the volume list and the notification bell in the top right of the page will notify you when the resizing is complete.

        Notification bell shows the Volume has been resized.

      5. Reboot your Linode.

      6. Once your Linode has restarted, make sure the volume is unmounted for safety:

        umount /dev/disk/by-id/scsi-0Linode_Volume_BlockStorage1
      7. Assuming you have an ext2, ext3, or ext4 partition, resize it to fill the new volume size:

        resize2fs /dev/disk/by-id/scsi-0Linode_Volume_BlockStorage1
      8. Mount it back onto the filesystem:

        mount /dev/disk/by-id/scsi-0Linode_Volume_BlockStorage1 /mnt/BlockStorage1

      Where to Go From Here?

      Need ideas for what to do with space? We have several guides which walk you through installing software that would make a great pairing with large storage volumes:

      Install Seafile with NGINX on Ubuntu 16.04

      Install Plex Media Server on Ubuntu 16.04

      Big Data in the Linode Cloud: Streaming Data Processing with Apache Storm

      Using Subsonic to Stream Media From Your Linode

      Install GitLab on Ubuntu 14.04

      Join our Community

      Find answers, ask questions, and help others.

      This guide is published under a CC BY-ND 4.0 license.

      Source link

      An Introduction to Queries in PostgreSQL


      Databases are a key component of many websites and applications, and are at the core of how data is stored and exchanged across the internet. One of the most important aspects of database management is the practice of retrieving data from a database, whether it’s on an ad hoc basis or part of a process that’s been coded into an application. There are several ways to retrieve information from a database, but one of the most commonly-used methods is performed through submitting queries through the command line.

      In relational database management systems, a query is any command used to retrieve data from a table. In Structured Query Language (SQL), queries are almost always made using the SELECT statement.

      In this guide, we will discuss the basic syntax of SQL queries as well as some of the more commonly-employed functions and operators. We will also practice making SQL queries using some sample data in a PostgreSQL database.

      PostgreSQL, often shortened to “Postgres,” is a relational database management system with an object-oriented approach, meaning that information can be represented as objects or classes in PostgreSQL schemas. PostgreSQL aligns closely with standard SQL, although it also includes some features not found in other relational database systems.


      In general, the commands and concepts presented in this guide can be used on any Linux-based operating system running any SQL database software. However, it was written specifically with an Ubuntu 18.04 server running PostgreSQL in mind. To set this up, you will need the following:

      With this setup in place, we can begin the tutorial.

      Creating a Sample Database

      Before we can begin making queries in SQL, we will first create a database and a couple tables, then populate these tables with some sample data. This will allow you to gain some hands-on experience when you begin making queries later on.

      For the sample database we’ll use throughout this guide, imagine the following scenario:

      You and several of your friends all celebrate your birthdays with one another. On each occasion, the members of the group head to the local bowling alley, participate in a friendly tournament, and then everyone heads to your place where you prepare the birthday-person’s favorite meal.

      Now that this tradition has been going on for a while, you’ve decided to begin tracking the records from these tournaments. Also, to make planning dinners easier, you decide to create a record of your friends’ birthdays and their favorite entrees, sides, and desserts. Rather than keep this information in a physical ledger, you decide to exercise your database skills by recording it in a PostgreSQL database.

      To begin, open up a PostgreSQL prompt as your postgres superuser:

      Note: If you followed all the steps of the prerequisite tutorial on Installing PostgreSQL on Ubuntu 18.04, you may have configured a new role for your PostgreSQL installation. In this case, you can connect to the Postgres prompt with the following command, substituting sammy with your own username:

      Next, create the database by running:

      • CREATE DATABASE birthdays;

      Then select this database by typing:

      Next, create two tables within this database. We'll use the first table to track your friends' records at the bowling alley. The following command will create a table called tourneys with columns for the name of each of your friends, the number of tournaments they've won (wins), their all-time best score, and what size bowling shoe they wear (size):

      • CREATE TABLE tourneys (
      • name varchar(30),
      • wins real,
      • best real,
      • size real
      • );

      Once you run the CREATE TABLE command and populate it with column headings, you’ll receive the following output:



      Populate the tourneys table with some sample data:

      • INSERT INTO tourneys (name, wins, best, size)
      • VALUES ('Dolly', '7', '245', '8.5'),
      • ('Etta', '4', '283', '9'),
      • ('Irma', '9', '266', '7'),
      • ('Barbara', '2', '197', '7.5'),
      • ('Gladys', '13', '273', '8');

      You’ll receive the following output:


      INSERT 0 5

      Following this, create another table within the same database which we'll use to store information about your friends' favorite birthday meals. The following command creates a table named dinners with columns for the name of each of your friends, their birthdate, their favorite entree, their preferred side dish, and their favorite dessert:

      • CREATE TABLE dinners (
      • name varchar(30),
      • birthdate date,
      • entree varchar(30),
      • side varchar(30),
      • dessert varchar(30)
      • );

      Similarly for this table, you’ll receive feedback verifying that the table was created:



      Populate this table with some sample data as well:

      • INSERT INTO dinners (name, birthdate, entree, side, dessert)
      • VALUES ('Dolly', '1946-01-19', 'steak', 'salad', 'cake'),
      • ('Etta', '1938-01-25', 'chicken', 'fries', 'ice cream'),
      • ('Irma', '1941-02-18', 'tofu', 'fries', 'cake'),
      • ('Barbara', '1948-12-25', 'tofu', 'salad', 'ice cream'),
      • ('Gladys', '1944-05-28', 'steak', 'fries', 'ice cream');


      INSERT 0 5

      Once that command completes successfully, you're done setting up your database. Next, we'll go over the basic command structure of SELECT queries.

      Understanding SELECT Statements

      As mentioned in the introduction, SQL queries almost always begin with the SELECT statement. SELECT is used in queries to specify which columns from a table should be returned in the result-set. Queries also almost always include FROM, which is used to specify which table the statement will query.

      Generally, SQL queries follow this syntax:

      • SELECT column_to_select FROM table_to_select WHERE certain_conditions_apply;

      By way of example, the following statement will return the entire name column from the dinners table:

      • SELECT name FROM dinners;


      name --------- Dolly Etta Irma Barbara Gladys (5 rows)

      You can select multiple columns from the same table by separating their names with a comma, like this:

      • SELECT name, birthdate FROM dinners;


      name | birthdate ---------+------------ Dolly | 1946-01-19 Etta | 1938-01-25 Irma | 1941-02-18 Barbara | 1948-12-25 Gladys | 1944-05-28 (5 rows)

      Instead of naming a specific column or set of columns, you can follow the SELECT operator with an asterisk (*) which serves as a placeholder representing all the columns in a table. The following command returns every column from the tourneys table:


      name | wins | best | size ---------+------+------+------ Dolly | 7 | 245 | 8.5 Etta | 4 | 283 | 9 Irma | 9 | 266 | 7 Barbara | 2 | 197 | 7.5 Gladys | 13 | 273 | 8 (5 rows)

      WHERE is used in queries to filter records that meet a specified condition, and any rows that do not meet that condition are eliminated from the result. A WHERE clause typically follows this syntax:

      • . . . WHERE column_name comparison_operator value

      The comparison operator in a WHERE clause defines how the specified column should be compared against the value. Here are some common SQL comparison operators:

      Operator What it does
      = tests for equality
      != tests for inequality
      < tests for less-than
      > tests for greater-than
      <= tests for less-than or equal-to
      >= tests for greater-than or equal-to
      BETWEEN tests whether a value lies within a given range
      IN tests whether a row's value is contained in a set of specified values
      EXISTS tests whether rows exist, given the specified conditions
      LIKE tests whether a value matches a specified string
      IS NULL tests for NULL values
      IS NOT NULL tests for all values other than NULL

      For example, if you wanted to find Irma's shoe size, you could use the following query:

      • SELECT size FROM tourneys WHERE name = 'Irma';


      size ------ 7 (1 row)

      SQL allows the use of wildcard characters, and these are especially handy when used in WHERE clauses. Percentage signs (%) represent zero or more unknown characters, and underscores (_) represent a single unknown character. These are useful if you're trying to find a specific entry in a table, but aren't sure of what that entry is exactly. To illustrate, let's say that you've forgotten the favorite entree of a few of your friends, but you're certain this particular entree starts with a "t." You could find its name by running the following query:

      • SELECT entree FROM dinners WHERE entree LIKE 't%';


      entree ------- tofu tofu (2 rows)

      Based on the output above, we see that the entree we have forgotten is tofu.

      There may be times when you're working with databases that have columns or tables with relatively long or difficult-to-read names. In these cases, you can make these names more readable by creating an alias with the AS keyword. Aliases created with AS are temporary, and only exist for the duration of the query for which they're created:

      • SELECT name AS n, birthdate AS b, dessert AS d FROM dinners;


      n | b | d ---------+------------+----------- Dolly | 1946-01-19 | cake Etta | 1938-01-25 | ice cream Irma | 1941-02-18 | cake Barbara | 1948-12-25 | ice cream Gladys | 1944-05-28 | ice cream (5 rows)

      Here, we have told SQL to display the name column as n, the birthdate column as b, and the dessert column as d.

      The examples we've gone through up to this point include some of the more frequently-used keywords and clauses in SQL queries. These are useful for basic queries, but they aren't helpful if you're trying to perform a calculation or derive a scalar value (a single value, as opposed to a set of multiple different values) based on your data. This is where aggregate functions come into play.

      Aggregate Functions

      Oftentimes, when working with data, you don't necessarily want to see the data itself. Rather, you want information about the data. The SQL syntax includes a number of functions that allow you to interpret or run calculations on your data just by issuing a SELECT query. These are known as aggregate functions.

      The COUNT function counts and returns the number of rows that match a certain criteria. For example, if you'd like to know how many of your friends prefer tofu for their birthday entree, you could issue this query:

      • SELECT COUNT(entree) FROM dinners WHERE entree = 'tofu';


      count ------- 2 (1 row)

      The AVG function returns the average (mean) value of a column. Using our example table, you could find the average best score amongst your friends with this query:

      • SELECT AVG(best) FROM tourneys;


      avg ------- 252.8 (1 row)

      SUM is used to find the total sum of a given column. For instance, if you'd like to see how many games you and your friends have bowled over the years, you could run this query:

      • SELECT SUM(wins) FROM tourneys;


      sum ----- 35 (1 row)

      Note that the AVG and SUM functions will only work correctly when used with numeric data. If you try to use them on non-numerical data, it will result in either an error or just 0, depending on which RDBMS you're using:

      • SELECT SUM(entree) FROM dinners;


      ERROR: function sum(character varying) does not exist LINE 1: select sum(entree) from dinners; ^ HINT: No function matches the given name and argument types. You might need to add explicit type casts.

      MIN is used to find the smallest value within a specified column. You could use this query to see what the worst overall bowling record is so far (in terms of number of wins):

      • SELECT MIN(wins) FROM tourneys;


      min ----- 2 (1 row)

      Similarly, MAX is used to find the largest numeric value in a given column. The following query will show the best overall bowling record:

      • SELECT MAX(wins) FROM tourneys;


      max ----- 13 (1 row)

      Unlike SUM and AVG, the MIN and MAX functions can be used for both numeric and alphabetic data types. When run on a column containing string values, the MIN function will show the first value alphabetically:

      • SELECT MIN(name) FROM dinners;


      min --------- Barbara (1 row)

      Likewise, when run on a column containing string values, the MAX function will show the last value alphabetically:

      • SELECT MAX(name) FROM dinners;


      max ------ Irma (1 row)

      Aggregate functions have many uses beyond what was described in this section. They're particularly useful when used with the GROUP BY clause, which is covered in the next section along with several other query clauses that affect how result-sets are sorted.

      Manipulating Query Outputs

      In addition to the FROM and WHERE clauses, there are several other clauses which are used to manipulate the results of a SELECT query. In this section, we will explain and provide examples for some of the more commonly-used query clauses.

      One of the most frequently-used query clauses, aside from FROM and WHERE, is the GROUP BY clause. It's typically used when you're performing an aggregate function on one column, but in relation to matching values in another.

      For example, let's say you wanted to know how many of your friends prefer each of the three entrees you make. You could find this info with the following query:

      • SELECT COUNT(name), entree FROM dinners GROUP BY entree;


      count | entree -------+--------- 1 | chicken 2 | steak 2 | tofu (3 rows)

      The ORDER BY clause is used to sort query results. By default, numeric values are sorted in ascending order, and text values are sorted in alphabetical order. To illustrate, the following query lists the name and birthdate columns, but sorts the results by birthdate:

      • SELECT name, birthdate FROM dinners ORDER BY birthdate;


      name | birthdate ---------+------------ Etta | 1938-01-25 Irma | 1941-02-18 Gladys | 1944-05-28 Dolly | 1946-01-19 Barbara | 1948-12-25 (5 rows)

      Notice that the default behavior of ORDER BY is to sort the result-set in ascending order. To reverse this and have the result-set sorted in descending order, close the query with DESC:

      • SELECT name, birthdate FROM dinners ORDER BY birthdate DESC;


      name | birthdate ---------+------------ Barbara | 1948-12-25 Dolly | 1946-01-19 Gladys | 1944-05-28 Irma | 1941-02-18 Etta | 1938-01-25 (5 rows)

      As mentioned previously, the WHERE clause is used to filter results based on specific conditions. However, if you use the WHERE clause with an aggregate function, it will return an error, as is the case with the following attempt to find which sides are the favorite of at least three of your friends:

      • SELECT COUNT(name), side FROM dinners WHERE COUNT(name) >= 3;


      ERROR: aggregate functions are not allowed in WHERE LINE 1: SELECT COUNT(name), side FROM dinners WHERE COUNT(name) >= 3...

      The HAVING clause was added to SQL to provide functionality similar to that of the WHERE clause while also being compatible with aggregate functions. It's helpful to think of the difference between these two clauses as being that WHERE applies to individual records, while HAVING applies to group records. To this end, any time you issue a HAVING clause, the GROUP BY clause must also be present.

      The following example is another attempt to find which side dishes are the favorite of at least three of your friends, although this one will return a result without error:

      • SELECT COUNT(name), side FROM dinners GROUP BY side HAVING COUNT(name) >= 3;


      count | side -------+------- 3 | fries (1 row)

      Aggregate functions are useful for summarizing the results of a particular column in a given table. However, there are many cases where it's necessary to query the contents of more than one table. We'll go over a few ways you can do this in the next section.

      Querying Multiple Tables

      More often than not, a database contains multiple tables, each holding different sets of data. SQL provides a few different ways to run a single query on multiple tables.

      The JOIN clause can be used to combine rows from two or more tables in a query result. It does this by finding a related column between the tables and sorts the results appropriately in the output.

      SELECT statements that include a JOIN clause generally follow this syntax:

      • SELECT table1.column1, table2.column2
      • FROM table1
      • JOIN table2 ON table1.related_column=table2.related_column;

      Note that because JOIN clauses compare the contents of more than one table, the previous example specifies which table to select each column from by preceding the name of the column with the name of the table and a period. You can specify which table a column should be selected from like this for any query, although it's not necessary when selecting from a single table, as we've done in the previous sections. Let's walk through an example using our sample data.

      Imagine that you wanted to buy each of your friends a pair of bowling shoes as a birthday gift. Because the information about your friends' birthdates and shoe sizes are held in separate tables, you could query both tables separately then compare the results from each. With a JOIN clause, though, you can find all the information you want with a single query:

      • SELECT, tourneys.size, dinners.birthdate
      • FROM tourneys
      • JOIN dinners ON;


      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 (5 rows)

      The JOIN clause used in this example, without any other arguments, is an inner JOIN clause. This means that it selects all the records that have matching values in both tables and prints them to the results set, while any records that aren't matched are excluded. To illustrate this idea, let's add a new row to each table that doesn't have a corresponding entry in the other:

      • INSERT INTO tourneys (name, wins, best, size)
      • VALUES ('Bettye', '0', '193', '9');
      • INSERT INTO dinners (name, birthdate, entree, side, dessert)
      • VALUES ('Lesley', '1946-05-02', 'steak', 'salad', 'ice cream');

      Then, re-run the previous SELECT statement with the JOIN clause:

      • SELECT, tourneys.size, dinners.birthdate
      • FROM tourneys
      • JOIN dinners ON;


      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 (5 rows)

      Notice that, because the tourneys table has no entry for Lesley and the dinners table has no entry for Bettye, those records are absent from this output.

      It is possible, though, to return all the records from one of the tables using an outer JOIN clause. Outer JOIN clauses are written as either LEFT JOIN, RIGHT JOIN, or FULL JOIN.

      A LEFT JOIN clause returns all the records from the “left” table and only the matching records from the right table. In the context of outer joins, the left table is the one referenced by the FROM clause, and the right table is any other table referenced after the JOIN statement.

      Run the previous query again, but this time use a LEFT JOIN clause:

      • SELECT, tourneys.size, dinners.birthdate
      • FROM tourneys
      • LEFT JOIN dinners ON;

      This command will return every record from the left table (in this case, tourneys) even if it doesn't have a corresponding record in the right table. Any time there isn't a matching record from the right table, it's returned as a blank value or NULL, depending on your RDBMS:


      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 Bettye | 9 | (6 rows)

      Now run the query again, this time with a RIGHT JOIN clause:

      • SELECT, tourneys.size, dinners.birthdate
      • FROM tourneys
      • RIGHT JOIN dinners ON;

      This will return all the records from the right table (dinners). Because Lesley's birthdate is recorded in the right table, but there is no corresponding row for her in the left table, the name and size columns will return as blank values in that row:


      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 | | 1946-05-02 (6 rows)

      Note that left and right joins can be written as LEFT OUTER JOIN or RIGHT OUTER JOIN, although the OUTER part of the clause is implied. Likewise, specifying INNER JOIN will produce the same result as just writing JOIN.

      There is a fourth join clause called FULL JOIN available for some RDBMS distributions, including PostgreSQL. A FULL JOIN will return all the records from each table, including any null values:

      • SELECT, tourneys.size, dinners.birthdate
      • FROM tourneys
      • FULL JOIN dinners ON;


      name | size | birthdate ---------+------+------------ Dolly | 8.5 | 1946-01-19 Etta | 9 | 1938-01-25 Irma | 7 | 1941-02-18 Barbara | 7.5 | 1948-12-25 Gladys | 8 | 1944-05-28 Bettye | 9 | | | 1946-05-02 (7 rows)

      Note: As of this writing, the FULL JOIN clause is not supported by either MySQL or MariaDB.

      As an alternative to using FULL JOIN to query all the records from multiple tables, you can use the UNION clause.

      The UNION operator works slightly differently than a JOIN clause: instead of printing results from multiple tables as unique columns using a single SELECT statement, UNION combines the results of two SELECT statements into a single column.

      To illustrate, run the following query:

      • SELECT name FROM tourneys UNION SELECT name FROM dinners;

      This query will remove any duplicate entries, which is the default behavior of the UNION operator:


      name --------- Irma Etta Bettye Gladys Barbara Lesley Dolly (7 rows)

      To return all entries (including duplicates) use the UNION ALL operator:

      • SELECT name FROM tourneys UNION ALL SELECT name FROM dinners;


      name --------- Dolly Etta Irma Barbara Gladys Bettye Dolly Etta Irma Barbara Gladys Lesley (12 rows)

      The names and number of the columns in the results table reflect the name and number of columns queried by the first SELECT statement. Note that when using UNION to query multiple columns from more than one table, each SELECT statement must query the same number of columns, the respective columns must have similar data types, and the columns in each SELECT statement must be in the same order. The following example shows what might result if you use a UNION clause on two SELECT statements that query a different number of columns:

      • SELECT name FROM dinners UNION SELECT name, wins FROM tourneys;


      ERROR: each UNION query must have the same number of columns LINE 1: SELECT name FROM dinners UNION SELECT name, wins FROM tourne...

      Another way to query multiple tables is through the use of subqueries. Subqueries (also known as inner or nested queries) are queries enclosed within another query. These are useful in cases where you're trying to filter the results of a query against the result of a separate aggregate function.

      To illustrate this idea, say you want to know which of your friends have won more matches than Barbara. Rather than querying how many matches Barbara has won then running another query to see who has won more games than that, you can calculate both with a single query:

      • SELECT name, wins FROM tourneys
      • WHERE wins > (
      • SELECT wins FROM tourneys WHERE name = 'Barbara'
      • );


      name | wins --------+------ Dolly | 7 Etta | 4 Irma | 9 Gladys | 13 (4 rows)

      The subquery in this statement was run only once; it only needed to find the value from the wins column in the same row as Barbara in the name column, and the data returned by the subquery and outer query are independent of one another. There are cases, though, where the outer query must first read every row in a table and compare those values against the data returned by the subquery in order to return the desired data. In this case, the subquery is referred to as a correlated subquery.

      The following statement is an example of a correlated subquery. This query seeks to find which of your friends have won more games than is the average for those with the same shoe size:

      • SELECT name, size FROM tourneys AS t
      • WHERE wins > (
      • SELECT AVG(wins) FROM tourneys WHERE size = t.size
      • );

      In order for the query to complete, it must first collect the name and size columns from the outer query. Then, it compares each row from that result set against the results of the inner query, which determines the average number of wins for individuals with identical shoe sizes. Because you only have two friends that have the same shoe size, there can only be one row in the result-set:


      name | size ------+------ Etta | 9 (1 row)

      As mentioned earlier, subqueries can be used to query results from multiple tables. To illustrate this with one final example, say you wanted to throw a surprise dinner for the group's all-time best bowler. You could find which of your friends has the best bowling record and return their favorite meal with the following query:

      • SELECT name, entree, side, dessert
      • FROM dinners
      • WHERE name = (SELECT name FROM tourneys
      • WHERE wins = (SELECT MAX(wins) FROM tourneys));


      name | entree | side | dessert --------+--------+-------+----------- Gladys | steak | fries | ice cream (1 row)

      Notice that this statement not only includes a subquery, but also contains a subquery within that subquery.


      Issuing queries is one of the most commonly-performed tasks within the realm of database management. There are a number of database administration tools, such as phpMyAdmin or pgAdmin, that allow you to perform queries and visualize the results, but issuing SELECT statements from the command line is still a widely-practiced workflow that can also provide you with greater control.

      If you're new to working with SQL, we encourage you to use our SQL Cheat Sheet as a reference and to review the official PostgreSQL documenation. Additionally, if you'd like to learn more about SQL and relational databases, the following tutorials may be of interest to you:

      Source link