One place for hosting & domains


      How To Perform CRUD operations in MongoDB

      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.


      MongoDB is a persistent document-oriented database used to store and process data in the form of documents. As with other database management systems, MongoDB allows you to manage and interact with data through four fundamental types of data operations:

      • Create operations, which involve writing data to the database
      • Read operations, which query a database to retrieve data from it
      • Update operations, which change data that already exists in a database
      • Delete operations, which permanently remove data from a database

      These four operations are jointly referred to as CRUD operations.

      This tutorial outlines how to create new MongoDB documents and later retrieve them to read their data. It also explains how to update the data within documents, as well as how to delete documents when they are no longer needed.


      To follow this tutorial, you will need:

      Note: The linked tutorials on how to configure your server, install, and then secure MongoDB installation refer to Ubuntu 20.04. This tutorial concentrates on MongoDB itself, not the underlying operating system. It will generally work with any MongoDB installation regardless of the operating system as long as authentication has been enabled.

      Step 1 — Connecting to the MongoDB Server

      This guide involves using the MongoDB shell to interact with MongoDB. In order to follow along and practice CRUD operations in MongoDB, you must first connect to a MongoDB database by opening up the MongoDB shell.

      If your MongoDB instance is running on a remote server, SSH into that server from your local machine:

      Then connect to your MongoDB installation by opening up the MongoDB shell. Be sure to connect as a MongoDB user with privileges to write and read data. If you followed the prerequisite MongoDB security tutorial, you can connect as the administrative user you created in Step 1 of that guide:

      • mongo -u AdminSammy -p --authenticationDatabase admin

      After providing the user’s password, your terminal prompt will change to a greater-than sign (>). This means the shell is now ready to accept commands for the MongoDB server it’s connected to.

      Note: On a fresh connection, the MongoDB shell will automatically connect to the test database by default. You can safely use this database to experiment with MongoDB and the MongoDB shell.

      Alternatively, you could also switch to another database to run all of the example commands given in this tutorial. To switch to another database, run the use command followed by the name of your database:

      Now that you have connected to the MongoDB server using a MongoDB shell, you can move on to creating new documents.

      Step 2 — Creating Documents

      In order to have data that you can practice reading, updating, and deleting in the later steps of this guide, this step focuses on how to create data documents in MongoDB.

      Imagine that you’re using MongoDB to build and manage a directory of famous historical monuments from around the world. This directory will store information like each monument’s name, country, city, and geographical location.

      The documents in this directory will follow a format similar to this example, which represents The Pyramids of Giza:

      The Pyramids of Giza

          "name": "The Pyramids of Giza",
          "city": "Giza",
          "country": "Egypt",
          "gps": {
              "lat": 29.976480,
              "lng": 31.131302

      This document, like all MongoDB documents, is written in BSON. BSON is a binary form of JSON, a human-readable data format. All data in BSON or JSON documents are represented as field-and-value pairs that take the form of field: value.

      This document consists of four fields. First is the name of the monument, followed by the city and the country. All three of these fields contain strings. The last field, called gps, is a nested document which details the monument’s GPS location. This location is made up of a pair of latitude and longitude coordinates, represented by the lat and lng fields respectively, each of which hold floating point values.

      Note: You can learn more about how MongoDB documents are structured in our conceptual article An Introduction to Document-Oriented Databases.

      Insert this document into a new collection called monuments using the insertOne method. As its name implies, insertOne is used to create individual documents, as opposed to creating multiple documents at once.

      In the MongoDB shell, run the following operation:

      • db.monuments.insertOne(
      • {
      • "name": "The Pyramids of Giza",
      • "city": "Giza",
      • "country": "Egypt",
      • "gps": {
      • "lat": 29.976480,
      • "lng": 31.131302
      • }
      • }
      • )

      Notice that you haven’t explicitly created the monuments collection before executing this insertOne method. MongoDB allows you to run commands on non-existent collections freely, and the missing collections only get created when the first object is inserted. By executing this example insertOne() method, not only will it insert the document into the collection but it will also create the collection automatically.

      MongoDB will execute the insertOne method and insert the requested document representing the Pyramids of Giza. The operation’s output will inform you that it executed successfully, and also provides the ObjectId which it generated automatically for the new document:


      { "acknowledged" : true, "insertedId" : ObjectId("6105752352e6d1ebb7072647") }

      In MongoDB, each document within a collection must have a unique _id field which acts as a primary key. You can include the _id field and provide it with a value of your own choosing, as long as you ensure each document’s _id field will be unique. However, if a new document omits the _id field, MongoDB will automatically generate an object identifier (in the form of an ObjectId object) as the value for the _id field.

      You can verify that the document was inserted by checking the object count in the monuments collection:

      Since you’ve only inserted one document into this collection, the count method will return 1:



      Inserting documents one by one like this would quickly become tedious if you wanted to create multiple documents. MongoDB provides the insertMany method which you can use to insert multiple documents in a single operation.

      Run the following example command, which uses the insertMany method to insert six additional famous monuments into the monuments collection:

      • db.monuments.insertMany([
      • {"name": "The Valley of the Kings", "city": "Luxor", "country": "Egypt", "gps": { "lat": 25.746424, "lng": 32.605309 }},
      • {"name": "Arc de Triomphe", "city": "Paris", "country": "France", "gps": { "lat": 48.873756, "lng": 2.294946 }},
      • {"name": "The Eiffel Tower", "city": "Paris", "country": "France", "gps": { "lat": 48.858093, "lng": 2.294694 }},
      • {"name": "Acropolis", "city": "Athens", "country": "Greece", "gps": { "lat": 37.970833, "lng": 23.726110 }},
      • {"name": "The Great Wall of China", "city": "Huairou", "country": "China", "gps": { "lat": 40.431908, "lng": 116.570374 }},
      • {"name": "The Statue of Liberty", "city": "New York", "country": "USA", "gps": { "lat": 40.689247, "lng": -74.044502 }}
      • ])

      Notice the square brackets ([ and ]) surrounding the six documents. These brackets signify an array of documents. Within square brackets, multiple objects can appear one after another, delimited by commas. In cases where the MongoDB method requires more than one object, you can provide a list of objects in the form of an array like this one.

      MongoDB will respond with several object identifiers, one for each of the newly inserted objects:


      { "acknowledged" : true, "insertedIds" : [ ObjectId("6105770952e6d1ebb7072648"), ObjectId("6105770952e6d1ebb7072649"), ObjectId("6105770952e6d1ebb707264a"), ObjectId("6105770952e6d1ebb707264b"), ObjectId("6105770952e6d1ebb707264c"), ObjectId("6105770952e6d1ebb707264d") ] }

      You can verify that the documents were inserted by checking the object count in the monuments collection:

      After adding these six new documents, the expected output of this command is 7:



      With that, you have used two separate insertion methods to create a number of documents representing several famous monuments. Next, you will read the data you just inserted with MongoDB’s find() method.

      Step 3 — Reading Documents

      Now that your collection has some documents stored within it, you can query your database to retrieve these documents and read their data. This step first outlines how to query all of the documents in a given collection, and then describes how to use filters to narrow down the list of retrieved documents.

      After completing the previous step, you now have seven documents describing famous monuments inserted into the monuments collection. You can retrieve all seven documents with a single operation using the find() method:

      This method, when used without any arguments, doesn’t apply any filtering and asks MongoDB to return all objects available in the specified collection, monuments. MongoDB will return the following output:


      { "_id" : ObjectId("6105752352e6d1ebb7072647"), "name" : "The Pyramids of Giza", "city" : "Giza", "country" : "Egypt", "gps" : { "lat" : 29.97648, "lng" : 31.131302 } } { "_id" : ObjectId("6105770952e6d1ebb7072648"), "name" : "The Valley of the Kings", "city" : "Luxor", "country" : "Egypt", "gps" : { "lat" : 25.746424, "lng" : 32.605309 } } { "_id" : ObjectId("6105770952e6d1ebb7072649"), "name" : "Arc de Triomphe", "city" : "Paris", "country" : "France", "gps" : { "lat" : 48.873756, "lng" : 2.294946 } } { "_id" : ObjectId("6105770952e6d1ebb707264a"), "name" : "The Eiffel Tower", "city" : "Paris", "country" : "France", "gps" : { "lat" : 48.858093, "lng" : 2.294694 } } { "_id" : ObjectId("6105770952e6d1ebb707264b"), "name" : "Acropolis", "city" : "Athens", "country" : "Greece", "gps" : { "lat" : 37.970833, "lng" : 23.72611 } } { "_id" : ObjectId("6105770952e6d1ebb707264c"), "name" : "The Great Wall of China", "city" : "Huairou", "country" : "China", "gps" : { "lat" : 40.431908, "lng" : 116.570374 } } { "_id" : ObjectId("6105770952e6d1ebb707264d"), "name" : "The Statue of Liberty", "city" : "New York", "country" : "USA", "gps" : { "lat" : 40.689247, "lng" : -74.044502 } }

      The MongoDB shell prints out all seven documents one by one and in full. Notice that each of these objects have an _id property which you didn’t define. As mentioned previously, the _id fields serve as their respective documents’ primary key, and were created automatically when you ran the insertMany method in the previous step.

      The default output from the MongoDB shell is compact, with each document’s fields and values printed in a single line. This can become difficult to read with objects containing multiple fields or nested documents, in particular.

      To make the find() method’s output more readable, you can use its pretty printing feature, like this:

      • db.monuments.find().pretty()

      This time, the MongoDB shell will print the documents on multiple lines, each with indentation:


      { "_id" : ObjectId("6105752352e6d1ebb7072647"), "name" : "The Pyramids of Giza", "city" : "Giza", "country" : "Egypt", "gps" : { "lat" : 29.97648, "lng" : 31.131302 } } { "_id" : ObjectId("6105770952e6d1ebb7072648"), "name" : "The Valley of the Kings", "city" : "Luxor", "country" : "Egypt", "gps" : { "lat" : 25.746424, "lng" : 32.605309 } } . . .

      Notice that in the two previous examples, the find() method was executed without any arguments. In both cases, it returned every object from the collection. You can apply filters to a query to narrow down the results.

      Recall from the previous examples that MongoDB automatically assigned The Valley of the Kings an object identifier with the value of ObjectId("6105770952e6d1ebb7072648"). The object identifier is not just the hexadecimal string inside the ObjectId(""), but the whole ObjectId object — a special datatype used in MongoDB to store object identifiers.

      The following find() method returns a single object by accepting a query filter document as an argument. Query filter documents follow the same structure as the documents you insert into a collection, consisting of fields and values, but they’re instead used to filter query results.

      The query filter document used in this example includes the _id field, with The Valley of the Kings’ object identifier as the value. To run this query on your own database, be sure to replace the highlighted object identifier with that of one of the documents stored in your own monuments collection:

      • db.monuments.find({"_id": ObjectId("6105770952e6d1ebb7072648")}).pretty()

      The query filter document in this example uses the equality condition, meaning the query will return any documents that have a field and value pair matching the one specified in the document. Essentially, this example tells the find() method to only return the documents whose _id value is equal to ObjectId("6105770952e6d1ebb7072648").

      After executing this method, MongoDB will return a single object matching the requested object identifier:


      { "_id" : ObjectId("6105770952e6d1ebb7072648"), "name" : "The Valley of the Kings", "city" : "Luxor", "country" : "Egypt", "gps" : { "lat" : 25.746424, "lng" : 32.605309 } }

      You can use quality condition on any other field from the document as well. To illustrate, try searching for monuments in France:

      • db.monuments.find({"country": "France"}).pretty()

      This method will return two monuments:


      { "_id" : ObjectId("6105770952e6d1ebb7072649"), "name" : "Arc de Triomphe", "city" : "Paris", "country" : "France", "gps" : { "lat" : 48.873756, "lng" : 2.294946 } } { "_id" : ObjectId("6105770952e6d1ebb707264a"), "name" : "The Eiffel Tower", "city" : "Paris", "country" : "France", "gps" : { "lat" : 48.858093, "lng" : 2.294694 } }

      Query filter documents are quite powerful and flexible, and they allow you to apply complex filters to collection documents.

      Step 4 — Updating Documents

      It’s common for documents within a document-oriented database like MongoDB to change over time. Sometimes, their structures must evolve along with the changing requirements of an application, or the data itself might change. This step focuses on how to update existing documents by changing field values in individual documents as well as and adding a new field to every document in a collection.

      Similar to the insertOne() and insertMany() methods, MongoDB provides methods that allow you to update either a single document or multiple documents at once. An important difference with these update methods is that, when creating new documents, you only need to pass the document data as method arguments. To update an existing document in the collection, you must also pass an argument that specifies which document you want to update.

      To allow users to do this, MongoDB uses the same query filter document mechanism in update methods as the one you used in the previous step to find and retrieve documents. Any query filter document that can be used to retrieve documents can also be used to specify documents to update.

      Try changing the name of Arc de Triomphe to the full name of Arc de Triomphe de l'Étoile. To do so, use the updateOne() method which updates a single document:

      • db.monuments.updateOne(
      • { "name": "Arc de Triomphe" },
      • {
      • $set: { "name": "Arc de Triomphe de l'Étoile" }
      • }
      • )

      The first argument of the updateOne method is the query filter document with a single equality condition, as covered in the previous step. In this example, { "name": "Arc de Triomphe" } finds documents with name key holding the value of Arc de Triomphe. Any valid query filter document can be used here.

      The second argument is the update document, specifying what changes should be applied during the update. The update document consists of update operators as keys, and parameters for each of the operator as values. In this example, the update operator used is $set. It is responsible for setting document fields to new values and requires a JSON object with new field values. Here, set: { "name": "Arc de Triomphe de l'Étoile" } tells MongoDB to set the value of field name to Arc de Triomphe de l'Étoile.

      The method will return a result telling you that one object was found by the query filter document, and also one object was successfully updated.


      { "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }

      Note: If the document query filter is not precise enough to select a single document, updateOne() will update only the first document returned from multiple results.

      To check whether the update worked, try retrieving all the monuments related to France:

      • db.monuments.find({"country": "France"}).pretty()

      This time, the method returns Arc de Triomphe but with its full name, which was changed by the update operation:


      { "_id" : ObjectId("6105770952e6d1ebb7072649"), "name" : "Arc de Triomphe de l'Étoile", "city" : "Paris", "country" : "France", "gps" : { "lat" : 48.873756, "lng" : 2.294946 } } . . .

      To modify more than one document, you can instead use the updateMany() method.

      As an example, say you notice there is no information about who created the entry and you’d like to credit the author who added each monument to the database. To do this, you’ll add a new editor field to each document in the monuments collection.

      The following example includes an empty query filter document. By including an empty query document, this operation will match every document in the collection and the updateMany() method will affect each of them . The update document adds a new editor field to each document, and assigns it a value of Sammy:

      • db.monuments.updateMany(
      • { },
      • {
      • $set: { "editor": "Sammy" }
      • }
      • )

      This method will return the following output:


      { "acknowledged" : true, "matchedCount" : 7, "modifiedCount" : 7 }

      This output informs you that seven documents were matched and seven were also modified.

      Confirm that the changes were applied:

      • db.monuments.find().pretty()


      { "_id" : ObjectId("6105752352e6d1ebb7072647"), "name" : "The Pyramids of Giza", "city" : "Giza", "country" : "Egypt", "gps" : { "lat" : 29.97648, "lng" : 31.131302 }, "editor" : "Sammy" } { "_id" : ObjectId("6105770952e6d1ebb7072648"), "name" : "The Valley of the Kings", "city" : "Luxor", "country" : "Egypt", "gps" : { "lat" : 25.746424, "lng" : 32.605309 }, "editor" : "Sammy" } . . .

      All the returned documents now have a new field called editor set to Sammy. By providing a non-existing field name to the $set update operator, the update operation will create missing fields in all matched documents and properly set the new value.

      Although you’ll likely use $set most often, many other update operators are available in MongoDB, allowing you to make complex alterations to your documents’ data and structure. You can learn more about these update operators in MongoDB’s official documentation on the subject.

      Step 5 — Deleting Documents

      There are times when data in the database becomes obsolete and needs to be deleted. As with Mongo’s update and insertion operations, there is a deleteOne() method, which removes only the first document matched by the query filter document, and deleteMany(), which deletes multiple objects at once.

      To practice using these methods, begin by trying to remove the Arc de Triomphe de l'Étoile monument you modified previously:

      • db.monuments.deleteOne(
      • { "name": "Arc de Triomphe de l'Étoile" }
      • )

      Notice that this method includes a query filter document like the previous update and retrieval examples. As before, you can use any valid query to specify what documents will be deleted.

      MongoDB will return the following result:


      { "acknowledged" : true, "deletedCount" : 1 }

      Here, the result tells you how many documents were deleted in the process.

      Check whether the document has indeed been removed from the collection by querying for monuments in France:

      • db.monuments.find({"country": "France"}).pretty()

      This time the method returns only single monument, The Eiffel Tower, since you removed the Arc de Triomphe de l'Étoile:


      { "_id" : ObjectId("6105770952e6d1ebb707264a"), "name" : "The Eiffel Tower", "city" : "Paris", "country" : "France", "gps" : { "lat" : 48.858093, "lng" : 2.294694 }, "editor" : "Sammy" }

      To illustrate removing multiple documents at once, remove all the monument documents for which Sammy was the editor. This will empty the collection, as you’ve previously designated Sammy as the editor for every monument:

      • db.monuments.deleteMany(
      • { "editor": "Sammy" }
      • )

      This time, MongoDB lets you know that this method removed six documents:


      { "acknowledged" : true, "deletedCount" : 6 }

      You can verify that the monuments collection is now empty by counting the number of documents within it:



      Since you’ve just removed all documents from the collection, this command returns the expected output of 0.


      By reading this article, you became familiar with the concept of CRUD operations — Create, Read, Update and Delete — the four essential components of data management. You can now insert new documents into a MongoDB database, modify existing ones, retrieve documents already present in a collection, and also delete documents as needed.

      Be aware, though, that this tutorial covered only one fundamental way of query filtering. MongoDB offers a robust query system allowing to precisely select documents of interest against complex criteria. To learn more about creating more complex queries, we encourage you to check out the official MongoDB documentation on the subject.

      Source link

      How To Perform CRUD Operations in MongoDB Using PyMongo on Ubuntu 20.04

      The author selected the Free and Open Source Fund to receive a donation as part of the Write for DOnations program.


      MongoDB is a general-purpose, document-oriented, NoSQL database program that uses JSON-like documents to store data. Unlike tabular relations used in relational databases, JSON-like documents allow for flexible and dynamic schemas while maintaining simplicity. In general, NoSQL databases have the ability to scale horizontally, making them suitable for big data and real-time applications.

      A database driver or connector is a program that connects an application to a database program. To perform CRUD operations in MongoDB using Python, a driver is required to establish the communication channel. PyMongo is the recommended driver for working with MongoDB from Python.

      In this guide, you will write a Python script that creates, retrieves, updates, and deletes data in a locally installed MongoDB server on Ubuntu 20.04. In the end, you will acquire relevant skills to understand the underlying concepts in how data moves across MongoDB and a Python application.


      Before you move forward with this guide, you will need the following:

      Step 1 — Setting Up PyMongo

      In this step, you will install PyMongo, the recommended driver for MongoDB from Python. As a collection of tools for working with MongoDB, PyMongo facilitates database requests using syntax and an interface native to Python.

      To enable PyMongo, open your Ubuntu terminal and install from the Python Package Index. It is recommended to install PyMongo within a virtual environment in order to isolate your Python project. Refer to this guide if you missed how to set up a virtual environment in the prerequisites.

      pip3 refers to the Python3 version of the popular pip package installer for Python. Note that within the Python 3 virtual environment you can use the command pip instead of pip3.

      Now, open the Python interpreter with the command below. The interpreter is a virtual machine that operates like a Unix shell, where you can execute Python code interactively.

      You are in the interpreter when you get an output similar to what’s below:


      Python 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information.

      With a successful output, import pymongo in the Python interpreter:

      Using the import statement, you can access the pymongo module and its code in your terminal. The import statement will run without raising exceptions.

      On the next line, import getpass.

      • from getpass import getpass

      getpass is a module for managing password inputs. The module prompts you for a password without showing an input, and adds a security mechanism to prevent displaying passwords as plaintext.

      Here, make a connection with MongoClient to enable a MongoDB instance of your database. Declare a variable client to hold the MongoClient instance with host, username, password, and authMechanism as arguments:

      • client = pymongo.MongoClient('localhost', username="username", password=getpass('Password: '), authMechanism='SCRAM-SHA-256')

      To connect to MongoDB with authorization enabled, MongoClient requires four arguments:

      • host - the hostname of the server on which MongoDB is installed. Since Mongo is local in this context, use localhost.
      • username and password - authorization credentials created after enabling authentication in MongoDB.
      • authMechanism - SCRAM-SHA-256 is the default authentication mechanism supported by a cluster configured for authentication with MongoDB 4.0 or later.

      Once you’ve established the client connection, you can now interact with your MongoDB instance.

      Step 2 — Testing Databases and Collections

      In this step, you will get familiar with NoSQL concepts such as collections and documents as applied to MongoDB.

      MongoDB supports managing multiple independent databases within a MongoClient instance. You can access or create a database using attribute style on a MongoClient instance. Declare a variable db and assign the new database as an attribute of client:

      In this context, the workplace database keeps track of employee records you will add such as the employee’s name and role.

      Next, create a collection. Like tables in relational databases, collections store a group of documents in MongoDB. In your Python interpreter, create an employees collection as an attribute of db and assign it to a variable of the same name:

      Create the employees collection as an attribute of db and assign it to a variable of the same name.

      Note: In MongoDB, databases, and collections are created lazily. This means that none of the above codes are actually executed until the first document is created.

      Now that you’ve reviewed collections, let’s look at how MongoDB represents documents, the basic structure for representing data.

      Step 3 — Performing CRUD Operations

      In this step, you will perform CRUD operations to manipulate data in MongoDB. Create, retrieve, update, and delete (CRUD) are the four basic operations in computer programming that one can perform to create persistent storage.

      To represent data in Python as JSON-like documents, dictionaries are used. Create a sample employee record with name and role attributes:

      • employee = {
      • "name": "Sammy",
      • "role": "Developer"
      • }

      As you can see, Python dictionaries are very similar in syntax to JSON documents. PyMongo converts Python dictionaries to JSON documents for scalable data storage.

      At this point, insert the employee record into the employees collection:

      • employees.insert_one(employee)

      Calling the insert_one() method on the employees collection, provide the employee record created earlier to be inserted. A successful insertion should return a successful output like below:


      <pymongo.results.InsertOneResult object at 0x7f8c5e3ed1c0>

      Now, verify you’ve successfully inserted the employee record and the collection. Make a query to find the employee you just created:

      • employees.find_one({"name": "Sammy"})

      Calling thefind_one() method on the employees collection with a name query returns a single matching document. This method is useful when you have only one document, or when you are interested in the first match.

      The output should look similar to this:


      {'_id': ObjectId('606ae5b2358ddf640da46894'), 'name': 'Sammy', 'role': 'Developer'}

      Note: When a document is inserted, a unique key _id is automatically added to the document if it does not already contain an _id key.

      If the need arises to modify existing documents, use the update_one() method. The update_one() method requires two arguments, query and update:

      • query - {"name": "Sammy"} - PyMongo will use this query parameter to find documents with elements that match.
      • update - { "$set": {"role": "Technical Writer"} } - The update parameter implements the $set operator, which replaces the value of a field with the specified value.

      Call the update_one() method on the employees collection:

      • employees.update_one({"name": "Sammy"}, { "$set": {"role": "Technical Writer"} })

      A successful update will return an output similar to this:


      <pymongo.results.UpdateResult object at 0x7f8c5e3eb940>

      To delete a single document, employ the delete_one() method. delete_one() requires a query parameter which specifies the document to delete. Execute the delete_one() method as an attribute of the employees collection with the name Sammy as a query parameter.

      • employees.delete_one({"name": "Sammy"})

      This will delete the only entry you have in your employees collection.


      <pymongo.results.DeleteResult object at 0x7f8c5e3c8280>

      Using the find_one() method again, it is apparent that you’ve successfully deleted Sammy’s employee record as nothing prints to the console.

      • employees.find_one({"name": "Sammy"})

      insert_one(), find_one(), update_one(), and delete_one() are great ways of getting started with performing CRUD operations in MongoDB with PyMongo.


      In this guide, you have explored how to set up and configure PyMongo, the database driver, to connect Python code to MongoDB, as well as creating, retrieving, updating, and deleting documents. Although this guide focuses on introductory concepts, PyMongo offers more powerful and flexible ways of working with MongoDB. For instance, you can make bulk inserts, query for more than one document, add indexes to queries, and many more.

      To learn more about MongoDB management, see How To Back Up, Restore, and Migrate a MongoDB Database on Ubuntu 20.04 and How To Import and Export a MongoDB Database on Ubuntu 20.04.

      Source link

      How To Set Up Continuous Archiving and Perform Point-In-Time-Recovery with PostgreSQL 12 on Ubuntu 20.04

      The author selected the Diversity in Tech Fund to receive a donation as part of the Write for DOnations program.


      PostgreSQL is a widely used relational database that supports ACID transactions. The acronym ACID stands for atomicity, consistency, isolation, and durability. These are four key properties of database transactions that PostgreSQL supports to ensure the persistence and validity of data in the database.

      One method PostgreSQL uses to maintain ACID properties is Write-Ahead Logging (WAL). PostgreSQL first records any transaction on the database to the WAL log files before it writes the changes to the database cluster’s data files.

      With continuous archiving, the WAL files are copied to secondary storage, which has a couple of benefits. For example, a secondary database cluster can use the archived WAL file for replication purposes, but you can also use the files to perform point-in-time-recovery (PITR). That is, you can use the files to rollback a database cluster to a desirable point if an accident happens.

      In this tutorial, you will set up continuous archiving with a PostgreSQL 12 cluster on Ubuntu 20.04 and perform PITR on the cluster.


      To complete this tutorial, you’ll need the following:

      Step 1 — Configuring Continuous Archiving on the Database Cluster

      In this first step, you need to configure your PostgreSQL 12 cluster to archive the cluster’s WAL files in a directory different from the cluster’s data directory. To do this, you must first create a new directory somewhere to archive the WAL files.

      Create a new directory as follows:

      You now need to give the default PostgreSQL user, postgres, permission to write to this directory. You can achieve this by changing the ownership of the directory using the chown command:

      • sudo chown postgres:postgres database_archive

      Now that you have a directory set up for the cluster to archive the WAL files into, you must enable archiving in the postgresql.conf configuration file, which you can find in the /etc/postgresql/12/main/ directory by default.

      Open the configuration file with your text editor:

      • sudo nano /etc/postgresql/12/main/postgresql.conf

      Once you have opened the file, you’ll uncomment the line with the archive_mode variable on it by removing the # from the start of the line. Also, change the value of archive_mode to on like the following:


      . . .
      archive_mode = on
      . . .

      You’ll also specify the command the cluster uses to archive the files. PostgreSQL provides an archive command that will work for this tutorial, which you can read about in the official PostgreSQL docs. Uncomment the archive_command variable and add the following command:


      . . .
      archive_command = 'test ! -f /path/to/database_archive/%f && cp %p /path/to/database_archive/%f'
      . . .

      The archive command here first checks to see if the WAL file already exists in the archive, and if it doesn’t, it copies the WAL file to the archive.

      Replace the /path/to/database_archive with the path to the database_archive directory you created earlier. For example, if you created this in your home directory: ~/database_archive.

      Lastly, you need to configure the wal_level variable. wal_level dictates how much information PostgreSQL writes to the log. For continuous archiving, this needs to be set to at least replica:


      . . .
      #wal_level = replica
      . . .

      This is already the default value in PostgreSQL 12, so you shouldn’t need to change it, but it is something to remember if you ever go to change this variable.

      You can now save and exit your file.

      To implement the changes to your database cluster configuration file, you need to restart the cluster as follows:

      • sudo systemctl restart postgresql@12-main

      If PostgreSQL restarts successfully, the cluster will archive every WAL file once it is full. By default, each WAL file is 16MB.

      In the case that you need to archive a transaction immediately, you can force the database cluster to change and archive the current WAL file by running the following command on the cluster:

      • sudo -u postgres psql -c "SELECT pg_switch_wal();"

      With the database cluster successfully copying the WAL files to the archive, you can now perform a physical backup of the database cluster’s data files.

      Step 2 — Performing a Physical Backup of the PostgreSQL Cluster

      It is important to take regular backups of your database to help mitigate data loss should the worst happen. PostgreSQL allows you to take both logical and physical backups of the database cluster. However, for PITR, you need to take a physical backup of the database cluster. That is, you need to make a copy of all the database’s files in PostgreSQL’s data directory. By default, the PostgreSQL 12 data directory is /var/lib/postgresql/12/main/.

      Note: You can also find the location of the data directory by running the following command on the cluster:

      • sudo -u postgres psql -c "SHOW data_directory;"

      In the previous step, you made the directory, database_archive, to store all the archived WAL files. In this step you need to create another directory, called database_backup, to store the physical backup you will take.

      Once again, make the directory:

      Now ensure that the postgres user has permission to write to the directory by changing the ownership:

      • sudo chown postgres:postgres database_backup

      Now that you have a directory for the backup, you need to perform a physical backup of the database cluster’s data files. Fortunately, PostgreSQL has the built-in pg_basebackup command that performs everything for you. Run the command as the postgres user:

      • sudo -u postgres pg_basebackup -D /path/to/database_backup

      Replace /path/to/ with the path to your directory.

      With this physical backup of the database cluster, you are now able to perform point-in-time-recovery on the cluster.

      Step 3 — Performing Point-In-Time-Recovery on the Database Cluster

      Now that you have at least one physical backup of the database and you’re archiving the WAL files, you can now perform PITR, if you need to rollback the database to a previous state.

      First, if the database is still running, you’ll need to shut it down. You can do this by running the systemctl stop command:

      • sudo systemctl stop postgresql@12-main

      Once the database is no longer running, you need to remove all the files in PostgreSQL’s data directory. But first, you need to move the pg_wal directory to a different place as this might contain unarchived WAL files that are important for recovery. Use the mv command to move the pg_wal directory as follows:

      • sudo mv /var/lib/postgresql/12/main/pg_wal ~/

      Now, you can remove the /var/lib/postgresql/12/main directory entirely and recreate it as such:

      • sudo rm -rf /var/lib/postgresql/12/main

      Followed by:

      • sudo mkdir /var/lib/postgresql/12/main

      Now, you need to copy all the files from the physical backup you made in the previous step to the new empty data directory. You can do this with cp:

      • sudo cp -a /path/to/database_backup/. /var/lib/postgresql/12/main/

      You also need to ensure the data directory has the postgres user as the owner and the appropriate permissions. Run the following command to change the owner:

      • sudo chown postgres:postgres /var/lib/postgresql/12/main

      And update the permissions:

      • sudo chmod 700 /var/lib/postgresql/12/main

      The WAL files in the pg_wal directory copied from the physical backup are outdated and not useful. You need to replace them with the WAL files in the pg_wal directory that you copied before you emptied out the PostgreSQL’s data directory as some of the files might not have been archived before stopping the server.

      Remove the pg_wal file in the /var/lib/postgresql/12/main directory as follows:

      • sudo rm -rf /var/lib/postgresql/12/main/pg_wal

      Now copy the files from the pg_wal directory you saved before clearing out the data directory:

      • sudo cp -a ~/pg_wal /var/lib/postgresql/12/main/pg_wal

      With the data directory restored correctly, you need to configure the recovery settings to ensure the database server recovers the archived WAL files correctly. The recovery settings are found in the postgresql.conf configuration file in the /etc/postgresql/12/main/ directory.

      Open the configuration file:

      • sudo nano /etc/postgresql/12/main/postgresql.conf

      Once you have the file open, locate the restore_command variable and remove the # character from the start of the line. Just like you did with archive_command in the first step, you need to specify how PostgreSQL should recover the WAL files. Since the archive command just copies the files to the archive, the restore command will copy the files back. The restore_command variable will be similar to the following:


      . . .
      restore_command = 'cp /path/to/database_archive/%f %p'
      . . .

      Remember to replace /path/to/database_archive/ with the path to your archive directory.

      Next, you have the option to specify a recovery target. This is the point that the database cluster will try to recover to before leaving recovery mode. The recovery target can be a timestamp, transaction ID, log sequence number, the name of a restore point created with the pg_create_restore_point() command, or whenever the database reaches a consistent state. If no recovery target is specified, the database cluster will read through the entire log of WAL files in the archive.

      For a complete list of options for the recovery_target variable, consult the official PostgreSQL documentation.

      Note: The recovery target must be a point in time after the physical backup you are using was taken. If you need to return to an earlier point, then you need to use an earlier backup of the database.

      Once you have set restore_command and recovery_target, save and exit the file.

      Before restarting the database cluster, you need to inform PostgreSQL that it should start in recovery mode. You can achieve this by creating an empty file in the cluster’s data directory called recovery.signal. To create an empty file in the directory, use the touch command:

      • sudo touch /var/lib/postgresql/12/main/recovery.signal

      Now you can restart the database cluster by running:

      • sudo systemctl start postgresql@12-main

      If the database started successfully, it will enter recovery mode. Once the database cluster reaches the recovery target, it will remove the recovery.signal file.

      Now that you have successfully recovered your database cluster to the desired state, you can begin your normal database operations. If you want to recover to a different point in time, you can repeat this step.


      In this tutorial, you set up a PostgreSQL 12 database cluster to archive WAL files and then you used the archived WAL files to perform point-in-time-recovery. You can now rollback a database cluster to a desirable state if an accident happens.

      To learn more about continuous archiving and point-in-time-recovery, you can read the docs.

      For more tutorials on PostgreSQL, check out our PostgreSQL topic page.

      Source link