One place for hosting & domains

      How To Perform Full-text Search in MongoDB


      The author selected the Open Internet/Free Speech Fund to receive a donation as part of the Write for DOnations program.

      Introduction

      MongoDB queries that filter data by searching for exact matches, using greater-than or less-than comparisons, or by using regular expressions will work well enough in many situations. However, these methods fall short when it comes to filtering against fields containing rich textual data.

      Imagine you typed “coffee recipe” into a web search engine but it only returned pages that contained that exact phrase. In this case, you may not find exactly what you were looking for since most popular websites with coffee recipes may not contain the exact phrase “coffee recipe.” If you were to enter that phrase into a real search engine, though, you might find pages with titles like “Great Coffee Drinks (with Recipes!)” or “Coffee Shop Drinks and Treats You Can Make at Home.” In these examples, the word “coffee” is present but the titles contain another form of the word “recipe” or exclude it entirely.

      This level of flexibility in matching text to a search query is typical for full-text search engines that specialize in searching textual data. There are multiple specialized open-source tools for such applications in use, with ElasticSearch being an especially popular choice. However, for scenarios that don’t require the robust search features found in dedicated search engines, some general-purpose database management systems offer their own full-text search capabilities.

      In this tutorial, you’ll learn by example how to create a text index in MongoDB and use it to search the documents in the database against common full-text search queries and filters.

      Prerequisites

      To follow this tutorial, you will need:

      Note: The linked tutorials on how to configure your server, install MongoDB, and secure the MongoDB installation refer to Ubuntu 20.04. This tutorial concentrates on MongoDB itself, not the underlying operating system. It will generally work with any MongoDB installation regardless of the operating system as long as authentication has been enabled.

      Step 1 — Preparing the Test Data

      To help you learn how to perform full-text searches in MongoDB, this step outlines how to open the MongoDB shell to connect to your locally-installed MongoDB instance. It also explains how to create a sample collection and insert a few sample documents into it. This sample data will be used in commands and examples throughout this guide to help explain how to use MongoDB to search text data.

      To create this sample collection, connect to the MongoDB shell as your administrative user. This tutorial follows the conventions of the prerequisite MongoDB security tutorial and assumes the name of this administrative user is AdminSammy and its authentication database is admin. Be sure to change these details in the following command to reflect your own setup, if different:

      • mongo -u AdminSammy -p --authenticationDatabase admin

      Enter the password you set during installation to gain access to the shell. After providing the password, your prompt will change to a greater-than sign:

      Note: On a fresh connection, the MongoDB shell will connect to the test database by default. You can safely use this database to experiment with MongoDB and the MongoDB shell.

      Alternatively, you could switch to another database to run all of the example commands given in this tutorial. To switch to another database, run the use command followed by the name of your database:

      To understand how full-text search can be applied to documents in MongoDB, you’ll need a collection of documents you can filter against. This guide will use a collection of sample documents that include names and descriptions of several different types of coffee drinks. These documents will have the same format as the following example document describing a Cuban coffee drink:

      Example Cafecito document

      {
          "name": "Cafecito",
          "description": "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam."
      }
      

      This document contains two fields: the name of the coffee drink and a longer description which provides some background information about the drink and its ingredients.

      Run the following insertMany() method in the MongoDB shell to create a collection named recipes and, at the same time, insert five sample documents into it:

      • db.recipes.insertMany([
      • {"name": "Cafecito", "description": "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam."},
      • {"name": "New Orleans Coffee", "description": "Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory."},
      • {"name": "Affogato", "description": "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream."},
      • {"name": "Maple Latte", "description": "A wintertime classic made with espresso and steamed milk and sweetened with some maple syrup."},
      • {"name": "Pumpkin Spice Latte", "description": "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree."}
      • ])

      This method will return a list of object identifiers assigned to the newly inserted objects:

      Output

      { "acknowledged" : true, "insertedIds" : [ ObjectId("61895d2787f246b334ece911"), ObjectId("61895d2787f246b334ece912"), ObjectId("61895d2787f246b334ece913"), ObjectId("61895d2787f246b334ece914"), ObjectId("61895d2787f246b334ece915") ] }

      You can verify that the documents were properly inserted by running the find() method on the recipes collection with no arguments. This will retrieve every document in the collection:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam." } . . .

      With the sample data in place, you’re ready to start learning how to use MongoDB’s full-text search features.

      Step 2 — Creating a Text Index

      To start using MongoDB’s full-text search capabilities, you must create a text index on a collection. Indexes are special data structures that store only a small subset of data from each document in a collection separately from the documents themselves. There are several types of indexes users can create in MongoDB, all of which help the database optimize search performance when querying the collection.

      A text index, however, is a special type of index used to further facilitate searching fields containing text data. When a user creates a text index, MongoDB will automatically drop any language-specific stop words from searches. This means that MongoDB will ignore the most common words for the given language (in English, words like “a”, “an”, “the”, or “this”).

      MongoDB will also implement a form of suffix-stemming in searches. This involves MongoDB identifying the root part of the search term and treating other grammar forms of that root (created by adding common suffixes like “-ing”, “-ed”, or perhaps “-er”) as equivalent to the root for the purposes of the search.

      Thanks to these and other features, MongoDB can more flexibly support queries written in natural language and provide better results.

      Note: This tutorial focuses on English text, but MongoDB supports multiple languages when using full-text search and text indexes. To learn more about what languages MongoDB supports, refer to the official documentation on supported languages.

      You can only create one text index for any given MongoDB collection, but the index can be created using more than one field. In our example collection, there is useful text stored in both the name and description fields of each document. It could be useful to create a text index for both fields.

      Run the following createIndex() method, which will create a text index for the two fields:

      • db.recipes.createIndex({ "name": "text", "description": "text" });

      For each of the two fields, name and description, the index type is set to text, telling MongoDB to create a text index tailored for full-text search based on these fields. The output will confirm the index creation:

      Output

      { "createdCollectionAutomatically" : false, "numIndexesBefore" : 1, "numIndexesAfter" : 2, "ok" : 1 }

      Now that you’ve created the index, you can use it to issue full-text search queries to the database. In the next step, you’ll learn how to execute queries containing both single and multiple words.

      Step 3 — Searching for One or More Individual Words

      Perhaps the most common search problem is to look up documents containing one or more individual words.

      Typically, users expect the search engine to be flexible in determining where the given search terms should appear. As an example, if you were to use any popular web search engine and type in “coffee sweet spicy”, you likely are not expecting results that will contain those three words in that exact order. It’s more likely that you’d expect a list of web pages containing the words “coffee”, “sweet”, and “spicy” but not necessarily immediately near each other.

      That’s also how MongoDB approaches typical search queries when using text indexes. This step outlines how MongoDB interprets search queries with a few examples.

      To begin, say you want to search for coffee drinks with spices in their recipe, so you search for the word spiced alone using the following command:

      • db.recipes.find({ $text: { $search: "spiced" } });

      Notice that the syntax when using full-text search is slightly different from regular queries. Individual field names — like name or description — don’t appear in the filter document. Instead, the query uses the $text operator, telling MongoDB that this query intends to use the text index you created previously. You don’t need to be any more specific than that because, as you may recall, a collection may only have a single text index. Inside the embedded document for this filter is the $search operator taking the search query as its value. In this example, the query is a single word: spiced.

      After running this command, MongoDB produces the following list of documents:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree." } { "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee", "description" : "Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory." }

      There are two documents in the result set, both of which contain words resembling the search query. While the New Orleans Coffee document does have the word spiced in the description, the Pumpkin Spice Late document doesn’t.

      Regardless, it was still returned by this query thanks to MongoDB’s use of stemming. MongoDB stripped the word spiced down to just spice, looked up spice in the index, and also stemmed it. Because of this, the words spice and spices in the Pumpkin Spice Late document matched the search query successfully, even though you didn’t search for either of those words specifically.

      Now, suppose you’re particularly fond of espresso drinks. Try looking up documents with a two-word query, spiced espresso, to look for a spicy, espresso-based coffee.

      • db.recipes.find({ $text: { $search: "spiced espresso" } });

      The list of results this time is longer than before:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece914"), "name" : "Maple Latte", "description" : "A wintertime classic made with espresso and steamed milk and sweetened with some maple syrup." } { "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream." } { "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam." } { "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree." } { "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee", "description" : "Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory." }

      When using multiple words in a search query, MongoDB performs a logical OR operation, so a document only has to match one part of the expression to be included in the result set. The results contain documents containing both spiced and espresso or either term alone. Notice that words do not necessarily need to appear near each other as long as they appear in the document somewhere.

      Note: If you try to execute any full-text search query on a collection for which there is no text index defined, MongoDB will return an error message instead:

      Error message

      Error: error: { "ok" : 0, "errmsg" : "text index required for $text query", "code" : 27, "codeName" : "IndexNotFound" }

      In this step, you learned how to use one or multiple words as a text search query, how MongoDB joins multiple words with a logical OR operation, and how MongoDB performs stemming. Next, you’ll use a complete phrase in a text search query and begin using exclusions to narrow down your search results further.

      Step 4 — Searching for Full Phrases and Using Exclusions

      Looking up individual words might return too many results, or the results may not be precise enough. In this step, you’ll use phrase search and exclusions to control search results more precisely.

      Suppose you have a sweet tooth, it’s hot outside, and coffee topped with ice cream sounds like a nice treat. Try finding an ice cream coffee using the basic search query as outlined previously:

      • db.recipes.find({ $text: { $search: "ice cream" } });

      The database will return two coffee recipes:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream." } { "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam." }

      While the Affogato document matches your expectations, Cafecito isn’t made with ice cream. The search engine, using the logical OR operation, accepted the second result just because the word cream appears in the description.

      To tell MongoDB that you are looking for ice cream as a full phrase and not two separate words, use the following query:

      • db.recipes.find({ $text: { $search: ""ice cream"" } });

      Notice the backslashes preceding each of the double quotes surrounding the phrase: "ice cream". The search query you’re executing is "ice cream", with double quotes denoting a phrase that should be matched exactly. The backslashes () escape the double quotes so they’re not treated as a part of JSON syntax, since these can appear inside the $search operator value.

      This time, MongoDB returns a single result:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream." }

      This document matches the search term exactly, and neither cream nor ice alone would be enough to count as a match.

      Another useful full-text search feature is the exclusion modifier. To illustrate how to this works, first run the following query to get a list of all the coffee drinks in the collection based on espresso:

      • db.recipes.find({ $text: { $search: "espresso" } });

      This query returns four documents:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece914"), "name" : "Maple Latte", "description" : "A wintertime classic made with espresso and steamed milk and sweetened with some maple syrup." } { "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream." } { "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree." } { "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam." }

      Notice that two of these drinks are served with milk, but suppose you want a milk-free drink. This is a case where exclusions can come in handy. In a single query, you can join words that you want to appear in the results with those that you want to be excluded by prepending the word or phrase you want to exclude with a minus sign (-).

      As an example, say you run the following query to look up espresso coffees that do not contain milk:

      • db.recipes.find({ $text: { $search: "espresso -milk" } });

      With this query, two documents will be excluded from the previously returned results:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream." } { "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam." }

      You can also exclude full phrases. To search for coffees without ice cream, you could include -"ice cream" in your search query. Again, you’d need to escape the double quotes with backslashes, like this:

      • db.recipes.find({ $text: { $search: "espresso -"ice cream"" } });

      Output

      { "_id" : ObjectId("61d48c31a285f8250c8dd5e6"), "name" : "Maple Latte", "description" : "A wintertime classic made with espresso and steamed milk and sweetened with some maple syrup." } { "_id" : ObjectId("61d48c31a285f8250c8dd5e7"), "name" : "Pumpkin Spice Latte", "description" : "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree." } { "_id" : ObjectId("61d48c31a285f8250c8dd5e3"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam." }

      Now that you’ve learned how to filter documents based on a phrase consisting of multiple words and how to exclude certain words and phrases from search results, you can acquaint yourself with MongoDB’s full-text search scoring.

      Step 5 — Scoring the Results and Sorting By Score

      When a query, especially a complex one, returns multiple results, some documents are likely to be a better match than others. For example, when you look for spiced espresso drinks, those that are both spiced and espresso-based are more fitting than those without spices or not using espresso as the base.

      Full-text search engines typically assign a relevance score to the search results, indicating how well they match the search query. MongoDB also does this, but the search relevance is not visible by default.

      Search once again for spiced espresso, but this time have MongoDB also return each result’s search relevance score. To do this, you could add a projection after the query filter document:

      • db.recipes.find(
      • { $text: { $search: "spiced espresso" } },
      • { score: { $meta: "textScore" } }
      • )

      The projection { score: { $meta: "textScore" } } uses the $meta operator, a special kind of projection that returns specific metadata from returned documents. This example returns the documents’ textScore metadata, a built-in feature of MongoDB’s full-text search engine that contains the search relevance score.

      After executing the query, the returned documents will include a new field named score, as was specified in the filter document:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream.", "score" : 0.5454545454545454 } { "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam.", "score" : 0.5384615384615384 } { "_id" : ObjectId("61895d2787f246b334ece914"), "name" : "Maple Latte", "description" : "A wintertime classic made with espresso and steamed milk and sweetened with some maple syrup.", "score" : 0.55 } { "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee", "description" : "Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory.", "score" : 0.5454545454545454 } { "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree.", "score" : 2.0705128205128203 }

      Notice how much higher the score is for Pumpkin Spice Latte, the only coffee drink that contains both the words spiced and espresso. According to MongoDB’s relevance score, it’s the most relevant document for that query. However, by default, the results are not returned in order of relevance.

      To change that, you could add a sort() clause to the query, like this:

      • db.recipes.find(
      • { $text: { $search: "spiced espresso" } },
      • { score: { $meta: "textScore" } }
      • ).sort(
      • { score: { $meta: "textScore" } }
      • );

      The syntax for the sorting document is the same as that of the projection. Now, the list of documents is the same, but their order is different:

      Output

      { "_id" : ObjectId("61895d2787f246b334ece915"), "name" : "Pumpkin Spice Latte", "description" : "It wouldn't be autumn without pumpkin spice lattes made with espresso, steamed milk, cinnamon spices, and pumpkin puree.", "score" : 2.0705128205128203 } { "_id" : ObjectId("61895d2787f246b334ece914"), "name" : "Maple Latte", "description" : "A wintertime classic made with espresso and steamed milk and sweetened with some maple syrup.", "score" : 0.55 } { "_id" : ObjectId("61895d2787f246b334ece913"), "name" : "Affogato", "description" : "An Italian sweet dessert coffee made with fresh-brewed espresso and vanilla ice cream.", "score" : 0.5454545454545454 } { "_id" : ObjectId("61895d2787f246b334ece912"), "name" : "New Orleans Coffee", "description" : "Cafe Noir from New Orleans is a spiced, nutty coffee made with chicory.", "score" : 0.5454545454545454 } { "_id" : ObjectId("61895d2787f246b334ece911"), "name" : "Cafecito", "description" : "A sweet and rich Cuban hot coffee made by topping an espresso shot with a thick sugar cream foam.", "score" : 0.5384615384615384 }

      The Pumpkin Spice Latte document appears as the first result since it has the highest relevance score.

      Sorting results according to their relevance score can be helpful. This is especially true with queries containing multiple words, where the most fitting documents will usually contain multiple search terms while the less relevant documents might contain only one.

      Conclusion

      By following this tutorial, you’ve acquainted yourself with MongoDB’s full-text search features. You created a text index and wrote text search queries using single and multiple words, full phrases, and exclusions. You’ve also assessed the relevance scores for returned documents and sorted the search results to show the most relevant results first. While MongoDB’s full-text search features may not be as robust as those of some dedicated search engines, they are capable enough for many use cases.

      Note that there are more search query modifiers — such as case and diacritic sensitivity and support for multiple languages — within a single text index. These can be used in more robust scenarios to support text search applications. For more information on MongoDB’s full-text search features and how they can be used, we encourage you to check out the official official MongoDB documentation.



      Source link


      Leave a Comment