One place for hosting & domains

      DocumentOriented

      An Introduction to Document-Oriented Databases


      Introduction

      Although they were first invented decades ago, computer-based databases have become ubiquitous on today’s internet. More and more commonly, websites and applications involve collecting, storing, and retrieving data from a database. For many years the database landscape was dominated by relational databases, which organize data in tables made up of rows. To break free from the rigid structure imposed by the relational model, though, a number of different database types have emerged in recent years.

      These new database models are jointly referred to as NoSQL databases, as they usually do not use Structured Query Language — also known as SQL — which relational databases typically employ to manage and query data. NoSQL databases offer a high level of scalability as well as flexibility in terms of data structure. These features make NoSQL databases useful for handling large volumes of data and fast-paced, agile development.

      This conceptual article outlines the key concepts related to document databases as well as the benefits of using them. Examples used in this article reference MongoDB, a widely-used document-oriented database, but most of the concepts highlighted here are applicable for most other document databases as well.

      What is a Document Database?

      Breaking free from thinking about databases as consisting of rows and columns, as is the case in a table within a relational database, document databases store data as documents. You might think of a document as a self-contained data entry containing everything needed to understand its meaning, similar to documents used in the real world.

      The following is an example of a document that might appear in a document database like MongoDB. This sample document represents a company contact card, describing an employee called Sammy:

      Sammy’s contact card document

      {
          "_id": "sammyshark",
          "firstName": "Sammy",
          "lastName": "Shark",
          "email": "sammy.shark@digitalocean.com",
          "department": "Finance"
      }
      

      Notice that the document is written as a JSON object. JSON is a human-readable data format that has become quite popular in recent years. While many different formats can be used to represent data within a document database, such as XML or YAML, JSON is one of the most common choices. For example, MongoDB adopted JSON as the primary data format to define and manage data.

      All data in JSON documents are represented as field-and-value pairs that take the form of field: value. In the previous example, the first line shows an _id field with the value sammyshark. The example also includes fields for the employee’s first and last names, their email address, as well as what department they work in.

      Field names allow you to understand what kind of data is held within a document with just a glance. Documents in document databases are self-describing, which means they contain both the data values as well as the information on what kind of data is being stored. When retrieving a document from the database, you always get the whole picture.

      The following is another sample document representing a colleague of Sammy’s named Tom, who works in multiple departments and also uses a middle name:

      Tom’s contact card document

      {
          "_id": "tomjohnson",
          "firstName": "Tom",
          "middleName": "William",
          "lastName": "Johnson",
          "email": "tom.johnson@digitalocean.com",
          "department": ["Finance", "Accounting"]
      }
      

      This second document has a few differences from the first example. For instance, it adds a new field called middleName. Also, this document’s department field stores not a single value, but an array of two values: "Finance" and "Accounting".

      Because these documents hold different fields of data, they can be said to have different schemas. A database’s schema is its formal structure, which outlines what kind of data it can hold. In the case of documents, their schemas are reflected in their field names and what kinds of values those fields represent.

      In a relational database, you’d be unable to store both of these example contact cards in the same table, as they differ in structure. You would have to adapt the database schema both to allow storing multiple departments as well as middle names, and you would have to provide a middle name for Sammy or else fill the column for that row with a NULL value. This is not the case with document databases, which offer you the freedom to save multiple documents with different schemas together with no changes to the database itself.

      In document databases, documents are not only self-describing but also their schema is dynamic, which means that you don’t have to define it before you start saving data. Fields can differ between different documents in the same database, and you can modify the document’s structure at will, adding or removing fields as you go. Documents can be also nested — meaning that a field within one document can have a value consisting of another document — making it possible to store complex data within a single document entry.

      Let’s imagine the contact card must store information about social media accounts the employee uses and add them as nested objects to the document:

      Tom’s contact card document with social media accounts information attached

      {
          "_id": "tomjohnson",
          "firstName": "Tom",
          "middleName": "William",
          "lastName": "Johnson",
          "email": "tom.johnson@digitalocean.com",
          "department": ["Finance", "Accounting"],
          "socialMediaAccounts": [
              {
                  "type": "facebook",
                  "username": "tom_william_johnson_23"
              },
              {
                  "type": "twitter",
                  "username": "@tomwilliamjohnson23"
              }
          ]
      }
      

      A new field called socialMediaAccounts appears in the document, but instead of a single value, it refers to an array of nested objects describing individual social media accounts. Each of these accounts could be a document on its own, but here they’re stored directly within the contact card. Once again, there is no need to change the database structure to accommodate this requirement. You can immediately save the new document to the database.

      Note: In MongoDB, it’s customary to name fields and collections using a camelCase notation, with no spaces between words, the first word written entirely in lowercase, and any additional words having their first letters capitalized. That said, you can also use different notations such as snake_case, in which words are all written in lowercase and separated with underscores. Whichever notation you choose, it’s considered bast practice to use it consistently across the whole database.

      All these attributes make it intuitive to work with document databases from the developer’s perspective. The database facilitates storing actual objects describing data within the application, encouraging experimentation and allowing great flexibility when reshaping data as the software grows and evolves.

      Benefits of Document Databases

      While document-oriented databases may not be the right choice for every use case, there are many benefits of choosing one over a relational database. A few of the most important benefits are:

      • Flexibility and adaptability: with a high level of control over the data structure, document databases enable experimentation and adaptation to new emerging requirements. New fields can be added right away and existing ones can be changed any time. It’s up to the developer to decide whether old documents must be amended or the change can be implemented only going forward.

      • Ability to manage structured and unstructured data: as mentioned previously, relational databases are well suited for storing data that conforms to a rigid structure. Document databases can be used to handle structured data as well, but they’re also quite useful for storing unstructured data where necessary. You can imagine structured data as the kind of information you would easily represent in a spreadsheet with rows and columns, whereas unstructured data is everything not as straightforward to frame. Examples of unstructured data are rich social media posts with human-generated texts and multimedia, server logs that don’t follow unified format, or data coming from a multitude of different sensors in smart homes.

      • Scalability by design: relational databases are often write constrained, and increasing their performance requires you to scale vertically (meaning you must migrate their data to more powerful and performant database servers). Conversely, document databases are designed as distributed systems that instead allow you to scale horizontally (meaning that you split a single database up across multiple servers). Because documents are independent units containing both data and schema, it’s relatively trivial to distribute them across server nodes. This makes it possible to store large amounts of data with less operational complexity.

      In real-world applications, both document databases and other NoSQL and relational databases are often used together, each responsible for what it’s best suited for. This paradigm of mixing various types of databases is known as polyglot persistence.

      Grouping Documents Into Collections

      While document databases allow great flexibility in how the documents are structured, having some means of organizing data into categories sharing similar characteristics is crucial for ensuring that a database is healthy and manageable.

      Imagine a database as an individual cabinet in a company archive with many draws. For example, one drawer might keep records of employment contracts, with another keeping agreements with business partners. While it is technically possible to put both kinds of documents into a single drawer, it would be difficult to browse the documents later on.

      In a document database, such drawers are often called collections, logically similar to tables in relational databases. The role of a collection is to group together documents that share a similar logical function, even if individual documents may slightly differ in their schema. For instance, say you have one employment contract for a fixed-term and another that describes a contractor’s additional benefits. Both documents are employment contracts and, as such, it could make sense to group them into a single collection:

      Document collection

      Note: While it’s a popular approach, not all document databases use the concept of collections to organize documents together. Some database systems use tags or tree-like hierarchies, others store documents directly within a database with no further subdivisions. MongoDB is one of the popular document-oriented databases that use collections for document organization.

      Having similar characteristics between documents within a collection also allows you to build indexes in order to allow for more performant retrieval of documents based on queries related to certain fields. Indexes are special data structures that store a portion of a collection’s data in a way that’s faster to traverse and filter.

      As an example, you might have a collection of documents in a database that all share a similar field. Because each document shares the same field, it’s likely you would often use that field when running queries. Without indexes, any query asking the database to retrieve a particular document requires a collection scan — browsing all documents within a collection one by one to find the requested match. By creating an index, however, the database only needs to browse through indexed fields, thereby improving query performance.

      Data Types and Schema Validation

      While we mentioned that document-oriented databases can store documents in different formats, such as XML, YAML or JSON, these are often further extended with additional traits that are specific to a given database system, such as additional data types or structure validation features.

      For example, MongoDB internally uses a binary format called BSON (short for Binary JSON) instead of a pure JSON. This not only allows for better performance, but it also extends the format with data types that JSON does not support natively. Thanks to this, we can reliably store different kinds of data in MongoDB documents without being restricted to standard JSON types and use filtering, sorting, and aggregation features specific to individual data types.

      The following sample document uses several different data types supported by MongoDB:

      {
          "_id": ObjectId("5a934e000102030405000000"),
          "code": NumberLong(2090845886852),
          "image": BinData(0, "TGVhcm5pbmcgTW9uZ29EQg=="),
          "lastPurchased": ISODate("2021-01-19T06:01:17.171Z"),
          "name": "Document database sticker",
          "price": NumberDecimal("13.23"),
          "quantity": 317,
          "tags": [
              "stickers",
              "accessories"
          ]
      }
      

      Notice that some of these data types not typical to JSON, such as decimal numbers with exact precision or dates which are represented as objects, such as NumberDecimal or ISODate. This ensures that these fields will always be interpreted properly and not mistakenly cast to another similar data type, like a decimal number being cast to a regular double.

      This variety of supported data types, combined with schema validation features, makes it possible to implement a set of rules and validity requirements to provide your document database structure. This allows you to model not only unstructured data, but to also create collections of documents following more rigid and precise requirements.

      Conclusion

      Thanks to their flexibility, scalability, and ease of use, document databases are becoming an increasingly popular choice of database for application developers. They are well suited to different applications and work well on their own or as a part of bigger, multi-database ecosystems. The wide array of document-oriented databases has distinct advantages and use cases, making it possible to choose the best database for any given task.

      You can learn more about document-oriented databases and other NoSQL databases from DigitalOcean’s community articles on that topic.

      To learn more about MongoDB in particular, we encourage you to follow this tutorial series covering many topics on using and administering MongoDB and to check the official MongoDB documentation, a vast source of knowledge about MongoDB as well as document databases in general.



      Source link

      Understanding MongoDB: Advantages of a Document-Oriented NoSQL Database


      Introduction

      Data has become a driving force of technology in recent years, as modern applications and websites need to manage an ever-increasing amount of data. Traditionally, database management systems organize data based on the relational model. As organizations’ data needs have changed, however, a number of new types of databases have been developed.

      These new types of databases often don’t rely on the traditional table structure provided by relational databases, and can thus allow for far more flexibility than the rigid structure imposed by relational databases. Additionally, they typically don’t use Structured Query Language (SQL), which is employed by most relational database systems to allow users to define and interact with data. This has led to many of these new non-relational databases to be referred to generally as NoSQL databases.

      First released in 2009, MongoDB — also known as Mongo — is a document-oriented NoSQL database used in many modern web applications. This conceptual article provides a high-level overview of the features that set MongoDB apart from other database management systems and make it a valuable tool across many different use cases.

      A Brief Overview of MongoDB

      As mentioned in the introduction, MongoDB is considered to be a NoSQL database since it doesn’t depend on the relational model. Every database management system is designed around a certain type of data model that defines how the data within the database will be organized. The relational model involves storing data in tables — more formally known as relations — made up of rows and columns.

      MongoDB, on the other hand, stores its data records in structures known as documents. Mongo allows you to group multiple documents into a structure known as a collection, which can be further grouped into separate databases.

      A document is written in BSON, a binary representation of JSON. Like objects in JSON, MongoDB documents begin and end with curly brackets ({ and }), and contain a number of field-and-value pairs which typically take the form of field: value. A field’s value can be any one of the data types used in BSON, or even other structures like documents and arrays.

      Security

      MongoDB comes installed with a number of features that can help to prevent data loss as well as access by unauthorized users. Some of these features can be found on other database management systems. For instance, Mongo, like many modern DBMSs, allows you to encrypt data as it traverses a network — sometimes called data in transit. It does this by requiring that connections to the database be made with Transport Layer Security (TLS), a cryptographic protocol that serves as a successor to Secure Sockets Layer (SSL).

      Also like other DBMSs, Mongo manages authorization — the practice of setting rules for a given user or group of users to define what actions they can perform and what resources they can access — through a computer security concept known as role-based access control, or RBAC. Whenever you create a MongoDB user, you have the option to provide them with one or more roles.

      A role defines what privileges a user has, including what actions they can perform on a given database, collection, set of collections, or cluster. For example, you can assign a user the readWrite role on any database, meaning that you can read and modify the data held in any database on your system as long as you’ve granted a user the readWrite role over it. Something that distinguishes MongoDB’s RBAC from that of other databases is that, in addition to its built-in roles, Mongo also allows you to define custom roles, giving you even more control over what resources users can access on your system.

      Since the release of version 4.2, MongoDB supports client-side field level encryption. This involves encrypting certain fields within a document before the data gets written to the database. Any client or application that tries to read it later on must first present the correct encryption keys to be able to decrypt the data in these fields.

      To illustrate, say your database holds a document with the following fields and values:

      {
        "name" : "Sammy",
        "phone" : "555-555-1234",
        "creditcard" : "1234567890123456"
      }
      

      It could be dangerous to store sensitive information like this — namely, a person’s phone and credit card numbers — in a real-world application. Even if you’ve put limits on who can access the database, anyone who has privileges to access the database could see and take advantage of your users’ sensitive information. When properly configured, though, these fields would look something like if they were written with client side field level encryption:

      {
        "name" : "Sammy",
        "phone" : BinData6,"quas+eG4chuolau6ahq=i8ahqui0otaek7phe+Miexoo"),
        "creditcard" : BinData6,"rau0Teez=iju4As9Eeyiu+h4coht=ukae8ahFah4aRo="),
      }
      

      For a more thorough overview of MongoDB’s security features, along with some general strategies for keeping a Mongo database secure, we encourage you to check out our series on MongoDB Security: Best Practices to Keep Your Data Safe.

      Flexibility

      Another characteristic of MongoDB that has helped drive its adoption is the flexibility it provides when compared with more traditional database management systems. This flexibility is rooted in MongoDB’s document-based design, since collections in Mongo do not enforce a specific structure that every document within them must follow. This contrasts with the rigid structure imposed by tables in a relational database.

      Whenever you create a table in a relational database, you must explicitly define the set of columns the table will hold along with their data types. Following that, every row of data you add must conform to that specific structure. On the other hand, MongoDB documents in the same collection can have different fields, and even if they share a given field it can hold different data types in different documents.

      This rigidity imposed by the relational model isn’t necessarily a bad thing. In fact, it makes relational databases quite useful for storing data that neatly conforms to a predefined structure. But it can become limiting in cases where you need to store unstructured data — data that doesn’t easily fit into predefined data models or isn’t easily searchable by conventional tools.

      Examples of unstructured data include media content, like videos or photos, communications data, or text files. Sometimes, unstructured data is generalized as qualitative data. In other words, data that may be human readable but is difficult for computers to adequately parse. MongoDB’s versatile document-oriented design, however, makes it a great choice for storing and analyzing unstructured data as well as structured and semi-structured data.

      Another example of Mongo’s flexibility is how it offers multiple avenues for interacting with one’s data. For example, you can run the mongo shell, a JavaScript-based interface that comes installed with the MongoDB server, which allows you to interact with your data from the command line.

      Mongo also supports a number of official drivers that can help you connect a database to your application. Mongo provides these libraries for a variety of popular programming languages, including PHP, Java, JavaScript, and Python. These drivers also provide support for the data types found in their respective host languages, expanding on the BSON data types available by default.

      High Availability

      Any computer-based database system depends on its underlying hardware to function and serve the needs of an application or client. If the machine on which it’s running fails for any reason, the data held within the database won’t be accessible until the machine is back up and running. If a database management system is able to remain in operation for a higher than normal period of time, it’s said to be highly available.

      One way many databases remain highly available is through a practice known as replication. Replication involves synchronizing data across multiple different databases running on separate machines. This results in multiple copies of the same data and provides redundancy in case one of the database servers fails. This ensures that the synchronized data always remains available to the applications or clients that depend on it.

      In MongoDB, a group of servers that maintain the same data set through replication are referred to as a replica set. Each running instance of MongoDB that’s part of a given replica set is referred to as one of its members. Every replica set must have one primary member and at least one secondary member.

      One advantage that MongoDB’s replica sets have over other replication implementations in other database systems is Mongo’s automatic failover mechanism. In the event that the primary member becomes unavailable, an automated election process happens among the secondary nodes to choose a new primary.

      Scalability

      As a core component of modern applications, it’s important for a database to be able to respond to changes in the amount of work it must perform. After all, an application can see sudden surges in its number of users, or perhaps experience periods of particularly heavy workloads.

      Scalability refers to a computer system’s ability to handle an ever-growing amount of work, and the practice of increasing this capacity is called scaling. There are two ways one can scale a computer system:

      • Vertical scaling — also called scaling up — involves adding more computing resources to a given system, typically by increasing its storage capacity or memory
      • Horizontal scaling — also called, scaling out — involves splitting the workload across multiple computing nodes which, all together, make up a single logical system

      To vertically scale a MongoDB database, one could back up its data and migrate it to another machine with more computing resources. This is generally the same procedure for vertically scaling any database management system, including relational databases. However, scaling up like this can have drawbacks. The cost of using larger and larger machines over time can become prohibitively expensive and, no matter how great it is, there is always an upper limit to how much data a single machine can store.

      Sharding is a strategy some administrators employ for scaling out a database. If you’d like a thorough explanation of sharding, we encourage you to read our conceptual article on Understanding Database Sharding. For the purposes of this article, though, understand that sharding is the process of breaking up a data set based on a given set of rules, and distributing the resulting pieces of data across multiple separate database nodes. A single node that holds part of a sharded cluster’s data set is known as a shard.

      Database management systems don’t always include sharding capabilities as a built-in feature, so oftentimes sharding is implemented at the application level. MongoDB, however, does include a built-in sharding feature which allows you to shard data at the collection level. As of version 3.6, every MongoDB shard must be deployed as a replica set to ensure that the shard’s data remains highly available.

      To shard data in Mongo, you must select one or more fields in a given collection’s documents to function as the shard key. MongoDB then takes the range of shard key values and divides them into non-overlapping ranges, known as chunks, and each chunk is assigned to a given shard.

      Following that, Mongo reads each document’s shard key value, determines what chunk the document belongs to, and then distributes the document to the appropriate shard. MongoDB actively monitors the number of chunks in each shard, and will attempt to migrate chunks from one shard to another to ensure that each has an equal amount.

      The main drawback of sharding is that it adds a degree of operational complexity to a database system. However, once you have a working MongoDB shard cluster, the process of adding more shards to scale the system horizontally is fairly straightforward, and a properly configured replica set can be added as a shard with a single command. This makes MongoDB an appealing choice for applications that need to scale out quickly.

      Is MongoDB Right for my Application?

      Relational database management systems still see wider use than databases that employ a NoSQL model. With that said, though, MongoDB continues to gain ground thanks to the features described throughout this guide. In particular, it’s become a common choice of database for a number of use cases.

      For example, its scaling capabilities and high availability make it a popular database for e-commerce and gaming applications where the number of users being served can increase quickly and dramatically. Likewise, its flexible schema and ability to handle large amounts of unstructured data make it a great choice for content management applications which need to manage an ever-evolving catalog of assets, ranging from text, to video, images, and audio files. It has also seen strong adoption among mobile application developers, thanks again to its powerful scaling as well as its data analysis capabilities.

      When deciding whether you should use MongoDB in your next application, you should first ask yourself what the application’s specific data needs are. If your application will store data that rigidly adheres to a predefined structure, you may not get much additional value from Mongo’s schemaless design and you might be better off using a relational database.

      Then, weigh how much data you expect your application will need to store and use. MongoDB’s document-oriented design makes it a great choice for applications that need to store large amounts of unstructured data. Similarly, MongoDB’s scalability and high availability make it a perfect fit for applications that serve a large and ever-growing number of clients. However, these features could be excessive in cases that aren’t as data intensive.

      Conclusion

      By reading this article, you’ll have gained a better understanding of the features that set MongoDB apart from other database management systems. Although MongoDB is a powerful, flexible, and secure database management system that can be the right choice of database in certain use cases, it may not always be the best choice. While its document-based and schemaless design may not supplant the relational database model any time soon, Mongo’s rapid growth highlights its value as a tool worth understanding.

      For more information about MongoDB, we encourage you to check out DigitalOcean’s entire library of MongoDB content. Additionally, the official MongoDB documentation serves as a valuable resource of information on working with Mongo.



      Source link