MongoDB Introduction - Features and Use Case Examples

Oliver Wolf, find me on twitter
Building Microservices since three years.

Table of Content

Executive Summary

MongoDB's final function is to persistently store data for software applications (e.g. for web services). When inserting a new data record, it doesn't have to fulfil a predefined schema. This means the record can contain any attributes (or not). The aim of the missing schema enforcement is to increase the flexibility when requirements are undefined or change. This flexibility comes with the trade-off that the developers has to add more logic to the application itself.

Another trade-off that could require additional logic in the application layer is that the query language to retrieve data is not that powerful as the query language of traditional relational databases. This limitation enables MongoDB to handle more data by distributing the workload of proceeding and storing data to multiple servers.

MongoDB Features

Store, Read, Update and Delete JSON Documents

MongoDB stores data as JSON- documents and those documents are grouped in collections. When using MongoDB to realize a blog application there could be a collection named “posts” and a collection named “authors”. Each document must be assigned to exactly one collection.

To update or delete existing document(s) you have to know the collection to which the document(s) belongs to. Further you have to know the attribute values that identify the document(s) to be updated or deleted.

Store, Find, Delete Large Files

Often applications have to store binary files like videos, images, PDFs, etc. Using MongoDB this binary data can also be stored in a document, but there is a limitation: MongoDB only supports documents which are smaller than 16MB. Yet, MongoDB is an alternative to the file system when it comes to store files even when there are bigger than 16MB.

To store bigger files, MongoDB comes with a subsystem called GridFS. GridFS will divide your big file into chunks which fit into a MongoDB document. When reading the file from GridFS, it will reassemble the file from these chunk documents.

Aggregate JSON Documents

This feature takes all documents in a collection and performs a set of operations on the data before returning the result back to the client.

Which operations are applied and in which order they are applied is defined by the client. This feature is also called “aggregation pipeline”. Using the aggregation pipeline, documents can be:

  • transformed by adding or removing attributes
  • filtered so that they are not part of the result
  • grouped into fewer documents, whereas the attribute values of the documents that should be grouped into one document can be accumulated in different ways
  • enriched with data from other documents. This functionality is similar to the “left outer join” of traditional relational databases. It is done using the “lookup” keyword that has been introduced with MongoDB 3.2.
  • sorted

To do more sophisticated aggregations, MongoDB provides the possibility to run MapReduce jobs. MapReduce Jobs are usually more difficult to write and to understand then using the aggregation pipeline. Often there are also slower in their execution time then the aggregation pipeline.

MongoDB Use Cases

MongoDB as a Database for Scalable (Web) Applications

MongoDB is a god database choice for (web) applications with the following characteristics:

  • Scalability: The request volume to the application is hight and/or the requests are increasing steadily.
  • Scalability: The amount of data to store is hight and/or is increasing steadily.
  • Availability: It must be possible to set up standby nodes so that the system still work when one or more servers crashes.
  • Eventual consistency: When a write operation is confirmed by MongoDB it is not required that reading clients get the updated/inserted data immediately. Using MongoDB as a database it could take a short time period (usually > 1 sec) until the data is up to date.
  • Eventual consistency over availability: In some situations (due to network partitions and split brain situations) it is not possible that a distributed systems ensures consistency and availability at the same time. In these situations MongoDB gives up availability to ensures consistency.

MongoDB as an Object Store for Scalable Web Applications

A common tactic to make web application scalable is to use a shared nothing architecture. This means there isn't an application instance that knows something another instance doesn't know.

Imagine a blog where a user writes a post and uploads an image for this post. Let's say the upload request is handled by application instance one, which saves the image into the local file system. From now one the application instance one knows something (the image) that is not known by instance two. When a user now opens the blog post, it could be that the request to show this post is handled by the application instance that doesn't has the image. In this situation the use won't see the image.

Instead of writing the image into the local file system, the application could use the feature to store larger files in MongoDB. This allows each application instance to fetch the required files (the image in this example) from the MongoDB server(s).

MongoDB as a Store for Log Events (Operational Intelligence)

When running software systems in production (e.g. web applications, web server, databases, etc.) they are usually configured to produce log messages. For example a web server produces a log message/event for each HTTP requests he handles. This message can contain multiple pieces of information. The most web server for example specify the time when a HTTP request arrived, the URL to which the request was made, the browser that the user used to show the page and so on.

Traditionally log messages are stored in a file on the same machine where the system runs. This makes the analysis of the data hard. Especially when the system is distributed across multiple machines. To stay with the web server example imagine you have a web server cluster, where each web server in this cluster produces log messages. To analyse the log messages you must first gather the messages from each server in the cluster.

Instead of writing these log messages into different files, it is possible to use MongoDB to store these messages. The horizontal scalability of MongoDB allows a growing number of log messages without struggling with hardware limitations. The fact that MongoDB only provides eventual consistent operations doesn't matter because this use case doesn't require a strong consistency.

To allow a huge amount of parallel incoming log messages it is possible to configure MongoDB that it should't care about the durability of the data that much as it would care by default. This doesn't mean the log messages are not saved persistent it just means that when a client inserts a record, MongoDB wont send an acknowledgement back to the client that the operation was successful. This configuration is called Write Concerns and can be configured for each write operation differently.

The aggregation features can be used for effective log message analysing.