Elasticsearch Introduction - Features and Use Cases Examples

Oliver Wolf, find me on twitter
Building Microservices since three years.

Table of Content

Executive Summary

Like many other NoSQL databases, Elasticsearch stores JSON documents. Elasticsearch is a scalable database optimized to perform fast and advanced searches on the stored data in near real time, and that's why it is also classified as a search server.

When it comes to free text search Elasticsearch doesn't only allows to search for a substring in a defined document attribute, instead it will analyze the document attributes and check whether the words from the search query are included in the attributes regardless of the word order or whether there are other words in between. It will also return documents whose attributes doesn't contain all the searched words, and it will rank the found documents according to their relevance to the search query.

Beside that's, there are even more sophisticated search functionalities like: considering synonyms, word base forms, variations of spelling and typos.

Elasticsearch Features

Index a JSON Document

Before a JSON document can be found, it naturally has to be stored/inserted/saved into to Elasticsearch. In Elasticsearch terminology this action is called “indexing”. But this is only one meaning, the index term exists twice in this context: once as a verb and once as a noun.

As a noun, an index is a logical unit that groups JSON documents. Inserting documents into a specific index (indexing) has an impact on how the documents are stored and distributed across the servers on which the Elasticsearch cluster runs on.

Inside an index there is another concept intended to further organize documents. Whereas indices group documents based on technical attributes, the “type” concept groups documents based on their semantic or better: based on the “mapping” they share. A mapping in Elasticsearch describes the attribute data types of the JSON documents that are grouped in the same type. Per default the mapping of type is created automatically as documents gets inserted (also when inserting a document into a type which has attributes the existing documents doesn't have). But there are some scenarios where the automatically created mapping can lead to some unexpected behavior.

There is an official blog post from elastic that points out the differences between an index and a type. One statement of that post is:

“In the past we tried to make elasticsearch easier to understand by building an analogy with relational databases: indices would be like a database, and types like a table in a database. This was a mistake: the way data is stored is so different that any comparisons can hardly make sense, and this ultimately led to an overuse of types in cases where they were more harmful than helpful.”

As a rule of thumb: Storing document types that has a very different mapping in the same index will increase the required disk space dramatically. Storing document in different indices will increase latency when searching for documents across multiple document types.

Search Documents

Elasticsearch provides a HTTP API and a query language to search for indexed JSON Documents in near real time. The query language is based on JSON and so even a search query is represented as a JSON document.

When searching for documents Elasticsearch returns the found documents enriched with metadata. One of the more important metadata information is the “relevance score” of a document regarding to a search query. The score indicates how well the document matches the search query.

As mentioned above, there are sophisticated ways to find JSON documents, so Elasticsearch is able to search for documents that contain words from a search query considering synonyms, word base forms, variations of spelling and typos.