Table of Content
Like many other NoSQL databases, Elasticsearch stores JSON documents. Elasticsearch is a scalable database optimized to perform fast and advanced searches on the stored data in near real time, and that's why it is also classified as a search server.
When it comes to free text search Elasticsearch doesn't only allows to search for a substring in a defined document attribute, instead it will analyze the document attributes and check whether the words from the search query are included in the attributes regardless of the word order or whether there are other words in between. It will also return documents whose attributes doesn't contain all the searched words, and it will rank the found documents according to their relevance to the search query.
Beside that's, there are even more sophisticated search functionalities like: considering synonyms, word base forms, variations of spelling and typos.
Index a JSON Document
Before a JSON document can be found, it naturally has to be stored/inserted/saved into to Elasticsearch. In Elasticsearch terminology this action is called “indexing”. But this is only one meaning, the index term exists twice in this context: once as a verb and once as a noun.
As a noun, an index is a logical unit that groups JSON documents. Inserting documents into a specific index (indexing) has an impact on how the documents are stored and distributed across the servers on which the Elasticsearch cluster runs on.
Inside an index there is another concept intended to further organize documents. Whereas indices group documents based on technical attributes, the “type” concept groups documents based on their semantic or better: based on the “mapping” they share. A mapping in Elasticsearch describes the attribute data types of the JSON documents that are grouped in the same type. Per default the mapping of type is created automatically as documents gets inserted (also when inserting a document into a type which has attributes the existing documents doesn't have). But there are some scenarios where the automatically created mapping can lead to some unexpected behavior.
There is an official blog post from elastic that points out the differences between an index and a type. One statement of that post is:
As a rule of thumb: Storing document types that has a very different mapping in the same index will increase the required disk space dramatically. Storing document in different indices will increase latency when searching for documents across multiple document types.
Elasticsearch provides a HTTP API and a query language to search for indexed JSON Documents in near real time. The query language is based on JSON and so even a search query is represented as a JSON document.
When searching for documents Elasticsearch returns the found documents enriched with metadata. One of the more important metadata information is the “relevance score” of a document regarding to a search query. The score indicates how well the document matches the search query.
As mentioned above, there are sophisticated ways to find JSON documents, so Elasticsearch is able to search for documents that contain words from a search query considering synonyms, word base forms, variations of spelling and typos.