Mapping basically defines the structure of documents and it is also used to configure how values will be indexed within Elasticsearch.
Elasticsearch doesn't require us to create a mapping for our indices, because it works using dynamic mapping. So, it will infer our data types based on what we are inserting in our document.
Since mapping is quite flexible, we can also combine explicit mapping with dynamic mapping. So, we can create an index with explicit data types and, when we add documents, they may have new fields and Elasticsearch will store them according to their types.
Data Types
Elasticsearch provides some pretty regular data types, which we can find also in many programming languages, such as: short, integer, long, float, double, boolean, date
The object data type
The object data type represents how Elasticsearch stores its JSON values, basically every document is an object, and we can have nested objects, let's see an example:
PUT /users
{
"mappings": {
"properties": {
"name": {"type": "text"},
"birthday": {"type": "date"},
"address": {
"properties": {
"country": {"type": "text"},
"zipCode": {"type": "text"}
}
}
}
}
}
This will create an index called users with a text
field, a date
field and an object
field, which represents an address and contains nested text
fields.
If we want to index a document to this, we can make the following request:
POST /users/_doc/1
{
"name": "Lucas",
"birthday": "1990-09-28",
"address": {
"country": "Brazil",
"zipCode": "04896060"
}
}
Since Elasticsearch runs on top of Apache Lucene, we should be aware that these objects are not really stored as JSON inside it. When we index a nested object, like the address field from the last example, Elasticsearch flattens the object and make it like this:
{
"name": "Lucas",
"birthday": "1990-09-28",
"address.country": "Brazil",
"address.zipCode": "04896060"
}
What if we had an array of addresses instead? In this case, Elasticsearch would then store our fields like an array of countries and an array of zipCodes, so this:
POST /articles/_doc
{
"name": "Elasticsearch article",
"reviews": [
{
"name": "Lucas",
"rating": 5
},
{
"name": "Eduardo",
"rating": 3
}
]
}
Will be stored like this:
{
"name": "Elasticsearch article",
"reviews.name": ["Lucas", "Eduardo"],
"reviews.rating": [5, 3]
}
The nested data type
In the previous example, we indexed a document representing an article that contains 2 reviews from 2 different users. Let's try to run the following query to try to get articles reviewed by Eduardo with rating greater than 4:
GET /articles/_search
{
"query": {
"bool": {
"must": [
{"match": { "reviews.name": "Eduardo" }},
{"range": { "reviews.rating": {"gt": 4} }}
]
}
}
}
We'll get the following result:
...
"hits" : [
{
"_index" : "articles",
"_id" : "sboYZIEBFSkh39rxJql6",
"_score" : 1.287682,
"_source" : {
"name" : "Elasticsearch article",
"reviews" : [
{
"name" : "Lucas",
"rating" : 5
},
{
"name" : "Eduardo",
"rating" : 3
}
]
}
}
]
...
This is not exactly what we are looking for, and why is that? Basically, since Elasticsearch flattened our document, it can't query based on these filter because they are not related. But what if we need to have this relationship? That's where the data type nested
comes in.
This data type basically tells Elasticsearch that our nested object has a relationship with its parent. Let's see the same example, but defining the field reviews
as nested
.
PUT /articles_v2
{
"mappings": {
"properties": {
"name": {"type": "text"},
"reviews": {"type": "nested"}
}
}
}
Let's index the same document as before:
POST /articles_v2/_doc
{
"name": "Elasticsearch article",
"reviews": [
{
"name": "Lucas",
"rating": 5
},
{
"name": "Eduardo",
"rating": 3
}
]
}
Let's now create a nested query to search for articles reviewd by Eduardo with rating greater than 4.
GET /articles_nested_v2/_search
{
"query": {
"nested": {
"path": "reviews",
"query": {
"bool": {
"must": [
{ "match": {"reviews.name": "Eduardo"} },
{ "range": {"reviews.rating": {"gt": 4}} }
]
}
}
}
}
}
And the return shows as that we have no articles matching these conditions, which is correct!
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}