We already saw how Elasticsearch stores text values using its standard analyser. However, what if we want to store entire phrases into our inverted index? For this, we need to use the data type keyword
.
Fields mapped as keywords are analyzed using a different analyzer. Its analyzer is so called no-op analyzer, because it doesn't make any changes to our variable. Let's see an example:
If we analyze the text The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :) using the standard analyzer, we`ll see this result:
POST /_analyze
{
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)"
}
{
"tokens" : [
{
"token" : "the",
"start_offset" : 0,
"end_offset" : 3,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "2",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<NUM>",
"position" : 1
},
{
"token" : "quick",
"start_offset" : 6,
"end_offset" : 11,
"type" : "<ALPHANUM>",
"position" : 2
},
{
"token" : "brown",
"start_offset" : 16,
"end_offset" : 21,
"type" : "<ALPHANUM>",
"position" : 3
},
{
"token" : "foxes",
"start_offset" : 22,
"end_offset" : 27,
"type" : "<ALPHANUM>",
"position" : 4
},
{
"token" : "jumped",
"start_offset" : 28,
"end_offset" : 34,
"type" : "<ALPHANUM>",
"position" : 5
},
{
"token" : "over",
"start_offset" : 35,
"end_offset" : 39,
"type" : "<ALPHANUM>",
"position" : 6
},
{
"token" : "the",
"start_offset" : 40,
"end_offset" : 43,
"type" : "<ALPHANUM>",
"position" : 7
},
{
"token" : "lazy",
"start_offset" : 44,
"end_offset" : 48,
"type" : "<ALPHANUM>",
"position" : 8
},
{
"token" : "dog's",
"start_offset" : 49,
"end_offset" : 54,
"type" : "<ALPHANUM>",
"position" : 9
},
{
"token" : "bone",
"start_offset" : 55,
"end_offset" : 59,
"type" : "<ALPHANUM>",
"position" : 10
}
]
}
The text was completely modified by Elasticsearch's standard analyzer and the result of this operation was stored in the inverted index with each term. Now, let's modify our request by specifying that we want to use the keyword
analyzer.
POST /_analyze
{
"text": "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)",
"analyzer": "keyword"
}
{
"tokens" : [
{
"token" : "The 2 QUICK Brown-Foxes jumped over the lazy dog's bone. :)",
"start_offset" : 0,
"end_offset" : 63,
"type" : "word",
"position" : 0
}
]
}
Now, the generated token was the text itself.
What are the effects of storing a field as a keyword
? If we store it as a text
, we are able to use Elasticsearch's textual queries, whereas if we use keyword
, it is not possible, since the term which is stored is exactly the String that we used on the request.
When using a keyword
, the field can be also used for aggregations, which we'll see in the next articles.