How to make elasticsearch add the timestamp field to every document in all indices?
Solution 1:
Elasticsearch used to support automatically adding timestamps to documents being indexed, but deprecated this feature in 2.0.0
From the version 5.5 documentation:
The _timestamp and _ttl fields were deprecated and are now removed. As a replacement for _timestamp, you should populate a regular date field with the current timestamp on application side.
Solution 2:
You can do this by providing it when creating your index.
$curl -XPOST localhost:9200/test -d '{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"_default_":{
"_timestamp" : {
"enabled" : true,
"store" : true
}
}
}
}'
That will then automatically create a _timestamp for all stuff that you put in the index. Then after indexing something when requesting the _timestamp field it will be returned.
Solution 3:
Adding another way to get indexing timestamp. Hope this may help someone.
Ingest pipeline can be used to add timestamp when document is indexed. Here, is a sample example:
PUT _ingest/pipeline/indexed_at
{
"description": "Adds indexed_at timestamp to documents",
"processors": [
{
"set": {
"field": "_source.indexed_at",
"value": "{{_ingest.timestamp}}"
}
}
]
}
Earlier, elastic search was using named-pipelines because of which 'pipeline' param needs to be specified in the elastic search endpoint which is used to write/index documents. (Ref: link) This was bit troublesome as you would need to make changes in endpoints on application side.
With Elastic search version >= 6.5, you can now specify a default pipeline for an index using index.default_pipeline
settings. (Refer link for details)
Here is the to set default pipeline:
PUT ms-test/_settings
{
"index.default_pipeline": "indexed_at"
}
I haven't tried out yet, as didn't upgraded to ES 6.5, but above command should work.
Solution 4:
You can make use of default index pipelines, leverage the script processor, and thus emulate the auto_now_add
functionality you may know from Django and DEFAULT GETDATE()
from SQL.
The process of adding a default yyyy-MM-dd HH:mm:ss
date goes like this:
1. Create the pipeline and specify which indices it'll be allowed to run on:
PUT _ingest/pipeline/auto_now_add
{
"description": "Assigns the current date if not yet present and if the index name is whitelisted",
"processors": [
{
"script": {
"source": """
// skip if not whitelisted
if (![ "myindex",
"logs-index",
"..."
].contains(ctx['_index'])) { return; }
// don't overwrite if present
if (ctx['created_at'] != null) { return; }
ctx['created_at'] = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss").format(new Date());
"""
}
}
]
}
Side note: the ingest processor's Painless script context is documented here.
2. Update the default_pipeline
setting in all of your indices:
PUT _all/_settings
{
"index": {
"default_pipeline": "auto_now_add"
}
}
Side note: you can restrict the target indices using the multi-target syntax:
PUT myindex,logs-2021-*/_settings?allow_no_indices=true
{
"index": {
"default_pipeline": "auto_now_add"
}
}
3. Ingest a document to one of the configured indices:
PUT myindex/_doc/1
{
"abc": "def"
}
4. Verify that the date string has been added:
GET myindex/_search
Solution 5:
An example for ElasticSearch 6.6.2 in Python 3:
from elasticsearch import Elasticsearch
es = Elasticsearch(hosts=["localhost"])
timestamp_pipeline_setting = {
"description": "insert timestamp field for all documents",
"processors": [
{
"set": {
"field": "ingest_timestamp",
"value": "{{_ingest.timestamp}}"
}
}
]
}
es.ingest.put_pipeline("timestamp_pipeline", timestamp_pipeline_setting)
conf = {
"settings": {
"number_of_shards": 2,
"number_of_replicas": 1,
"default_pipeline": "timestamp_pipeline"
},
"mappings": {
"articles":{
"dynamic": "false",
"_source" : {"enabled" : "true" },
"properties": {
"title": {
"type": "text",
},
"content": {
"type": "text",
},
}
}
}
}
response = es.indices.create(
index="articles_index",
body=conf,
ignore=400 # ignore 400 already exists code
)
print ('\nresponse:', response)
doc = {
'title': 'automatically adding a timestamp to documents',
'content': 'prior to version 5 of Elasticsearch, documents had a metadata field called _timestamp. When enabled, this _timestamp was automatically added to every document. It would tell you the exact time a document had been indexed.',
}
res = es.index(index="articles_index", doc_type="articles", id=100001, body=doc)
print(res)
res = es.get(index="articles_index", doc_type="articles", id=100001)
print(res)
About ES 7.x, the example should work after removing the doc_type related parameters as it's not supported any more.