ElasticSearch - 索引增加字段并查询增加字段前的历史数据

这篇具有很好参考价值的文章主要介绍了ElasticSearch - 索引增加字段并查询增加字段前的历史数据。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

1. 问题引入

我们项目中有一个需求：ElasticSearch存在很多历史数据，然后需求中索引新增了一个字段，我们需要根据条件查询出历史数据，但历史数据中这个新增的字段并不存在，如何查询到历史数据呢？

1. 索引2个文档

PUT /user/_doc/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

PUT /user/_doc/2
{
    "first_name" : "zhangsan",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

2. 给索引增加新的字段

PUT /user/_mapping
{
  "properties": {
      "height": {
        "type": "long"
      }
  }
}

3. 再次索引1个文档

这个文档新增了height字段的值

PUT /user/_doc/3
{
    "first_name" : "lisi",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ],
    "height":175
}

4. 查看索引中的文档

GET /user/_search

{
  "took" : 817,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "zhangsan",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "lisi",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ],
          "height" : 175
        }
      }
    ]
  }
}

从上面的结果可以看出，在ElasticSearch中为已有索引增加一个新字段以后，老的数据并不会自动就拥有了这个新字段，也就不可能给他一个默认值。因此前面2条数据都没有 height 这个字段。

在ElasticSearch中，如果一个字段不存在或者这个字段的值为null，在检索的时候该字段会被忽略，因此也就无法做空值搜索。

PUT my_index/my_type/1
{
  "first_name": "zhangsan"
}

PUT my_index/my_type/2
{
  "first_name": "wangwu",
  "height": null
}

例如上面的2个文档，都无法根据 height 这个字段检索。那么我们如何查询到没增加字段之前的历史数据呢？

2. must_not & exist

POST /user/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "exists": {
            "field" : "height" 
          }
        }
      ]
    }
  }
}

{
  "took" : 7,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 0.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "John",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.0,
        "_source" : {
          "first_name" : "zhangsan",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ]
        }
      }
    ]
  }
}

exists 返回在原始字段中至少有一个非空值的文档：

GET /user/_search
{
    "query": {
        "exists" : { "field" : "height" }
    }
}

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "first_name" : "lisi",
          "last_name" : "Smith",
          "age" : 25,
          "about" : "I love to go rock climbing",
          "interests" : [
            "sports",
            "music"
          ],
          "height" : 175
        }
      }
    ]
  }
}

3. 给历史数据赋初值

对现有索引新增字段时并不会影响历史数据，因此我们可以修改历史数据文档，对历史数据设置默认值，然后根据默认值检索。
使用脚本批量更新文档：_update_by_query，如果字段的值为null，则给该字段赋初值为0

POST /user/_update_by_query
{
  "script": {
    "lang": "painless",
    "inline": "if (ctx._source.height== null) {ctx._source.height=0}"
  }
}

再次查看索引的文档：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "about" : "I love to go rock climbing",
          "last_name" : "Smith",
          "interests" : [
            "sports",
            "music"
          ],
          "first_name" : "John",
          "age" : 25,
          "height" : 0
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "about" : "I love to go rock climbing",
          "last_name" : "Smith",
          "interests" : [
            "sports",
            "music"
          ],
          "first_name" : "zhangsan",
          "age" : 25,
          "height" : 0
        }
      },
      {
        "_index" : "user",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "about" : "I love to go rock climbing",
          "last_name" : "Smith",
          "interests" : [
            "sports",
            "music"
          ],
          "first_name" : "lisi",
          "age" : 25,
          "height" : 175
        }
      }
    ]
  }
}