es 中文前缀短语匹配（搜索智能补全） prefix查询和completion suggester两种方式-Toy模板网

这篇具有很好参考价值的文章主要介绍了es 中文前缀短语匹配（搜索智能补全） prefix查询和completion suggester两种方式。希望对大家有所帮助。如果存在错误或未考虑完全的地方，请大家不吝赐教，您也可以点击"举报违法"按钮提交疑问。

需求：es进行前缀匹配，用来进行智能补全
方式一：正常索引库类型，字段类型为text
过程：es正常的prefix只能进行词语匹配，而中文的分词大部分按字分词，不按语义分词，所以无法搜索出正确的前缀匹配，而能进行短语匹配的match_phrase_prefix匹配，是正常按前几个词进行匹配，最后一个词进行前缀匹配，也不满足要求。查阅很多资料发现，离正确答案只差一个keyword
代码：

curl -X POST "localhost:9200/information_completion/_search?pretty" -H 'Content-Type:application/json' -d '{
  "_source": ["text"],
  "text": {
    "prefix": {
      "text.keyword": "中云街"
    }
  }
}
'

参考资料

以上方式存在的问题是：无法计算得分，前缀匹配到的所有记录的得分，也就是score，是一致的，那么引入方式二，重新建库。

方式二：使用completion suggester建议器建库
流程：使用以下代码建立索引，建出来字段类型为completion，然后用建议器的方式进行前缀搜索匹配，可以通过建库时自定义权重的方式，使得搜索的结果得分不一致，并且使得自己所需要的热词权重较高。

建库curl代码：

curl -X PUT "localhost:9200/info_completion" -H 'Content-Type: application/json' -d '
{
    "mappings" : {
      "properties" : {
        "id" : {
          "type" : "text"
        },
        "query" : {
          "type" : "completion",
          "analyzer" : "ik_max_word"
        },
        "text" : {
          "type" : "text",
          "analyzer" : "ik_max_word",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
}
'

（需要注意的是，如果字段类型为completion，就无法正常使用term query以及match等方式来查询了，如果两者都需要支持，需要再加一个text字段来保存）。

建库脚本

file_path = "/home/xxx/xxx.txt"
with open(file_path, 'r', encoding="utf-8") as f:
    lines = f.readlines()

i = 0
for line in lines:
    data = line.split()
    d = {"id": data[0], "query": data[1]}
    # print(json_str)
    es.index(index="information_completion", id=i, body=d)
    i += 1
es.indices.refresh(index="information_completion")

print("finish")

查询curl代码

curl -X POST "localhost:9200/info_completion/_search?pretty" -H 'Content-Type:application/json' -d '
{
  "suggest": {
    "info_suggest": {
      "prefix": "农业银行",
      "completion": {
        "field": "query",
        "size": "10"
      }
    }
  }
}
'

查询脚本片段

def get_prefix_res(query, size):
    res = []
    start = 0
    query_body = {
        "suggest": {
            "info_suggest": {
                "prefix": query,
                "completion": {
                    "field": "query",
                    "size": size
                }
            }
        }
    }
    data = es.search(index="information_completion", body=query_body)
    print("get_prefix: ")
    suggest = data['suggest']['info_suggest'][0]["options"]
    for i in range(0, min(10, len(suggest))):
        d = {"score": suggest[i]["_score"], "content": suggest[i]["_source"]["query"]}
        print(d)
        res.append(d)
    if len(suggest) == 0:
        return res
    max_val = suggest[0]["_score"]
    min_val = suggest[len(suggest) - 1]["_score"]
    for item in res:
        item["score"] = cal_val(item.get("score"), max_val, min_val)
    print(res)
    print(max_val, ' ', min_val, ' ', len(suggest))
    return res