本文参考自:https://blog.csdn.net/Q54665642ljf/article/details/127701719
本文适用于elasticsearch
入门小白,还请大佬能指出我的不足(本人其实也是刚学elasticsearch
没多久)
一、准备工作
1.1 安装ES文本抽取插件
(1)为什么要有文本抽取插件?
对于word
、pdf
等文档类型的文件而言,它们文件底层的内容除了纯文本之外,还会有很多杂乱的信息(比如在一个word
文件中,除了文本内容,还包含了页面设置、字体大小、颜色等无关信息)
为了剔除文档中与文本无关的信息,所以才需要使用文本抽取插件。
(2)如何安装文本抽取插件?
在 elasticsearch
的bin
目录下,使用elasticsearch-plugin
来安装文本抽取插件ingest-attachment
。
# windows下命令(进到bin目录):
elasticsearch-plugin install ingest-attachment
# Linux下命令(进到bin目录):
./elasticsearch-plugin install ingest-attachment
为了方便后续检索文本,需要安装一个IK分词器插件(官方下载地址:https://github.com/medcl/elasticsearch-analysis-ik
官方里面也有说明如何进行下载。选择一个和你elasticsearch
版本相同的版本进行下载即可,比如执行以下命令:
# windows下(进到bin目录):
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-7.6.2.zip
# Linux下(进到bin目录)
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-7.6.2.zip
命令执行完毕,在plugins
目录下可以看到相关插件已安装。
1.2 定义文本抽取管道(pipeline)
(1)什么是管道(pipeline)?
pipeline
也叫做“预处理管道”,它主要的作用是可以在存储内容时,对字段进行加工。
比如,有一串奇怪的字符串&*he@¥#ll%&o……¥%
,在不进行加工的情况下,我直接丢给用户看,那么用户看到的就是&*he@¥#ll%&o……¥%
,无法看见其中的关键信息。
但是,假如我有这么一个字符串加工机器,我把&*he@¥#ll%&o……¥%
丢进去,结果出来的是hello
,这时候的数据才是用户真正想要的。
pipeline
就相当于这里的“加工机器”,它起到的是一个加工数据的作用。
(2)定义文本抽取管道
我们需要在kibana
控制台中,创建一个名为"attachment"
的预定义管道。
Kibana 是一款免费且开放的前端应用程序, 可以为 Elasticsearch 中索引的数据提供搜索和数据可视化功能。
(此处不提供 Kibana 安装教程)
在"attachment"
中指定要过滤的字段为content
,所以写⼊elasticsearch
时需要将⽂档内容放在content
字段。
PUT /_ingest/pipeline/attachment
{
"description": "提取附件信息",
"processors": [{
"attachment": {
"field": "content",
"ignore_missing": true
}
},
{
"remove": {
"field": "content"
}
}
]
}
注意!!!
定义好管道之后,我们只需把文档文件转化为Base64
格式,并把它丢到content
字段上,文本抽取管道会自动帮我们把文件内容进行加工,把经过IK分词器分词后的纯文本结果存储到content
字段上!
接下来,我们可以开始创建索引,并在索引中定义这个content
字段了。
1.3 创建索引
(1)创建的索引结构
-
id
:标识唯一记录 -
userId
:文件所属用户id,根据需求添加。 -
docId
:文件id,根据需求添加。 -
docName
:文件名称,使用了ik_max_word
中文分词器(把中⽂尽可能的拆分) -
docType
:文件类型,根据需求添加。 -
content
:关键!! 用于存储文件的base64
内容,使用了ik_smart
中文分词器(按常⽤习惯划分)
PUT /docwrite
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"userId": {
"type": "keyword"
},
"docId":{
"type": "keyword"
},
"docName": {
"type": "text",
"analyzer": "ik_max_word"
},
"docType": {
"type": "keyword"
},
"attachment": {
"properties": {
"content": {
"type": "text",
"analyzer": "ik_smart"
}
}
}
}
}
}
二、在 Kibana 中测试添加文档
2.1 先把文件转为Base64形式
找一个Base64在线转换网站,把某个文档文件转换成base64
字符串。
或者可以直接用我下面给出的Base64
内容。
这是我自己创建的一个word
文档,里面的内容是从elasticsearch
官网里抄来的。把这个文档转化为Base64
后的结果是:
UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC0lMtuwjAQRfeV+g+Rt1Vi6KKqKgKLPpYtUukHGHsCVv2Sx7z+vhMCUVUBkQpsIiUz994zVsaD0dqabAkRtXcl6xc9loGTXmk3K9nX5C1/ZBkm4ZQw3kHJNoBsNLy9GUw2ATAjtcOSzVMKT5yjnIMVWPgAjiqVj1Ykeo0zHoT8FjPg973eA5feJXApT7UHGw5eoBILk7LXNX1uSCIYZNlz01hnlUyEYLQUiep86dSflHyXUJBy24NzHfCOGhg/mFBXjgfsdB90NFEryMYipndhqYuvfFRcebmwpCxO2xzg9FWlJbT62i1ELwGRztyaoq1Yod2e/ygHpo0BvDxF49sdDymR4BoAO+dOhBVMP69G8cu8E6Si3ImYGrg8RmvdCZFoA6F59s/m2NqciqTOcfQBaaPjP8ber2ytzmngADHp039dm0jWZ88H9W2gQB3I5tv7bfgDAAD//wMAUEsDBBQABgAIAAAAIQAekRq37wAAAE4CAAALAAgCX3JlbHMvLnJlbHMgogQCKKAAAgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAArJLBasMwDEDvg/2D0b1R2sEYo04vY9DbGNkHCFtJTBPb2GrX/v082NgCXelhR8vS05PQenOcRnXglF3wGpZVDYq9Cdb5XsNb+7x4AJWFvKUxeNZw4gyb5vZm/cojSSnKg4tZFYrPGgaR+IiYzcAT5SpE9uWnC2kiKc/UYySzo55xVdf3mH4zoJkx1dZqSFt7B6o9Rb6GHbrOGX4KZj+xlzMtkI/C3rJdxFTqk7gyjWop9SwabDAvJZyRYqwKGvC80ep6o7+nxYmFLAmhCYkv+3xmXBJa/ueK5hk/Nu8hWbRf4W8bnF1B8wEAAP//AwBQSwMEFAAGAAgAAAAhAPd6kjMuCQAAGyQAABEAAAB3b3JkL2RvY3VtZW50LnhtbORaWVPbWBZ+n6r5Dy6/J7Zsea2GLvDSlanuHnroeaaELGxN25JLEhDmyQnYYQmYniYJDU6gISzTHZZMFoyX8F/SupJ5yl+Yc3XlJTakZafJMNUpyouk891zz/nOd8+9zmef304lbROcJPOi0GenbjrtNk5gxRgvxPvsf/82esNvt8kKI8SYpChwffYpTrZ/3v/nP302GYyJ7HiKExQbQAhycDLN9tkTipIOOhwym+BSjHwzxbOSKItjyk1WTDnEsTGe5RyTohRzuJyU0/iUlkSWk2UYL8QIE4xsN+HY29bQYhIzCcYYkHawCUZSuNtNDKprEI8j4PB3Arl6AIIZuqhOKHfXUF4H9qoDiO4JCLzqQPL0hnTB5Ly9Ibk6kXy9Ibk7kfy9IXXQKdVJcDHNCXBzTJRSjAJfpbgjxUjfjadvAHCaUfhRPskrU4Dp9NZhGF74rgePwKqBkHLHukbwOVJijEu6Y3UUsc8+LglB0/5Gwx67HiT25lvdQrIyf2ISNsXBmLlD4pIQC1GQE3y6UeGpXtHgZqIOMvGhSUykkvXnJtOUxXK5TJ7CJJRNQCvum/FPJYnnH0aknBYygiEaFlZceH/MuicpYGFz4J5C0xJcyqKA1AFcHQBelrMo+HUMv4nhYJsVinF4i6VRxyFZwTh8M7CURR1rd6YFQI4psURXKK56XB3YllGYBCM3iI4Rue6c8jTgplItMUrHP64QvpDE8XQTjf84tFtNWZvEDUYXWGZBtRa5/HHODCeYNKhdig3eiguixIwmwSMoDxsw3GZkAL8CUfCb8ZG7bVzHubZhjbH3Q2c0Ksam8Hsa7tHBNCMxt4CUbift9oZ90FDhq7CuKPiqz/wHV4PQhcX+1md3OiM+bzg02Lg0JF1wMcyNMeNJpfPOUMslw4shCb9J5G3UYbyGZHh3mFcdzYcuHLLFXIqKgiLDUwlegLE5RlYGZJ6xE9QLsCeDSr+2XNBfbr3NrMCfVsigjZJWeKavzaCTHZQ9wY8qxIC40hk6lyc06PEN9BA6y1FqjdEHJtk6LX3pGBX231Xua7Pfq+Vn6OQ/ejmPDnbVyppayqHdu2r1gVo901f235ui+WL0zUE5zbDAsbTEyZw0wdn7bZEkDMezMsdIbMKmVgtqsaQWM2rxZ/0wgxYfQNxsX4vD33xpO99e0jYrOI6zOVScRpW89vCe9tM2OlhFd/f1F2W9vPFr5q7pXf5ILe+g4xzBsv1l+K9f28jz2twKmj1uG/j83n2YmT7zGi3DKKfa4qHtVtimbWcgkwCqVnNtBnphQzvYRpUSerqnni6gpzMQHLW0RAZWy0vao83a3gx6swAeaw+OAVEtHmhzb9A9oMOcNpNXT+fglnr2k3bniJAFrN5mCgBInn+beQyOkiv1xx6DMxYY5KQjkQDloa6IQQbx3x+RjtIBn5emr5qzSn8bY4qLKP8zcHCE8OBdZVZWRODXu8rciJlxyCxmVSlXm67Wtn4h9QmhVU83MHmLJZMYJ2U0vzmC71UeAA4vxLjbJhBA1M5+BANt9QhjlTPq6SyhGuG8Xv5BezKD7j/EhG1hGzbMwPCLYAiUII/hvJcAcB3Nvia5tpZWj5MeCEdczk+XVq/b7fc6XVee1u6kiIQW6gnX1lEeChwdPNIOX51PV6HkIcUkh+j0tVpcIRWJ876xUzva6kaccOUXF9VyFtRNO3gKuYQiJ7pEwLRHr+Hi+4w0FcBghU2by0C1A5mYZBKIZMMkKq2b7hk+Y3KWd7TpebVUgumcZzZQabd2toLWnxhkKYATYIV5V93Ujou1swL2qbJWy0zXfvweBlErQKsNwiNU+gEtldGdNUwxkL7Sulpa0KazVmUjFKZdbu+n45eH8lFhl+uqhKpXfhkKAVm7kCz9JNE4neYyc2cN1ORdZc1CiClvwOeiI1dVUBeE2B31RqLhwWsWYlxIz8tqpQJrZjclGedAlSWbNvuwlsnamDiHS0rbPMF9gVFPanG+TZFJqUKdtY0DjbA4FpEk8FWZSsMwcppLJocVRlJIHE1X/yHywghsUDhL9hEh1mp9yTQuc1pb+hcq5bHQHD9H03noe+CZtplAXYOttYp2OZ0+j98zcM2Sn93HEm50V7ihJF2RIZXo/ul5dhFlX5yvHEJssMCTLu/0tXZ/Ts/uXVySI/p6EYy0zB5UrcQluQlGYDnSAxghhb6uTVStrLd02ENTvuj1ih5eXYAv0EBksihbgfYC4gY9qLU5UbQ74vdEXdeMEW9m0dM1oruwisNqhyoZ/fA5WnyBS3ltptHDoWwWNgAaFMPSajfC0da+A6+qBTS/jnGrm+eFTG33jr6cq+1tG+FdUItz0FZou3fR2Stryu4OD3i8LiOEn0jZnaHBKO2P9kDP//st9uomOp65YIH+rdEuA/yKSad5IQ6CYUVUKWiJA55rtj/X8vna2bHZBhb2W/tilMMdLOzPyR7JwhSdFBUJhag/ILWwuDxZ/h2pNSAwySmZx/tIK/Ls8Qe8/tBVbfF65Ja5YIMqExo9nQGthI4Fn3w0tleGeIOwWpilx+2nPWH8I9EfjV7nWzm0saHPnaLDNdLO1Y4OavuZ35Fw34xz0pQtPPylRcb5olRg8H9xcnNAjs/0O6+0l2W8IrfEw4oMD4QjtM8ZuF6lQrozLb8MW8LG8YR+95R0NeQ4EiZdO3p8vrXaPE09zoFmtzPAQLykn9FOnqPsDik6iGVG/2WzdrSrl3Pnq8uwDqDDaYDF56GFZ/jWfP1g9Dinlhf08iI+iSjOgWONkwJyUE1anvqBZBYffhQP0fy/oUvCbt97ic/R9rbJwE1FONlRqwWAsrgtCXl80UEqfEWpu6BX8gz4Q75wpG0XTHvpAB11Yu63jOj0RLy4P/4Ea3YjJ2T3V8g0Tr1RdQtV8tAFo70FkrFu2GHW2MDQrcsSInOsMtTQmJZ5G7GLD/8Tbk2COlABQ6aDCXx24Xf7zeDGv2KwsSKm8ZaCxutVUOLjCZgs5XcaX0dFRRFTzdtJbqzlboJjYCPfZ/cbp9TBMVFU8NdAwIh8fFwxvjrJcKyYxME050m7PORyTGS/kPDPU+YGPMkLnGwMBR+GeIUFp92Uy4w6mbHxkfxk5Wj+r57+/wIAAP//AwBQSwMEFAAGAAgAAAAhANZks1H0AAAAMQMAABwACAF3b3JkL19yZWxzL2RvY3VtZW50LnhtbC5yZWxzIKIEASigAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAArJLLasMwEEX3hf6DmH0tO31QQuRsSiHb1v0ARR4/qCwJzfThv69ISevQYLrwcq6Yc8+ANtvPwYp3jNR7p6DIchDojK971yp4qR6v7kEQa1dr6x0qGJFgW15ebJ7Qak5L1PWBRKI4UtAxh7WUZDocNGU+oEsvjY+D5jTGVgZtXnWLcpXndzJOGVCeMMWuVhB39TWIagz4H7Zvmt7ggzdvAzo+UyE/cP+MzOk4SlgdW2QFkzBLRJDnRVZLitAfi2Myp1AsqsCjxanAYZ6rv12yntMu/rYfxu+wmHO4WdKh8Y4rvbcTj5/oKCFPPnr5BQAA//8DAFBLAwQUAAYACAAAACEAOgXMGeEGAADOIAAAFQAAAHdvcmQvdGhlbWUvdGhlbWUxLnhtbOxZW2sbRxR+L/Q/LPuu6Lari4kcpJUUN7ETEyspeRyvRrtjze6ImZEdEQIheSqFQiEteWig9KUPpTTQQEP70P9Sl4Q0/RGdmZW0O9IsTmIbQrFtrLl858w355w5c7R7+cq9CFuHkDJE4pZdvlSyLRj7ZIjioGXfHvQLDdtiHMRDgEkMW/YMMvvK5qefXAYbPIQRtIR8zDZAyw45n2wUi8wXw4BdIhMYi7kRoRHgokuD4pCCI6E3wsVKqVQrRgDFthWDSKi9ORohH1p/v/zjzQ9P/3r4pfizNxdr9LD4F3MmB3xM9+QKUBNU2OG4LD/YjHmYWocAt2yx3JAcDeA9blsYMC4mWnZJ/djFzcvFpRDmObIZub76mcvNBYbjipKjwf5S0HFcp9Ze6lcAzNdxvXqv1qst9SkA8H2x04SLrrNe8Zw5NgNKmgbd3Xq3WtbwGf3VNXzblb8aXoGSprOG7/e91IYZUNJ01/Bup9np6voVKGnW1vD1Urvr1DW8AoUYxeM1dMmtVb3FbpeQEcFbRnjTdfr1yhyeooqZ6ErkY54XaxE4ILQvAMq5gKPY4rMJHAFf4F7//MXr3/+0tlEQiribgJgwMVqqlPqlqvgvfx3VUg4FGxBkhJMhn60NSToW8yma8JZ9TWi1M5BXL18eP3px/Oi348ePjx/9Ml97XW4LxEFW7u2PX//77KH1z6/fv33yjRnPsnhta0Y412h9+/z1i+evnn715qcnBnibgv0sfIAiyKwb8Mi6RSKxQcMCcJ++n8QgBCgr0Y4DBmIgZQzoHg819I0ZwMCA60DdjneoyBYm4NXpgUZ4L6RTjgzA62GkAXcIwR1CjXu6LtfKWmEaB+bF6TSLuwXAoWltb8XLvelEhD0yqfRCqNHcxcLlIIAx5JacI2MIDWJ3EdLsuoN8ShgZcesusjoAGU0yQPtaNKVCWygSfpmZCAp/a7bZuWN1CDap78JDHSnOBsAmlRBrZrwKphxERsYgwlnkNuChieTejPqawRkXng4gJlZvCBkzydykM43udSDSltHtO3gW6UjK0diE3AaEZJFdMvZCEE2MnFEcZrGfsbEIUWDtEm4kQfQTIvvCDyDOdfcdBDV3n3y2b4s0ZA4QOTOlpiMBiX4eZ3gEoEl5m0Zaim1TZIyOzjTQQnsbQgyOwBBC6/ZnJjyZaDZPSV8LRVbZgibbXAN6rMp+DBm0VG1jcCxiWsjuwYDk8NmZrSSeGYgjQPM03xjrIdPbp+IwmuIV+2MtlSIqD62ZxE0WafvL1bobAi2sZJ+Z43VGNf+9yxkTMgcfIAPfW0Yk9ne2zQBgbYE0YAYAWdumdCtENPenIvI4KbGpUW6kH9rUDcWVmidC8UkF0Erp455f6SMKjFffPTNgz6bcMQNPU+jk5ZLV8iYPt1rUeIQO0cdf03TBNN6F4hoxQC9KmouS5n9f0uSd54tC5qKQuShkzCLnUMiktYt6ALR4zKO0RLnPfEYI4z0+w3CbqaqHibM/7ItB1VFCy0dMk1A058tpuIAC1bYo4Z8jHu6FYCKWKasVAjZXHTBrQpgonNSwUbecwNNohwyT0XJ58VRTCACejovCazEuqjSejNbq6eO7pXrVC9Rj1gUBKfs+JDKL6SSqBhL1xeAJJNTOzoRF08CiIdXnslAfc6+Iy8kC8rm46ySMRLiJkB5KPyXyC++euafzjKlvu2LYXlNyPRtPayQy4aaTyIRhKC6P1eEz9nUzdalGT5pinUa9cR6+lklkJTfgWO9ZR+LMVV2hxgeTlj0S35hEM5oIfUxmKoCDuGX7fG7oD8ksE8p4F7AwgampZP8R4pBaGEUi1rNuwHHKrVypyz1+pOSapY/Pcuoj62Q4GkGf54ykXTGXKDHOnhIsO2QqSO+FwyNrH0/pLSAM5dbL0oBDxPjSmkNEM8GdWnElXc2Pova2JT2iAE9CML9Rssk8gav2kk5mH4rp6q70/nwz+4F00qlv3ZOF5EQmaeZcIPLWNOeP87vkM6zSvK+xSlL3aq5rLnJd3i1x+gshQy1dTKMmGRuopaM6tTMsCDLLLUMz744469tgNWrlBbGoK1Vv7bU22T8Qkd8V1eoUc6aoim8tFHiLF5JJJlCji+xyj1tTilr2/ZLbdryK6xVKDbdXcKpOqdBw29VC23Wr5Z5bLnU7lQfCKDyMym6ydl982cez+ct7Nb72Aj9alNqXfBIViaqDi0pYvcAvV0wv8Ady3raQsMz9WqXfrDY7tUKz2u4XnG6nUWh6tU6hW/Pq3X7XcxvN/gPbOlRgp131nFqvUaiVPa/g1EqSfqNZqDuVStuptxs9p/1gbmux88XnwryK1+Z/AAAA//8DAFBLAwQUAAYACAAAACEAMW9THM4EAABVDQAAEQAAAHdvcmQvc2V0dGluZ3MueG1stFfbUttIEH3fqv0Hl57X+IJlwBUnhW0cSEGSQrB5Hklta5a5qGZGNmZr/317bpYTNinIVl5g1Kf7dKunL/Kbd4+cdTagNJVimgyO+kkHRCFLKtbT5P5u2T1NOtoQURImBUyTHejk3dvff3uznWgwBtV0BymEnvBimlTG1JNeTxcVcKKPZA0CwZVUnBh8VOseJ+qhqbuF5DUxNKeMml1v2O+Pk0Ajp0mjxCRQdDktlNRyZazJRK5WtIDwL1qol/j1JgtZNByEcR57ChjGIIWuaK0jG/9ZNgSrSLL50UtsOIt620H/Ba+7larcW7wkPGtQK1mA1nhBnMUAqWgdj54R7X0foe/wio4KzQd9dzqMPH0dwfAZwbiAx9dxnAaOHloe8tDydTzjPQ9tEzsY/1wwBwS6NGX1KpZhzGvP2hJDKqL3VWQZ4XVBpXu6HW9zpNlLqsZD1zRXRPmeDCXDi8nVWkhFcobhYOl08PY7Ljr7F5No/7kjPDq5zUPyFmfEk5S8s53UoApsFBww/X7Ss0COYeLUWciP0mSNUrIR5SUQlH0XXkppAozFLVeZIQYDmOgaGHPzqmBAMN7tZK0Ix0kTJc6mhBVpmLkjeWZkjUobgmkZDUNApSJbJHmvaPknKEMLwrKaFCiKqoN0HFSprhnZXUpFn6QwhC1a2wucpbtoEam9fqT9nvbQaxcVUaTANw3u5+hCSRa17ORU2NifG1GYxs2vYOdGqj1pNISlVPfXPpeEEVFAhlwMZjuDc6vJ/ekLLU3lY7SZvgaygRkpHjTDMjy3E9+BDbtThLp8eIHTvniscS9kFV2ZWzA4xRxEyr8aba6pgEug68pciTtbN55Hw/LimuxkYw5CzvwewRcUhIN/w/1uuJEl2BttFH15J1iDcGWHufnWkcTs4yWACzAzO4ZJEyajT3Auyg/4FhQZfYZ/PoIfBQDCev6ErXi3q2EJBLOI+/XXOHN3tmS0vqHYT+pKlNiRv8wZXa1AoQOKPXqDbUeV3Lo8+yb/VX6xwr6gMk7CYyzZ4mEmjZH8cldXmOv/d5OumXuHfYYfTaWOh1ucTnvV/jw9OT9Z+kgt+hLkop9ejMMQ+AY5GS/ms+A/eOUT+8HxWcWTLd0O9xZzwnNFSefGfpL0rEauHmZURDwHnPtwiGRNHsFu1wOaE8aWmMQIuARwN80WsHJndkPUuuUNGuo/pTh/P+y57DYA9R7neu3RrSK1L8moMhiNgiUVOE94lOsmz6KVwE11AOGS+LRRLk9terYTg1fsWvuatNMcRPc+s5cLRJtzTck0eaq684+hupjKbGXADalrX2D5ejBNmB1pA2tm8KnEj1n3kK+HARs6bOgx90AK+7KoHQ6tbBhlB3rHUXbcykZRNmplaZSlrWwcZWMrq3CkKNyLD1jr8WjlK8mY3EJ52eLPRD4JuiI1LPzaxIqTXhD2qO5sJvCIKx1KavA3Qk1LTh7thh+6NRm0mZv2X+lazCrXXzPYr5/Q3b2vjF3VfxOLXecFxQrNdjxvl+ORD5xRjZOhxj1qpIrYHw4bpG7BGjcd8GJvYTUjGsqAlbK4sl8zqbf5ezRPF7P0vN9dLEdpd3RyPOyezk/H3f5penp2Njs7mR0v/wmNGX8Pvf0XAAD//wMAUEsDBBQABgAIAAAAIQDwgl4meQsAAARyAAAPAAAAd29yZC9zdHlsZXMueG1svJ1Nc9s4EobvW7X/gaXT7sGRv52kxplynGTt2jjjGTmbM0RCFsYgoQWp2J5fvwBISZCboNhgry+JRbEfgnjxNtAkJf3y61Muk59cl0IV56ODN/ujhBepykRxfz76fvdl7+0oKStWZEyqgp+Pnnk5+vXD3//2y+P7snqWvEwMoCjf5+n5aF5Vi/fjcZnOec7KN2rBC/PmTOmcVealvh/nTD8sF3upyhesElMhRfU8PtzfPx01GN2HomYzkfJPKl3mvKhc/FhzaYiqKOdiUa5oj31oj0pnC61SXpbmpHNZ83ImijXm4BiAcpFqVapZ9cacTNMihzLhB/vur1xuACc4wCEAnKb8Ccd42zDGJtLniAzHOV1zROZx4hrjAcqsyuYoyuGqX8c2llVszsq5T+S4Rp2scc+57aM8fX99XyjNptKQjOqJES5xYPuvOX/7n/uTP7nt9hRGH4wXMpV+4jO2lFVpX+pb3bxsXrn/vqiiKpPH96xMhbgzDTRHyYU54NVFUYqReYezsrooBfPf/Nxss+/P7Y6tkWlZeZs/ikyMxvagD1wX5u2fTJ6PDutN5V/rDQerLZe2XfW2Zi/JivvVNl7sfZ/47Tsf/TXfu/xmN03Noc5HTO9NLmzguDnd+n+vExbrV/VeL3rMuNd4eVKnFPMun31V6QPPJpV543y0bw9lNn6/vtVCaZM2zkfv3jUbJzwXVyLLeOHtWMxFxn/MefG95Nlm++9fnPWbDalaFubvo7NTp6Iss89PKV/YRGLeLZjt0G82QNq9l2JzcBf+3xWs6cfW+DlnNpsmBy8RrvkoxKGNKL2zbWcuX5y72wt1oKPXOtDxax3o5LUOdPpaBzp7rQO9fa0DOcz/80CiyEzidvvDwwDqLk7AjWhOwGxoTsBLaE7AKmhOwAloTmCgozmBcYzmBIYpglOpNDQKvcF+FBjt3dzdc0Qcd/eUEMfdPQPEcXcn/Dju7vwex92dzuO4u7N3HHd3ssZz66VWcm1sVlSDXTZTqipUxZOKPw2nscKwXIlJw7OTHtckJ0mAqTNbMxEPpqXMvd49QpxJ4+fzylZqiZolM3G/1Lwc3HBe/ORSLXjCsszwCIGaV0sd6JGYMa35jGtepJxyYNNBpSh4UizzKcHYXLB7MhYvMuLuWxFJksJ6QLNlNbcmEQSDOmepVsObphhZfvgqyuF9ZSHJx6WUnIj1jWaIOdbw2sBhhpcGDjO8MnCY4YWBpxlVFzU0op5qaEQd1tCI+q0en1T91tCI+q2hEfVbQxveb3eiki7F+6uOg/7X7i6lsjcFBrdjIu4LZhYAw6eb5pppcss0u9dsMU/sVeV2rH/O2ON8VNlzckcxp61JVOt6N0QuzVmLYjm8Q7doVOZa84jsteYRGWzNG26xG7NMtgu0K5p6ZrKcVq2mdaRepp0wuawXtMPdxqrhI2xjgC9Cl2Q2aMcSjOBvdjlr5aTIfJtWDm/YhjXcVi+zEmnzGiRBK6VKH2jS8NXzgmtTlj0MJn1RUqpHntERJ5VW9VjzLX/oJOll+c/5Ys5K4WqlLUT/qX71OEFywxaDT+hWMlHQ6PZ5L2dCJnQriKu7m6/JnVrYMtN2DA3wo6oqlZMxmyuB//jBp/+kaeCFKYKLZ6KzvSC6PORgl4JgkqlJKiMimWWmKATJHOp4/+bPU8V0RkO71bx+gqfiRMQJyxf1ooPAWyYvPpr8Q7Aacrz/MC3sdaHBNO9KX7mc/snT4dnpm0pILub8tqzcJUO3OnXRdLjhM/sWbvisfueu8k2EHXIEJ7uFG36yWziqk72UrCxF8K5nNI/qdFc86vMdXq81PCWVni0lXQeugGQ9uAKSdaGSy7woKc/Y8QhP2PGoz5dwyDgewVU0x/uXFhmZGA5GpYSDUcngYFQaOBipAMMfqvFgw5+s8WDDH6+pYURLAA9GNc5Ip3+iGzMejGqcORjVOHMwqnHmYFTj7OhTwmczswimm2I8JNWY85B0E01R8XyhNNPPRMjPkt8zgmuaNe1Wq5n9NIYq6ueuCZD2srIkXGzXOCqRf/ApWdMsi+BaJpNSKaJLWJtJwkVuPyIWDruVLOVzJTOuA+0Ix5q6dLJgaXMFG9wJ63VF8Ku4n1fJZL6+EO5jTvd3Rq4K462w3Qds66fT1Ycy2sJueCaW+aqh8HMGp0f9g93I2Qo+3h28mbG3Ik96RsJjnu6O3KxGtyLPekbCY77tGemy8FZk1xj+xPRD60A46xo/61oqMPjOukbROrj1sF0DaR3ZNgTPukbRllWSizS1F9KhOv08E47vZ55wPMZFYQrGTmFKb1+FEV0G+4P/FHYGxSRNd7z1gwUgV7vFaq/M+ftS1Ze0t+7F9P+807VZoBQlT1o5R/3v6WxlmXA/9k43YUTvvBNG9E5AYUSvTBQMR6WkMKV3bgojeiepMAKdreCMgMtWMB6XrWB8TLaClJhsNWAVEEb0Xg6EEWijQgTaqANWCmEEyqggPMqokII2KkSgjQoRaKPCBRjOqDAeZ1QYH2NUSIkxKqSgjQoRaKNCBNqoEIE2KkSgjRq5tg+GRxkVUtBGhQi0USECbVS3XhxgVBiPMyqMjzEqpMQYFVLQRoUItFEhAm1UiEAbFSLQRoUIlFFBeJRRIQVtVIhAGxUi0EatP4UXb1QYjzMqjI8xKqTEGBVS0EaFCLRRIQJtVIhAGxUi0EaFCJRRQXiUUSEFbVSIQBsVItBGdTflBhgVxuOMCuNjjAopMUaFFLRRIQJtVIhAGxUi0EaFCLRRIQJlVBAeZVRIQRsVItBGhYiu8dncCgw9gX6Av+oZfJi9/62rplF/+J9y9lFH/VGrVoVZ/R/T/6jUQ9L6mbwjV2/0g4ipFMpdog7cvva57tED1M3K3y67P/zi0wd+H1HzMQF3exTAj/tGgmsqx11D3o8ERd5x10j3I8Gq87gr+/qRYBo87kq6zperhz/MdASCu9KMF3wQCO/K1l447OKuHO0Fwh7uysxeIOzgrnzsBZ4kNjm/jD7p2U+n6+c4AaFrOHqEszCha1hCrVbpGBqjr2hhQl/1woS+MoYJKD2DGLywYRRa4TAqTmpoM6zU8UYNE7BSQ0KU1AATLzVERUsNUXFSw8SIlRoSsFLHJ+cwIUpqgImXGqKipYaoOKnhVIaVGhKwUkMCVuqBE3IQEy81REVLDVFxUsPFHVZqSMBKDQlYqSEhSmqAiZcaoqKlhqg4qUGVjJYaErBSQwJWakiIkhpg4qWGqGipIapLancVZUtqlMJeOG4R5gXiJmQvEJecvcCIasmLjqyWPEJktQS1WmmOq5Z80cKEvuqFCX1lDBNQegYxeGHDKLTCYVSc1LhqqU3qeKOGCVipcdVSUGpctdQpNa5a6pQaVy2FpcZVS21S46qlNqnjk3OYECU1rlrqlBpXLXVKjauWwlLjqqU2qXHVUpvUuGqpTeqBE3IQEy81rlrqlBpXLYWlxlVLbVLjqqU2qXHVUpvUuGopKDWuWuqUGlctdUqNq5bCUuOqpTapcdVSm9S4aqlNaly1FJQaVy11So2rljqlxlVLNyZEEHw70iRnukrovkrtipXzig3/3r7vhealkj95ltCe6lfUWY4ft34ZyrLdz86Z/SvTZ/bLwb2PK2X1l6M2QLfjtSEx9+NOthFJ84NWzW86ubY2d2rd34v6l7oeRaYe7cektZKrkGaI/pmuNkxVNW+a6MLGzRFhG9O5aWTafB9UqI37oJGBr3p1zdiM09XeTc9vurXeb6tT69YGWllZX3S18CDQjbWjQu1616SIXQ0zzZjKuvvNH9dFZgCPze9s1Q3MnliNMu9fcilvWL23WoR3lXxW1e8e7LsvDnjx/rT+2rpgvHZJPAgYbzemftk9GOovsm+eLgh19WFLV7vHXIb28qZdq7/KD/8DAAD//wMAUEsDBBQABgAIAAAAIQACD8w67wEAAEcIAAAUAAAAd29yZC93ZWJTZXR0aW5ncy54bWzsld9umzAUxu8n7R2Q7xsgCzRBTSplVadJ0zR13QMY2wRrtg+ynZD06Wc7kNJlF6FSd9UbfPjs78f5I4ub270U0Y5pw0EtUTpJUMQUAcrVZol+Pd5fzVFkLFYUC1BsiQ7MoNvVxw83bdGy8iez1p00kaMoU0iyRLW1TRHHhtRMYjOBhim3WYGW2LpXvYkl1r+3zRUB2WDLSy64PcTTJMlRh9GXUKCqOGF3QLaSKRv8sWbCEUGZmjemp7WX0FrQtNFAmDGuHimOPIm5OmHS2RlIcqLBQGUnrpguo4By9jQJkRTPgGwcYHoGyAnbj2PMO0bsnEMOp+M4+YnD6YDzumQGAEMtrUdRpn1fY+/FFtfY1EMiG5dUdsIdpO+RJMXXjQKNS+FIbuqRG1wUwP7p6vdLCNk+6L4EtHIXgvKd6daoLXyL08UsT9NZli3CgRLo4S5s7rBwuyj2qrsP31hlezU5qQ98U/9DfoTmXFyDtSD/0l0ia6p9ZJ89yt1j5F7Mkz/ngwYT1sUEBLjrh7cWjggxyGycs3yR0TivHlY+xhoPi/bz+FxzQV8OZTrPsiS/nl2Hmbx3/z93P/uULZIkS967/3bdP4b92o/hItWnAI3lkj+xe9BrDa1hOmSGhYD2x/cvx28Nfv+rPwAAAP//AwBQSwMEFAAGAAgAAAAhAKMhTSYeAgAAkwYAABIAAAB3b3JkL2ZvbnRUYWJsZS54bWzck99umzAUxu8n7R2Q7xsMSSiLSqo2baRJUy+mTtqtYwxYwzaynX+PsIfZC+xmj9PX6LGBNGpaLexyIMB8x+d3fD7M1fVO1MGGacOVzFA0wihgkqqcyzJD3x6XFykKjCUyJ7WSLEN7ZtD1/OOHq+2sUNKaAPKlmQmaocraZhaGhlZMEDNSDZMQLJQWxMKrLkNB9I91c0GVaIjlK15zuw9jjBPUYfQ5FFUUnLI7RdeCSevzQ81qICppKt6YnrY9h7ZVOm+0oswY6FnULU8QLg+YaHICEpxqZVRhR9BMtyKPgvQI+5GoXwDTYYD4BJBQthvGSDtGCJnHHJ4P4yQHDs+POP+2mCOAyW1eDaLEva+hyyWWVMRUx0Q2bFHTA24vnEeCzj6XUmmyqoEEXz2ADxd4sLtD/+7hh2znddcCmne/QrCdSSIg8+nXz6fff7xOavsAGoQ2pM7QHZPld04kCl2wIVIZFvVB7HZOgjEew7M724m0ItowV8BPTJNWLojg9b5XydqqjsstrXp5QzR3DbUhw0sIrM0KZ+gGSuH4dolaJcrQOF0sLxfLm06JYU3+iJJOGfcKxk6hngMvE7g8h3rOYQ7UDFtzTkx65IKZ4IFtg69KgCNvGxKDIWM8hQJTGI/x5E1D2kqvDdGeO8SRe2fI/fLIkQUol+n09rUj+NNfHAHTWs75jrTbJvjCy8q+Y8f/vD+6gZk/AwAA//8DAFBLAwQUAAYACAAAACEAtY3V8GoBAADdAgAAEQAIAWRvY1Byb3BzL2NvcmUueG1sIKIEASigAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAjJJdT4MwFIbvTfwPpPdQCkYXAixRsyuXGJ2Z2V1tz7Y6KE3bje3fW2Awibvw7ny858npe5pOj2XhHUAbUckMkSBEHkhWcSE3GfpYzPwJ8oylktOikpChExg0zW9vUqYSVml41ZUCbQUYz5GkSZjK0NZalWBs2BZKagKnkK65rnRJrUv1BivKdnQDOArDe1yCpZxaihugrwYiOiM5G5Bqr4sWwBmGAkqQ1mASEHzRWtCluTrQdn4pS2FPCq5K++agPhoxCOu6Duq4lbr9Cf6cv7y3T/WFbLxigPKUs8QKW0Ce4kvoIrP/+gZmu/KQuJhpoLbS+XL11vb6vHF6B6e60ty4qVHmZBwM00JZd7+OOSo4dUGNnbuDrgXwx1OH/1tulBoOovkHedQqhjQ9m9qtBNxzZiSddX1nGT89L2Yoj8Io9sMHn5AFuUuiSRKGq2ar0fwFWJ4X+DcxJmNiD+iMGX/I/AcAAP//AwBQSwMEFAAGAAgAAAAhAHOEmAR3AQAAywIAABAACAFkb2NQcm9wcy9hcHAueG1sIKIEASigAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnFLLTsMwELwj8Q9R7tQpqKVCWyPUCnHgUakBzpa9SSwc27JN1f49G9KGIG7ktDPrHc9sDLf71mQ7DFE7u8ynkyLP0EqntK2X+Wt5f7HIs5iEVcI4i8v8gDG/5ednsAnOY0gaY0YSNi7zJiV/w1iUDbYiTqhtqVO50IpEMNTMVZWWuHbys0Wb2GVRzBnuE1qF6sIPgnmveLNL/xVVTnb+4lt58KTHocTWG5GQP3eTZqJcaoENLJQuCVPqFvkV0QOAjagx8imwvoB3FxTh6QxYX8KqEUHIRBvk89kC2AjDnfdGS5Fot/xJy+Ciq1L28m046+aBjY8Ahdii/Aw6HXgBbAzhUVsyQPf2BTkLog7CN0d7A4KtFAZXFJ9XwkQE9kPAyrVeWJJjQ0V6H/HVl27dbeI48pschXzXqdl6IcnC9fXlOO6oA1tiUZH/wcJAwAP9kmA6fZq1NarTmb+NboFv/dvk0/mkoO97YyeOcg+Phn8BAAD//wMAUEsBAi0AFAAGAAgAAAAhAN+k0mxaAQAAIAUAABMAAAAAAAAAAAAAAAAAAAAAAFtDb250ZW50X1R5cGVzXS54bWxQSwECLQAUAAYACAAAACEAHpEat+8AAABOAgAACwAAAAAAAAAAAAAAAACTAwAAX3JlbHMvLnJlbHNQSwECLQAUAAYACAAAACEA93qSMy4JAAAbJAAAEQAAAAAAAAAAAAAAAACzBgAAd29yZC9kb2N1bWVudC54bWxQSwECLQAUAAYACAAAACEA1mSzUfQAAAAxAwAAHAAAAAAAAAAAAAAAAAAQEAAAd29yZC9fcmVscy9kb2N1bWVudC54bWwucmVsc1BLAQItABQABgAIAAAAIQA6BcwZ4QYAAM4gAAAVAAAAAAAAAAAAAAAAAEYSAAB3b3JkL3RoZW1lL3RoZW1lMS54bWxQSwECLQAUAAYACAAAACEAMW9THM4EAABVDQAAEQAAAAAAAAAAAAAAAABaGQAAd29yZC9zZXR0aW5ncy54bWxQSwECLQAUAAYACAAAACEA8IJeJnkLAAAEcgAADwAAAAAAAAAAAAAAAABXHgAAd29yZC9zdHlsZXMueG1sUEsBAi0AFAAGAAgAAAAhAAIPzDrvAQAARwgAABQAAAAAAAAAAAAAAAAA/SkAAHdvcmQvd2ViU2V0dGluZ3MueG1sUEsBAi0AFAAGAAgAAAAhAKMhTSYeAgAAkwYAABIAAAAAAAAAAAAAAAAAHiwAAHdvcmQvZm9udFRhYmxlLnhtbFBLAQItABQABgAIAAAAIQC1jdXwagEAAN0CAAARAAAAAAAAAAAAAAAAAGwuAABkb2NQcm9wcy9jb3JlLnhtbFBLAQItABQABgAIAAAAIQBzhJgEdwEAAMsCAAAQAAAAAAAAAAAAAAAAAA0xAABkb2NQcm9wcy9hcHAueG1sUEsFBgAAAAALAAsAwQIAALozAAAAAA==
2.2 向ES中添加一条记录
使用kibana
控制台添加一条记录,把上面得到的Base64
内容粘贴到content
字段上(注意要加双引号),
POST /docwrite/_doc?pipeline=attachment
{
"userId": 1001,
"docId": 10003,
"docName": "es.docx",
"docType": "docx",
"content": "[此处放Base64内容]"
}
通过以下查询语句,检查记录中的content
字段是否已被文本抽取管道处理过。
GET /docwrite/_search
可以发现,content
字段已经被IK分词器进行分词存储了。
2.3 测试关键词高亮搜索
我们的最终目的,还是需要通过搜索关键词,把匹配到的文档信息显示出来。
这里就需要用到关键词的高亮搜索。
比如,如果我想搜索关键词“Elasticsearch”,那么可以执行以下语句:
GET /docwrite/_search
{
"query": {
"match": {
"attachment.content": {
"query": "Elasticsearch",
"analyzer": "ik_smart"
}
}
},
"highlight": {
"fields": {
"attachment.content": {
"pre_tags": "<strong>",
"post_tags": "</strong>"
}
}
}
}
这样就能够搜索到相关的记录,在该记录的 “highlight” 字段中,就显示出了和关键词匹配的文本内容,其中关键字是使用了<strong>
标签进行高亮显示。
这就好比我们平时在百度中搜索一个关键词,然后出现和关键词相关的文本内容,而且关键字会进行高亮显示(比如设置为红字)。
三、SpringBoot 实现
如果以上步骤能流畅的走完,SpringBoot
后端的实现就变得很容易了。
3.1 elasticsearch配置
(1)pom.xml
添加elasticsearch
和IOUtils
依赖
<!-- elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!-- IOUtils -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.8.0</version>
</dependency>
<!-- lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
(2)application.yml
添加elasticsearch
服务器地址、端口号
# 自定义参数
my-config:
# elasticsearch自定义配置
elasticsearch:
url: localhost
port: 9200
(3)ElasticSearchConfig 类
@Configuration
@Slf4j
public class ElasticSearchConfig {
@Value("${my-config.elasticsearch.url}")
private String esHost;
@Value("${my-config.elasticsearch.port}")
private int esPort;
/**
* 获取ES操作对象,注入bean中
* @return ES client对象
*/
@Bean("myESClient")
public RestHighLevelClient myElasticsearchClient() {
return new RestHighLevelClient(RestClient.builder(
new HttpHost(esHost, esPort, "http")
));
}
}
(4)elasticsearch 工具类
@Component
@Slf4j
public class ElasticSearchClient {
@Autowired
@Qualifier("myESClient")
private RestHighLevelClient restHighLevelClient;
/**
* 获得关键词搜索结果
* @param index
* @param sourceBuilder
* @return
*/
public SearchHit[] selectDocumentList(String index, SearchSourceBuilder sourceBuilder) {
try {
SearchRequest request = new SearchRequest(index);
if (sourceBuilder != null) {
// 返回实际命中数
sourceBuilder.trackTotalHits(true);
request.source(sourceBuilder);
}
SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
if (response.getHits() != null) {
return response.getHits().getHits();
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
/**
* 插入/修改文档信息
* @param index 索引
* @param data 数据
*/
public void insertDocument(String index, Object data) {
try {
String id = UUID.randomUUID().toString().replaceAll("-", "").toUpperCase();
IndexRequest request = new IndexRequest(index);
request.timeout(TIME_VALUE_SECONDS);
request.id(id);
// 重要!!必须设置管道
request.setPipeline("attachment");
request.source(JSON.toJSONString(data), XContentType.JSON);
IndexResponse response = restHighLevelClient.index(request, RequestOptions.DEFAULT);
log.debug("[es] 插入文档的响应状态: status:{},id:{}", response.status().getStatus(), response.getId());
String status = response.status().toString();
if ("CREATED".equals(status) || "OK".equals(status)) {
log.debug("[es] 插入文档成功! ");
return true;
}
} catch (Exception e) {
e.printStackTrace();
log.error("[es] 插入文档失败");
}
return false;
}
}
3.2 DocumentObj 实体类
用于记录文档文件的某些参数
@Data
public class DocumentObj implements Serializable{
/** 当前文件所属用户id */
private Long userId;
/** mysql中的文件id */
private Long docId;
/** 文件名字 */
private String docName;
/** 文件类型 */
private String docType;
/** 文件的base64内容 */
private String content;
private static final long serialVersionUID = 1L;
public DocumentObj() {}
}
3.3 Service 接口
public interface ISearchService {
/**
* (测试)根据关键词,搜索文档
* @param keyword
* @return
*/
List<DocumentObj> testSearch(String keyword);
/**
* (测试)把本地文档加载到elasticsearch中
*/
boolean testLoadDocument();
}
3.4 ServiceImpl 实现类
@Slf4j
@Service
public class SearchServiceImpl implements ISearchService {
@Autowired
private ElasticSearchClient esClient;
@Override
public List<DocumentObj> testSearch(String keyword) {
// 高亮查询,关键词添加红色样式
HighlightBuilder highlightBuilder = new HighlightBuilder()
.field("attachment.content")
.preTags("<font color='red' font-weight='bold'>")
.postTags("</font>");
// 普通全索引查询
SearchSourceBuilder searchSourceBuilder =
new SearchSourceBuilder()
.query(QueryBuilders.matchQuery("attachment.content", keyword).analyzer("ik_smart"))
.highlighter(highlightBuilder);
SearchHit[] searchHits = esClient.selectDocumentList("docwrite", searchSourceBuilder);
// 处理每一条记录(每一个文档),获得高亮文本。
List<DocumentObj> results = new ArrayList<>();
for (SearchHit hit : searchHits) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
DocumentObj obj = new DocumentObj();
obj.setDocId( ((Integer) sourceAsMap.get("docId")).longValue() );
obj.setDocName( (String) sourceAsMap.get("docName") );
HighlightField contentHighlightField = hit.getHighlightFields().get("attachment.content");
// 对于一个文档,它的高亮文本有多个结果,这里只拼接前2个结果。
String highLightMessage = contentHighlightField.fragments()[0].toString()
+ " " + contentHighlightField.fragments()[1].toString();
obj.setContent(highLightMessage);
results.add(obj);
}
return results;
}
@Override
public boolean testLoadDocument() {
// 用本地文档进行测试
try {
// 加载文件,得到base64
File file = new File("D:\\桌面文件\\es介绍.docx");
InputStream fileInputStream = new FileInputStream(file);
byte[] bytes = IOUtils.toByteArray(fileInputStream);
String base64 = Base64.getEncoder().encodeToString(bytes);
// 向es添加文档
DocumentObj obj = new DocumentObj();
obj.setUserId(1001L);
obj.setDocId(666L);
obj.setDocName("es介绍.docx");
obj.setDocType("docx");
obj.setContent(base64);
return esClient.insertDocument("docwrite", obj);
} catch (IOException e) {
e.printStackTrace();
}
return false;
}
}
3.5 Controller 层
@RestController
@RequestMapping("/test")
public class TestController {
@RequestMapping("/es/search")
public ResponseEntity<?> testSearch(String keyword) {
return ResponseEntity.ok( searchService.testSearch(keyword) );
}
@RequestMapping("/es/addone")
public ResponseEntity<?> testAddone() {
return ResponseEntity.ok( searchService.testLoadDocument() );
}
}
3.6 测试
(1)加载文件
http://localhost:8002/test/es/addone
(2)关键字查询
搜索的关键词是 “Elasticsearch”。
http://localhost:8002/test/es/search?keyword=Elasticsearch
文章来源:https://www.toymoban.com/news/detail-596025.html
前端把特定的关键词传入接口,接口就会从elasticsearch
服务器中得到对应的记录。文章来源地址https://www.toymoban.com/news/detail-596025.html
到了这里,关于SpringBoot 项目使用 Elasticsearch 对 Word、Pdf 等文档内容的检索的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!