Elasticsearch实战-查询query,filter过滤,结合aggs 进行局部/全局聚合统计
1.准备数据
POST /testcopy/_bulk
{"index":{"_id": 1}}
{"empId" : "111","name" : "员工1","age" : 20,"sex" : "男","mobile" : "19000001111","salary":1333,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"光谷大道","address":"湖北省武汉市洪山区光谷大厦","content" : "i like to write best elasticsearch article"}
{"index":{"_id": 2}}
{"empId" : "222","name" : "员工2","age" : 25,"sex" : "男","mobile" : "19000002222","salary":15963,"deptName" : "销售部","provice" : "湖北省","city":"武汉","area":"江汉区","address" : "湖北省武汉市江汉路","content" : "i think java is the best programming language"}
{"index":{"_id": 3}}
{ "empId" : "333","name" : "员工3","age" : 30,"sex" : "男","mobile" : "19000003333","salary":20000,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"经济技术开发区","address" : "湖北省武汉市经济开发区","content" : "i am only an elasticsearch beginner"}
{"index":{"_id": 4}}
{"empId" : "444","name" : "员工4","age" : 20,"sex" : "女","mobile" : "19000004444","salary":5600,"deptName" : "销售部","provice" : "湖北省","city":"武汉","area":"沌口开发区","address" : "湖北省武汉市沌口开发区","content" : "elasticsearch and hadoop are all very good solution, i am a beginner"}
{"index":{"_id": 5}}
{ "empId" : "555","name" : "员工5","age" : 20,"sex" : "男","mobile" : "19000005555","salary":9665,"deptName" : "测试部","provice" : "湖北省","city":"高新开发区","area":"武汉","address" : "湖北省武汉市东湖隧道","content" : "spark is best big data solution based on scala ,an programming language similar to java"}
{"index":{"_id": 6}}
{"empId" : "666","name" : "员工6","age" : 30,"sex" : "女","mobile" : "19000006666","salary":30000,"deptName" : "技术部","provice" : "武汉市","city":"湖北省","area":"江汉区","address" : "湖北省武汉市江汉路","content" : "i like java developer"}
{"index":{"_id": 7}}
{"empId" : "777","name" : "员工7","age" : 60,"sex" : "女","mobile" : "19000007777","salary":52130,"deptName" : "测试部","provice" : "湖北省","city":"黄冈市","area":"边城区","address" : "湖北省黄冈市边城区","content" : "i like elasticsearch developer"}
{"index":{"_id": 8}}
{"empId" : "888","name" : "员工8","age" : 19,"sex" : "女","mobile" : "19000008888","salary":60000,"deptName" : "技术部","provice" : "湖北省","city":"武汉","area":"汉阳区","address" : "湖北省武汉市江汉大学","content" : "i like spark language"}
{"index":{"_id": 9}}
{"empId" : "999","name" : "员工9","age" : 40,"sex" : "男","mobile" : "19000009999","salary":23000,"deptName" : "销售部","provice" : "河南省","city":"郑州市","area":"二七区","address" : "河南省郑州市郑州大学","content" : "i like java developer"}
{"index":{"_id": 10}}
{"empId" : "101010","name" : "张湖北","age" : 35,"sex" : "男","mobile" : "19000001010","salary":18000,"deptName" : "测试部","provice" : "湖北省","city":"武汉","area":"高新开发区","address" : "湖北省武汉市东湖高新","content" : "i like java developer i also like elasticsearch"}
{"index":{"_id": 11}}
{"empId" : "111111","name" : "王河南","age" : 61,"sex" : "男","mobile" : "19000001011","salary":10000,"deptName" : "销售部",,"provice" : "河南省","city":"开封市","area":"金明区","address" : "河南省开封市河南大学","content" : "i am not like java "}
{"index":{"_id": 12}}
{"empId" : "121212","name" : "张大学","age" : 26,"sex" : "女","mobile" : "19000001012","salary":1321,"deptName" : "测试部",,"provice" : "河南省","city":"开封市","area":"金明区","address" : "河南省开封市河南大学","content" : "i am java developer thing java is good"}
{"index":{"_id": 13}}
{"empId" : "131313","name" : "李江汉","age" : 36,"sex" : "男","mobile" : "19000001013","salary":1125,"deptName" : "销售部","provice" : "河南省","city":"郑州市","area":"二七区","address" : "河南省郑州市二七区","content" : "i like java and java is very best i like it do you like java "}
{"index":{"_id": 14}}
{"empId" : "141414","name" : "王技术","age" : 45,"sex" : "女","mobile" : "19000001014","salary":6222,"deptName" : "测试部",,"provice" : "河南省","city":"郑州市","area":"金水区","address" : "河南省郑州市金水区","content" : "i like c++"}
{"index":{"_id": 15}}
{"empId" : "151515","name" : "张测试","age" : 18,"sex" : "男","mobile" : "19000001015","salary":20000,"deptName" : "技术部",,"provice" : "河南省","city":"郑州市","area":"高新开发区","address" : "河南省郑州高新开发区","content" : "i think spark is good"}
2. ES 查询query,filter过滤,结合aggs 聚合统计
2.1 查询命中后,基于查询的数据进行聚合
前面我们讲的所有的聚合操作 都是没有查询的,都是上来直接 aggs 进行 聚合 avg, count, 如果现在我想统计 技术部的人 的平均年龄该如何实现?
实现 某个部门(技术部)的平均年龄的统计,先查询然后基于查询结果进行统计 技术部最大年龄,最小年龄,平均年龄
#query先查询,然后基于查询结果进行统计 max , min, avg
get /testcopy/_search
{
"query":{
"match_phrase": {
"deptName.keyword": "技术部"
}
},
//基于 query平级,进行aggs聚合操作,就是用query结果进行aggs聚合统计
"aggs":{
"tech_avg_age":{
"avg": {
"field": "age"
}
},
"max_age":{
"max": {
"field": "age"
}
},
"min_age":{
"min": {
"field": "age"
}
}
}
}
查询结果 四个技术部员工, 是再查询出来这四个员工的 基础上 再次进行的统计分析
技术部 max age:30, min age:19, avg age:24.75
2.2 基于 filter 过滤后,基于此数据进行聚合
上面我们讲了 基于 query的数据 进行 aggs 统计分析, 那是否能和 filter 结合来进行过滤呢? 当然可以!
实现 过滤 年龄在 (25,60)之间的人, 然后 基于这部分数据进行 平均年龄的统计
# 过滤 filter 过滤 25-40的人,然后 基于过滤结果进行统计 avg
get /testcopy/_search
{
"query":{
"bool": {
"filter": [
{
"range": {
"age": {
"gte": 25,
"lte": 40
}
}
}
]
}
},
"aggs":{
"avg_age":{
"avg": {
"field": "age"
}
}
}
}
查询过滤 结果 6个技术部员工, 基于过滤基础之上 再次进行的统计分析, 得出平均年龄 32.66
2.3 基于查询query, filter 综合过滤后,基于此数据进行聚合
上面我们讲了 基于 query的数据 进行 aggs 统计分析, 那是否能和 filter 结合来进行过滤呢? 基于 查询, 过滤后的数据进行统计分析 能否可行?
当然可以!
实现 查询 技术部 过滤 年龄在 (25,60)之间的 :武汉的 然后 基于这部分数据进行 平均年龄的统计
#query先查询,然后 filter 过滤 25-60的人,然后 基于查询过滤结果进行统计 avg
get /testcopy/_search
{
"query":{
"bool": {
"must": [
{
"match": {
"deptName.keyword": "技术部"
}
}
],
//must 平级 进行filter 过滤
"filter": [
{
"range": {
"age": {
"gte": 25,
"lte": 60
}
}
}
]
}
},
//query 结束, 平级 基于query查询过滤结果 进行 aggs
"aggs":{
"avg_age":{
"avg": {
"field": "age"
}
}
}
}
查询过滤 结果 2个技术部员工, 基于基础之上 再次进行的统计分析, 得出平均年龄 30
3 Global bucket 全局bucket统计
3.1 局部bucket统计与全局global bucket统计
比如 现在 我想 部门的平均年龄和 所有整个公司的人的平均年龄的 来做对比,应该如何实现?分两次查询先查部门,然后查全部么,最后再做对比? 太麻烦了
- 不是这样子的,ES提供了 global参数 来控制 全局统计, global定义了一个全局桶bucket
- Global 忽略查询条件, 直接对所有document 数据进行统计
场景:
实现 某个部门的平均年龄和整个doc的平均年龄 的统计
#global:{} 在aggs 的分组名字内部, 就是忽略上面的查询条件, 进行全局统计
get /testcopy/_search
{
"size":0,
"query":{
"match": {
"deptName.keyword": "技术部"
}
},
"aggs":{
"tech_avg":{
"avg": {
"field": "age"
}
},
//aggs内部 tech_avg 平级 进行全局统计 global bucket
"all_avg_age":{
"global": {},
"aggs": {
"all_of_age": {
"avg": {
"field": "age"
}
}
}
}
}
}
查询结果 global 统计了11个doc, 然后 平均年龄30.45
单个技术部 有 4条数据doc,平均年龄是 24.75
文章来源:https://www.toymoban.com/news/detail-497612.html
至此 我们已经学习了 查询query,filter过滤,结合aggs 进行局部/全局聚合统计的基本用法,实现了 查询 query, filter过滤 及 融合aggs进行统计分析,及对比 局部/全局 global aggs 聚合统计, 下一篇,我们介绍下 TOP N 排名推荐文章来源地址https://www.toymoban.com/news/detail-497612.html
到了这里,关于Elasticsearch实战(十五)---查询query,filter过滤,结合aggs 进行局部/全局聚合统计的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!