es强制段合并实验 – 消失的夜丶
1. 问题
由于集群的磁盘空间限制,我们删除了超过10亿的数据,但是发现删除后,磁盘的可使用空间并有快速上升。原因在于es的删除文档并不是物理删除,只是标记为"删除状态"。当发生merge时,才会物理意义上的删除。
一个索引如果deleted状态的索引过多,会影响到查询的效率。根据这篇文章显示1,超过50%的文档被标记为deleted,会导致搜索性能下降约30%。
Because deleted documents remain in the index, they must still be decoded from the postings lists and then skipped during searching, so there is added search cost. To test how much, I ran a search performance test for varying queries using the 100 M document index with no deletions as the baseline, and the same index with 50% deleted documents (i.e., 150 M documents with 50M deleted). Both indices were single-segment. Here are the results:
Query QPS StdDev QPS with deletes StdDev with deletes % change Int Range query 1.2 (5.1%) 0.6 (1.8%) 46% Prefix query 5.7 (5.0%) 3.4 (2.3%) 41% Wildcard 5.3 (4.4%) 3.2 (2.2%) 39% And High+Low 91.1 (2.0%) 59.5 (2.1%) 34% Med Phrase 36.2 (2.8%) 24.4 (1.3%) 32% And High+Med 16.6 (1.5%) 11.2 (1.0%) 32% ……
The bad news is there is clearly a non-trivial performance cost to deleted documents, and this is something we can work to reduce over time (patches welcome!). The good news is the cost is typically quite a bit lower than the percentage deletes (50% in this test) because these documents are filtered out at a low level before any of the costly query matchers and scorers see them. The more costly queries (Phrase, Span) tend to see the lowest impact, which is also good because it is the slow queries that determine node capacity for most applications.
但是由于强制合并会导致集群负载过高,所以首先对一个小的索引做了测试。
2. 强制合并小索引
小索引占用如下
1
docs.count docs.deleted store.size
2
6176077350 417260277 1.4tb
1. 首先执行写入刷新
刷新写入,直至没有failed产生。
1
POST index_business_v1/_flush/synced
2. 关闭索引写入权限
1
PUT index_business_v1/_settings
2
{
3
"index": {
4
"blocks.write": true
5
}
6
}
3. 执行强制合并
ES8.x可以加wait_for_completion=false
不必阻塞响应,only_expunge_deletes=true
表示只合并包含一定数量deleted的segment。这个一定数量是由index.merge.policy.expunge_deletes_allowed
进行控制,默认是10.0,代表如果一个分片deleted/total超过10%,则会对该分片进行合并操作。
1
POST /index_business_v1/_forcemerge?only_expunge_deletes=true
4. 查看任务进度
1
GET _tasks?detailed=true&actions=*forcemerge&human
2
或者 POST _tasks/rZqBdGdSTGy2qdyZAtDD6Q:3358737
3
4
rZqBdGdSTGy2qdyZAtDD6Q:3604476
3. 更改合并方案
在启用only_expunge_deletes=true
参数后,默认情况下,合并结果:
deleted减少了3亿,通过计算,实际腾出空间约70GB。所以默认情况下的,强制合并能大幅度降低磁盘占用率,deleted占比越高,降低的占用率幅度就越大。
但,由于生产环境的索引过大,需要进行拆分,所以在拆分之前,先把deleted文档尽可能的删除掉,通过设置参数,提高被删除的比率:
1
PUT base_domain/_settings
2
{
3
"index.merge.policy.expunge_deletes_allowed": "5.0"
4
}
1
index docs.count docs.deleted store.size
2
ip 12503736313 4207846666 40.2tb
通过计算:4207846666/(4207846666+12503736313)∗40≈10��,可得至少可以释放10TB数据。再加上标记deleted也需要耗费磁盘空间,所以估计最终可以释放12TB。
合并过程中,会经历一个反复的磁盘使用率升高->磁盘使用率降低的过程。本次测试中,40TB膨胀到44TB,然后开始下降。所以做合并时,需要根据索引情况预留一部分空间,不然会直接失败。
耗时约1.1天后,合并完成,磁盘使用由40TB降至22TB,释放了18TB空间。但是每个分片出现了一些比较大的segment,最大的达到了40GB。
4. 查看segment信息
1. 查看当前哪些索引在合并
1
GET /_cat/indices/?s=segmentsCount:desc&v&h=index,segmentsCount,segmentsMemory,memoryTotal,mergesCurrent,mergesCurrentDocs,storeSize,p,r
index | segmentsCount | segmentsMemory | memoryTotal | mergesCurrent | mergesCurrentDocs | storeSize | p | r |
---|---|---|---|---|---|---|---|---|
domain | 22766 | 7.5gb | 12.2gb | 48 | 21343329 | 87.5tb | 160 | 0 |
ip | 14270 | 13.3gb | 50.7gb | 47 | 18168708 | 40.4tb | 160 | 0 |
dns | 2220 | 2.8gb | 15.8gb | 0 | 0 | 3.7tb | 60 | 0 |
2. 查看某个索引各个segment情况
1
GET _cat/segments?index=sumap_domain*&v&s=docs.deleted:desc
index | shard | prirep | ip | segment | generation | docs.count | docs.deleted | size | size.memory | committed | searchable | version | compound |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
domain_index | 36 | p | 172.16.80.1 | _stmt | 1344773 | 161171 | 692426 | 4.3gb | 1181094 | true | true | 8.2.0 | true |
3. 查看某个索引整体的deleted情况
1
GET _cat/indices?index=sumap_domain*&v&h=index,docs.count,docs.deleted,store.size
index | docs.count | docs.deleted | store.size |
---|---|---|---|
domain | 2683311417 | 943551705 | 87.5tb |
5. 强制合并导致的问题
强制合并会导致>=5GB的段,不太清楚一个段的大小达到100GB后,会导致什么结果。通过查询得到资料,目前会导致:文章来源:https://www.toymoban.com/news/detail-509951.html
- 如果继续对该段进行update写入,则后台合并不会去关心>=5GB的段,这时需要手动执行forcemerge
参考
1 How Do Deleted Documents Affect Search Performance?文章来源地址https://www.toymoban.com/news/detail-509951.html
到了这里,关于es强制段合并实验的文章就介绍完了。如果您还想了解更多内容,请在右上角搜索TOY模板网以前的文章或继续浏览下面的相关文章,希望大家以后多多支持TOY模板网!