本文介绍了Elasticsearch快照的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有许多(10+)个Elasticsearch集群,并且该集群用于不同目的(存储日志,存储一些业务和分析数据)因此,例如,我有一个用于某些业务数据(用户在电子商务网站中购物车)使用的3节点Elasticsearch群集,并且我每天都拍摄快照并且该群集为NFS共享创建了快照,我的管理员告诉我,我必须清除快照存储库中的最后10个快照以释放磁盘空间.例如,某人(或我)不小心启动了 curl -XDELETE/* ,它删除了集群中的所有索引,并且我必须还原此处的所有业务数据,并且从10个快照中我只有10个快照最后几天,我可以恢复所有数据吗?还是仅从上次快照日期恢复数据?因为在文档中说快照是增量快照:每个快照仅存储不属于较早快照的一部分的数据因此,例如,我网站上的客户Joe在2020年1月9日向购物车中添加了一些东西,然后在2020年9月15日,我从集群中删除了所有数据,而我在快照存储库中的最后一个快照是/03/09/2020,所以如果我从此快照还原,此快照将包含旧数据还是不包含旧数据?对不起,我的英语不好

I have many (10+) Elasticsearch clusters, and this clusters is use for different purposes (storing logs, storing some business and analytical data)So for example i have a 3-node elasticsearch cluster used for some business data (users shopping carts in e-commerce website) and i take snapshots every dayand this cluster makes snapshots to NFS share, and my admins told me that i must to clear last 10 snapshots from the snapshot repository to free disk space.And for example the somebody/or me accidentally launch curl -XDELETE/* which delete all indices in my cluster, and i must to restore all business data which was here, and i have only 10 snapshots from 10 last days, can i restore all the data? or it restore data only from the last snapshots date? because in the documentations said that Snapshots are incremental: each snapshot only stores data that is not part of an earlier snapshotso for example the customer Joe in my website add something to cart in 01/09/2020, then in the 15/09/2020 i delete all data from cluster, and my last snapshot in snapshot repository is /03/09/2020 so if i restore from this snapshot, this snapshot will contain old data or not?sorry for my bad english

推荐答案

了解这一点的一个有趣测试是执行以下过程:

An interesting test to understand this is to perform the following process:

  1. 创建索引
  2. 索引一个文档
  3. 创建第一个快照A
  4. 索引第二个文档
  5. 创建第二个快照B
  6. 删除第一个快照A
  7. 删除索引
  8. 还原快照B

您认为第一个文件不见了吗?让我们找出...这是重现上述过程的所有步骤:

Do you think the first document is gone? Let's find out... here are all the steps to reproduce the above process:

# 1. create an index
PUT test

# 2. index one document
PUT test/_doc/1
{
  "id": 1
}

# 3. create a first snapshot A
PUT /_snapshot/my-snapshots/snapshot_a?wait_for_completion=true
{
  "indices": "test",
  "ignore_unavailable": true,
  "include_global_state": false
}

# 4. index a second document
PUT test/_doc/2
{
  "id": 2
}

# 5. create a second snapshot B
PUT /_snapshot/my-snapshots/snapshot_b?wait_for_completion=true
{
  "indices": "test",
  "ignore_unavailable": true,
  "include_global_state": false
}

# 6. delete the first snapshot A
DELETE /_snapshot/my-snapshots/snapshot_a

# 7. delete the index
DELETE test

# 8. restore the snapshot B
POST /_snapshot/found-snapshots/snapshot_b/_restore

# 9. And now check the content of the index
GET test/_search

=>
    "hits" : [
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "id" : 1
        }
      },
      {
        "_index" : "test",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "id" : 2
        }
      }
    ]

因此,最重要的是,较旧的文档仍包含在较新的快照中,删除旧快照并不意味着删除旧文档.

So the bottom line of this is that older documents are still contained in newer snapshots and deleting old snapshots doesn't mean deleting old documents.

快照包含快照创建时存在的所有分片段文件的精确副本.随着时间的流逝,较小的细分文件会合并为较大的文件.当下一个快照发生时,它将复制较新的较大段文件,而较旧的快照仍将包含较旧的较小段文件.

A snapshot contains an exact copy of all the shard segment files that exist at the moment of the snapshot creation. Over time, smaller segment files get merged into bigger ones. When the next snapshot happens, it will copy the newer bigger segment files and the older snapshots will still contain the older smaller segment files.

但是,这并不意味着只保留最新快照并认为所有数据都在其中始终是安全的,但是如果您每天进行快照,我认为仅保留最后10个快照并保留所有快照是安全的.希望所有数据都在那里.

It doesn't mean, however, that it's always safe to only keep the latest snapshot and think that all the data is in there, but if you do daily snapshots, I think it's safe to keep only the 10 last snapshots and expect that all the data is there.

最后值得注意的是,当您删除快照,ES会删除与该快照关联的所有其他快照未使用的文件,这基本上使删除快照本来就是安全的.

The last thing worth noting is that when you delete a snapshot, ES will delete all files associated with the snapshot that are not in-use by other snapshots, which basically makes deleting snapshots inherently safe.

这篇关于Elasticsearch快照的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-23 08:00