HDFS是否可以为每个文件指定复制因子以提高可用性

本文介绍了HDFS是否可以为每个文件指定复制因子以提高可用性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是HDFS的新手，如果我的问题这么幼稚，请见谅.

I'm newbie in HDFS, so sorry if my question is so naive.

假设我们将文件存储在Hadoop集群中.某些文件确实很受欢迎，与其他文件相比，它们经常会被请求(但并不经常被请求存入内存).值得保留该文件的更多副本(副本).

Suppose we store files in a Hadoop cluster. Some files are really popular and will be requested very often(but not so often to put them in memory) than the other. It worth to keep more copies(replicas) of that files.

我可以在HDFS中实现它，还是有最佳实践来解决此任务?

Can I implement it in HDFS or is there any best practice to tackle this task?

推荐答案

是的，您可以针对整个群集/目录/文件单独进行操作.

Yes, you can do it for entire cluster/directory/file individually.

您可以使用Hadoop FS Shell在每个文件的基础上更改复制因子(说3).

You can change the replication factor(lets say 3) on a per-file basis using the Hadoop FS shell.

[sys@localhost ~]$ hadoop fs –setrep –w 3 /my/file

或者，您可以更改目录下所有文件的复制因子(假设为3).

Alternatively, you can change the replication factor(lets say 3) of all the files under a directory.

[sys@localhost ~]$ hadoop fs –setrep –w 3 -R /my/dir

要将整个HDFS的复制更改为1:

To change replication of entire HDFS to 1:

[sys@localhost ~]$ hadoop fs -setrep -w 1 -R /

但是复制因子应该介于 dfs.replication.max 和 dfs.replication.min 值之间.

But the replication factor should lie between dfs.replication.max and dfs.replication.min value.

这篇关于HDFS是否可以为每个文件指定复制因子以提高可用性的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！