问题描述
我当前正在运行一个具有2个节点的集群. 1个节点是主/从节点,另一个是从节点.我有一个文件,并将块大小设置为该文件大小的一半.然后我
I am currently running a cluster with 2 nodes. 1 Node is master/slave and the other one is just slave. I have a file and I set the block size to half the size of that file. Then I do
hdfs dfs -put file /
文件被复制到HDFS没问题,但是当我检查HDFS站点时,我看到创建的两个块都在一个datanode中(这些块在我使用-put命令的datanode上).我什至尝试调用均衡器脚本,但两个块仍在同一数据节点上.
File gets copied to the HDFS no problem, but when I check the HDFS site, I see that both the blocks that was created is in one datanode (the blocks are on the datanode whereI used the -put command). I even tried to call the balancer script but both the blocks are still on the same datanode.
我需要在所有节点之间尽可能均匀地分布数据.
I need the data to be evenly spread out (as much as possible) between all nodes.
我在这里想念东西吗?
推荐答案
如hdfs dfs -ls
输出所示,您的复制因子设置为1
,因此没有令人信服的理由让hdfs
分发数据块在数据节点上.
As the hdfs dfs -ls
output shows, your replication factor is set to 1
, so there is no compelling reason for hdfs
to distribute the data blocks on the datanodes.
您需要将复制级别至少提高到2
才能获得所需的结果,例如:
You need to increase the replication level to at least 2
to get what you expect, eg:
hdfs dfs -setrep 2 /input/data1.txt
这篇关于Hadoop HDFS不能平均分配数据块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!