本文介绍了Hadoop HDFS不能平均分配数据块的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我当前正在运行一个具有2个节点的集群. 1个节点是主/从节点,另一个是从节点.我有一个文件,并将块大小设置为该文件大小的一半.然后我

I am currently running a cluster with 2 nodes. 1 Node is master/slave and the other one is just slave. I have a file and I set the block size to half the size of that file. Then I do

hdfs dfs -put file /

文件被复制到HDFS没问题,但是当我检查HDFS站点时,我看到创建的两个块都在一个datanode中(这些块在我使用-put命令的datanode上).我什至尝试调用均衡器脚本,但两个块仍在同一数据节点上.

File gets copied to the HDFS no problem, but when I check the HDFS site, I see that both the blocks that was created is in one datanode (the blocks are on the datanode whereI used the -put command). I even tried to call the balancer script but both the blocks are still on the same datanode.

我需要在所有节点之间尽可能均匀地分布数据.

I need the data to be evenly spread out (as much as possible) between all nodes.

我在这里想念东西吗?

推荐答案

hdfs dfs -ls输出所示,您的复制因子设置为1,因此没有令人信服的理由让hdfs分发数据块在数据节点上.

As the hdfs dfs -ls output shows, your replication factor is set to 1, so there is no compelling reason for hdfs to distribute the data blocks on the datanodes.

您需要将复制级别至少提高到2才能获得所需的结果,例如:

You need to increase the replication level to at least 2 to get what you expect, eg:

hdfs dfs -setrep 2 /input/data1.txt

这篇关于Hadoop HDFS不能平均分配数据块的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-10 05:10