hdfs put/moveFromLocal是否不在数据节点之间分配数据?

本文介绍了hdfs put/moveFromLocal是否不在数据节点之间分配数据?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

I found similar question Hadoop HDFS is not distributing blocks of data evenly

但是我的问题是复制因子= 1时

but my ask is when replication factor = 1

我仍然想了解为什么HDFS无法在群集节点之间平均分配文件块?当我在此类文件上加载/运行数据框操作时，这将导致从一开始就出现数据偏斜.我想念什么吗?

I still want to understand why HDFS is not evenly distributing file blocks across the cluster nodes? This will result in data skew from start, when I load/run dataframe ops on such files. Am I missing something?

推荐答案

即使复制因子为1，文件仍然会被拆分并以HDFS块大小的倍数存储.积木放置是尽力而为的，即AFAIK，并非完全平衡.复制放置3会选择一个随机节点，然后是同一机架上的另一个节点，然后是另一个随机离开机架的节点

Even if replication factor is one, files are still split and stored in multiples of the HDFS block size. Block placement is on best effort, AFAIK, not purely balanced; replication placement of 3 picks a random node, then another node on the same rack, then another node off rack at random

您需要弄清文件的大小以及要查看数据是否正在拆分的位置

You'll need to clarify how large your files are and where you are looking to see if data is being split

注意:并非所有文件格式都是可拆分的

Note: not all file formats are splittable

这篇关于hdfs put/moveFromLocal是否不在数据节点之间分配数据?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！