问题描述
参数mapred.min.split.size更改之前写入文件的块的大小?
假设我在启动我的JOB时传递值为134217728(128MB)的参数mapred.min.split.size。
什么是正确的说什么发生?
1 - 每个MAP处理相当于2个HDFS块(假设每个块为64MB);
2 - 我的输入文件(之前包含HDFS)会有一个新的区域占用HDFS 128M中的块;
拆分大小按以下公式计算: -
max(mapred.min。 split.size,min(mapred.max.split.size,dfs.block.size))
在你的情况下,它将是: - $ / b
$ b $ pre $ split size = max(128,min(Long.MAX_VALUE(默认值),64) )
所以上面的推断: -
-
每张地图将会处理2个hdfs块(假设每个块为64MB): True 我的输入文件(之前包含HDFS)会有一个新的部分占用HDFS 128M中的块: False
但制作最小分割大小grea比块的大小增加了分割大小,但是以区域性为代价。
The parameter "mapred.min.split.size" changes the size of the block in which the file was written earlier?Assuming a situation where I, when starting my JOB, pass the parameter "mapred.min.split.size" with a value of 134217728 (128MB).What is correct to say about what happens?
1 - Each MAP process the equivalent of 2 HDFS blocks (assuming each block 64MB);
2 - There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M;
The split size is calculated by the formula:-
max(mapred.min.split.size, min(mapred.max.split.size, dfs.block.size))
In your case it will be:-
split size=max(128,min(Long.MAX_VALUE(default),64))
So above inference:-
each map will process 2 hdfs blocks(assuming each block 64MB): True
There will be a new division of my input file (previously included HDFS) to occupy blocks in HDFS 128M: False
but making the minimum split size greater than the block size increases the split size, but at the cost of locality.
这篇关于参数“mapred.min.split.size”的行为在HDFS中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!