本文介绍了HDFS中的大块大小!未使用的空间如何计算?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们都知道,与传统文件系统中的块大小相比,HDFS中的块大小相当大(64M或128M)。这样做是为了减少搜索时间的百分比与传输时间的比较(传输速率的提高比磁盘搜索时间的提高要大得多,因此设计文件系统时的目标总是减少与要传输的数据量相比的搜索数量)。但是这带来了内部碎片的另一个缺点(这就是为什么传统文件系统块大小不是很高,并且只有几KB的量级 - 通常是4K或8K)。

我正在阅读本书 - 权威指南Hadoop,并发现在某处写入的文件比HDFS的块大小小的文件不占用完整块,不占整个块的空间,但不知道如何?有人可以提出一些解释。

HFDS中的块分区只是逻辑地构建在物理块底层文件系统(例如ext3 / fat)。文件系统在物理上没有分成块(比如64MB或128MB(或者可能是块大小))。这只是将元数据存储在NameNode中的抽象。由于Namenode必须将全部元数据加载到内存中,因此元数据条目的数量是有限的,因此需要大块数据。

存储在HDFS上的文件逻辑上占用3个块(namenode中有3个元数据条目),但物理地占用底层文件系统中的8 * 3 = 24MB空间。



在考虑namenode内存限制的情况下考虑正确使用存储空间。


We all know that the block size in HDFS is pretty large (64M or 128M) as compared to the block size in traditional file systems. This is done in order to reduce the percentage of seek time compared to the transfer time (Improvements in transfer rate have been on a much larger scale than improvements on the disk seek time therefore, the goal while designing a file system is always to reduce the number of seeks in comparison to the amount of data to be transferred). But this comes with an additional disadvantage of internal fragmentation (which is why traditional file system block sizes are not so high and are only of the order of a few KBs - generally 4K or 8K).

I was going through the book - Hadoop, the Definitive Guide and found this written somewhere that a file smaller than the block size of HDFS does not occupy the full block and does not account for the full block's space but couldn't understand how? Can somebody please throw some light on this.

解决方案

@Abhishek : The block division in HFDS is just logically built over the physical blocks of underlying filesystem (e.g. ext3/fat). The filesystem is not physically divided into blocks(say of 64MB or 128MB(or whatever may be the block size)). It's just an abstraction to store the metadata in the NameNode. Since the Namenode has to load the entire metadata in memory therefore there is a limit to number of metadata entries thus explaining the need for a large block size.

Therefore, three 8MB files stored on HDFS logically occupies 3 blocks(3 metadata entries in namenode) but physically occupies 8*3=24MB space in the underlying filesystem.

The large block size is to account for proper usage of storage space while considering the limit on the memory of namenode.

这篇关于HDFS中的大块大小!未使用的空间如何计算?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-23 22:27