本文介绍了HDFS文件系统-如何获取目录中特定文件的字节数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试获取HDFS目录中特定文件的字节数.

I am trying to get the byte count for the specific file in a HDFS directory.

我尝试使用 fs.getFileStatus(),但是我看不到任何获取文件字节数的方法,我只能看到 getBlockSize()方法

I tried to use fs.getFileStatus() ,but i do not see any methods for getting byte count of the file, i can see only getBlockSize() method.

有什么方法可以获取HDFS中特定文件的字节数吗?

Is there any way can i get the byte count of a specific file in HDFS?

推荐答案

fs.getFileStatus()返回具有方法 getLen()的FileStatus对象,该方法将返回此文件的长度,以字节为单位."也许您应该对此进行仔细研究: https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileStatus.html .

fs.getFileStatus() returns a FileStatus objects which has a method getLen() this will return "length of this file, in bytes." Maybe you should haev a closer look on this: https://hadoop.apache.org/docs/r2.6.1/api/org/apache/hadoop/fs/FileStatus.html.

但是请注意,文件大小在HDFS上并不那么重要.文件将按照所谓的数据块进行组织,每个数据块默认为64 MB.因此,如果处理许多小文件(这是HDFS上的一大反模式),则容量可能会比您预期的要少.有关更多详细信息,请参见此链接:

BUT be aware that the file size is not that important on HDFS. The files will be organized in so called data-blocks each datablock is by default 64 MB. So if you deal with many small files (which is one big anti-pattern on HDFS) you may have less capacity than you expect. See this link for more details:

https://hadoop.apache.org/docs/r2.6.1/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Blocks

这篇关于HDFS文件系统-如何获取目录中特定文件的字节数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 23:31