本文介绍了如何计算MR作业中HDFS中的文件数量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Hadoop和Java的新手。我试图从我正在编写的MapReduce驱动程序中计算HDFS上文件夹中的文件数量。我想在不调用HDFS Shell的情况下执行此操作,因为我想在运行MapReduce作业时能够传入我使用的目录。由于我对Java的经验不足,我尝试了很多方法,但在实现方面没有成功。



任何帮助将不胜感激。



谢谢,


$ b $您可以使用FileSystem并遍历路径中的文件。

这是一些示例代码

  int count = 0; 
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator< LocatedFileStatus> ri = fs.listFiles(new Path(hdfs:// my / path),递归);
while(ri.hasNext()){
count ++;
ri.next();
}


I'm new to Hadoop and Java for that matter. I'm trying to count the number of files in a folder on HDFS from the MapReduce driver I'm writing. I'd like to do this without calling the HDFS Shell as I want to be able to pass in the directory I use when I run the MapReduce job. I've tried a number of methods but have had no success in implementation due to my inexperience with Java.

Any help would be greatly appreciated.

Thanks,

Nomad.

解决方案

You can just use the FileSystem and iterate over the files inside the path. Here is some example code

int count = 0;
FileSystem fs = FileSystem.get(getConf());
boolean recursive = false;
RemoteIterator<LocatedFileStatus> ri = fs.listFiles(new Path("hdfs://my/path"), recursive);
while (ri.hasNext()){
    count++;
    ri.next();
}

这篇关于如何计算MR作业中HDFS中的文件数量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-10 05:12