如何使用 hdfs 命令计算文件中的行数?

本文介绍了如何使用 hdfs 命令计算文件中的行数?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我在 HDFS 上有一个文件，我想知道它有多少行.(测试文件)

I have a file on HDFS that I want to know how many lines are. (testfile)

在linux中，我可以这样做:

In linux, I can do:

wc -l <filename>

我可以用hadoop fs"命令做类似的事情吗?我可以打印文件内容:

Can I do something similar with "hadoop fs" command? I can print file contents with:

hadoop fs -text /user/mklein/testfile

我怎么知道我有多少行?我想避免将文件复制到本地文件系统然后运行 wc 命令.

How do I know how many lines do I have? I want to avoid copying the file to local filesystem then running the wc command.

注意:我的文件是使用 snappy 压缩来压缩的，这就是为什么我必须使用 -text 而不是 -cat

Note: My file is compressed using snappy compression, which is why I have to use -text instead of -cat

你不能用 hadoop fs 命令来做到这一点.您必须使用此 post 或者这个猪脚本会有所帮助.

A = LOAD 'file' using PigStorage() as(...);
B = group A all;
cnt = foreach B generate COUNT(A);

这篇关于如何使用 hdfs 命令计算文件中的行数?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！