问题描述
我在 HDFS 上有一个文件,我想知道它有多少行.(测试文件)
I have a file on HDFS that I want to know how many lines are. (testfile)
在linux中,我可以这样做:
In linux, I can do:
wc -l <filename>
我可以用hadoop fs"命令做类似的事情吗?我可以打印文件内容:
Can I do something similar with "hadoop fs" command? I can print file contents with:
hadoop fs -text /user/mklein/testfile
我怎么知道我有多少行?我想避免将文件复制到本地文件系统然后运行 wc 命令.
How do I know how many lines do I have? I want to avoid copying the file to local filesystem then running the wc command.
注意:我的文件是使用 snappy 压缩来压缩的,这就是为什么我必须使用 -text 而不是 -cat
Note: My file is compressed using snappy compression, which is why I have to use -text instead of -cat
推荐答案
你不能用 hadoop fs
命令来做到这一点.您必须使用此 post 或者这个猪脚本会有所帮助.
You cannot do it with a hadoop fs
command. Either you have to write a mapreduce code with the logic explained in this post or this pig script would help.
A = LOAD 'file' using PigStorage() as(...);
B = group A all;
cnt = foreach B generate COUNT(A);
确保您的 snappy 文件具有正确的扩展名,以便猪可以检测和读取它.
Makesure you have the correct extension for your snappy file so that pig could detect and read it.
这篇关于如何使用 hdfs 命令计算文件中的行数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!