本文介绍了如何从实木复合地板文件中获取架构/列名称?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储在HDFS中的文件,其格式为part-m-00000.gz.parquet

I have a file stored in HDFS as part-m-00000.gz.parquet

我尝试运行hdfs dfs -text dir/part-m-00000.gz.parquet,但是它已压缩,所以我运行了gunzip part-m-00000.gz.parquet,但是由于无法识别.parquet扩展名,因此它没有解压缩文件.

I've tried to run hdfs dfs -text dir/part-m-00000.gz.parquet but it's compressed, so I ran gunzip part-m-00000.gz.parquet but it doesn't uncompress the file since it doesn't recognise the .parquet extension.

如何获取此文件的架构/列名称?

How do I get the schema / column names for this file?

推荐答案

您将无法使用hdfs dfs -text打开"文件,因为它不是文本文件.与文本文件相比,Parquet文件写入磁盘的方式非常不同.

You won't be able "open" the file using a hdfs dfs -text because its not a text file. Parquet files are written to disk very differently compared to text files.

同样,Parquet项目提供了镶木地板工具来执行您要执行的任务.打开并查看架构,数据,元数据等.

And for the same matter, the Parquet project provides parquet-tools to do tasks like which you are trying to do. Open and see the schema, data, metadata etc.

检出parquet-tool项目(简单地说就是jar文件).镶木工具

Check out the parquet-tool project (which is put simply, a jar file.)parquet-tools

支持Parquet并为Parquet做出巨大贡献的Cloudera,也有一个漂亮的页面,其中包含有关Parquet工具用法的示例.该页面上针对您的用例的一个示例是

Also Cloudera which support and contributes heavily to Parquet, also has a nice page with examples on usage of parquet-tools. A example from that page for your use case is

parquet-tools schema part-m-00000.parquet

签出Cloudera页面. 将Parquet文件格式与Impala,Hive结合使用,Pig,HBase和MapReduce

Checkout the Cloudera page. Using the Parquet File Format with Impala, Hive, Pig, HBase, and MapReduce

这篇关于如何从实木复合地板文件中获取架构/列名称?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-11 07:22