问题描述
我们正在将包含XML文件的zip文件存储在HDFS中。我们需要能够以编程方式解压缩文件并使用Java流出包含的XML文件。 FileSystem.open返回一个FSDataInputStream,但ZipFile构造函数只将File或String作为参数。我真的不想使用FileSystem.copyToLocalFile。
We are storing zip files, containing XML files, in HDFS. We need to be able to programmatically unzip the file and stream out the contained XML files, using Java. FileSystem.open returns a FSDataInputStream but ZipFile constructors only take File or String as parameters. I really don't want to have to use FileSystem.copyToLocalFile.
是否有可能在不先复制zip文件的情况下将流中存储在HDFS中的zip文件的内容本地文件系统?如果是这样的话?
Is it possible to stream the contents of a zip file stored in HDFS without first copying the zip file to the local file system? If so how?
推荐答案
嗨请找到示例代码,
Hi Please find the sample code,
public static Map<String, byte[]> loadZipFileData(String hdfsFilePath) {
try {
ZipInputStream zipInputStream = readZipFileFromHDFS(new Path(hdfsFilePath));
ZipEntry zipEntry = null;
byte[] buf = new byte[1024];
Map<String, byte[]> listOfFiles = new LinkedHashMap<>();
while ((zipEntry = zipInputStream.getNextEntry()) != null ) {
int bytesRead = 0;
String entryName = zipEntry.getName();
if (!zipEntry.isDirectory()) {
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
while ((bytesRead = zipInputStream.read(buf, 0, 1024)) > -1) {
outputStream.write(buf, 0, bytesRead);
}
listOfFiles.put(entryName, outputStream.toByteArray());
outputStream.close();
}
zipInputStream.closeEntry();
}
zipInputStream.close();
return listOfFiles;
} catch (Exception e) {
e.printStackTrace();
}
}
protected ZipInputStream readZipFileFromHDFS(FileSystem fileSystem, Path path) throws Exception {
if (!fileSystem.exists(path)) {
throw new IllegalArgumentException(path.getName() + " does not exist");
}
FSDataInputStream fsInputStream = fileSystem.open(path);
ZipInputStream zipInputStream = new ZipInputStream(fsInputStream);
return zipInputStream;
}
这篇关于如何在不先复制到本地文件系统的情况下使用Java解压缩存储在HDFS中的文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!