本文介绍了如何使用水槽将zip文件上传到hdfs接收器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是flume的新手.我的flume代理以http服务器为源,从那里定期获取zip文件(压缩的xml文件).此zip文件很小(小于10 mb),我想放将zip文件解压缩到hdfs接收器中.请分享一些操作方法.我是否需要自定义拦截器.

I am new to flume.My flume agent having source as http server,from where it getting zip files(compressed xml files) on regular interval.This zip files are very small (less than 10 mb) and i want to put the zip files extracted into the hdfs sink.Please share some idea how to do this.Do i have to go for a custom interceptor.

推荐答案

Flume会尝试逐行读取文件,除非您配置了特定的反序列化器.解串器使您可以控制文件的解析和拆分方式.您当然可以遵循blob反序列化器的示例,该示例是为PDF等设计的,但是我知道您实际上是想解压缩它们,然后逐行阅读它们.在这种情况下,您将需要编写一个自定义解串器,以读取Zip并逐行记录事件.

Flume will try to read your files line by line, except if you configure a specific deserializer. A deserializer lets you control how the file is parsed and split into events. You could of course follow the example of the blob deserizalizer, which is designed for PDFs and such, but I understand that you actually want to unpack them and then read them line by line. In that case you would need to write a custom deserializer which reads Zip and writes line by line events.

以下是文档中的参考:

https://flume.apache.org/FlumeUserGuide.html#event-deserializers

这篇关于如何使用水槽将zip文件上传到hdfs接收器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-26 13:40