本文介绍了Flume HDFS接收器:从文件名中删除时间戳记的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经为我的应用程序配置了flume代理,其中源是Spooldir,接收器是HDFS

I have configured flume agent for my application, where source is Spooldir and sink is HDFS

我能够在hdfs中收集文件.

I am able to collect files in hdfs.

代理配置为:

agent.sources = src-1
agent.channels = c1
agent.sinks = k1

agent.sources.src-1.type = spooldir
agent.sources.src-1.channels = c1
agent.sources.src-1.spoolDir = /home/Documents/id/
agent.sources.src-1.deserializer=org.apache.flume.sink.solr.morphline.BlobDeserializer$Builder
agent.sources.src-1.fileHeader=true
agent.channels.c1.type = file
agent.sources.src-1.basenameHeader=true
agent.sources.src-1.basenameHeaderKey=basename

agent.sinks.k1.type = hdfs
agent.sinks.k1.channel = c1
agent.sinks.k1.hdfs.path =hdfs://localhost:8020/user/flume/events/
agent.sinks.k1.hdfs.filePrefix = %{basename}
agent.sinks.k1.hdfs.fileHeader = true
agent.sinks.k1.hdfs.fileType = DataStream

我有以下格式的hdfs文件:

I am having hdfs files as below format:

我想知道是否可以删除时间戳(1411543838171)/为文件名的每个事件自动生成的唯一编号?

I want to know Could i remove timestamp(1411543838171) / unique number which is generated automatically for each event for file name?

推荐答案

似乎仅通过配置就无法删除时间戳.如果您查看HDFS Sink的工作方式,则会发现以下内容:

It doesn't seem to be possible to remove the timestamp just by using configuration. If you have a look at how HDFS Sink works you will find the following:

long counter = fileExtensionCounter.incrementAndGet();
String fullFileName = fileName + "." + counter;

fileExtensionCounterfileExtensionCounter = new AtomicLong(clock.currentTimeMillis());

您可以检查接收器的代码此处此处供作者使用.

You can check the code for the sink here and here for the writer.

如果要在单个文件中放置更多事件,则可以查看接收器属性

If what you want to do is put more events in a single file, then you can have a look at the sink properties

  • rollTime
  • rollSize
  • rollCount
  • batchSize
  • rollTime
  • rollSize
  • rollCount
  • batchSize

这篇关于Flume HDFS接收器:从文件名中删除时间戳记的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-26 13:40