本文介绍了NiFi GetFile Processor中的文件过滤器属性我的正则表达式失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有要复制到HDFS的文件列表.

I have a list of files to copy to HDFS.

文件名如下:

  1. 样品11072016
  2. 样品11082016
  3. 样品11062016
  4. 样品11062016
  5. Denodo-09082016
  6. Denodo-09122016
  7. Denodo-11082016
  8. Denodo-11072016

现在,我正在尝试编写一个正则表达式,它将选择今天的样本"文件.文件后的数字是日期,如

Now I am trying to write a regex which would pick Today's Sample file.The digits following the file are dates as in

我尝试过的正则表达式是 [Sample]-(0-9){8} 当我检查8位数字时,此正则表达式将返回所有日期的所有示例文件.您能否建议如何查找具有今天日期的文件.这里的问题是文件名 Sample 在日期不断变化的地方保持不变.我必须编写一个正则表达式,以便它只能选择今天的日期文件.

The regex I tried is [Sample]-(0-9){8}This regex would return all Sample files with of all dates as I am checking for 8 digits.Could you please suggest on how to find the file with today's date.The problem here is the File name Sample stays constant where as the date keeps changing.I have to write a regex so that it would pick the file of today's date only.

我对Regex很陌生,是否可以编写一个正则表达式来检查日期是否为今天的日期.

I am pretty new to Regex, is it possible to write a regex to check if the date is today's date.

任何建议都会有所帮助.NIFI regex规则与Java Regex规则相同.正则表达式应针对 GetFile Processor

Any suggestions would help.NIFI regex Rules are same as Java Regex rules.The Regex Expression should be used against the File Filter Attribute of GetFile Processor

此致

Sai_PB.

推荐答案

您正好在正则表达式上.通过在方括号("["和]")之间放置样本",您的意思是第一个字符应与这些字符之一匹配".这是一个链接,对它进行了更深入的说明(请参阅字符类"部分).

You're almost there on the regex. By putting "Sample" in between the square brackets ('[' and ']'), you're saying "The first character should match one of these characters". Here is a link that explains it a bit more in depth (see the "Character Classes" section).

通过在括号中加上"0-9",您是在说捕获与字符"0-9"完全匹配的组".这是您想要方括号的地方.

Also by putting "0-9" in paranthesis, you're saying "Capture this group that matches the characters '0-9' exactly". Here is where you want the square brackets.

因此,您应该使用的正则表达式为"Sample- [0-9] {8}"(您可以使用"\ d"代替"0-9",但我想保留尽可能多的初始正则表达式作为可能).

So the regex you should be using is "Sample-[0-9]{8}" (you can use "\d" instead of "0-9" but I wanted to keep as much of your initial regex as possible).

您可以使用网站来测试您的正则表达式.

You can test your regex using this website.

为了解决第二个获取当天日志文件的问题,您应该能够使用上述正则表达式作为文件过滤器.然后,将调度策略"调整为每天运行一次(预计每天将写入该文件之后).最后,将最大文件年龄"设置为"24h"(根据需要进行调整,以确保只有最新的有效).这些配置将导致处理器每天运行一次,仅拾取与适当的过滤器匹配且不超过一天的文件.

In order to solve the second problem of picking up the current day's log file, you should be able to use the above regex as the File Filter. Then adjust the "Scheduling Strategy" to run once a day (after the file is expected to be written for the day). Lastly set the "Maximum File Age" to "24h" (adjust as necessary to be sure only the latest is valid). These configurations will cause the processor to run once per day, picking up only a file that matches the appropriate filter and is not older than a day old.

这篇关于NiFi GetFile Processor中的文件过滤器属性我的正则表达式失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 09:45