本文介绍了使用Apache Nifi在CSV内转换日期格式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要在Apache Nifi环境中修改CSV文件.

I need to modify CSV file in Apache Nifi environment.

我的CSV看起来像文件:

My CSV looks like file:

Advertiser ID,Campaign Start Date,Campaign End Date,Campaign Name
10730729,1/29/2020 3:00:00 AM,2/20/2020 3:00:00 AM,Nestle
40376079,2/1/2020 3:00:00 AM,4/1/2020 3:00:00 AM,Heinz
...

我想将具有AM/PM值的日期转换为简单日期格式.每行从 1/29/2020 3:00:00 2020-01-29 .我读到有关 UpdateRecord 处理器的信息,但是有问题.如您所见, CSV标头包含空格,我什至都无法使用替换价值策略(文字和记录路径)来解析这些字段.

I want to transform dates with AM/PM values to simple date format. From 1/29/2020 3:00:00 AM to 2020-01-29 for each row. I read about UpdateRecord processor, but there is a problem. As you can see, CSV headers contain spaces and I can't even parse these fields with both Replacement Value Strategy (Literal and Record Path).

有什么想法可以解决这个问题?也许我应该以某种方式将标头从 Advertiser ID 修改为 advertiser_id 等,

Any ideas to solve this problem? Maybe somehow I should modify headers from Advertiser ID to advertiser_id, etc?

推荐答案

您不需要亲自进行转换,可以让您的读者和作家为您处理.但是,要使CSV阅读器识别日期,您将需要为行定义一个架构.您的架构看起来像这样(我已从列名称中删除了空格,因为不允许使用它们):

You don't need to actually make the transformation yourself, you can let your Readers and Writers handle it for you. To get the CSV Reader to recognize dates though, you will need to define a schema for your rows. Your schema would look something like this (I've removed the spaces from the column names because they are not allowed):

{
    "type": "record",
    "name": "ExampleCSV",
    "namespace": "Stackoverflow",
    "fields": [
        {"name": "AdvertiserID", "type": "string"},
        {"name": "CampaignStartDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
        {"name": "CampaignEndDate", "type" : {"type": "long", "logicalType" : "timestamp-micros"}},
        {"name": "CampaignName", "type": "string"}
    ]
}

要配置阅读器,请设置以下属性:

To configure the reader, set the following properties:

  • 模式访问策略=使用模式文本"属性
  • 模式文本=(在代码块之上)
  • 将第一行视为标题= True
  • 时间戳格式="MM/dd/yyyy hh:mm:ss a"

此外,如果您不想或无法更改上游系统以删除空格,则可以将此属性设置为忽略CSV的标头.

Additionally you can set this property to ignore the Header of the CSV if you don't want to or are unable to change the upstream system to remove the spaces.

  • 忽略CSD标头列名= True

然后在CSVRecordSetWriter服务中,可以指定以下内容:

Then in your CSVRecordSetWriter service you can specify the following:

  • 架构访问策略=继承记录架构
  • 时间戳格式="yyyy-MM-dd"

您可以使用UpdateRecord或ConvertRecord(或其他,只要它们允许您同时指定读取器和写入器),它将为您完成转换.UpdateRecord和ConvertRecord之间的区别在于UpdateRecord要求您指定用户定义的属性,因此,如果这是您将要进行的唯一更改,则只需使用ConvertRecord.如果您还有其他转换,则应使用UpdateRecord并同时进行这些更改.

You can use UpdateRecord or ConvertRecord (or others as long as they allow you to specify both a reader and a writer)and it will just do the conversion for you. The difference between UpdateRecord and ConvertRecord is that UpdateRecord requires you to specify a user defined property, so if this is the only change you will make, just use ConvertRecord. If you have other transformations, you should use UpdateRecord and make those changes at the same time.

注意事项:这将使用新的列名(在我的示例中为不包含空格的列名)重写文件,因此请记住此点以供下游使用.

Caveat: This will rewrite the file using the new column names (in my example, ones without spaces) so keep that in mind for downstream usage.

这篇关于使用Apache Nifi在CSV内转换日期格式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 10:33