本文介绍了使用 gawk 解析 CSV 文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何使用 gawk 解析 CSV 文件?仅仅设置 FS="," 是不够的,因为带有逗号的引用字段将被视为多个字段.

How do you parse a CSV file using gawk? Simply setting FS="," is not enough, as a quoted field with a comma inside will be treated as multiple fields.

使用 FS="," 不起作用的示例:

Example using FS="," which does not work:

文件内容:

one,two,"three, four",five
"six, seven",eight,"nine"

gawk 脚本:

BEGIN { FS="," }
{
  for (i=1; i<=NF; i++) printf "field #%d: %s
", i, $(i)
  printf "---------------------------
"
}

错误输出:

field #1: one
field #2: two
field #3: "three
field #4:  four"
field #5: five
---------------------------
field #1: "six
field #2:  seven"
field #3: eight
field #4: "nine"
---------------------------

期望的输出:

field #1: one
field #2: two
field #3: "three, four"
field #4: five
---------------------------
field #1: "six, seven"
field #2: eight
field #3: "nine"
---------------------------

推荐答案

简短的回答是如果 CSV 包含尴尬的数据,我不会使用 gawk 来解析 CSV",其中 'awkward' 的意思是 CSV 中的逗号之类的东西现场数据.

The short answer is "I wouldn't use gawk to parse CSV if the CSV contains awkward data", where 'awkward' means things like commas in the CSV field data.

下一个问题是您将进行哪些其他处理",因为这会影响您使用的替代方案.

The next question is "What other processing are you going to be doing", since that will influence what alternatives you use.

我可能会使用 Perl 和 Text::CSV 或 Text::CSV_XS 模块来读取和处理数据.请记住,Perl 最初部分是作为 awksed 杀手编写的 - 因此 a2ps2p 程序仍然与 Perl 一起分发,将 awksed 脚本(分别)转换为 Perl.

I'd probably use Perl and the Text::CSV or Text::CSV_XS modules to read and process the data. Remember, Perl was originally written in part as an awk and sed killer - hence the a2p and s2p programs still distributed with Perl which convert awk and sed scripts (respectively) into Perl.

这篇关于使用 gawk 解析 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

05-18 03:12