问题描述
如何使用 gawk 解析 CSV 文件?仅仅设置 FS=","
是不够的,因为带有逗号的引用字段将被视为多个字段.
How do you parse a CSV file using gawk? Simply setting FS=","
is not enough, as a quoted field with a comma inside will be treated as multiple fields.
使用 FS=","
不起作用的示例:
Example using FS=","
which does not work:
文件内容:
one,two,"three, four",five
"six, seven",eight,"nine"
gawk 脚本:
BEGIN { FS="," }
{
for (i=1; i<=NF; i++) printf "field #%d: %s
", i, $(i)
printf "---------------------------
"
}
错误输出:
field #1: one
field #2: two
field #3: "three
field #4: four"
field #5: five
---------------------------
field #1: "six
field #2: seven"
field #3: eight
field #4: "nine"
---------------------------
期望的输出:
field #1: one
field #2: two
field #3: "three, four"
field #4: five
---------------------------
field #1: "six, seven"
field #2: eight
field #3: "nine"
---------------------------
推荐答案
简短的回答是如果 CSV 包含尴尬的数据,我不会使用 gawk 来解析 CSV",其中 'awkward' 的意思是 CSV 中的逗号之类的东西现场数据.
The short answer is "I wouldn't use gawk to parse CSV if the CSV contains awkward data", where 'awkward' means things like commas in the CSV field data.
下一个问题是您将进行哪些其他处理",因为这会影响您使用的替代方案.
The next question is "What other processing are you going to be doing", since that will influence what alternatives you use.
我可能会使用 Perl 和 Text::CSV 或 Text::CSV_XS 模块来读取和处理数据.请记住,Perl 最初部分是作为 awk
和 sed
杀手编写的 - 因此 a2p
和 s2p
程序仍然与 Perl 一起分发,将 awk
和 sed
脚本(分别)转换为 Perl.
I'd probably use Perl and the Text::CSV or Text::CSV_XS modules to read and process the data. Remember, Perl was originally written in part as an awk
and sed
killer - hence the a2p
and s2p
programs still distributed with Perl which convert awk
and sed
scripts (respectively) into Perl.
这篇关于使用 gawk 解析 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!