解决方案 awk 是很好的,但对于你的问题,慢如果你必须使用system()来计算md5。 awk 如果第一个字段有任何嵌入的逗号,也可能不适合任务。 这里是一个快速(或至少快得多)解决方案使用 php ,我发现有各种条纹和色调的CSV的优秀支持。您应该能够在Mac或类Linux平台上以脚本形式运行。 #!/ usr / bin / env php <?php #语法:$ 0 [PATHNAME] #一个希望输入为CSV格式的过滤器。 #如果PATHNAME是 - 或未指定,则从STDIN输入。 #输出是相同的CSV,但粘贴了第一个字段的md5。 $ file =($ argc> 1&& $ argv [1]!=)? $ argv [1]:'php:// stdin'; if($ file == - ){$ file ='php:// stdin'; } $ handle = @fopen($ file,r); $ sep =,; if($ handle){ while(($ data = fgetcsv($ handle,0,$ sep))!== FALSE){ $ num = count $ data); $ data [] = md5($ data [0]); fputcsv(STDOUT,$ data,$ sep); } fclose($ handle); } else { echo{$ argv [0]}:无法fopen $ argv [1] \\\; exit(1); } ?> 如果你想让输入行保持不变,那么你可以在字面上读取并使用str_getcsv ()来解析它等。 I need to insert a new field containing the MD5 Hash value of the first field for each line of an 80 GB csv file.For small projects, I have been able to do this in excel by passing the field value to=WEBSERVICE(CONCATENATE("https://helloacm.com/api/md5/?s="&ENCODEURL(A1)))However, with the 80 GB file, that is not an option. Via AWK, is it possible to pull the first field of each row in this massive csv, calculate the md5 for the content of the first field, and insert that value back into the same line?Example line:Original:"value001","value002","Value003","Value004","Value005","Value006","Value007"Revised Example line with md5ofvalue001 field inserted:"value001","MD5ofValue001","value002","Value003","Value004","Value005","Value006","Value007" 解决方案 awk is great, but for your problem, it will probably be much too slow if you have to use system() to calculate the md5. awk may also be poorly suited to the task if the first field has any embedded commas.In any case, here is a fast (or at least much faster) solution using php, which I've found to have excellent support for CSV of various stripes and hues. You should be able to run this as a script on a Mac or Linux-like platform.#!/usr/bin/env php<?php# Syntax: $0 [PATHNAME]# A filter that expects its input to have the CSV format.# Input is taken from STDIN if PATHNAME is - or not specified.# Output is the same CSV but with the md5 of the first field tacked on.$file = ($argc > 1 && $argv[1] != "" ) ? $argv[1] : 'php://stdin';if ( $file == "-" ) { $file = 'php://stdin'; }$handle = @fopen($file, "r");$sep = ",";if ($handle) { while (($data = fgetcsv($handle, 0, $sep)) !== FALSE) { $num = count($data); $data[] = md5($data[0]); fputcsv(STDOUT, $data, $sep); } fclose($handle);} else { echo "{$argv[0]}: unable to fopen $argv[1]\n"; exit(1);}?>If you want to leave the input lines unaltered, then you could read in the line literally and use str_getcsv() to parse it, etc. 这篇关于将md5哈希值添加到大型CSV文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!
10-28 07:51