文件数量巨大时如何提高perl中的grep效率

文件数量巨大时如何提高perl中的grep效率

本文介绍了文件数量巨大时如何提高perl中的grep效率的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用 perl 从位于以下目录结构中的日志文件中 grep 一些日志信息:$jobDir/jobXXXX/host.log 其中 XXXX 是作业编号,从 1 到几千.$jobDir 下没有其他类型的子目录,jobXXXX 下除了日志之外没有其他文件.脚本是:

I want to grep some log information from the log files located in the following directory structure using perl:$jobDir/jobXXXX/host.log where XXXX is a job number, from 1 to a few thousands. There's no other kinds of sub directory under $jobDir and no other files except logs under jobXXXX. The script is :

my  @Info;  #store the log informaiton
my $Num = 0;
@Info = qx(grep "information" -r $jobDir); #is this OK ?

foreach(@Info){
        if($_=~ /((d+))(.*)((d+))/){
            Output(xxxxxxxx);
        }
        $Num=$Num+1; #number count
    }

发现当job number是几千的时候,这个脚本需要很长时间才能输出信息.

It is found that when then job number is a few thousands, this script will take very long time to output the information.

有什么办法可以提高它的效率吗?

Is there any way to improve its efficiency?

谢谢!

推荐答案

你应该逐行搜索那些日志文件,并逐行扫描每个日志文件,而不是读取 grep 的输出内存(这可能会消耗大量内存,并减慢您的程序,甚至是您的系统):

You should search those log file one by one, and scan each log file line by line, instead of reading the output of grep to memory (that could cost lots of memory, and slow your program, even your system):

# untested script

my $Num;
foreach my $log (<$jobDir/job*/host.log>) {
    open my $logfh, '<', "$log" or die "Cannot open $log: $!";
    while (<$logfh>) {
        if (m/information/) {
            if(m/((d+))(.*)((d+))/) {
                Output(xxx);
            }
            $Num++;
        }
    }
    close $logfh;
}

这篇关于文件数量巨大时如何提高perl中的grep效率的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-11 00:32