本文介绍了模拟“分组依据"的最佳方式来自 bash?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您有一个包含 IP 地址的文件,每行一个地址:

Suppose you have a file that contains IP addresses, one address in each line:

10.0.10.1
10.0.10.1
10.0.10.3
10.0.10.2
10.0.10.1

您需要一个 shell 脚本来计算每个 IP 地址在文件中出现的次数.对于之前的输入,您需要以下输出:

You need a shell script that counts for each IP address how many times it appears in the file. For the previous input you need the following output:

10.0.10.1 3
10.0.10.2 1
10.0.10.3 1

一种方法是:

cat ip_addresses |uniq |while read ip
do
    echo -n $ip" "
    grep -c $ip ip_addresses
done

然而,它离高效还差得很远.

However it is really far from being efficient.

你如何使用 bash 更有效地解决这个问题?

How would you solve this problem more efficiently using bash?

(补充一点:我知道它可以通过 perl 或 awk 解决,我对 bash 中更好的解决方案感兴趣,而不是那些语言.)

(One thing to add: I know it can be solved from perl or awk, I'm interested in a better solution in bash, not in those languages.)

附加信息:

假设源文件是5GB,运行算法的机器是4GB.所以 sort 不是一个有效的解决方案,也不是多次读取文件.

Suppose that the source file is 5GB and the machine running the algorithm has 4GB. So sort is not an efficient solution, neither is reading the file more than once.

我喜欢类似哈希表的解决方案 - 任何人都可以对该解决方案进行改进?

I liked the hashtable-like solution - anybody can provide improvements to that solution?

附加信息#2:

有些人问我为什么要费心在 bash 中做它,因为它更容易,例如珀尔.原因是在我必须执行此操作的机器上,perl 对我不可用.这是一台定制的 linux 机器,没有我习惯的大部分工具.我认为这是一个有趣的问题.

Some people asked why would I bother doing it in bash when it is way easier in e.g. perl. The reason is that on the machine I had to do this perl wasn't available for me. It was a custom built linux machine without most of the tools I'm used to. And I think it was an interesting problem.

所以请不要责怪这个问题,如果您不喜欢它,请忽略它.:-)

So please, don't blame the question, just ignore it if you don't like it. :-)

推荐答案

sort ip_addresses | uniq -c

这将首先打印计数,但除此之外,它应该正是您想要的.

This will print the count first, but other than that it should be exactly what you want.

这篇关于模拟“分组依据"的最佳方式来自 bash?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 07:28