前言:

相信提高文本检索工具,大家脑海里肯定有很多工具会自动跳出来,比如,grep,egrep,sed,cat,more,less,cut,awk,vim,vi,nano,strings等等工具(vim,nano 这样的文本编辑工具也是兼具文本检索功能的嘛,要是不能检索就没有存在的必要了~~~~~)

那么,这些文本信息检索工具里如果按照使用频率来说,第一个毫无疑问是文本编辑工具vim了,其次就是文本检索工具cat和grep了,在次就是sed和awk这样的流检索工具了,而egrep可能使用频率上就比较低了,但,其实有些工具就和人的性格一样:低调不代表能力不行,相反,egrep的强大应该是和vim一样毋庸置疑的啊!!!

其实呢,grep和grep这两个命令是比较有意思的,也就是非常的有用的,在Linux这里,grep和egrep这两个命令是必要要熟练掌握和运用的

那么,本文将就grep和egrep命令在做一个详细的介绍

一、

grep和egrep的相似处和不同处

正则表达式分类:

1、基本的正则表达式(Basic Regular Expression 又叫 Basic RegEx  简称 BREs)

2、扩展的正则表达式(Extended Regular Expression 又叫 Extended RegEx 简称 EREs)

3、Perl 的正则表达式(Perl Regular Expression 又叫 Perl RegEx 简称 PREs)

grep , egrep 正则表达式特点:

1)grep 支持:BREs、EREs、PREs 正则表达式

grep 指令后不跟任何参数,则表示要使用 ”BREs“

grep 指令后跟 ”-E" 参数,则表示要使用 “EREs“

grep 指令后跟 “-P" 参数,则表示要使用 “PREs"

2)egrep 支持:EREs、PREs 正则表达式

egrep 指令后不跟任何参数,则表示要使用 “EREs”

egrep 指令后跟 “-P" 参数,则表示要使用 “PREs"

  • egrep执行效果与"grep-E"相似,使用的语法及参数可参照grep指令,与grep的不同点在于解读字符串的方法。
  • egrep是用extended regular expression语法来解读的,而grep则用basic regular expression 语法解读,extended regular expression比basic regular expression的表达更规范,且egep支持更多的元字符,即egrep使用的是拓展的正则表达式
  • grep和egrep这两个命令的帮助基本是一致的,so,学会了一个命令,另一个命令就也学会了

那么,也就是可以简单的认为grep和egrep基本是一个命令,只是egrep使用的时候更为简练,搜索的时候正则规则egrep比grep更多,它们之间的关系可以认为是vim和vi (vim比vi多了语法,搜索高亮等等附加功能,但主要的功能---文本编辑并没有任何区别)的关系,这里,我可以保证,这么理解绝对绝对的没有问题~~~~!!!

🆗,那么,其实grep这个命令并没有使用的太多必要,如果只是简单的文本搜索过滤的话,好吧,grep还是可以勉强一用的

二、

grep和egrep的基本无脑用法

示例文件,我这里选择使用本地yum仓库的xml配置文件,选择此文件的原因是数字和字母都有,比较像配置文件,其实也确实是一个配置文件

可以看到,该文件的文本特征是数字和字母混杂在一起了,有纯字母行,但没有纯数字行

[root@centos7 repodata]# cat repomd.xml 
<?xml version="1.0" encoding="UTF-8"?>
<repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
 <revision>1709240060</revision>
<data type="filelists">
  <checksum type="sha256">406fe311374b513103fc6968c07da6f9a99b561c90871608cfd89cad119a1126</checksum>
  <open-checksum type="sha256">ba1d9203986defbf63d9361351ed24d33aabad40a014061e669f6fa62dccd868</open-checksum>
  <location href="repodata/406fe311374b513103fc6968c07da6f9a99b561c90871608cfd89cad119a1126-filelists.xml.gz"/>
  <timestamp>1709240061</timestamp>
  <size>87505</size>
  <open-size>935761</open-size>
</data>
<data type="primary">
  <checksum type="sha256">8595bbc4226db56075dfaf6317e5effd7f92f18d5a6d7526e77c6f6552cebf94</checksum>
  <open-checksum type="sha256">ffc6df89c274e8925cc631ead5f6cfe9435a193c22d5b22a9c99b400a51d881a</open-checksum>
  <location href="repodata/8595bbc4226db56075dfaf6317e5effd7f92f18d5a6d7526e77c6f6552cebf94-primary.xml.gz"/>
  <timestamp>1709240061</timestamp>
  <size>49182</size>
  <open-size>444084</open-size>
</data>
<data type="primary_db">
  <checksum type="sha256">e65d3a6cf36df5021676d932b2a2d0b0c03e4271655c045c07e94222a5290b18</checksum>
  <open-checksum type="sha256">c24e01dc5b3388cba3b876c64ddf9406cffb175ee3b2a7fd7e4b11a43c62a84a</open-checksum>
  <location href="repodata/e65d3a6cf36df5021676d932b2a2d0b0c03e4271655c045c07e94222a5290b18-primary.sqlite.bz2"/>
  <timestamp>1709240061</timestamp>
  <database_version>10</database_version>
  <size>115549</size>
  <open-size>524288</open-size>
</data>
<data type="other_db">
  <checksum type="sha256">fb5a57bd9729ce92ea197c79fb637d77f6c2abf1bedd8e6fe8407f6582b118d3</checksum>
  <open-checksum type="sha256">a2fb0f11d2991d2756117198c24cc5e92ddcefa763a6708b4a3f6a55c55e41ce</open-checksum>
  <location href="repodata/fb5a57bd9729ce92ea197c79fb637d77f6c2abf1bedd8e6fe8407f6582b118d3-other.sqlite.bz2"/>
  <timestamp>1709240061</timestamp>
  <database_version>10</database_version>
  <size>60744</size>
  <open-size>296960</open-size>
</data>
<data type="other">
  <checksum type="sha256">f31c3eb35c261d4437cfa9dc1fef589cd0142ce2575cedc6ccca0e9c355c48f8</checksum>
  <open-checksum type="sha256">2708ba2eac9dff6e1118067b0b1bc42f138ccf58335c92214dcae503813e3112</open-checksum>
  <location href="repodata/f31c3eb35c261d4437cfa9dc1fef589cd0142ce2575cedc6ccca0e9c355c48f8-other.xml.gz"/>
  <timestamp>1709240061</timestamp>
  <size>43373</size>
  <open-size>319989</open-size>
</data>
<data type="filelists_db">
  <checksum type="sha256">258ab32bd065ef14ed56b34a06798c95ee2de2652b83dce93a3cd8d62928d3c3</checksum>
  <open-checksum type="sha256">25a2e76fe87f7fed34ace38b0daae9de3b9b8f4aca259441d226f6633db3ab10</open-checksum>
  <location href="repodata/258ab32bd065ef14ed56b34a06798c95ee2de2652b83dce93a3cd8d62928d3c3-filelists.sqlite.bz2"/>
  <timestamp>1709240061</timestamp>
  <database_version>10</database_version>
  <size>106570</size>
  <open-size>476160</open-size>
</data>
</repomd>

🆗,什么叫无脑呢?这里自然说的是不需要绞尽脑汁想正则表达式,也就是跟cat命令一样,仅仅查看文本文档的内容

说明:空单引号表示没有字符串过滤,自然就是取的所有文本内容了,通过管道符结合tail -n 10 回显了该文件的最后10行,其实这个命令等同于cat repomd.xml|tail -n 10

[root@centos7 repodata]# grep '' repomd.xml |tail -n 10
<data type="filelists_db">
  <checksum type="sha256">258ab32bd065ef14ed56b34a06798c95ee2de2652b83dce93a3cd8d62928d3c3</checksum>
  <open-checksum type="sha256">25a2e76fe87f7fed34ace38b0daae9de3b9b8f4aca259441d226f6633db3ab10</open-checksum>
  <location href="repodata/258ab32bd065ef14ed56b34a06798c95ee2de2652b83dce93a3cd8d62928d3c3-filelists.sqlite.bz2"/>
  <timestamp>1709240061</timestamp>
  <database_version>10</database_version>
  <size>106570</size>
  <open-size>476160</open-size>
</data>
</repomd>

自然的,egrep的效果和grep是一模一样的,无需废话多言:

[root@centos7 repodata]# egrep '' repomd.xml |tail -n 10
<data type="filelists_db">
  <checksum type="sha256">258ab32bd065ef14ed56b34a06798c95ee2de2652b83dce93a3cd8d62928d3c3</checksum>
  <open-checksum type="sha256">25a2e76fe87f7fed34ace38b0daae9de3b9b8f4aca259441d226f6633db3ab10</open-checksum>
  <location href="repodata/258ab32bd065ef14ed56b34a06798c95ee2de2652b83dce93a3cd8d62928d3c3-filelists.sqlite.bz2"/>
  <timestamp>1709240061</timestamp>
  <database_version>10</database_version>
  <size>106570</size>
  <open-size>476160</open-size>
</data>
</repomd>

三、

反向过滤 -v   逻辑符 not

egrep 过滤,去除所有带数字的行,只保留纯字母行

[root@centos7 repodata]# egrep -v '[0-9]' repomd.xml 
<repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
<data type="filelists">
</data>
<data type="primary">
</data>
<data type="primary_db">
</data>
<data type="other_db">
</data>
<data type="other">
</data>
<data type="filelists_db">
</data>
</repomd>

同样的,过滤去除所有带字母的行,只保留纯数字行(没有纯数字行,因此没有输出):

[root@centos7 repodata]# egrep -v '[a-z]' repomd.xml 
[root@centos7 repodata]# 

四、

过滤单词  完全精确匹配 -w

###说明:sed命令精确匹配要写很多,比较麻烦

[root@centos7 repodata]# egrep -w 'size' repomd.xml 
  <size>87505</size>
  <open-size>935761</open-size>
  <size>49182</size>
  <open-size>444084</open-size>
  <size>115549</size>
  <open-size>524288</open-size>
  <size>60744</size>
  <open-size>296960</open-size>
  <size>43373</size>
  <open-size>319989</open-size>
  <size>106570</size>
  <open-size>476160</open-size>


[root@centos7 repodata]# egrep -w 'siz' repomd.xml ###这个无输出回显

五、

匹配行显示行号

过滤的同时显示行号 -n 显示行号

[root@centos7 repodata]# egrep -n -v '[0-9]' repomd.xml 
2:<repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
4:<data type="filelists">
11:</data>
12:<data type="primary">
19:</data>
20:<data type="primary_db">
28:</data>
29:<data type="other_db">
37:</data>
38:<data type="other">
45:</data>
46:<data type="filelists_db">
54:</data>
55:</repomd>
[root@centos7 repodata]# cat repomd.xml |egrep -n -v '[0-9]'
2:<repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
4:<data type="filelists">
11:</data>
12:<data type="primary">
19:</data>
20:<data type="primary_db">
28:</data>
29:<data type="other_db">
37:</data>
38:<data type="other">
45:</data>
46:<data type="filelists_db">
54:</data>
55:</repomd>

 🆗,这里就有意思了,egrep可以回显上下文

行号+上下文回显  -C 1 显示匹配行的上下文各一行 数字按需指定

[root@centos7 repodata]# egrep -C 1 -n  -w 'other' repomd.xml 
31-  <open-checksum type="sha256">a2fb0f11d2991d2756117198c24cc5e92ddcefa763a6708b4a3f6a55c55e41ce</open-checksum>
32:  <location href="repodata/fb5a57bd9729ce92ea197c79fb637d77f6c2abf1bedd8e6fe8407f6582b118d3-other.sqlite.bz2"/>
33-  <timestamp>1709240061</timestamp>
--
37-</data>
38:<data type="other">
39-  <checksum type="sha256">f31c3eb35c261d4437cfa9dc1fef589cd0142ce2575cedc6ccca0e9c355c48f8</checksum>
40-  <open-checksum type="sha256">2708ba2eac9dff6e1118067b0b1bc42f138ccf58335c92214dcae503813e3112</open-checksum>
41:  <location href="repodata/f31c3eb35c261d4437cfa9dc1fef589cd0142ce2575cedc6ccca0e9c355c48f8-other.xml.gz"/>
42-  <timestamp>1709240061</timestamp>

行号+上下文回显  -A 1 显示匹配行的下文一行  数字按需指定

[root@centos7 repodata]# egrep -A 1 -n  -w 'other' repomd.xml 
32:  <location href="repodata/fb5a57bd9729ce92ea197c79fb637d77f6c2abf1bedd8e6fe8407f6582b118d3-other.sqlite.bz2"/>
33-  <timestamp>1709240061</timestamp>
--
38:<data type="other">
39-  <checksum type="sha256">f31c3eb35c261d4437cfa9dc1fef589cd0142ce2575cedc6ccca0e9c355c48f8</checksum>
--
41:  <location href="repodata/f31c3eb35c261d4437cfa9dc1fef589cd0142ce2575cedc6ccca0e9c355c48f8-other.xml.gz"/>
42-  <timestamp>1709240061</timestamp>

Linux技巧|centos7|重新认识和学习egrep和grep命令-LMLPHP

行号+上下文回显  -B 1 显示匹配行的上文一行 数字按需指定

[root@centos7 repodata]# egrep -B 1 -n  -w 'other' repomd.xml 
31-  <open-checksum type="sha256">a2fb0f11d2991d2756117198c24cc5e92ddcefa763a6708b4a3f6a55c55e41ce</open-checksum>
32:  <location href="repodata/fb5a57bd9729ce92ea197c79fb637d77f6c2abf1bedd8e6fe8407f6582b118d3-other.sqlite.bz2"/>
--
37-</data>
38:<data type="other">
--
40-  <open-checksum type="sha256">2708ba2eac9dff6e1118067b0b1bc42f138ccf58335c92214dcae503813e3112</open-checksum>
41:  <location href="repodata/f31c3eb35c261d4437cfa9dc1fef589cd0142ce2575cedc6ccca0e9c355c48f8-other.xml.gz"/>

Linux技巧|centos7|重新认识和学习egrep和grep命令-LMLPHP

六、

特征匹配

正则表达式 以">结束的行 

[root@centos7 repodata]# egrep -n '.*">$' repomd.xml 
2:<repomd xmlns="http://linux.duke.edu/metadata/repo" xmlns:rpm="http://linux.duke.edu/metadata/rpm">
4:<data type="filelists">
12:<data type="primary">
20:<data type="primary_db">
29:<data type="other_db">
38:<data type="other">
46:<data type="filelists_db">

 Linux技巧|centos7|重新认识和学习egrep和grep命令-LMLPHP

以</ 开始的行

[root@centos7 repodata]# egrep -n '^</' repomd.xml 
11:</data>
19:</data>
28:</data>
37:</data>
45:</data>
54:</data>
55:</repomd>

很多时候,查看配置文件的时候,不想看那些以#开始的注释文件,那么egrep将会非常简单:

下面一个是不过滤#开始的行,一个是过滤掉#开始的行

[root@centos7 ~]# egrep '^#'  /etc/selinux/config 
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected. 
#     mls - Multi Level Security protection.
[root@centos7 ~]# egrep -v '^#'  /etc/selinux/config 

SELINUX=disabled
SELINUXTYPE=targeted 

 七、

递归目录匹配并忽略大小写

egrep -r 递归扫描  -i 忽略大小写  搜索/etc/目录下所有包含root这个单词的文件

Binary file /etc/aliases.db matches
[root@centos7 repodata]# egrep -r -i 'root' /etc/ |more
/etc/fstab:/dev/mapper/centos-root /                       xfs     defaults        0 0
/etc/pki/ca-trust/ca-legacy.conf:# The upstream Mozilla.org project tests all changes to the root CA
/etc/pki/ca-trust/ca-legacy.conf:# to temporarily keep certain (legacy) root CA certificates trusted,
/etc/pki/ca-trust/ca-legacy.conf:#   It may keep root CA certificate as trusted, which the upstream 
/etc/pki/ca-trust/extracted/README:root CA certificates.
/etc/pki/ca-trust/extracted/java/README:root CA certificates.
Binary file /etc/pki/ca-trust/extracted/java/cacerts matches
/etc/pki/ca-trust/extracted/openssl/README:root CA certificates.
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt:# Actalis Authentication Root CA
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt:# AddTrust External Root
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt:# AddTrust Low-Value Services Root
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt:# Amazon Root CA 1
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt:# Amazon Root CA 2
/etc/pki/ca-trust/extracted/openssl/ca-bundle.trust.crt:# Amazon Root CA 3

八、

搜索包含有5个连续数字的行,5个数字后接的是<  ,<是这个文件内的数字后的特征符, 可以看到有10位数字被匹配到了,并不是十分准确

下面这个命令等价于 egrep '[0-9]{5}\b' repomd.xml

[root@centos7 repodata]# egrep  '[.*[:digit:]]{5}<' repomd.xml 
 <revision>1709240060</revision>
  <timestamp>1709240061</timestamp>
  <size>87505</size>
  <open-size>935761</open-size>
  <timestamp>1709240061</timestamp>
  <size>49182</size>
  <open-size>444084</open-size>
  <timestamp>1709240061</timestamp>
  <size>115549</size>
  <open-size>524288</open-size>
  <timestamp>1709240061</timestamp>
  <size>60744</size>
  <open-size>296960</open-size>
  <timestamp>1709240061</timestamp>
  <size>43373</size>
  <open-size>319989</open-size>
  <timestamp>1709240061</timestamp>
  <size>106570</size>
  <open-size>476160</open-size>

Linux技巧|centos7|重新认识和学习egrep和grep命令-LMLPHP

由于没有达到预期:精确匹配任意的只有5个数字的行   修正表达式如下:

[root@centos7 repodata]# egrep '\b[0-9]{5}\b' repomd.xml 
  <size>87505</size>
  <size>49182</size>
  <size>60744</size>
  <size>43373</size>

🆗,那么,-w 参数等价于 \b 正则表达式,证据如下(两个命令输出是一样的):

[root@centos7 repodata]# egrep '\bsize\b' repomd.xml 
  <size>87505</size>
  <open-size>935761</open-size>
  <size>49182</size>
  <open-size>444084</open-size>
  <size>115549</size>
  <open-size>524288</open-size>
  <size>60744</size>
  <open-size>296960</open-size>
  <size>43373</size>
  <open-size>319989</open-size>
  <size>106570</size>
  <open-size>476160</open-size>
[root@centos7 repodata]# egrep -w 'size' repomd.xml 
  <size>87505</size>
  <open-size>935761</open-size>
  <size>49182</size>
  <open-size>444084</open-size>
  <size>115549</size>
  <open-size>524288</open-size>
  <size>60744</size>
  <open-size>296960</open-size>
  <size>43373</size>
  <open-size>319989</open-size>
  <size>106570</size>
  <open-size>476160</open-size>
03-03 05:02