问题描述
我有以下日志文件:
I have a log file of the following:
example.com - - - 127.0.01 [22/Sep/2013:07:22:22 +0000] "POST /api/test.php HTTP/1.1" 200 355 "-" "-" "{\x22id\x22:\x22 ... }}}"
example.com - - - 127.0.01 [22/Sep/2013:07:22:22 +0000] "POST /api/test.php HTTP/1.1" 200 355 "-" "-" "{\x22id\x22:"{\x22 ... }}}"
我想把第一个 {\x22
提取到最后一个 }
I want to extract the first {\x22
to the last }
所以我使用以下 sed 命令:
So I am using the following sed command:
cat test.txt | sed -r 's/.+?"(\{.+\})".*/\1/g'
然而,它给了我
{\x22id\x22:\x22 ... }}}
{\x22 ... }}}
但是我想要
{\x22id\x22:\x22 ... }}}
{\x22id\x22:"{\x22 ... }}}
推荐答案
这可能对你有用(GNU sed):
This might work for you (GNU sed):
sed 's/\({\\x22.*}\).*/\n\1/;s/.*\n//' file
贪婪是你在第一场比赛中的敌人,所以对 {\x22
使用分而治之的习语.即放置一个唯一标记(在本例中为 \n
)并使用第二个替换命令删除字符串的第一部分.对于最后一个 }
贪婪是你的朋友,因为 .*}
会自己找到最后一个匹配.
Greed is your enemy in the first match so use the divide-and-conquer idiom for the {\x22
. That is place a unique marker (in this case \n
) and use a second substitute command to remove the first part of the string. For the last }
greed is your friend as .*}
will find the last match by itself.
注意如果第一个匹配项是单个字符,例如 X
,那么否定字符类 [^X]*
就足够了.然而,由于它是一个字符串(两个或更多字符),这将不起作用.
N.B. If the first match were a single character, say X
then a negated character class [^X]*
would be suffice. However as it is a string (two or more characters) this will not work.
这篇关于使用 sed 提取子串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!