本文介绍了使用 sed 提取子串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下日志文​​件:

I have a log file of the following:

example.com - - - 127.0.01 [22/Sep/2013:07:22:22 +0000]  "POST /api/test.php HTTP/1.1" 200 355 "-" "-" "{\x22id\x22:\x22 ... }}}"

example.com - - - 127.0.01 [22/Sep/2013:07:22:22 +0000]  "POST /api/test.php HTTP/1.1" 200 355 "-" "-" "{\x22id\x22:"{\x22 ... }}}"

我想把第一个 {\x22 提取到最后一个 }

I want to extract the first {\x22 to the last }

所以我使用以下 sed 命令:

So I am using the following sed command:

cat test.txt  | sed -r  's/.+?"(\{.+\})".*/\1/g'

然而,它给了我

{\x22id\x22:\x22 ... }}}

{\x22 ... }}}

但是我想要

{\x22id\x22:\x22 ... }}}

{\x22id\x22:"{\x22 ... }}}

推荐答案

这可能对你有用(GNU sed):

This might work for you (GNU sed):

sed 's/\({\\x22.*}\).*/\n\1/;s/.*\n//' file

贪婪是你在第一场比赛中的敌人,所以对 {\x22 使用分而治之的习语.即放置一个唯一标记(在本例中为 \n)并使用第二个替换命令删除字符串的第一部分.对于最后一个 } 贪婪是你的朋友,因为 .*} 会自己找到最后一个匹配.

Greed is your enemy in the first match so use the divide-and-conquer idiom for the {\x22. That is place a unique marker (in this case \n) and use a second substitute command to remove the first part of the string. For the last } greed is your friend as .*} will find the last match by itself.

注意如果第一个匹配项是单个字符,例如 X,那么否定字符类 [^X]* 就足够了.然而,由于它是一个字符串(两个或更多字符),这将不起作用.

N.B. If the first match were a single character, say X then a negated character class [^X]* would be suffice. However as it is a string (two or more characters) this will not work.

这篇关于使用 sed 提取子串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

09-22 13:40