从Golang中的字符串中提取内部子字符串的最佳方法是什么?

输入:

"Hello <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

输出:
"this is paragraph \n
 this is paragraph 2"

Go的字符串包/库是否已经做过类似的事情?
package main

import (
    "fmt"
    "strings"
)

func main() {
    longString := "Hello world <p> this is paragraph </p> this is junk <p> this is paragraph 2 </p> this is junk 2"

    newString := getInnerStrings("<p>", "</p>", longString)

    fmt.Println(newString)
   //output: this is paragraph \n
    //        this is paragraph 2

}
func getInnerStrings(start, end, str string) string {
    //Brain Freeze
        //Regex?
        //Bytes Loop?
}

谢谢

最佳答案

Don't use regular expressions尝试解释HTML。使用fully capable HTML tokenizer and parser

我建议您阅读CodingHorror上的this article

关于regex - 在Golang中从HTML提取文本内容,我们在Stack Overflow上找到一个类似的问题:https://stackoverflow.com/questions/21000277/

10-17 01:39