问题描述
所以我有以下示例 html 来解析.
<strong>标题:</strong>NEWS ABC 副编辑<strong>名称:</strong>约翰<strong>其中:</strong>到处<strong>什么时候:</strong>任何时间<strong>一切都可以在那里消失..</strong>Lorem Ipsum 等等等等....
我想提取整个 div,除非我不希望标题和地点和时间使用以下值.
到目前为止,我已经按照 XPath 进行了测试.
a) 没有跟随兄弟(1:不工作.2:工作)
1.//div/node()[not(strong[contains(text(), "Title")])]2.//div/node()[not(self::strong and contains(text(), "Title"))]
a) 有以下兄弟姐妹(1:不工作.2:不工作)
1.//div/node()[not(strong[contains(text(), "Title")]) and not(strong[contains(text(), "Title")]/following-sibling::text())]2.//div/node()[not(self::strong and contains(text(), "Title") and following-sibling::text())]
如何实现我所追求的目标?
我认为以下内容符合您的要求 - 它排除了包含标题的强元素以及它之后的文本节点.您可以将其扩展为包含要排除的其他强元素:
//div/node()[not(self::strong and contains(text(), "Title") 或previous-sibling::strong[1][contains(text(), "Title")])]
强节点被跳过:
not(self::strong and contains(text(), "Title")
以下文字被跳过:
preceding-sibling::strong[1][contains(text(), "Title")]
请注意,文本节点需要检查其最近的前一个兄弟节点(而不是其后面的兄弟节点).
So I have following example html to parse.
<div>
<strong>Title:</strong>
Sub Editor at NEWS ABC
<strong>Name:</strong>
John
<strong>Where:</strong>
Everywhere
<strong>When:</strong>
Anytime
<strong>Everything can go down there..</strong>
Lorem Ipsum blah blah blah....
</div>
I want to extract this whole div except I don't want Title and Where and When heading with their following values.
I have tested following XPaths so far.
a) Without following sibling (1: don't work. 2: works)
1. //div/node()[not(strong[contains(text(), "Title")])]
2. //div/node()[not(self::strong and contains(text(), "Title"))]
a) With following sibling (1: don't work. 2: don't work)
1. //div/node()[not(strong[contains(text(), "Title")]) and not(strong[contains(text(), "Title")]/following-sibling::text())]
2. //div/node()[not(self::strong and contains(text(), "Title") and following-sibling::text())]
How to achieve what I am after?
I think the following meets what you are trying to do - it excludes the strong element containing title as well as the text node that is after it. You could expand it to include the other strong elements you want to exclude:
//div/node()[not(self::strong and contains(text(), "Title") or preceding-sibling::strong[1][contains(text(), "Title")])]
The strong node is skipped by the:
not(self::strong and contains(text(), "Title")
The following text is skipped by the:
preceding-sibling::strong[1][contains(text(), "Title")]
Note that the text node needs to check its closest preceding sibling (rather than its following sibling).
这篇关于XPath 选择所有但不选择 self::strong 和 self::strong/following-sibling::text()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!