本文介绍了XPath 选择所有但不选择 self::strong 和 self::strong/following-sibling::text()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有以下示例 html 来解析.

<strong>标题:</strong>NEWS ABC 副编辑<strong>名称:</strong>约翰<strong>其中:</strong>到处<strong>什么时候:</strong>任何时间<strong>一切都可以在那里消失..</strong>Lorem Ipsum 等等等等....

我想提取整个 div,除非我不希望标题和地点和时间使用以下值.

到目前为止,我已经按照 XPath 进行了测试.

a) 没有跟随兄弟(1:不工作.2:工作)

1.//div/node()[not(strong[contains(text(), "Title")])]2.//div/node()[not(self::strong and contains(text(), "Title"))]

a) 有以下兄弟姐妹(1:不工作.2:不工作)

1.//div/node()[not(strong[contains(text(), "Title")]) and not(strong[contains(text(), "Title")]/following-sibling::text())]2.//div/node()[not(self::strong and contains(text(), "Title") and following-sibling::text())]

如何实现我所追求的目标?

解决方案

我认为以下内容符合您的要求 - 它排除了包含标题的强元素以及它之后的文本节点.您可以将其扩展为包含要排除的其他强元素:

//div/node()[not(self::strong and contains(text(), "Title") 或previous-sibling::strong[1][contains(text(), "Title")])]

强节点被跳过:

not(self::strong and contains(text(), "Title")

以下文字被跳过:

preceding-sibling::strong[1][contains(text(), "Title")]

请注意,文本节点需要检查其最近的前一个兄弟节点(而不是其后面的兄弟节点).

So I have following example html to parse.

<div>
    <strong>Title:</strong>
    Sub Editor at NEWS ABC

    <strong>Name:</strong>
    John

    <strong>Where:</strong>
    Everywhere

    <strong>When:</strong>
    Anytime

    <strong>Everything can go down there..</strong>

    Lorem Ipsum blah blah blah....
</div>

I want to extract this whole div except I don't want Title and Where and When heading with their following values.

I have tested following XPaths so far.

a) Without following sibling (1: don't work. 2: works)

1. //div/node()[not(strong[contains(text(), "Title")])]

2. //div/node()[not(self::strong and contains(text(), "Title"))]

a) With following sibling (1: don't work. 2: don't work)

1. //div/node()[not(strong[contains(text(), "Title")]) and not(strong[contains(text(), "Title")]/following-sibling::text())]

2. //div/node()[not(self::strong and contains(text(), "Title") and following-sibling::text())]

How to achieve what I am after?

解决方案

I think the following meets what you are trying to do - it excludes the strong element containing title as well as the text node that is after it. You could expand it to include the other strong elements you want to exclude:

//div/node()[not(self::strong and contains(text(), "Title") or preceding-sibling::strong[1][contains(text(), "Title")])]

The strong node is skipped by the:

not(self::strong and contains(text(), "Title")

The following text is skipped by the:

preceding-sibling::strong[1][contains(text(), "Title")]

Note that the text node needs to check its closest preceding sibling (rather than its following sibling).

这篇关于XPath 选择所有但不选择 self::strong 和 self::strong/following-sibling::text()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

10-26 23:44