本文介绍了如何在ElementTree中迭代子文本节点(而非子代)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出这样的元素

<A>
    hello

    <annotation> NOT part of text </annotation>

    world
</A>

如何仅获取子文本节点(如XPath text() ),使用 ElementTree

how can I get just the child text nodes (like XPath text()), using ElementTree?

两者 iter() itertext()是树遍历器,其中包括所有后代节点。我知道没有立即子迭代器。另外, iter()只能找到 elements (毕竟是ElementTree),因此不能用于收集文本节点,因为

Both iter() and itertext() are tree walkers, which include all descendant nodes. There is no immediate child iterator that I'm aware of. Plus, iter() only finds elements, anyway (it is after all, ElementTree), so can't be used to collect text nodes as such.

我知道有一个名为 lxml 的库,它提供了更好的XPath支持,但是我在这里问在添加另一个依赖项之前。 (另外,我是Python的新手,所以我可能会遗漏一些明显的东西。)

I understand that there's a library called lxml which provides better XPath support, but I'm asking here before adding another dependency. (Plus I'm very new to Python so I might be missing something obvious.)

推荐答案

您会找到示例文本在三个属性中有些反直觉:

You find the text of your example somewhat counter-intuitively in three attributes:


  • hello的文本。

  • 注释。

  • annotation.tail代表世界

(省略空白)。这有点麻烦。但是,遵循以下几条原则应该会有所帮助:

(whitespace omitted). This is somewhat cumbersome. However, something along these lines should help:

 import xml.etree.ElementTree as et

 xml = """
 <A>
     hello

     <annotation> NOT part of text </annotation>

     world
 </A>"""


 doc = et.fromstring(xml)


 def all_texts(root):
     if root.text is not None:
         yield root.text
     for child in root:
         if child.tail is not None:
             yield child.tail


 print list(all_texts(doc))

这篇关于如何在ElementTree中迭代子文本节点(而非子代)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-02 19:23