XPath如何从html文档中检索表格单元格的值

本文介绍了XPath如何从html文档中检索表格单元格的值的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个html文档，并且doc内的某个地方在一个表的下面，我可以获取表格行和java DOM对象。什么是不明白的是，当值是一个字符串时，如何提取表格单元格的值，以及当它是一个二进制资源？

我使用如下代码：

  XPath xpath; 
 XPathExpression expr; 
 NodeList nodes = null; 
 //使用XPath从（X）HTML 
 try {b 
 $ b xpath = XPathFactory.newInstance（）。newXPath（）; 
 //< table class =data> 
 
 NodeList list = doc.getElementsByTagName（table）; 
 // Node node = list.item（0）; 
 //System.out.println（node.getTextContent（））; 
 // String textContent = node.getTextContent（）; 
 
 expr = xpath.compile（// table / tr / td）; 
 nodes =（NodeList）expr.evaluate（doc，XPathConstants.NODESET）;

and loopiong like：

  for（int i = 0; i< nodes.getLength（）; i ++）{
 
 Node ln = list.item（i）; 
 String lnText = ln.toString（）; 
 NodeList rowElements = ln.getChildNodes（）; 
节点one = rowElements.item（0）; 
 
 String oneText = one.toString（）; 
 String nodeName = one.getNodeName（）; 
字符串valOne = one.getNodeValue（）;

但是我没有看到表格中的值。

 < table class =data> 
< tr>< td> ImageName1< / td>< td width =50>< / td>< td>< img src =/ images / 036000291452alt = 036000291452/>< / td>< / tr> 
< tr>< td> ImageName2< / td>< td width =50>< / td>< td>< img src =/ images / 36000291452alt = 36000291452/>< / td>< / tr> 
< tr>< td>说明< / td>< td>< / td>< td>时代杂志< / td>< / tr> 
< tr>< td>大小/重量< / td>< td>< / td>< td> 14问题< / td>< / tr> 
< tr>< td>颁发国家< / td>< td>< / td>< td>美国< / td>< / tr> 
< / table>

解决方案

此XPath表达式：

  / * / tr [1] / td [1]

选择第一个 td 元素> tr 所提供XML文档的顶层元素（ table ）的子元素。

XPath表达式：
/ * / tr [1] / td [2]
选择 td 元素（在无名称空间中）是提供的XML文档的顶层元素（ table ）的第一个 tr 子元素的第二个子元素。

一般： tr [$ m] / td [$ n]

选择 td 元素（在无名称空间中）是 $ m 的 $ n -th tr 顶部元素（表格）的孩子ded XML文档。用所需的整数值替换 $ m 和 $ n 。

您可以使用标准XPath函数以获取字符串值：

评估为 td 元素（在无名称空间中）的字符串值，该元素是 $ n 子元素的子元素 $ m -th tr 顶层元素的子元素（ table ）提供的XML文档。

I have a html document and somewhere inside the doc is below a table, I can get the table rows and java DOM objects. What is not clear to me is how to extract the value of the table cell when the value is a string and also when it is a binary resource?
I am using code like:
XPath xpath; XPathExpression expr; NodeList nodes=null; // Use XPath to obtain whatever you want from the (X)HTML try{ xpath = XPathFactory.newInstance().newXPath(); //<table class="data"> NodeList list = doc.getElementsByTagName("table"); // Node node = list.item(0); //System.out.println(node.getTextContent()); //String textContent=node.getTextContent(); expr = xpath.compile("//table/tr/td"); nodes = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);
and loopiong like:
for (int i = 0; i < nodes.getLength(); i++) { Node ln = list.item(i); String lnText=ln.toString(); NodeList rowElements=ln.getChildNodes(); Node one=rowElements.item(0); String oneText=one.toString(); String nodeName=one.getNodeName(); String valOne = one.getNodeValue();
But I am not seeing the values in the table.
<table class="data"> <tr><td>ImageName1</td><td width="50"></td><td><img src="/images/036000291452" alt="036000291452" /></td></tr> <tr><td>ImageName2</td><td width="50"></td><td><img src="/images/36000291452" alt="36000291452" /></td></tr> <tr><td>Description</td><td></td><td>Time Magazine</td></tr> <tr><td>Size/Weight</td><td></td><td>14 Issues</td></tr> <tr><td>Issuing Country</td><td></td><td>United States</td></tr> </table>
解决方案
This XPath expression:
/*/tr[1]/td[1]
selects the td element (in no namespace) that is the first child of the first tr child of the top element (table) of the provided XML document.
The XPath expression:
/*/tr[1]/td[2]
selects the td element (in no namespace) that is the second child of the first tr child of the top element (table) of the provided XML document.
In general:
/*/tr[$m]/td[$n]
selects the td element (in no namespace) that is the $n-th child of the $m-th tr child of the top element (table) of the provided XML document. Just replace $m and $n with the desired integer values.
You can use the standard XPath function string() to obtain their string value:
string(/*/tr[$m]/td[$n])
evaluates to the string value of the td element (in no namespace) that is the $n-th child of the $m-th tr child of the top element (table) of the provided XML document.

这篇关于XPath如何从html文档中检索表格单元格的值的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持！