本文介绍了XPath如何从html文档中检索表格单元格的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个html文档,并且doc内的某个地方在一个表的下面,我可以获取表格行和java DOM对象。什么是不明白的是,当值是一个字符串时,如何提取表格单元格的值,以及当它是一个二进制资源?



我使用如下代码:

  XPath xpath; 
XPathExpression expr;
NodeList nodes = null;
//使用XPath从(X)HTML
try {b
$ b xpath = XPathFactory.newInstance()。newXPath();
//< table class =data>

NodeList list = doc.getElementsByTagName(table);
// Node node = list.item(0);
//System.out.println(node.getTextContent());
// String textContent = node.getTextContent();

expr = xpath.compile(// table / tr / td);
nodes =(NodeList)expr.evaluate(doc,XPathConstants.NODESET);

and loopiong like:

  for(int i = 0; i< nodes.getLength(); i ++){

Node ln = list.item(i);
String lnText = ln.toString();
NodeList rowElements = ln.getChildNodes();
节点one = rowElements.item(0);

String oneText = one.toString();
String nodeName = one.getNodeName();
字符串valOne = one.getNodeValue();

但是我没有看到表格中的值。

 < table class =data> 
< tr>< td> ImageName1< / td>< td width =50>< / td>< td>< img src =/ images / 036000291452alt = 036000291452/>< / td>< / tr>
< tr>< td> ImageName2< / td>< td width =50>< / td>< td>< img src =/ images / 36000291452alt = 36000291452/>< / td>< / tr>
< tr>< td>说明< / td>< td>< / td>< td>时代杂志< / td>< / tr>
< tr>< td>大小/重量< / td>< td>< / td>< td> 14问题< / td>< / tr>
< tr>< td>颁发国家< / td>< td>< / td>< td>美国< / td>< / tr>
< / table>


解决方案

此XPath表达式

  / * / tr [1] / td [1] 

选择第一个 td 元素> tr 所提供XML文档的顶层元素( table )的子元素。



XPath表达式

  / * / tr [1] / td [2] 

选择 td 元素(在无名称空间中)是提供的XML文档的顶层元素( table )的第一个 tr 子元素的第二个子元素。

一般: tr [$ m] / td [$ n]

选择 td 元素(在无名称空间中)是 $ m $ n -th tr 顶部元素(表格)的孩子ded XML文档。用所需的整数值替换 $ m $ n

您可以使用标准XPath函数 以获取字符串值:

   

评估为 td 元素(在无名称空间中)的字符串值,该元素是 $ n 子元素的子元素 $ m -th tr 顶层元素的子元素( table )提供的XML文档。


I have a html document and somewhere inside the doc is below a table, I can get the table rows and java DOM objects. What is not clear to me is how to extract the value of the table cell when the value is a string and also when it is a binary resource?

I am using code like:

  XPath xpath;
   XPathExpression expr;
   NodeList nodes=null;
   // Use XPath to obtain whatever you want from the (X)HTML
   try{

      xpath = XPathFactory.newInstance().newXPath();
      //<table class="data">

      NodeList list = doc.getElementsByTagName("table");
     // Node node = list.item(0); 
     //System.out.println(node.getTextContent());
    //String textContent=node.getTextContent();

    expr = xpath.compile("//table/tr/td");
    nodes = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);

and loopiong like:

     for (int i = 0; i < nodes.getLength(); i++) {

       Node ln = list.item(i);
       String lnText=ln.toString();
       NodeList rowElements=ln.getChildNodes();
       Node one=rowElements.item(0);

       String oneText=one.toString();
       String nodeName=one.getNodeName();
       String valOne = one.getNodeValue();

But I am not seeing the values in the table.

 <table class="data">
 <tr><td>ImageName1</td><td width="50"></td><td><img src="/images/036000291452" alt="036000291452" /></td></tr>
 <tr><td>ImageName2</td><td width="50"></td><td><img src="/images/36000291452" alt="36000291452" /></td></tr>
 <tr><td>Description</td><td></td><td>Time Magazine</td></tr>
 <tr><td>Size/Weight</td><td></td><td>14 Issues</td></tr>
 <tr><td>Issuing Country</td><td></td><td>United States</td></tr>
  </table>
解决方案

This XPath expression:

/*/tr[1]/td[1]

selects the td element (in no namespace) that is the first child of the first tr child of the top element (table) of the provided XML document.

The XPath expression:

/*/tr[1]/td[2]

selects the td element (in no namespace) that is the second child of the first tr child of the top element (table) of the provided XML document.

In general:

/*/tr[$m]/td[$n]

selects the td element (in no namespace) that is the $n-th child of the $m-th tr child of the top element (table) of the provided XML document. Just replace $m and $n with the desired integer values.

You can use the standard XPath function string() to obtain their string value:

string(/*/tr[$m]/td[$n])

evaluates to the string value of the td element (in no namespace) that is the $n-th child of the $m-th tr child of the top element (table) of the provided XML document.

这篇关于XPath如何从html文档中检索表格单元格的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

11-03 13:42