手动解析XML文件相比

手动解析XML文件相比

本文介绍了与使用DOM解析器手动解析XML文件相比,使用XSLT样式表有什么优势的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对于我们的一个应用程序,我已经编写了一个使用java的DOM解析器的实用程序。它基本上需要一个XML文件,解析它,然后使用以下方法之一来处理数据以实际检索数据。

For one of our applications, I've written a utility that uses java's DOM parser. It basically takes an XML file, parses it and then processes the data using one of the following methods to actually retrieve the data.

getElementByTagName()
getElementAtIndex()
getFirstChild()
getNextSibling()
getTextContent()

现在我必须做同样的事情,但我想知道是否会更好地使用XSLT样式表。向我们发送XML文件的组织不断改变它们的模式,这意味着我们必须改变我们的代码以适应这些改变。我不太熟悉XSLT进程,所以我试图找出是否更好地使用XSLT样式表而不是手动解析。

Now i have to do the same thing but i am wondering whether it would be better to use an XSLT stylesheet. The organisation that sends us the XML file keeps changing their schema meaning that we have to change our code to cater for these shema changes. Im not very familiar with XSLT process so im trying to find out whether im better of using XSLT stylesheets rather than "manual parsing".

XSLT样式表看起来很有吸引力的原因是我认为,如果XML文件的模式更改,我只需要更改样式表?它是否正确?

The reason XSLT stylesheets looks attractive is that i think that if the schema for the XML file changes i will only need to change the stylesheet? Is this correct?

我想知道的另一件事是两个(XSLT变压器或DOM解析器)中哪一个更好的性能明智。对于手动选项,我只是使用DOM解析器来解析xml文件。 XSLT变压器如何实际解析文件?与手动解析xml文件相比,它是否包含额外的开销?我问的原因是性能很重要,因为我将处理的数据的性质。

The other thing i would like to know is which of the two (XSLT transformer or DOM parser) is better performance wise. For the manual option, i just use the DOM parser to parse the xml file. How does the XSLT transformer actually parse the file? Does it include additional overhead compared to manually parsing the xml file? The reason i ask is that performance is important because of the nature of the data i will be processing.

任何建议?

谢谢

我目前正在做的是解析一个xml文件并处理一些xml元素中的值。我不将xml文件转换成任何其他格式。我只是提取一些值,从Oracle数据库中提取一行,并将新行保存到不同的表中。我解析的xml文件只包含我用来从数据库中检索一些数据的引用值。

Basically what I am currently doing is parsing an xml file and process the values in some of the xml elements. I don't transform the xml file into any other format. I just extract some value, extract a row from an Oracle database and save a new row into a different table. The xml file I parse just contains reference values I use to retrieve some data from the database.

在这种情况下,xslt不适合吗?有没有更好的方法,如果模式发生变化,我可以使用它来避免代码更改?

Is xslt not suitable in this scenario? Is there a better approach that I can use to avoid code changes if the schema changes?

对于我正在使用XML数据的工作不够清楚,道歉。基本上有一个包含一些信息的XML文件。我从XML文件中提取此信息,并使用它从本地数据库检索更多信息。 xml文件中的数据更像是数据库中需要的数据的引用键。然后,我从XML文件中提取的内容加上使用XML文件中的特定键从数据库检索的内容,并将该数据保存到另一个数据库表中。

Apologies for not being clear enough about what i am doing with the XML data. Basically there is an XML file which contains some information. I extract this information from the XML file and use it to retrieve more information from a local database. The data in the xml file is more like reference keys for the data i need in the database. I then take the content i extracted from the XML file plus the content i retrieved from the database using a specific key from the XML file and save that data into another database table.

我遇到的问题是,我知道如何编写一个DOM解析器来从XML文件中提取我需要的信息,但我想知道是否使用XSLT样式表是更好的选择,因为我不会改变代码如果架构改变。

The problem i have is that i know how to write a DOM parser to extract the information i need from the XML file but i was wondering whether using an XSLT stylesheet was a better option as i wouldnt have to change the code if the schema changes.

阅读下面的回应听起来像XSLT只用于转换和XML文件到另一个XML文件或其他格式。鉴于我不打算转换XML文件,可能不需要添加解析XSLT样式表以及XML文件的额外开销。

Reading the responses below it sounds like XSLT is only used for transorming and XML file to another XML file or some other format. Given that i dont intend to transform the XML file, there is probably no need to add the additional overhead of parsing the XSLT stylesheet as well as the XML file.

推荐答案

我认为你需要的是一个XPath表达式。您可以在某些属性文件中配置该表达式,或者用于检索设置参数的任何表达式。

I think that what you need is actually an XPath expression. You could configure that expression in some property file or whatever you use to retrieve your setup parameters.

这样,只要客户隐藏更改XPath表达式除了你在另一个地方使用的信息。

In this way, you'd just change the XPath expression whenever your customer hides away the info you use in yet another place.

基本上,XSLT是一个过分的,你只需要一个XPath表达式。单一的XPath表达式将允许您处理您之后的每个值。

Basically, an XSLT is an overkill, you just need an XPath expression. A single XPath expression will allow to home in onto each value you are after.

由于我们正在谈论 JDK 1.4 ,我已经包含以下3种不同的使用XPath在XML文件中获取文本的方法。 (尽可能简单,没有NPE保护绒毛恐怕; - )

Since we are now talking about JDK 1.4 I've included below 3 different ways of fetching text in an XML file using XPath. (as simple as possible, no NPE guard fluff I'm afraid ;-)

从最新开始。

0。首先,示例XML配置文件

<?xml version="1.0" encoding="UTF-8"?>
<config>
    <param id="MaxThread" desc="MaxThread"        type="int">250</param>
    <param id="rTmo"      desc="RespTimeout (ms)" type="int">5000</param>
</config>

1。使用Java SE 5.0的JAXP 1.3标准部分

import javax.xml.parsers.*;
import javax.xml.xpath.*;
import org.w3c.dom.Document;

public class TestXPath {

    private static final String CFG_FILE = "test.xml" ;
    private static final String XPATH_FOR_PRM_MaxThread = "/config/param[@id='MaxThread']/text()";
    public static void main(String[] args) {

        DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
        docFactory.setNamespaceAware(true);
        DocumentBuilder builder;
        try {
            builder = docFactory.newDocumentBuilder();
            Document doc = builder.parse(CFG_FILE);
            XPathExpression expr = XPathFactory.newInstance().newXPath().compile(XPATH_FOR_PRM_MaxThread);
            Object result = expr.evaluate(doc, XPathConstants.NUMBER);
            if ( result instanceof Double ) {
                System.out.println( ((Double)result).intValue() );
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

2。使用Java SE 1.4-2的JAXP 1.2标准部分

import javax.xml.parsers.*;
import org.apache.xpath.XPathAPI;
import org.w3c.dom.*;

public class TestXPath {

    private static final String CFG_FILE = "test.xml" ;
    private static final String XPATH_FOR_PRM_MaxThread = "/config/param[@id='MaxThread']/text()";

    public static void main(String[] args) {

        try {
            DocumentBuilderFactory docFactory = DocumentBuilderFactory.newInstance();
            docFactory.setNamespaceAware(true);
            DocumentBuilder builder = docFactory.newDocumentBuilder();
            Document doc = builder.parse(CFG_FILE);
            Node param = XPathAPI.selectSingleNode( doc, XPATH_FOR_PRM_MaxThread );
            if ( param instanceof Text ) {
                System.out.println( Integer.decode(((Text)(param)).getNodeValue() ) );
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

3。使用Java SE 1.4 + jdom + jaxen的JAXP 1.1标准部分

您需要添加这2个jar(可从www.jdom.org - 二进制文件,包括jaxen)。

You need to add these 2 jars (available from www.jdom.org - binaries, jaxen is included).

import java.io.File;
import org.jdom.*;
import org.jdom.input.SAXBuilder;
import org.jdom.xpath.XPath;

public class TestXPath {

    private static final String CFG_FILE = "test.xml" ;
    private static final String XPATH_FOR_PRM_MaxThread = "/config/param[@id='MaxThread']/text()";

    public static void main(String[] args) {
        try {
            SAXBuilder sxb = new SAXBuilder();
            Document doc = sxb.build(new File(CFG_FILE));
            Element root = doc.getRootElement();
            XPath xpath = XPath.newInstance(XPATH_FOR_PRM_MaxThread);
            Text param = (Text) xpath.selectSingleNode(root);
            Integer maxThread = Integer.decode( param.getText() );
            System.out.println( maxThread );
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
}

这篇关于与使用DOM解析器手动解析XML文件相比,使用XSLT样式表有什么优势的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持!

08-03 17:55