Two ways of XML parsing DOM way and SAX way

Hits: 0

DOM: Document ObjectModel, document object model. This is the W3C-recommended way of dealing with [XML] .

SAX: Simple [API] for XML. This method is not an official standard, it belongs to the open source community XML-DEV, and almost all XML parsers support it.

1. DOM parsing:

Load the xml document into a Document tree at one time, get the node object through the Document object, and access the content of the xml document (tags, attributes, text, comments) through the node object.

a) JAXP parsing: It is the implementation of parsing standard introduced by SUN.

Take book.xml as an example to explain.

< bookshelf > 
     < book > 
          < book title > computer network </ book title > 
          < author > Zhang San </ author > 
          < selling price > 100 </ selling price > 
     </ book > 
     < book > 
          < book title > data structure < / title > 
          < author > Li Si </ author > 
          < price > 80 </ price > 
     </ book > 
</Bookshelf >

Obtain the DOM parser in JAXP
l, call the DocumentBuilderFactory.newInstance() method to get the factory that creates the DOM parser.
2. Call the newDocumentBuilder method of the factory object to get the DOM parser object.
3. Call the parse() method of the DOM parser object to parse the XML document, and obtain the Document object representing the entire document, so that the entire XML document can be operated by using the DOM characteristics.
// create parser object

DocumentBuilder db = DocumentBuilderFactory.newInstance().newDocumentBuilder();
Document document = db.parse("src/book.xml");

  1. Get a specific node content: eg: get the price of the computer network

public  static  void  test1 ( Document document ) {
         // Get all selling price nodes 
        NodeList nl = document.getElementsByTagName( "Price price" );
         // Get selling price nodes of computer network 
        Node n = nl.item( 1 );

        // Get the selling price text System.out 
        .println ( n.getTextContent ());
    }

  1. Traverse all element nodes

public  static  void  test2 ( Node node ) {
         // Loop through the node nodes
        NodeList nl = node.getChildNodes();
        // Loop judgment 
        for ( int i = 0 ; i < nl.getLength(); i++) {
            Node n = nl.item(i);
            if (n.getNodeType() == Node.ELEMENT_NODE) ​​{
                 // Indicate that this node is the label node System.out 
                .println ( n.getNodeName ());
                test2(n);
            }
        }
    }

  1. Modify the main content of an element node eg: Modify the price of the computer network to 80

public  static  void  test3 (Document document)  throws Exception {
         // Get all selling price nodes 
        NodeList nl = document.getElementsByTagName( "Price price" );
         // Get selling price nodes of computer network 
        Node n = nl.item( 1 ) ;
         // Modify the body content 
        n.setTextContent( "80" );

        // save the modified result to the hard disk
        Transformer tf = TransformerFactory.newInstance().newTransformer();
        tf.transform(new DOMSource(document), new StreamResult("src/book.xml"));

    }

  1. Add a child element node to the specified element node: eg: Add an internal price node to the selling price of the data structure: 50

public static  void test4(Document document ) throws Exception {
         // Get all selling price nodes 
        NodeList nl = document .getElementsByTagName( "selling price" );
         // Get selling price nodes of data structure 
        Node n = nl.item( 0 ) ;


        // Create a new node 
        Element el = document .createElement( "Internal Value" );
         // Set the body content of the node 
        el.setTextContent( "50" );


        // Hook the internal price node to the price point
        n.appendChild(el);


        // save the modified result to the hard disk
        Transformer tf = TransformerFactory.newInstance().newTransformer();
        tf.transform(new DOMSource(document), new StreamResult("src/book.xml"));
    }

  1. Add the same-level element node to the specified element node: eg: add a wholesale price node to the selling price of the data structure: 60

public static  void test5(Document document ) throws Exception {
         // Get all the book nodes 
        NodeList nl = document .getElementsByTagName( "Book" );
         // Get the book nodes of the data structure 
        Node n = nl.item( 0 );

        // Create a wholesale price node 
        Element el = document .createElement( "Wholesale Price" );
         // Set the main content of the node 
        el.setTextContent( "60" );

        // Hook the internal price node to the book point
        n.appendChild(el);

        // save the modified result to the hard disk
        Transformer tf = TransformerFactory.newInstance().newTransformer();
        tf.transform(new DOMSource(document), new StreamResult("src/book.xml"));
    }

  1. Delete the specified element node eg: delete the internal valence node

public  static  void  test6 (Document document)  throws Exception {
         // Get all the internal price nodes 
        NodeList nl = document.getElementsByTagName( "Internal price" );
         // Get the internal price nodes of the data structure 
        Node node = nl.item( 0 );

        // father kills son
        node.getParentNode().removeChild(node);

        // save the modified result to the hard disk
        Transformer tf = TransformerFactory.newInstance().newTransformer();
        tf.transform(new DOMSource(document), new StreamResult("src/book.xml"));
    }

  1. Manipulate XML file attributes: eg: Add an attribute to the data structure book: ISBN : “Little Programmer”

public  static  void  test7 (Document document)  throws Exception {
         // Get all the book nodes 
        NodeList nl = document.getElementsByTagName( "Book" );
         // Get the book node of the data structure 
        Node n = nl.item( 0 );

        // Add an attribute 
        ((Element) n).setAttribute( "ISBN" , "Little Programmer" );

        // save the modified result to the hard disk
        Transformer tf = TransformerFactory.newInstance().newTransformer();
        tf.transform(new DOMSource(document), new StreamResult("src/book.xml"));
    }

b) Dom4J parsing:

i. The first step: read the XML document information and generate the document object

  1. Read the XML file and get the document object

<p></p><p>SAXReader reader = new SAXReader();
Document  document = reader.read(new File("input.xml"));</p>

  1. Parse the text in XML form and get the document object.

<p>String text = "<members></members>";
Documentdocument = DocumentHelper.parseText(text);</p>

  1. Actively create the document object.

<p>Document document = DocumentHelper.createDocument();
//Create root node </ p > < p > Element root = document.addElement("members"); </ p >

ii. Step 2: Modify the information of the XML document

  1. Added: documents, tags, attributes

//1. Create a document
Document doc = DocumentHelper.createDocument();
//2. Add label 
Element rootElem = doc.addElement( "contactList" );
 //doc.addElement("contactList"); 
Element contactElem = rootElem.addElement( "contact" );
contactElem.addElement( "name" );
 //3. Add attribute 
contactElem.addAttribute( "id" , "001" );
contactElem.addAttribute("name", "eric");

  1. Delete: tags, attributes

1. Delete the label      1.1 Get the label object   1.2 Delete the label object

Element ageElem = doc.getRootElement().element( "contact" ).element( "age" );
 //1.2 delete the tag object <span style="white-space:pre"> </span>
ageElem.detach();
//Get the parent tag first, and then remove yourself through the parent tag// ageElem.getParent().remove 
(ageElem);

  1. Delete attribute    2.1 Get attribute object   2.2 Delete attribute

//2.1 Get the attribute object 
//Get the second contact tag 
Element contactElem = (Element)doc.getRootElement().elements(). get ( 1 );
 //2.2 Get the attribute object 
Attribute idAttr = contactElem.attribute( "id " );
 //2.3 delete attribute
idAttr.detach();
//idAttr.getParent().remove(idAttr);

  1. Change: attribute value, text

Scheme 1: Modify the attribute value    1. Get the label object 2. Get the attribute object 3. Modify the attribute value

//1.1 Get the label object 
Element contactElem = doc.getRootElement().element( "contact" );
 //1.2 Get the attribute object 
Attribute idAttr = contactElem.attribute( "id" );
 //1.3 Modify the attribute value 
idAttr.setValue( "003" );

Option 2: Modify the attribute value 1.1  to get the label object 1.2 Modify the attribute value by adding the attribute of the same name

//1.1 Get the tag object 
Element contactElem = doc.getRootElement().element( "contact" );
 //1.2 Modify the attribute value by adding an attribute with the same name 
contactElem.addAttribute( "id" , "004" );

Modify the text 1. Get the label object 2. Modify the text

Element nameElem = doc.getRootElement().element("contact").element("name");

ii. Step 3: Write the modified XML document to the file

FileOutputStream out = new FileOutputStream( "e:/contact.xml" );
 //1. Specify the format to write out 
//Compact format. Remove spaces and wrap. When the project is online
OutputFormat format = OutputFormat.createCompactFormat();
//Pretty format. There are spaces and line breaks. When developing and debugging 
//OutputFormat format = OutputFormat.createPrettyPrint(); 
<pre  > /**

    1. Specify the encoding of the generated xml document
  • It affects both the encoding of the xml document when saving and the encoding of the encoding declared by the xml document (the encoding of the xml parsing)
  • Conclusion: The xml document generated by this method avoids the problem of Chinese garbled characters.
    */
    format.setEncoding(“utf-8”); //1.Create and
    write out object
    XMLWriter writer = new XMLWriter(out,format); //2.Write
    out object
    writer.write(doc);
    //3. close the stream
    writer.close();

c) XPath technology: mainly used to quickly obtain the required node objects

1) Import the xPath support jar package. jaxen-1.1-beta-6.jar

2) Use the xpath method

List selectNodes(“xpath expression”); Query multiple node objects

Node selectSingleNode(“xpath expression”); Query a node object

3) xPath syntax:

/ Absolute path means starting from the root of the xml or a child element (a hierarchy)

// A relative path represents a selection element without any hierarchy.

* wildcard means match all elements

[] Condition Indicates the element under which condition to select

@property means to select the property node

and relation Represents conditional AND relation (equivalent to &&)

text() text Indicates the selected text content

Second, SAX parsing: load a little, read a little, and process a little. The memory requirements are relatively low.

<span style= "white-space:pre" > </span>    //Create a sax parser
                   SAXParser sax =SAXParserFactory.newInstance().newSAXParser() ;
                   // get the content reader
                   XMLReader xml =  sax.getXMLReader() ;
                   //register a content handler
                   xml.setContentHandler(newDefaultHandler(){
                            String curName = "" ;   //The record is currently the label 
                            int index = 0 ;   //The record is read to the author

                            public voidstartElement(String uri, String localName,
                                               StringqName, Attributes attributes) throws SAXException {
                                               cur ;
                                               index++ ;
                                     }
                            }

                            public voidendElement(String uri, String localName, String qName)
                                               throwsSAXException {
                                     curName ="" ;
                            }

                            public voidcharacters(char[] ch, int start, int length)
                                               throwsSAXException {

                                     if ( "author" . equals (curName)&& index == 2 ){
                                                //Indicates that the author of the second book is read 
                                               System. out .println(newString(ch,start,length));
                                     }
                            }
                   }) ;
                   //Load xml document 
                   xml.parse( "src/book.xml" );

===========DOM parsing vs SAX parsing ========

DOM parsing

SAX parsing

Principle: One-time loading of xml documents, not suitable for large-capacity file reading

Principle: Load a little, read a little, process a little. Suitable for reading large-capacity files

DOM parsing can be arbitrarily added, deleted and changed to

SAX parsing can only read

DOM parsing reads data anywhere, even back

SAX parsing can only be read from top to bottom, in order, and cannot be read back

DOM parsing object-oriented programming methods (Node, Element, Attribute), Java developers coding is relatively simple.

SAX parsing an event-based programming approach. Java development coding is relatively complex.

Leave a Reply

Your email address will not be published.