SAX Readers (Java & XML, 2nd Edition)

3.2.1. Instantiating a Reader

SAX provides an interface all SAX-compliant XML parsers should implement. This allows SAX to know exactly what methods are available for callback and use within an application. For example, the Xerces main SAX parser class, org.apache.xerces.parsers.SAXParser, implements the org.xml.sax.XMLReader interface. If you have access to the source of your parser, you should see the same interface implemented in your parser's main SAX parser class. Each XML parser must have one class (and sometimes has more than one) that implements this interface, and that is the class you need to instantiate to allow for parsing XML:

// Instantiate a Reader
XMLReader reader = 
  new org.xml.sax.SAXParser( );

// Do something with the parser
reader.parse(uri);

With that in mind, it's worth looking at a more realistic example. Example 3-1 is the skeleton for the SAXTreeViewer class I was just referring to, which allows viewing of an XML document as a graphical tree. This also gives you a chance to look at each of the SAX events and associated callback methods that can be used to perform action within the parsing of an XML document.

Example 3-1. The SAXTreeViewer skeleton

package javaxml2;

import java.io.IOException;
import java.util.HashMap;
import java.util.Iterator;
import java.util.Map;
import org.xml.sax.Attributes;
import org.xml.sax.ContentHandler;
import org.xml.sax.ErrorHandler;
import org.xml.sax.InputSource;
import org.xml.sax.Locator;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;

// This is an XML book - no need for explicit Swing imports
import java.awt.*;
import javax.swing.*;
import javax.swing.tree.*;

public class SAXTreeViewer extends JFrame {

    /** Default parser to use */
    private String vendorParserClass = 
        "org.apache.xerces.parsers.SAXParser";

    /** The base tree to render */
    private JTree jTree;

    /** Tree model to use */
    DefaultTreeModel defaultTreeModel;

    public SAXTreeViewer( ) {
        // Handle Swing setup
        super("SAX Tree Viewer");
        setSize(600, 450);
    }

    public void init(String xmlURI) throws IOException, SAXException {
        DefaultMutableTreeNode base = 
            new DefaultMutableTreeNode("XML Document: " + 
                xmlURI);
        
        // Build the tree model
        defaultTreeModel = new DefaultTreeModel(base);
        jTree = new JTree(defaultTreeModel);

        // Construct the tree hierarchy
        buildTree(defaultTreeModel, base, xmlURI);

        // Display the results
        getContentPane( ).add(new JScrollPane(jTree), 
            BorderLayout.CENTER);
    }

    public void buildTree(DefaultTreeModel treeModel, 
                          DefaultMutableTreeNode base, String xmlURI) 
        throws IOException, SAXException {

        // Create instances needed for parsing
        XMLReader reader = 
            XMLReaderFactory.createXMLReader(vendorParserClass);

        // Register content handler

        // Register error handler

        // Parse
    }

    public static void main(String[] args) {
        try {
            if (args.length != 1) {
                System.out.println(
                    "Usage: java javaxml2.SAXTreeViewer " +
                    "[XML Document URI]");
                System.exit(0);
            }
            SAXTreeViewer viewer = new SAXTreeViewer( );
            viewer.init(args[0]);
            viewer.setVisible(true);
        } catch (Exception e) {
            e.printStackTrace( );
        }
    }
}

This should all be fairly straightforward.[2] Other than setting up the visual properties for Swing, this code takes in the URI of an XML document (our contents.xml from the last chapter). In the init( ) method, a JTree is created for displaying the contents of the URI. These objects (the tree and URI) are then passed to the method that is worth focusing on, the buildTree( ) method. This is where parsing will take place, and the visual representation of the XML document supplied will be created. Additionally, the skeleton takes care of creating a base node for the graphical tree, with the path to the supplied XML document as that node's text.

[2]Don't be concerned if you are not familiar with the Swing concepts involved here; to be honest, I had to look most of them up myself! For a good reference on Swing, pick up a copy of Java Swing by Robert Eckstein, Marc Loy, and Dave Wood (O'Reilly).

U-R-What?

I've just breezed by what URIs are both here and in the last chapter. In short, a URI is a uniform resource indicator. As the name suggests, it provides a standard means of identifying (and thereby locating, in most cases) a specific resource; this resource is almost always some sort of XML document, for the purposes of this book. URIs are related to URLs, uniform resource locators. In fact, a URL is always a URI (although the reverse is not true). So in the examples in this and other chapters, you could specify a filename or a URL, like http://www.newInstance.com/javaxml2/copyright.xml, and either would be accepted.

You should be able to load and compile this program if you made the preparations talked about earlier to ensure that an XML parser and the SAX classes are in your class path. If you have a parser other than Apache Xerces, you can replace the value of the vendorParserClass variable to match your parser's XMLReader implementation class, and leave the rest of the code as is. This simple program doesn't do much yet; in fact, if you run it and supply a legitimate filename as an argument, it should happily grind away and show you an empty tree, with the document's filename as the base node. That's because you have only instantiated a reader, not requested that the XML document be parsed.

WARNING: If you have trouble compiling this source file, you most likely have problems with your IDE or system's class path. First, make sure you obtained the Apache Xerces parser (or your vendor's parser). For Xerces, this involves downloading azipped or gzipped file. This archive can then be extracted, and will contain a xerces.jar file; it is this jar file that contains the compiled class files for the program. Add this archive to your class path. You should then be able to compile the source file listing.

3.2.2. Parsing the Document

Once a reader is loaded and ready for use, you can instruct it to parse an XML document. This is conveniently handled by the parse( ) method of org.xml.sax.XMLReader class, and this method can accept either an org.xml.sax.InputSource or a simple string URI. It's a much better idea to use the SAX InputSource class, as that can provide more information than a simple location. I'll talk more about that later, but suffice it to say that an InputSource can be constructed from an I/O InputStream, Reader, or a string URI.

You can now add construction of an InputSource from the provided URI, as well as the invocation of the parse( ) method to the example. Because the document must be loaded, either locally or remotely, a java.io.IOException may result, and must be caught. In addition, the org.xml.sax.SAXException will be thrown if problems occur while parsing the document. Notice that the buildTree method can throw both of these exceptions:

    public void buildTree(DefaultTreeModel treeModel, 
                          DefaultMutableTreeNode base, File file) 
        throws IOException, SAXException {

        // Create instances needed for parsing
        XMLReader reader = 
            XMLReaderFactory.createXMLReader(vendorParserClass);

        // Register content handler

        // Register error handler

        // Parse
        InputSource inputSource = 
            new InputSource(xmlURI);
        reader.parse(inputSource);
    }

Compile these changes and you are ready to execute the parsing example. You should specify the path to your file as the first argument to the program:

c:\javaxml2\build>java javaxml2.SAXTreeViewer ..\Ch03\xml\contents.xml

WARNING: Supplying an XML URI can be a rather strange task. In versions of Xerces before 1.1, a normal filename could be supplied (for example, on Windows, ..\xml\contents.xml). However, this behavior changed in Xerces 1.1 and 1.2, and the URI had to be in this form: file:///c:/javaxml2/xml/contents.xml. However, in the latest versions of Xerces (from 1.3 up, as well as 2.0), this behavior has moved back to accepting normal filenames. Be aware of these issues if you are using Xerces 1.1 through 1.2.

The rather boring output shown in Figure 3-1 may make you doubt that anything has happened. However, if you lean nice and close, you may hear your hard drive spin briefly (or you can just have faith in the bytecode). In fact, the XML document is parsed. However, no callbacks have been implemented to tell SAX to take action during the parsing; without these callbacks, a document is parsed quietly and without application intervention. Of course, we want to intervene in that process, so it's now time to look at creating some parser callback methods. A callback method is a method that is not directly invoked by you or your application code. Instead, as the parser begins to work, it calls these methods at certain events, without any intervention. In other words, instead of your code calling into the parser, the parser calls back to yours. That allows you to programmatically insert behavior into the parsing process. This intervention is the most important part of using SAX. Parser callbacks let you insert action into the program flow, and turn the rather boring, quiet parsing of an XML document into an application that can react to the data, elements, attributes, and structure of the document being parsed, as well as interact with other programs and clients along the way.

Figure 3-1. An uninteresting JTree

3.2.3. Using an InputSource

I mentioned earlier that I would touch on using a SAX InputSource again, albeit briefly. The advantage to using an InputSource instead of directly supplying a URI is simple: it can provide more information to the parser. An InputSource encapsulates information about a single object, the document to parse. In situations where a system identifier, public identifier, or stream may all be tied to one URI, using an InputSource for encapsulation can become very handy. The class has accessor and mutator methods for its system ID and public ID, a character encoding, a byte stream (java.io.InputStream), and a character stream (java.io.Reader). Passed as an argument to the parse( ) method, SAX also guarantees that the parser will never modify the InputSource. The original input to a parser is still available unchanged after its use by a parser or XML-aware application. In our example, it's important because the XML document uses a relative path to the DTD in it:

<!DOCTYPE Book SYSTEM "DTD/JavaXML.dtd">

By using an InputSource and wrapping the supplied XML URI, you have set the system ID of the document. This effectively sets up the path to the document for the parser and allows it to resolve all relative paths within that document, like the JavaXML.dtd file. If instead of setting this ID, you parsed an I/O stream, the DTD wouldn't be located (as it has no frame of reference); you could simulate this by changing the code in the buildTree( ) method as shown here:

        // Parse
        InputSource inputSource = 
            new InputSource(new java.io.FileInputStream(
                new java.io.File(xmlURI)));
        reader.parse(inputSource);

As a result, you would get the following exception when running the viewer:

C:\javaxml2\build>java javaxml2.SAXTreeViewer ..\ch03\xml\contents.xml
org.xml.sax.SAXParseException: File 
  "file:///C:/javaxml2/build/DTD/JavaXML.dtd" not found.

While this seems a little silly (wrapping a URI in a file and I/O stream), it's actually quite common to see people using I/O streams as input to parsers. Just be sure that you don't reference any other files in the XML and that you set a system ID for the XML stream (using the setSystemID( ) method on InputSource). So the above code sample could be "fixed" by changing it to the following:

        // Parse
        InputSource inputSource = 
            new InputSource(new java.io.FileInputStream(
                new java.io.File(xmlURI)));
        inputSource.setSystemID(xmlURI);
        reader.parse(inputSource);

Always set a system ID. Sorry for the excessive detail; now you can bore coworkers with your knowledge about SAX InputSources.


3. SAX		3.3. Content Handlers

3.2. SAX Readers

3.2.1. Instantiating a Reader

Example 3-1. The SAXTreeViewer skeleton

U-R-What?

3.2.2. Parsing the Document

Figure 3-1. An uninteresting JTree

3.2.3. Using an InputSource