Home
TOC Index |
Handling Lexical Events
You saw earlier that if you are writing text out as XML, you need to know if you are in a
CDATA
section. If you are, then angle brackets (<) and ampersands (&) should be output unchanged. But if you're not in aCDATA
section, they should be replaced by the predefined entities<
and&
. But how do you know if you're processing aCDATA
section?Then again, if you are filtering XML in some way, you would want to pass comments along. Normally the parser ignores comments. How can you get comments so that you can echo them?
Finally, there are the parsed entity definitions. If an XML-filtering app sees
&myEntity;
it needs to echo the same string--not the text that is inserted in its place. How do you go about doing that?This section of the tutorial answers those questions. It shows you how to use
org.xml.sax.ext.LexicalHandler
to identify comments,CDATA
sections, and references to parsed entities.Comments,
CDATA
tags, and references to parsed entities constitute lexical information--that is, information that concerns the text of the XML itself, rather than the XML's information content. Most applications, of course, are concerned only with the content of an XML document. Such apps will not use theLexicalEventListener
API. But apps that output XML text will find it invaluable.
Note: Lexical event handling is a optional parser feature. Parser implementations are not required to support it. (The reference implementation does so.) This discussion assumes that the parser you are using does so, as well.
How the LexicalHandler Works
To be informed when the SAX parser sees lexical information, you configure the
XmlReader
that underlies the parser with aLexicalHandler
. TheLexicalHandler
interface defines these even-handling methods:
- Tells when a
CDATA
section is starting and ending, which tells your application what kind of characters to expect the next timecharacters()
is called.Working with a LexicalHandler
In the remainder of this section, you'll convert the Echo app into a lexical handler and play with its features.
Note: The code shown in this section is inEcho11.java
. The output is shown inEcho11-09.txt
. (The browsable version isEcho11-09.html
.)
To start, add the code highlighted below to implement the
LexicalHandler
interface and add the appropriate methods.import org.xml.sax.*; import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.ext.LexicalHandler; ...
public class Echo extends HandlerBaseimplements LexicalHandler
{ public static void main(String argv[]) { ... // Use an instance of ourselves as the SAX event handler
DefaultHandler handler = new Echo();Echo handler = new Echo();
...At this point, the
Echo
class extends one class and implements an additional interface. You changed the class of the handler variable accordingly, so you can use the same instance as either aDefaultHandler
or aLexicalHandler
, as appropriate.Next, add the code highlighted below to get the
XMLReader
that the parser delegates to, and configure it to send lexical events to your lexical handler:public static void main(String argv[]) { ... try { ... // Parse the input SAXParser saxParser = factory.newSAXParser();XMLReader xmlReader = saxParser.getXMLReader(); xmlReader.setProperty( "http://xml.org/sax/properties/lexical-handler", handler );
saxParser.parse( new File(argv[0]), handler); } catch (SAXParseException spe) { ...Here, you configured the
XMLReader
using thesetProperty()
method defined in theXMLReader
class. The property name, defined as part of the SAX standard, is the URL,http://xml.org/sax/properties/lexical-handler
.Finally, add the code highlighted below to define the appropriate methods that implement the interface.
public void warning(SAXParseException err) ... }public void comment(char[] ch, int start, int length)throws SAX- Exception { } public void startCDATA() throws SAXException { } pubic void endCDATA() throws SAXException { } public void startEntity(String name) throws SAXException { } public void endEntity(String name) throws SAXException { } public void startDTD( String name, String publicId, String systemId) throws SAXException { } public void endDTD() throws SAXException { }
private void echoText() ...You have now turned the
Echo
class into a lexical handler. In the next section, you'll start experimenting with lexical events.Echoing Comments
The next step is to do something with one of the new methods. Add the code highlighted below to echo comments in the XML file:
public void comment(char[] ch, int start, int length) throws SAXException {String text = new String(ch, start, length); nl(); emit("COMMENT: "+text);
}When you compile the Echo program and run it on your XML file, the result looks something like this:
COMMENT: A SAMPLE set of slides COMMENT: FOR WALLY / WALLIES COMMENT: DTD for a simple "slide show". COMMENT: Defines the %inline; declaration COMMENT: ...The line endings in the comments are passed as part of the comment string, once again normalized to newlines. You can also see that comments in the DTD are echoed along with comments from the file. (That can pose problems when you want to echo only comments that are in the data file. To get around that problem, you can use the
startDTD
andendDTD
methods.)Echoing Other Lexical Information
To finish up this section, you'll exercise the remaining
LexicalHandler
methods.
Note: The code shown in this section is inEcho12.java
. The file it operates on isslideSample10.xml
. (The browsable version isslideSample10-xml.html
.) The results of processing are inEcho12-10
.
Make the changes highlighted below to remove the comment echo (you don't need that any more) and echo the other events, along with any characters that have been accumulated when an event occurs:
public void comment(char[] ch, int start, int length) throws SAXException {} public void startCDATA() throws SAXException {
String text = new String(ch, start, length); nl(); emit("COMMENT: "+text);echoText(); nl(); emit("START CDATA SECTION");
} public void endCDATA() throws SAXException {echoText(); nl(); emit("END CDATA SECTION");
} public void startEntity(String name) throws SAXException {echoText(); nl(); emit("START ENTITY: "+name);
} public void endEntity(String name) throws SAXException {echoText(); nl(); emit("END ENTITY: "+name);
} public void startDTD(String name, String publicId, String systemId) throws SAXException {nl(); emit("START DTD: "+name +" publicId=" + publicId +" systemId=" + systemId);
} public void endDTD() throws SAXException {nl(); emit("END DTD");
}Here is what you see when the
DTD
is processed:START DTD: slideshow publicId=null systemId=file:/..../samples/slideshow3.dtd START ENTITY: ... ... END DTD
Note: To see events that occur while theDTD
is being processed, useorg.xml.sax.ext.DeclHandler
.
Here is some of the additional output you see when the internally defined
products
entity is processed with the latest version of the program:START ENTITY: products
CHARS: WonderWidgetsEND ENTITY: products
And here is the additional output you see as a result of processing the external copyright entity:
START ENTITY: copyright
CHARS: This is the standard copyright message that our lawyers make us put everywhere so we don't have to shell out a million bucks every time someone spills hot coffee in their lap...END ENTITY: copyright
Finally, you get output that shows when the
CDATA
section was processed:START CDATA SECTION
CHARS: Diagram: frobmorten <--------------fuznaten | <3> ^ | <1> | <1> = fozzle V | <2> = framboze staten----------------------+ <3> = frenzle <2>END CDATA SECTION
In summary, the
LexicalHandler
gives you the event-notifications you need to produce an accurate reflection of the original XML text.
Note: To accurately echo the input, you would modify the characters() method to echo the text it sees in the appropriate fashion, depending on whether or not the program was inCDATA
mode.
Home
TOC Index |
This tutorial contains information on the 1.0 version of the Java Web Services Developer Pack.
All of the material in The Java Web Services Tutorial is copyright-protected and may not be published in other works without express written permission from Sun Microsystems.