ToCDocOverviewCGDocRelNotesFAQIndexPermutedIndex
Allegro CL version 10.1
Unrevised from 10.0 to 10.1.
10.0 version

A Sax XML Parser for Allegro Common Lisp

This document contains the following sections:

1.0 Sax parser introduction
   1.1 XML introduction
   1.2 XML versions
   1.3 Parsing XML documents
   1.4 Well-formed XML documents
   1.5 Valid XML documents
2.0 The sax API in Allegro CL
3.0 Testing the sax parser: the test-sax-parser class
4.0 LXML
   4.1 What is LXML?
   4.2 The SAX-LXML Module
   4.3 The PXML-SAX Module
   4.4 The PXML-DUAL Module
   4.5 PXML reference
5.0 Index


1.0 Sax parser introduction

This document has been revised since its first release. A patch made available around June 7, 2004, updates the sax module to conform to the description in this document.

This utility provides a validating parser for XML 1.0 and XML 1.1. The interface to the parser is based on the SAX (Simple API for XML) specification.

A SAX parser reads the input file and checks it for correctness. While it is parsing the file, it is making callbacks to user code that note what the parser is seeing. If the parser finds an error in the input file it signals an error.

There are two levels of correctness for an xml file: well formed, described in Section 1.4 Well-formed XML documents and valid, described in Section 1.5 Valid XML documents.

When the sax parser is invoked it creates as instance of the class sax-parser (or a subclass of sax-parser). This instance holds the data accumulated during the parse and this instance is also used to discriminate on the method to call when callbacks are done. A user of the parser will usually subclass sax-parser and then write methods on this subclass for those callbacks that he wishes to handle. The sax-parser class has a set of methods for the callbacks so the user need only write those methods whose behavior he wishes to change.

All symbols in this module are exported from the net.xml.sax package. The module is named sax. You load the sax module with a form like:

(require :sax)

See also dom.htm, which describes Document Object Model support in Allegro CL.


1.1 XML introduction

XML (the Extensible Markup Language) is a language for writing structured documents. An XML document contains characters and elements. An element looks like

<name att1="value1" att2="value2"> body content here </name>

or

<name att1="value1" att2="value2"/>

The elements are used to assign a meaning to the text between the start and end tags of the elements. For example

<name>
   <lastname>Smith</lastname>
   <firstname>John</firstname>
</name>

The designers of XML intended to write a clear concise specification of a structured document language. While they did not achieve that very ambitious goal, XML has nevertheless become very popular, in large part because of the popularity of the world wide web and the HTML language on which it is written. XML is very similar to HTML.


1.2 XML versions

There are two versions of XML (1.0 and 1.1) and they differ in the characters they permit inside documents. In XML 1.0 the XML designers decided what characters were permitted and declared that all other characters were forbidden. In XML 1.1 the XML designers decided which characters were forbidden and any characters not forbidden were permitted. Far more characters are permitted in XML 1.1 than XML 1.0. All XML 1.0 documents are XML 1.1 documents but not the other way around.

The way that the two versions of XML documents are distinguished is by what appears at the beginning of the document. An XML 1.1 document always begins

<?xml version="1.1"?>

and this form may also include encoding and standalone attributes.

An XML 1.0 document begins with

<?xml version="1.0"?>

or begins with no <?xml..?> form at all.


1.3 Parsing XML documents

There are two popular models for parsing an XML document, DOM and SAX:

1. DOM (Document Object Model) parsing

The parser reads the whole XML document and returns a object which represents the whole XML document. A program can query this object and the objects this object points to in order to find out what is in the XML document.

The advantage of DOM parsing is the XML document is now in a form that's easily studied and manipulated by a program. It has the disadvantage that there is a limit to the size of the XML document you can parse this way since the whole document represented by objects must fit in the address space of the program.

2. SAX parsing

While the parser is reading the XML document it is calling back to user code to tell it what it's encountering. The callbacks occur immediately, before the parser has even determined that the XML document is completely error free.

The advantage of SAX parsing is the user code can ignore what it does not care about and only keep the data it considers important. Thus it can handle huge XML documents. But the disadvantage is the callbacks occur before it's even known if this document is correct XML. If the goal is to analyze the document then the sax user code will often end up writing ad-hoc DOM structure.


1.4 Well-formed XML documents

An XML document must be well-formed or technically it is not an XML document. There is no simple definition of well-formed (and readers are invited to read the XML specification at http://www.w3c.org for all the details). Basically though a well-formed document follows these rules:

  1. There is one top-level element (and no top-level characters other than whitespace, except Misc in the Prolog). Here are some well-formed XML documents based on that criterion:
      document 1:
      <foo/>
      
      document 2:
      <foo></foo> 
      
      document 3:
      <foo> hello <bar>
      </bar>  hello
      <baz/> hello
      </foo>
    

    and some not well-formed ones:

      document 4:
      <foo/>
      <bar/>
      
      document 5:
      <foo/>
      hello
      
      document 6:
      no elements here.
    
  2. Every start tag has an end tag. So this document is well-formed
      <foo> </foo>
    

    and this one isn't:

      <foo>
    
  3. Elements are nested correctly. This document is well-formed:
      <foo> hello <bar/> <baz> hello </baz> </foo>
    

    and this one isn't:

      <foo> hello <bar> hello </foo> hello </bar>
    

The ACL Sax parser will signal an error if it detects that the document is not well-formed.


1.5 Valid XML documents

A well-formed document can also be valid. A valid document contains or references a DTD (document type description) and obeys that DTD.

A DTD contains two things:

  1. Declarations that constrain what can be used in the XML document. For example the DTD can specify that within the body of the 'name' element you will have in sequence a 'lastname' element and a 'firstname' element and nothing else.
  2. Definitions of entities (these are text substitution macros). Even documents that are not intended to be valid will include a DTD just for the purpose of declaring entities.

The ACL Sax parser will test a document for validity only if the :validate argument is given as true (the :validate argument defaults to false). The Sax parser takes longer to parse if it must validate as well. Even if the paser is not validating it may detect problems in the document for which it would have signaled an error if it were validating. In this case the parser will issue a warning. You can surpress those warnings by passing nil to the :warn argument (the default for :warn is true).

The parser collects all the DTD information about the document and stores it in the parser object that's passed to all the callback functions. You can use the accessors shown below to retrieve information about the DTD.



2.0 The sax API in Allegro CL

There are two predefined classes that you can pass to the :class argument of the sax-parse functions.

The class sax-parser defines callback functions that do nothing except for compute-external-address and compute-external-format, which do the work necessary to ensure that the parse will be able to handle external references.

In the example code shown below we'll assume that we've created our own subclass of sax-parser called my-sax-parser. We do this by evaluating:

(defclass my-sax-parser (sax-parser)
    ((private :initform nil :accessor private)))

The class test-sax-parser defines the callback methods to bring the values of their arguments. This allows you to see how the sax parser would treat an xml document. See Section 3.0 Testing the sax parser: the test-sax-parser class for more information on this class.


start-document

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser)

User should define their own method on their subclass of sax-parser. start-document is called just before the parser begins parsing its input. This function can be used to initialize values in the instance of the parser object. The default method returns nil.

This callback is a good place to do initialization of the data structures you will be using during the parse, as we do with the following method (we assume private and make-private-object are elsewhere defined):

(defmethod start-document ((parser my-sax-parser))
  (setf (private parser) (make-private-object)))


end-document

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser)

User should define their own method on their subclass of sax-parser. end-document is called after the parse is compete. end-document will only be called if the document is well formed, and in the case where the parser was called with :validate t then end-document will only be called if the document is also valid. The default method returns nil.

Example

(defmethod end-document ((parser my-sax-parser))
  (finalize-parse (private parser)))


start-element

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) iri localname qname attrs

This method is called when a start element (like <foo> or <foo/>) is seen. If you sax-parse with :namespace t, then iri is the iri that denotes the namespace of the start element tag (this is specified by a namespace binding); localname is the part after the colon in the element tag; qname is what was actually seen as the tag (e.g. "rdf:foo"); and attrs is a list of ("attrname" . "value") where attrname can contain colons (e.g. namespace processing has not been done).

If, on the other hand, sax-parse with :namespace nil, then iri is nil; localname is the actual element tag (e.g. "rdf:foo"); qname is the same as localname; and attrs is a list of ("attrname" . "value") where attrname can contain colons (e.g. namespace processing has not been done).

Given this xml source:

<foo xmlns="urn:defnamespace" 
     xmlns:pack="http://mydef.com/pack">
    <bar/>
    <pack:baz/>
</foo>

If the parser is called with :namespace t then during the parse three calls start-element are made with the arguments to the calls being:

iri="urn:defnamespace", localname="foo", qname="foo"

iri="urn:defnamespace", localname="bar", qname="bar"

iri="http://mydef.com/pack", localname="baz", qname="pack:baz"

If the parse is called with :namespace nil then again three calls are made to start-element, but this time the arguments are:

iri=nil, localname="foo", qname="foo"

iri=nil, localname="bar", qname="bar"

iri=nil, localname="pack:baz", qname="pack:baz"

The default method does nothing and returns nil.



end-element

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) iri localname qname

This method is called when an end element (</foo> or <foo/>) is seen. As with start-element, the values of the iri, localname, and qname arguments depend on whether sax-parse is called with :namespace t or :namespace nil. See start-element for details.

The default method does nothing and returns nil.



start-prefix-mapping

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) prefix iri

This method is called when the parser enters a context where the namespace prefix is mapped to the given iri. A prefix of "" means the default mapping. start-prefix-mapping is called before the start-element call for the element that defines the prefix mapping

The default method does nothing and returns nil.



end-prefix-mapping

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) prefix

This method is called when the parser leaves a context where the prefix mapping applies.

The default method does nothing and returns nil.



processing-instruction

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) target data

This method is called when <?name data> is seen, with target being "name" and data being "data". If <?name data values> is seen then target is "name" and data is "data values".

The default method does nothing and returns nil.



content

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) content start end ignorable

This method is called when text is seen between elements. Note: for a given string of characters between elements, one or more calls to content or content-character may be made. For example, given the XML fragment <tag>abcdefghijkl</tagg>, the content method may be called once with a string argument of "abcdefghijkl", or twice with string arguments "abcd" and then "efghijkl", etc for all the other permutations.

If an application requires access to the entire content string as a single string, the application program must collect the fragments into a contiguous string. The parse-to-lxml function and the DOM module implement normalize options that ensure contiguous string content appears as a single Lisp string.

This is the most common error people make with this sax parser: assuming that all content between the start and end element tags will be passed in exactly one call to the content or content-character. As we said, the content may be provided in more than one call to content and content-character.

content is a character array. start is the index of the first character with content. end is one past the index of the last character with content.

ignorable is true if content is whitespace inside an element not permitting character data. This can only happen when the parser is validating since it is only then that the parser knows from an element's specification whether that element's body can contain non-whitespace characters.

The default method does nothing and returns nil.



content-character

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) character ignorable

This method is called when a single character of text is seen between elements. character is that character.

ignorable is true if content is whitespace inside an element not permitting character data. This can only happen when the parser is validating since it is only then that the parser knows from an element's specification whether that element's body can contain non-whitespace characters.

The default method does nothing and returns nil.



comment

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) string

This method is called when an XML comment (i.e. <!-- ..... -->) is seen.

The default method does nothing and returns nil.



compute-external-address

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) system public current-filename

This method is called when the parser has to locate another file in order to continue parsing. It should return a filename to open next. It can return nil if it cannot compute a name.

system is nil or a string holding the value after SYSTEM in the xml source. public is nil or a string holding the value after PUBLIC in the xml source. current-filename is the filename of the file being parsed.

The default method does not handle non-file identifiers such as those beginning with "http:". It merges the pathname of system with the pathname of current-filename, if current-filename is non-nil, and otherwise returns the value of system. The default method signals an error if system is nil. Thus, the body of the default method looks like this:

  (if* (null system)
     then (error "Can't compute external address with no system address"))
  
  (if* current-filename
     then (namestring (merge-pathnames (pathname system)
                                       (pathname current-filename)
                                       ))
     else system))


compute-external-format

Generic Function

Package: net.xml.sax

Arguments: (parser sax-parser) encoding ef

Given an encoding, this method should return an external format or the name of an external-format. The default method does the following:

  (find-external-format
   (if* (equalp encoding "shift_jis")
      then :shiftjis
    elseif (equalp encoding "euc-jp")
      then :euc
    elseif (equalp encoding "utf-16")
      then ef ; already must have the correct ef
      else encoding)))
 


sax-parse-file

Function

Package: net.xml.sax

Arguments: filename &key (namespace t) (external t) (validate nil) (class (quote sax-parser)) (warn t) comments show-xmlns

This function parses the file specified by filename. The keyword arguments are:



sax-parse-stream

Function

Package: net.xml.sax

Arguments: stream &key (namespace t) (external t) (validate nil) (class (quote sax-parser)) (warn t) comments show-xmlns

This function is like sax-parse-file but parses the data from stream, which must be an open stream. stream is closed when sax-parse-stream returns.



sax-parse-string

Function

Package: net.xml.sax

Arguments: string &key (namespace t) (external t) (validate nil) (class (quote sax-parser)) (warn t) comments show-xmlns

This function is like sax-parse-file but parses the data from the string argument, which should be a string.

Note that the string should not begin with a byte-order-marker (BOM) character and the XML form parsed should not contain a conflicting encoding declaration as a string-input stream does not have an associated external format. Since the string contents consists of characters already, any transformations from octets or interpretation of octets as meta-data is assumed to have been done when the character content of the string was created.



sax-parser-flag

Function

Package: net.xml.sax

Arguments: parser flag-name

The parser flags are initially set from the values supplied (or defaulted) to the sax-parse-xxxx functions (sax-parse-file, sax-parse-stream, and sax-parse-string). You can use sax-parser-flag to read the current value of the flag. You can use (setf sax-parser-flag) to set certain flags. Some flags should not be modified after the parse has begun.

parser is an instance of the sax-parser class (or a subclass of sax-parser). flag-name is one of the values from the table below. sax-parser-flag returns t or nil as the flag is or is not set. When you use setf with this function to modify a flag, specify a non-nil value to set a flag and nil to unset it.


Flags

The table below lists flags; writeable means that user code can change the value during the parse. Setting a flag denoted as not writeable will result in undefined behavior.

Flag name Writeable Meaning
:namespace no obey the xml namespace rules
:external no read external entities
:validate no do validation
:warn yes issue warnings for items that may signify problems in the xml but which aren't actual errors.
:show-xmlns yes add xmlns attributes to start-element attribute lists
:comments yes call the comment generic function when comments are seen.

DTD information acccess

The parser first parses the DTD and then the content of the file. The information found in the DTD is stored in the parser object where it is referenced by the parser during the parse.

An xml document need not have a DTD. However if you tell the parser to validate a document then the document must have a DTD.

When the first start-element callback is made the whole DTD has been parsed and the information is stored in the parser object.

After the parse completes the DTD information is still stored in the parser object.

The following accessors retrieve DTD information from the parser object.


parser-root

Function

Package: net.xml.sax

Arguments: parser

Returns a string naming the root element. Every xml file contains exactly one element at 'top level' and may contain other elements inside that root element.



parser-general-entities

Function

Package: net.xml.sax

Arguments: parser

Returns a hash table where the key is the general entity name and the value is an entity object.



parser-parameter-entities

Function

Package: net.xml.sax

Arguments: parser

Returns a hash table where the key is the parameter entity name and the value is an entity object.



parser-notations

Function

Package: net.xml.sax

Arguments: parser

Returns a hash table where the key is the notation name and the value is a notation object.


Accessors for entity objects


entity-name

Function

Package: net.xml.sax

Arguments: entity

Returns a string naming the entity.



entity-replacment-text

Function

Package: net.xml.sax

Arguments: entity

Returns nil or a string holding the replacement text for the entity. If the entity is internal then this field will be a string.



entity-system

Function

Package: net.xml.sax

Arguments: entity

Returns nil for internal entities. For external entities this is a string describing the location of the entity's value (the string is often a location on the filesystem relative to the file that references it).



entity-public

Function

Package: net.xml.sax

Arguments: entity

Returns nil or a string. For certain external entities that have public identifiers, this is that public identifier.



entity-ndata

Function

Package: net.xml.sax

Arguments: entity

Returns nil or a string. If this is an external unparsed entity then this is the name of a notation that describes its format.



entity-ext-subset

Function

Package: net.xml.sax

Arguments: entity

Returns true if this entity was defined in the 'external subset' which is a term referring to files other than the main file being parsed.



notation-name

Function

Package: net.xml.sax

Arguments: notation

Returns a string naming the notation.



notation-public

Function

Package: net.xml.sax

Arguments: notation

Returns nil or a string naming the public identifier for this notation.



notation-system

Function

Package: net.xml.sax

Arguments: notation

Returns a string naming the location of a description of the notation.



attribute-name

Function

Package: net.xml.sax

Arguments: attribute

returns a string naming the attribute.



attribute-type

Function

Package: net.xml.sax

Arguments: attribute

Returns the type of the attribute, which is one of:



attribute-default

Function

Package: net.xml.sax

Arguments: attribute

The value returned is one of: :required, :implied, (:fixed value), (:value value).



attribute-ext-subset

Function

Package: net.xml.sax

Arguments: attribute

returns true if the attribute was declared in the external subset.



element-name

Function

Package: net.xml.sax

Arguments: element

returns a string naming the element.



element-attrs

Function

Package: net.xml.sax

Arguments: element

Returns a list of attribute objects describing the attributes of this element.



element-spec

Function

Package: net.xml.sax

Arguments: element

A description of the specification of the body of the element. The format is:

spec := :empty
         :any
         cp 
         (:mixed ["name" ...])    

cp := (:cp cho/seq modifier)

cho/seq := 
           (:choice cp [cp ...])
           (:sequence cp [cp ...])
	    "name"

 modifier :=
            nil
            "*"
	    "?"
            "+"


element-ext-subset

Function

Package: net.xml.sax

Arguments: element

Returns true if the element was defined in the external subset.




3.0 Testing the sax parser: the test-sax-parser class

If you wish to test the sax-parser, we have defined several example classes. The class test-sax-parser and its associated methods are already defined in the system (after the sax module is loaded). The class sax-count-parser, defined below in this section, is not defined in the sax module but the definition code can be copied from this document.

The examples in this section assume that the SAX module has been loaded and the relevant package (net.xml.sax) has been used. If you do not want to use the package, package-qualify the relevant symbols. The following forms load the module and use the package:

(require :sax)
(use-package :net.xml.sax)

Here are the definitions of the class test-sax-parser and the associated methods (again, these definitions are included in the sax module so they need not be defined again). The methods on test-sax-parser print the arguments to the callbacks.


This is the definition of this class:


(defclass test-sax-parser (sax-parser)
  ())


(defmethod start-document ((parser test-sax-parser))
  (format t "sax callback: Start Document~%"))

(defmethod end-document ((parser test-sax-parser))
  (format t  "sax callback: End Document~%"))

(defmethod start-element ((parser test-sax-parser) iri localname qname attrs)
  (format t "sax callback: start element ~s (iri: ~s) (qname: ~s) attrs: ~s~%"
	  localname iri qname attrs)
  nil)

(defmethod end-element ((parser test-sax-parser) iri localname qname)
  (format t "sax callback: end element ~s (iri: ~s) (qname: ~s)~%"
	  localname iri qname)
  nil)

(defmethod start-prefix-mapping ((parser test-sax-parser) prefix iri)
  
  (format t "sax callback: start-prefix-mapping ~s -> ~s~%" prefix iri)
  nil
  )

(defmethod end-prefix-mapping ((parser test-sax-parser) prefix)
  
  (format t "sax callback: end-prefix-mapping ~s~%" prefix)
  )

(defmethod processing-instruction ((parser test-sax-parser) target data)
  (format t "sax callback: processing-instruction  target: ~s, data: ~s~%" 
	  target data)
  ;; 
  nil)


(defmethod content ((parser test-sax-parser) content start end ignorable)
  (format t "sax callback: ~:[~;ignorable~] content(~s,~s) ~s~%" ignorable
	  start end
	  (subseq content start end))
  nil)


(defmethod content-character ((parser test-sax-parser) character ignorable)
  (format t "sax callback: ~:[~;ignorable~] content-char ~s~%" 
	  ignorable character)
  nil)

(defmethod compute-external-format ((parser test-sax-parser) encoding ef)
  (let ((ans (call-next-method)))
    (format t "sax callback: compute-external-format of ~s is ~s (current is ~s)~%" 
	    encoding ans ef)
    ans))

(defmethod comment ((parser test-sax-parser) string)
  ;;
  ;; called when <!-- ..... --> is seen
  ;;
  (format t "sax callback: comment: ~s~%" string)
  nil)

Counting elements in a document

This is an example of another useful sax-parser subclass. The sax-count-parser class maintains a count of the elements, attributes and characters in an xml file. This class is not defined when the sax parser is loaded but you can just copy the definition below and load it into Lisp if you wish to try it.

; definition of a sax parser to count items
(defstruct counter
  (elements 0)
  (attributes 0)
  (characters 0))


(defclass sax-count-parser (sax-parser)
  ((counts :initform (make-counter)
	   :reader counts)))

(defmethod start-element ((parser sax-count-parser) iri localname qname
			  attrs)
  (declare (ignore iri localname qname))
  (let ((counter (counts parser)))
    (incf (counter-elements counter))
    (let ((attlen (length attrs)))
      
      (if* (> attlen 0)
	 then (incf (counter-attributes counter) attlen)))))

(defmethod content ((parser sax-count-parser) content start end ignorable)
  (declare (ignore content ignorable))
  (let ((counter (counts parser)))
    (incf (counter-characters counter) (- end start))))

(defmethod content-character ((parser sax-count-parser) char ignorable)
  (declare (ignore char ignorable))
  (let ((counter (counts parser)))
    (incf (counter-characters counter))))


4.0 LXML


4.1 What is LXML?

LXML is a list representation of an XML parse tree. The notation was introduced initially with the PXML module (see pxml.htm, but note the PXML module is deprecated may may be removed in a release later than 9.0), and is supported for compatibility with existing applications. It is also a convenient representation for moderately sized XML documents.

The representation is made up of lists of LXML tags containing LXML nodes. An LXML node is either a string or a list of an LXML tag followed by an LXML node. An LXML tag is either a symbol or a list of a symbol followed by attribute/value pairs, where the attribute is a symbol and the value is a string. In brief:

  LXML-node -> string | (LXML-tag [LXML-node] ... )
  LXML-tag  -> symbol | (symbol [attr-name attr-value] ... )

And more formally:

  - An LXML node may be a string representing textual element content.
  - An LXML node may be list representing a named XML element.
       - The first element in the list represents the element tag
             - If no attributes were present in the element tag,
               then the element tag is represented by a Lisp symbol;
               the symbol-name of the Lisp symbol is the local name of the tag;
               the XML namespace of the tag is represented by the Lisp
               home package of the symbol.
             - If attributes were present in the element tag,
               then the element tag is represented by a list where
               the first element is the tag (as above) and the remainder of
               the list is a lisp property list where the property keys are
               lisp symbols that represent the attribute names and the
               property values are strings that represent the property values.
       - The remainder of the list is a list of LXML nodes that
         represent the content of the XML tag.

       - An LXML node may be a list of the form (:comment text-string)
	 to represent a comment in the XML document.
       - An LXML node may be a list of the form (:pi target data)
	 to represent a processing instruction in the XML document.

Each distinct XML namespace is mapped to a Lisp package. An application may specify the namespace-to-package mapping in full, in part, or not at all. If there is no pre-specified Lisp package for some XML namespace, then the parser creates a new package with a name "pppnn" where "ppp" is a prefix specified by the user and "nn" is an integer that guarantees uniqueness. The default prefix is the symbol-name of :net.xml.namespace. (ending with a period).


4.2 The SAX-LXML Module

The :sax module implements the lxml-parser sub-class of sax-parser. The methods on this class use the SAX parser to build an LXML data structure from the parsed XML input. (In earlier releases, it was possible to require a module named :sax-lxml, which would not be included by default in the :sax module. Now that module is always loaded when the :sax module is loaded and cannot be required separately.)


lxml-parser

Class

Package: net.xml.sax

A subclass of sax-parser. Slots include normalize, default-package, package-prefix, and skip-ignorable. The add-parser-package method is defined.

The initial value of the package slot is :keyword. The inital value of the normalize slot is nil.



parser-normalize

Generic Function

Package: net.xml.sax

Arguments: lxml-parser

Returns the value of the normalize slot os its argument, which must be an instance of lxml-parser.

If the normalize slot is nil, string elemnt content may appear as a list of strings. The length of each fragment is determined by the implementation and may vary from one parse to the next.

If the normalize slot is non-nil, then if an element contains only string content, this content will appear as one contiguous string. This option will naturally require the parser to do more consing during the parse.



add-parser-package

Generic Function

Package: net.xml.sax

Arguments: parser iri package &rest prefixes

The default method, defined on (lxml-parser t t), adds a new iri-to-package mapping to the parser or adds a prefix to an existing mapping.

The iri argument may be a string or a net.uri:uri instance (see uri.htm). The package argument may be a package or the name of a package. The prefixes may be symbols or strings. When the iri argument is a uri instance, it is converted to its string form for use during the parse.

Note that the Allegro CL implementation of uri instances may map many different uri instances to the same string. To avoid possible ambiguities, it is best to specify the iri argument as a string that will be used without any interpretation or change.

To pre-specify namespace-to-package mappings in a program, the application program must call add-parser-package in a start-document method for an application-specific sub-class of lxml-parser.



parser-default-package

Generic Function

Package: net.xml.sax

Arguments: lxml-parser

Returns the default package of the lxml-parser instance.



parser-package-prefix

Generic Function

Package: net.xml.sax

Arguments: lxml-parser

Returns the prefix string used to generate package names for packages that represent namespaces that were not specified with add-parser-package. This default value is :net.xml.namespace. (with a trailing period).



parser-skip-ignorable

Generic Function

Package: net.xml.sax

Arguments: lxml-parser

Returns whether ignorable text will be skipped for the lxml-parser instance. This default value is nil.



parser-lxml

Generic Function

Package: net.xml.sax

Arguments: lxml-parser

When a parse is complete, this accessor returns the resulting lxml data structure.



pxml-parser

Class

Package: net.xml.sax

A subclass of lxml-parser. The initial value of the package slot is the value of *package*. The inital value of the normalize slot is t.



parse-to-lxml

Generic Function

Package: net.xml.sax

Arguments: string-or-stream &key external-callback content-only general-entities parameter-entities uri-to-package package class normalize comments warn

The arguments to this function are like the arguments to net.xml.parser:parse-xml (see pxml.htm).The class and methods are included for compatibility with pxml.

The content-only, external-callback, general-entities, and parameter-entities are ignored, silently in the case of content-only, with a warning for the others.

The package keyword argument specifies the Lisp package of XML names without a namespace qualifier. If the argument is omitted or nil, the initial value in the class is used.

The class keyword argument specifies the class of the parser. The choice of class can affect the default packege and normalize behavior, and many other behaviors. The default is lxml-parser.

The class argument may be the name of a class, a class object, or an instance of a suitable class. If an instance is passed, it must be one that has never been used by the SAX or LXML parser.

The normalize keyword argument specifies the value of the normalize slot in the parser. Values other than nil or t must be specifed in the call. It can be one of the following values:

Whitespace characters are defined by the parser-char-table in the parser instance. The various trim behaviors are not specified in the XML standard but are often useful when parsing to LXML.

The uri-to-package argument is a list of conses of the form (iri . package) where iri may be a string or a uri instance and package may be a package name or a package instance.

The :comments argument may be nil or non-nil. When nil (the default), XML comments are discarded during the parse. When non-nil, XML comments are included in the LXML output as expressions of the form (:comment text-string).

The :warn argument is propagated to the sax-parse-* function called by parse-to-lxml.

This form is more general than that allowed by the parse-xml function.

SAX parser flag value and how to change them

This function calls the SAX parser with the following flag values

      :namespace  t 
      :show-xmlns t 
      :comments   nil 
      :validate   nil 
      :external   as specified in argument
      :warn       SAX parser default

If it is necessary to modify the flag settings for a specific application, the following code can be used:

  (defclass local-lxml (lxml-parser) ())

  (defmethod start-document :before ((p local-lxml))
     (setf (sax-parser-flag xxx) yyy)) 

  (parse-to-lxml what :class 'local-lxml ...)


*lxml-parser*

Variable

Package: net.xml.sax

The lxml-parser instance created in the most recent call to parse-to-lxml.



4.3 The PXML-SAX Module

The :pxml-sax module implements a partial pxml API to the SAX parser. This module replaces the :pxml module. It requires the modules :sax, and :sax-lxml. Symbols naming operators, variables, etc. in the module are in the :net.xml.parser package. Load this module with

(require :pxml-sax)

The operators in this module are:


4.4 The PXML-DUAL Module

The :pxml-dual module allows an application to switch at run time between the base implementation of pxml and the partial SAX implementation. It requires the modules :pxml, :sax, and :sax-lxml. Symbols naming operators, variables, etc. in the module are in the :net.xml.parser package. Load this module with

(require :pxml-dual)

When the module is loaded, the initial setting is to use the SAX parser implementation.

We provide this module to allow mission-critical applications to test both parsers in the same run-time environment. You can switch between the base and the SAX parsers with pxml-version.

The operators in this module are:


4.5 PXML reference

In this section, we list the operators and variables associated with the various PXML modules. In many cases, the operators behave differently depending on what module is loaded.

Compatibility Note:

The PXML parser default behavior was to silently ignore external DTDs unless a function was specified for the external-callback argument. The SAX parser default behavior is to signal an error if an external DTD cannot be located. The built-in default function can only locate files in the local file system.

Existing applications that depend on the default external DTD behavior of the PXML parser may break when using the SAX parser through the PXML compatibility package. These application will need to use the SAX parser more explicitly and specify a suitable compute-external-address method.


pxml-version

Function

Package: net.xml.parser

:pxml-sax module behavior

Arguments:

In the :pxml-sax module, this function works as described in pxml.htm: called with no arguments, this function returns a string naming the PXML version.

:pxml-dual module behavior

Arguments: &optional parser-type

Called with no arguments, this function returns a string naming the PXML version. If parser-type is specified, it should be either :sax, :base, or :query.

When parser-type is :sax, the SAX version of parse-xml is enabled. When parser-type is :base, the original version of parse-xml is enabled.

When parser-type is :query, this function returns :base or :sax depending on which version of parse-xml is enabled.



parse-xml

Generic Function

Package: net.xml.parser

Arguments: input-source &key external-callback content-only general-entities parameter-entities uri-to-package

The arguments and behavior are fully described in pxml.htm. The differences among modules is whether the keyword arguments content-only, external-callback, general-entities, and parameter-entities have effect or are ignored. In the :pxml-sax module and (thus) in the :pxml-dual module when in :sax mode, those arguments are ignored (silently in the case of content-only, with a warning for the others). The implementation of parse-xml in the SAX mode cannot at this time support the use of those arguments, but is much faster than in base mode. All arguments are considered when regular PXML is loaded or the :pxml-dual module is loaded and is in :base mode.

When the SAX implementation of parse-xml is used, the uri-to-package argument may be a list of conses of the form (iri . package) where iri may be a string or a uri instance and package may be a package name or a package instance.

This form is more general than the form accepted by the base implementation of parse-xml. An application using the more general form will not be back-compatible with the base implementation of parse-xml.



with-base-pxml

Macro

Package: net.xml.parser

Arguments: &body body

Defined in the :pxml-dual module only (see Section 4.4 The PXML-DUAL Module). Within the body of this macro the implemetation of parse-xml is dynamically bound to the base implementation. See also with-sax-pxml.



with-sax-pxml

Macro

Package: net.xml.parser

Arguments: &body body

Defined in the :pxml-dual module only (see Section 4.4 The PXML-DUAL Module). Within the body of this macro the implemetation of parse-xml is dynamically bound to the SAX implementation. See also with-base-pxml.




5.0 Index


Copyright (c) 1998-2022, Franz Inc. Lafayette, CA., USA. All rights reserved.
This page was not revised from the 10.0 page.
Created 2019.8.20.

ToCDocOverviewCGDocRelNotesFAQIndexPermutedIndex
Allegro CL version 10.1
Unrevised from 10.0 to 10.1.
10.0 version