|
Allegro CL version 11.0 |
This utility provides a validating parser for XML 1.0 and XML 1.1. The interface to the parser is based on the SAX (Simple API for XML) specification.
A SAX parser reads the input file and checks it for correctness. While it is parsing the file, it is making callbacks to user code that note what the parser is seeing. If the parser finds an error in the input file it signals an error.
There are two levels of correctness for an xml file: well formed, described in Well-formed XML documents and valid, described in Valid XML documents.
When the sax parser is invoked it creates as instance of the class sax-parser
(or a subclass of sax-parser
). This instance holds the data accumulated during the parse and this instance is also used to discriminate on the method to call when callbacks are done. A user of the parser will usually subclass sax-parser
and then write methods on this subclass for those callbacks that he wishes to handle. The sax-parser
class has a set of methods for the callbacks so the user need only write those methods whose behavior he wishes to change.
All symbols in this module are exported from the net.xml.sax
package. The module is named sax
. You load the sax
module with a form like:
(require :sax)
See also dom.html, which describes Document Object Model support in Allegro CL.
XML (the Extensible Markup Language) is a language for writing structured documents. An XML document contains characters and elements. An element looks like
<name att1="value1" att2="value2"> body content here </name>
or
<name att1="value1" att2="value2"/>
The elements are used to assign a meaning to the text between the start and end tags of the elements. For example
<name>
<lastname>Smith</lastname>
<firstname>John</firstname>
</name>
The designers of XML intended to write a clear concise specification of a structured document language. While they did not achieve that very ambitious goal, XML has nevertheless become very popular, in large part because of the popularity of the world wide web and the HTML language on which it is written. XML is very similar to HTML.
There are two versions of XML (1.0 and 1.1) and they differ in the characters they permit inside documents. In XML 1.0 the XML designers decided what characters were permitted and declared that all other characters were forbidden. In XML 1.1 the XML designers decided which characters were forbidden and any characters not forbidden were permitted. Far more characters are permitted in XML 1.1 than XML 1.0. All XML 1.0 documents are XML 1.1 documents but not the other way around.
The way that the two versions of XML documents are distinguished is by what appears at the beginning of the document. An XML 1.1 document always begins
<?xml version="1.1"?>
and this form may also include encoding and standalone attributes.
An XML 1.0 document begins with
<?xml version="1.0"?>
or begins with no <?xml..?>
form at all.
There are two popular models for parsing an XML document, DOM and SAX:
The parser reads the whole XML document and returns a object which represents the whole XML document. A program can query this object and the objects this object points to in order to find out what is in the XML document.
The advantage of DOM parsing is the XML document is now in a form that's easily studied and manipulated by a program. It has the disadvantage that there is a limit to the size of the XML document you can parse this way since the whole document represented by objects must fit in the address space of the program.
While the parser is reading the XML document it is calling back to user code to tell it what it's encountering. The callbacks occur immediately, before the parser has even determined that the XML document is completely error free.
The advantage of SAX parsing is the user code can ignore what it does not care about and only keep the data it considers important. Thus it can handle huge XML documents. But the disadvantage is the callbacks occur before it's even known if this document is correct XML. If the goal is to analyze the document then the sax user code will often end up writing ad-hoc DOM structure.
An XML document must be well-formed or technically it is not an XML document. There is no simple definition of well-formed (and readers are invited to read the XML specification at http://www.w3c.org for all the details). Basically though a well-formed document follows these rules:
There is one top-level element (and no top-level characters other than whitespace, except Misc in the Prolog). Here are some well-formed XML documents based on that criterion:
document 1:
<foo/>
document 2:
<foo></foo>
document 3:
<foo> hello <bar>
</bar> hello
<baz/> hello
</foo>
and some not well-formed ones:
document 4:
<foo/>
<bar/>
document 5:
<foo/>
hello
document 6:
no elements here.
Every start tag has an end tag. So this document is well-formed
<foo> </foo>
and this one isn't:
<foo>
Elements are nested correctly. This document is well-formed:
<foo> hello <bar/> <baz> hello </baz> </foo>
and this one isn't:
<foo> hello <bar> hello </foo> hello </bar>
The ACL Sax parser will signal an error if it detects that the document is not well-formed.
A well-formed document can also be valid. A valid document contains or references a DTD (document type description) and obeys that DTD.
A DTD contains two things:
Declarations that constrain what can be used in the XML document. For example the DTD can specify that within the body of the 'name' element you will have in sequence a 'lastname' element and a 'firstname' element and nothing else.
Definitions of entities (these are text substitution macros). Even documents that are not intended to be valid will include a DTD just for the purpose of declaring entities.
The ACL Sax parser will test a document for validity only if the :validate argument is given as true (the :validate argument defaults to false). The Sax parser takes longer to parse if it must validate as well. Even if the paser is not validating it may detect problems in the document for which it would have signaled an error if it were validating. In this case the parser will issue a warning. You can surpress those warnings by passing nil
to the :warn argument (the default for :warn is true).
The parser collects all the DTD information about the document and stores it in the parser object that's passed to all the callback functions. You can use the accessors shown below to retrieve information about the DTD.
There are two predefined classes that you can pass to the :class argument of the sax-parse functions.
The class sax-parser
defines callback functions that do nothing except for compute-external-address and compute-external-format, which do the work necessary to ensure that the parse will be able to handle external references.
In the example code shown below we'll assume that we've created our own subclass of sax-parser
called my-sax-parser
. We do this by evaluating:
(defclass my-sax-parser (sax-parser)
((private :initform nil :accessor private)))
The class test-sax-parser
defines the callback methods to bring the values of their arguments. This allows you to see how the sax parser would treat an xml document. See Testing the sax parser: the test-sax-parser class for more information on this class.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser)
User should define their own method on their subclass of sax-parser
. start-document is called just before the parser begins parsing its input. This function can be used to initialize values in the instance of the parser object. The default method returns nil
.
This callback is a good place to do initialization of the data structures you will be using during the parse, as we do with the following method (we assume private and make-private-object are elsewhere defined):
(defmethod start-document ((parser my-sax-parser))
(setf (private parser) (make-private-object)))
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser)
User should define their own method on their subclass of sax-parser
. end-document is called after the parse is compete. end-document will only be called if the document is well formed, and in the case where the parser was called with :validate t
then end-document will only be called if the document is also valid. The default method returns nil
.
(defmethod end-document ((parser my-sax-parser))
(finalize-parse (private parser)))
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) iri localname qname attrs
This method is called when a start element (like :namespace t
, then iri is the iri that denotes the namespace of the start element tag (this is specified by a namespace binding); localname is the part after the colon in the element tag; qname is what was actually seen as the tag (e.g. "rdf:foo"
); and attrs is a list of ("attrname" . "value")
where attrname
can contain colons (e.g. namespace processing has not been done).
If, on the other hand, sax-parse with :namespace nil
, then iri is nil
; localname is the actual element tag (e.g. "rdf:foo"
); qname is the same as localname; and attrs is a list of ("attrname" . "value")
where attrname
can contain colons (e.g. namespace processing has not been done).
Given this xml source:
<foo xmlns="urn:defnamespace"
xmlns:pack="http://mydef.com/pack">
<bar/>
<pack:baz/>
</foo>
If the parser is called with :namespace t
then during the parse three calls start-element are made with the arguments to the calls being:
iri="urn:defnamespace", localname="foo", qname="foo"
iri="urn:defnamespace", localname="bar", qname="bar"
iri="http://mydef.com/pack", localname="baz", qname="pack:baz"
If the parse is called with :namespace nil
then again three calls are made to start-element, but this time the arguments are:
iri=nil, localname="foo", qname="foo"
iri=nil, localname="bar", qname="bar"
iri=nil, localname="pack:baz", qname="pack:baz"
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) iri localname qname
This method is called when an end element ( or :namespace t
or :namespace nil
. See start-element for details.
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) prefix iri
This method is called when the parser enters a context where the namespace prefix is mapped to the given iri. A prefix of "" means the default mapping. start-prefix-mapping is called before the start-element call for the element that defines the prefix mapping
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) prefix
This method is called when the parser leaves a context where the prefix mapping applies.
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) target data
This method is called when <?name data>
is seen, with target being "name" and data being "data". If <?name data values>
is seen then target is "name" and data is "data values".
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) content start end ignorable
This method is called when text is seen between elements. Note: for a given string of characters between elements, one or more calls to content or content-character may be made. For example, given the XML fragment
If an application requires access to the entire content string as a single string, the application program must collect the fragments into a contiguous string. The parse-to-lxml function and the DOM module implement normalize options that ensure contiguous string content appears as a single Lisp string.
This is the most common error people make with this sax parser: assuming that all content between the start and end element tags will be passed in exactly one call to the content or content-character. As we said, the content may be provided in more than one call to content and content-character.
content is a character array. start is the index of the first character with content. end is one past the index of the last character with content.
ignorable is true if content is whitespace inside an element not permitting character data. This can only happen when the parser is validating since it is only then that the parser knows from an element's specification whether that element's body can contain non-whitespace characters.
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) character ignorable
This method is called when a single character of text is seen between elements. character is that character.
ignorable is true if content is whitespace inside an element not permitting character data. This can only happen when the parser is validating since it is only then that the parser knows from an element's specification whether that element's body can contain non-whitespace characters.
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) string
This method is called when an XML comment (i.e. <!-- ..... -->
) is seen.
The default method does nothing and returns nil
.
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) system public current-filename
This method is called when the parser has to locate another file in order to continue parsing. It should return a filename to open next. It can return nil
if it cannot compute a name.
system is nil
or a string holding the value after SYSTEM in the xml source. public is nil
or a string holding the value after PUBLIC in the xml source. current-filename is the filename of the file being parsed.
The default method does not handle non-file identifiers such as those beginning with "http:". It merges the pathname of system with the pathname of current-filename, if current-filename is non-nil
, and otherwise returns the value of system. The default method signals an error if system is nil
. Thus, the body of the default method looks like this:
(if* (null system)
then (error "Can't compute external address with no system address"))
(if* current-filename
then (namestring (merge-pathnames (pathname system)
(pathname current-filename)
))
else system))
Generic Function, package: net.xml.sax
Arguments: (parser sax-parser) encoding ef
Given an encoding, this method should return an external format or the name of an external-format. The default method does the following:
(find-external-format
(if* (equalp encoding "shift_jis")
then :shiftjis
elseif (equalp encoding "euc-jp")
then :euc
elseif (equalp encoding "utf-16")
then ef ; already must have the correct ef
else encoding)))
Function, package: net.xml.sax
Arguments: filename &key (namespace t) (external t) (validate nil) (class 'sax-parser) (warn t) comments show-xmlns ` This function parses the file specified by filename. The keyword arguments are:
namespace: if true then treat element tags as namespace:localname.
external: if true then follow references to other files referenced by the file being parsed. This must be true if validate is true.
validate: make sure that the content matches its DTD. If validate is nil
then some validation will still be done but problems will be reported as warnings and not errors. Even if validate is nil
the parser will signal an error if the xml is not well formed.
class: the name of the class of parser to create. Specify 'test-sax-parser
if you just want to experiment and see the parser in action. The value can be a symbol naming a class, a class object or an instance of the class sax-parser
or a subclass of sax-parser
. If an instance is passed then it must be a freshly created one that has never been passed to a sax parser function.
warn: if true (the default), the parser will emit warnings about things it finds unusual but not illegal.
comments: if true (the default is nil
) then call (comment parser string)
for comments seen in the xml.
show-xmlns: if namespace is true, then show-xmlns controls whether the xmlns and xmlns:xxx attributes are included in the list of attributes for an element in the start-element callback. A true value for show-xmlns means include the xmlns attributes.
If namespace is nil
then the xmlns attributes are always included with the list of attributes (since in this case there is nothing special about them).
Function, package: net.xml.sax
Arguments:
This function is like sax-parse-file but parses the data from stream, which must be an open stream. stream is closed when sax-parse-stream returns.
Function, package: net.xml.sax
Arguments:
This function is like sax-parse-file but parses the data from the string argument, which should be a string.
Note that the string should not begin with a byte-order-marker (BOM) character and the XML form parsed should not contain a conflicting encoding declaration as a string-input stream does not have an associated external format. Since the string contents consists of characters already, any transformations from octets or interpretation of octets as meta-data is assumed to have been done when the character content of the string was created.
Function, package: net.xml.sax
Arguments: parser flag-name
The parser flags are initially set from the values supplied (or defaulted) to the sax-parse-xxxx functions (sax-parse-file, sax-parse-stream, and sax-parse-string). You can use sax-parser-flag to read the current value of the flag. You can use (setf sax-parser-flag) to set certain flags.
Some flags should not be modified after the parse has begun.
parser is an instance of the sax-parser
class (or a subclass of sax-parser
). flag-name is one of the values from the table below. sax-parser-flag returns t
or nil
as the flag is or is not set. When you use setf with this function to modify a flag, specify a non-nil
value to set a flag and nil
to unset it.
The table below lists flags; writeable means that user code can change the value during the parse. Setting a flag denoted as not writeable will result in undefined behavior.
Flag name | Writeable | Meaning |
:namespace
|
no | obey the xml namespace rules |
:external
|
no | read external entities |
:validate
|
no | do validation |
:warn
|
yes | issue warnings for items that may signify problems in the xml but which aren't actual errors. |
:show-xmlns
|
yes | add xmlns attributes to start-element attribute lists |
:comments
|
yes | call the comment generic function when comments are seen. |
The parser first parses the DTD and then the content of the file. The information found in the DTD is stored in the parser object where it is referenced by the parser during the parse.
An xml document need not have a DTD. However if you tell the parser to validate a document then the document must have a DTD.
When the first start-element callback is made the whole DTD has been parsed and the information is stored in the parser object.
After the parse completes the DTD information is still stored in the parser object.
The following accessors retrieve DTD information from the parser object.
Function, package: net.xml.sax
Arguments: parser
Returns a string naming the root element. Every xml file contains exactly one element at 'top level' and may contain other elements inside that root element.
Function, package: net.xml.sax
Arguments: parser
Returns a hash table where the key is the general entity name and the value is an entity object.
Function, package: net.xml.sax
Arguments: parser
Returns a hash table where the key is the parameter entity name and the value is an entity object.
Function, package: net.xml.sax
Arguments: parser
Returns a hash table where the key is the notation name and the value is a notation object.
Function, package: net.xml.sax
Arguments: parser
Returns a string naming the entity.
Function, package: net.xml.sax
Arguments: entity
Returns nil
or a string holding the replacement text for the entity. If the entity is internal then this field will be a string.
Function, package: net.xml.sax
Arguments: entity
Returns nil
for internal entities. For external entities this is a string describing the location of the entity's value (the string is often a location on the filesystem relative to the file that references it).
Function, package: net.xml.sax
Arguments: entity
Returns nil
or a string. For certain external entities that have public identifiers, this is that public identifier.
Function, package: net.xml.sax
Arguments: entity
Returns nil
or a string. If this is an external unparsed entity then this is the name of a notation that describes its format.
Function, package: net.xml.sax
Arguments: entity
Returns true if this entity was defined in the 'external subset' which is a term referring to files other than the main file being parsed.
Function, package: net.xml.sax
Arguments: notation
Returns a string naming the notation.
Function, package: net.xml.sax
Arguments: notation
Returns nil
or a string naming the public identifier for this notation.
Function, package: net.xml.sax
Arguments: notation
Returns a string naming the location of a description of the notation.
Function, package: net.xml.sax
Arguments: attribute
returns a string naming the attribute.
Function, package: net.xml.sax
Arguments: attribute
Returns the type of the attribute, which is one of:
:cdata
:id
:idref
:idrefs
:entity
:entities
:nmtoken
:nmtokens
(:notation "name" ...)
(:enum "name" ....)
Function, package: net.xml.sax
Arguments: attribute
The value returned is one of: :required
, :implied
, ``(:fixed value), (:value value)
.
Function, package: net.xml.sax
Arguments: attribute
returns true if the attribute was declared in the external subset.
Function, package: net.xml.sax
Arguments: element
returns a string naming the element.
Function, package: net.xml.sax
Arguments: element
Returns a list of attribute objects describing the attributes of this element.
Function, package: net.xml.sax
Arguments: element
A description of the specification of the body of the element. The format is:
spec := :empty
:any
cp
(:mixed ["name" ...])
cp := (:cp cho/seq modifier)
cho/seq :=
(:choice cp [cp ...])
(:sequence cp [cp ...])
"name"
modifier :=
nil
"*"
"?"
"+"
Function, package: net.xml.sax
Arguments: element
Returns true if the element was defined in the external subset.
If you wish to test the sax-parser, we have defined several example classes. The class test-sax-parser
and its associated methods are already defined in the system (after the sax module is loaded). The class sax-count-parser
, defined below in this section, is not defined in the sax module but the definition code can be copied from this document.
The examples in this section assume that the SAX module has been loaded and the relevant package (net.xml.sax
) has been used. If you do not want to use the package, package-qualify the relevant symbols. The following forms load the module and use the package:
require :sax)
(use-package :net.xml.sax) (
Here are the definitions of the class test-sax-parser
and the associated methods (again, these definitions are included in the sax module so they need not be defined again). The methods on test-sax-parser
print the arguments to the callbacks.
;; This is the definition of this class:
defclass test-sax-parser (sax-parser)
(
())
defmethod start-document ((parser test-sax-parser))
(format t "sax callback: Start Document~%"))
(
defmethod end-document ((parser test-sax-parser))
(format t "sax callback: End Document~%"))
(
defmethod start-element ((parser test-sax-parser) iri localname qname attrs)
(format t "sax callback: start element ~s (iri: ~s) (qname: ~s) attrs: ~s~%"
(
localname iri qname attrs)nil)
defmethod end-element ((parser test-sax-parser) iri localname qname)
(format t "sax callback: end element ~s (iri: ~s) (qname: ~s)~%"
(
localname iri qname)nil)
defmethod start-prefix-mapping ((parser test-sax-parser) prefix iri)
(
format t "sax callback: start-prefix-mapping ~s -> ~s~%" prefix iri)
(nil)
defmethod end-prefix-mapping ((parser test-sax-parser) prefix)
(
format t "sax callback: end-prefix-mapping ~s~%" prefix))
(
defmethod processing-instruction ((parser test-sax-parser) target data)
(format t "sax callback: processing-instruction target: ~s, data: ~s~%"
(
target data);;
nil)
defmethod content ((parser test-sax-parser) content start end ignorable)
(format t "sax callback: ~:[~;ignorable~] content(~s,~s) ~s~%" ignorable
(
start endsubseq content start end))
(nil)
defmethod content-character ((parser test-sax-parser) character ignorable)
(format t "sax callback: ~:[~;ignorable~] content-char ~s~%"
(ignorable character)
nil)
defmethod compute-external-format ((parser test-sax-parser) encoding ef)
(let ((ans (call-next-method)))
(format t "sax callback: compute-external-format of ~s is ~s (current is ~s)~%"
(
encoding ans ef)
ans))
defmethod comment ((parser test-sax-parser) string)
(;;
;; called when <!-- ..... --> is seen
;;
format t "sax callback: comment: ~s~%" string)
(nil)
This is an example of another useful sax-parser subclass. The sax-count-parser
class maintains a count of the elements, attributes and characters in an xml file. This class is not defined when the sax parser is loaded but you can just copy the definition below and load it into Lisp if you wish to try it.
;; definition of a sax parser to count items
defstruct counter
(0)
(elements 0)
(attributes 0))
(characters
defclass sax-count-parser (sax-parser)
(
((counts :initform (make-counter)
:reader counts)))
defmethod start-element ((parser sax-count-parser) iri localname qname
(
attrs)declare (ignore iri localname qname))
(let ((counter (counts parser)))
(incf (counter-elements counter))
(let ((attlen (length attrs)))
(
> attlen 0)
(if* (incf (counter-attributes counter) attlen)))))
then (
defmethod content ((parser sax-count-parser) content start end ignorable)
(declare (ignore content ignorable))
(let ((counter (counts parser)))
(incf (counter-characters counter) (- end start))))
(
defmethod content-character ((parser sax-count-parser) char ignorable)
(declare (ignore char ignorable))
(let ((counter (counts parser)))
(incf (counter-characters counter)))) (
LXML is a list representation of an XML parse tree. The notation was introduced initially with the PXML module (see pxml.html, but note the PXML module is deprecated may may be removed in a release later than 9.0), and is supported for compatibility with existing applications. It is also a convenient representation for moderately sized XML documents.
The representation is made up of lists of LXML tags containing LXML nodes. An LXML node is either a string or a list of an LXML tag followed by an LXML node. An LXML tag is either a symbol or a list of a symbol followed by attribute/value pairs, where the attribute is a symbol and the value is a string. In brief:
LXML-node -> string | (LXML-tag [LXML-node] ... )
LXML-tag -> symbol | (symbol [attr-name attr-value] ... )
And more formally:
- An LXML node may be a string representing textual element content.
- An LXML node may be list representing a named XML element.
- The first element in the list represents the element tag
- If no attributes were present in the element tag,
then the element tag is represented by a Lisp symbol;
the symbol-name of the Lisp symbol is the local name of the tag;
the XML namespace of the tag is represented by the Lisp
home package of the symbol.
- If attributes were present in the element tag,
then the element tag is represented by a list where
the first element is the tag (as above) and the remainder of
the list is a lisp property list where the property keys are
lisp symbols that represent the attribute names and the
property values are strings that represent the property values.
- The remainder of the list is a list of LXML nodes that
represent the content of the XML tag.
- An LXML node may be a list of the form (:comment text-string)
to represent a comment in the XML document.
- An LXML node may be a list of the form (:pi target data)
to represent a processing instruction in the XML document.
Each distinct XML namespace is mapped to a Lisp package. An application may specify the namespace-to-package mapping in full, in part, or not at all. If there is no pre-specified Lisp package for some XML namespace, then the parser creates a new package with a name "pppnn" where "ppp" is a prefix specified by the user and "nn" is an integer that guarantees uniqueness. The default prefix is the symbol-name of :net.xml.namespace.
(ending with a period).
The :sax
module implements the lxml-parser
sub-class of sax-parser
. The methods on this class use the SAX parser to build an LXML data structure from the parsed XML input. (In earlier releases, it was possible to require a module named :sax-lxml
, which would not be included by default in the :sax
module. Now that module is always loaded when the :sax
module is loaded and cannot be required separately.)
Class, package: net.xml.sax
A subclass of sax-parser
. Slots include normalize, default-package, package-prefix, and skip-ignorable. The add-parser-package method is defined.
The initial value of the package slot is :keyword
. The inital value of the normalize slot is nil
.
generic function, package: net.xml.sax
Arguments: lxml-parser
Returns the value of the normalize slot of its argument, which must be an instance of lxml-parser.
If the normalize slot is nil
, string element content may appear as a list of strings. The length of each fragment is determined by the implementation and may vary from one parse to the next.
If the normalize slot is non-nil
, then if an element contains only string content, this content will appear as one contiguous string. This option will naturally require the parser to do more consing during the parse.
Here are the possible values of the normalize slot, with their behavior:
nil
: do not combine strings, do not delete anything:trim-simple
: applies only to elements where the only content is strings. Combine adjacent string content into a single string, delete leading and trailing whitespace in the combined string.trim-complex
: applies only to elements that contain other named XML elements. If character content appears between two sub-elements, than combine it into a single string and remove leading and trailing whitespace; if the resulting string is empty, then remove it entirely.:trim-all
: apply both :trim-simple and :trim-complex:trim
: same as :trim-allnil
: only combine adjacent string content into a single string.generic function, package: net.xml.sax
Arguments: parser iri package &rest prefixes
The default method, defined on (lxml-parser t t)
, adds a new iri-to-package mapping to the parser or adds a prefix to an existing mapping.
The iri argument may be a string or a net.uri:uri
instance (see uri.html). The package argument may be a package or the name of a package. The prefixes may be symbols or strings. When the iri argument is a uri instance, it is converted to its string form for use during the parse.
Note that the Allegro CL implementation of uri instances may map many different uri instances to the same string. To avoid possible ambiguities, it is best to specify the iri argument as a string that will be used without any interpretation or change.
To pre-specify namespace-to-package mappings in a program, the application program must call add-parser-package in a start-document method for an application-specific sub-class of lxml-parser.
generic function, package: net.xml.sax
Arguments: lxml-parser
Returns the default package of the lxml-parser instance.
generic function, package: net.xml.sax
Arguments: lxml-parser
Returns the prefix string used to generate package names for packages that represent namespaces that were not specified with add-parser-package. This default value is :net.xml.namespace.
(with a trailing period).
generic function, package: net.xml.sax
Arguments: lxml-parser
Returns whether ignorable text will be skipped for the lxml-parser instance. This default value is nil
.
generic function, package: net.xml.sax
Arguments: lxml-parser
When a parse is complete, this accessor returns the resulting lxml data structure.
Class, package: net.xml.sax
A subclass of lxml-parser. The initial value of the package slot is the value of *package*. The inital value of the normalize slot is t
.
generic function, package: net.xml.sax
Arguments: string-or-stream &key external-callback content-only general-entities parameter-entities uri-to-package package class normalize comments warn
The arguments to this function are like the arguments to net.xml.parser:parse-xml (see pxml.html).The class and methods are included for compatibility with pxml.
The content-only, external-callback, general-entities, and parameter-entities are ignored, silently in the case of content-only, with a warning for the others.
The package keyword argument specifies the Lisp package of XML names without a namespace qualifier. If the argument is omitted or nil
, the initial value in the class is used.
The class keyword argument specifies the class of the parser. The choice of class can affect the default packege and normalize behavior, and many other behaviors. The default is lxml-parser.
The class argument may be the name of a class, a class object, or an instance of a suitable class. If an instance is passed, it must be one that has never been used by the SAX or LXML parser.
The normalize keyword argument specifies the value of the normalize slot in the parser. Values other than nil
or t
must be specifed in the call. It can be one of the following values:
nil
: do not combine strings, do not delete anything:trim-simple
: applies only to elements where the only content is strings. Combine adjacent string content into a single string, delete leading and trailing whitespace in the combined string.trim-complex
: applies only to elements that contain other named XML elements. If character content appears between two sub-elements, than combine it into a single string and remove leading and trailing whitespace; if the resulting string is empty, then remove it entirely.:trim-all
: apply both :trim-simple and :trim-complex:trim
: same as :trim-allnil
: only combine adjacent string content into a single string.Whitespace characters are defined by the parser-char-table in the parser instance. The various trim behaviors are not specified in the XML standard but are often useful when parsing to LXML.
The uri-to-package argument is a list of conses of the form (iri . package)
where iri
may be a string or a uri instance and package
may be a package name or a package instance.
The :comments argument may be nil
or non-nil
. When nil
(the default), XML comments are discarded during the parse. When non-nil
, XML comments are included in the LXML output as expressions of the form (:comment text-string)
.
The :warn argument is propagated to the sax-parse-* function called by parse-to-lxml.
This form is more general than that allowed by the parse-xml function.
This function calls the SAX parser with the following flag values
:namespace t
:show-xmlns t
:comments nil
:validate nil
:external as specified in argument
:warn SAX parser default
If it is necessary to modify the flag settings for a specific application, the following code can be used:
defclass local-lxml (lxml-parser) ())
(
defmethod start-document :before ((p local-lxml))
(setf (sax-parser-flag xxx) yyy))
(
(parse-to-lxml what :class 'local-lxml ...)
Variable, package: net.xml.sax
The lxml-parser instance created in the most recent call to parse-to-lxml.
The :pxml-sax
module implements a partial pxml API to the SAX parser. This module replaces the :pxml
module. It requires the modules :sax
, and :sax-lxml
. Symbols naming operators, variables, etc. in the module are in the :net.xml.parser
package. Load this module with
(require :pxml-sax)
The operators in this module are:
The :pxml-dual
module allows an application to switch at run time between the base implementation of pxml and the partial SAX implementation. It requires the modules :pxml
, :sax
, and :sax-lxml
. Symbols naming operators, variables, etc. in the module are in the :net.xml.parser
package. Load this module with
require :pxml-dual) (
When the module is loaded, the initial setting is to use the SAX parser implementation.
We provide this module to allow mission-critical applications to test both parsers in the same run-time environment. You can switch between the base and the SAX parsers with pxml-version.
The operators in this module are:
:base
.:sax
.In this section, we list the operators and variables associated with the various PXML modules. In many cases, the operators behave differently depending on what module is loaded.
The PXML parser default behavior was to silently ignore external DTDs unless a function was specified for the external-callback argument. The SAX parser default behavior is to signal an error if an external DTD cannot be located. The built-in default function can only locate files in the local file system.
Existing applications that depend on the default external DTD behavior of the PXML parser may break when using the SAX parser through the PXML compatibility package. These application will need to use the SAX parser more explicitly and specify a suitable compute-external-address method.
Function, package: net.xml.parser
Arguments:
In the :pxml-sax module, this function works as described in pxml.html: called with no arguments, this function returns a string naming the PXML version.
Arguments: &optional parser-type
Called with no arguments, this function returns a string naming the PXML version. If parser-type is specified, it should be either :sax
, :base
, or :query
.
When parser-type is :sax
, the SAX version of parse-xml is enabled. When parser-type is :base
, the original version of parse-xml is enabled.
When parser-type is :query
, this function returns :base
or :sax
depending on which version of parse-xml is enabled.
Generic Function, package: net.xml.sax
Arguments: input-source &key external-callback content-only general-entities parameter-entities uri-to-package
The arguments and behavior are fully described in pxml.html. The differences among modules is whether the keyword arguments content-only, external-callback, general-entities, and parameter-entities have effect or are ignored. In the :pxml-sax
module and (thus) in the :pxml-dual
module when in :sax
mode, those arguments are ignored (silently in the case of content-only, with a warning for the others). The implementation of parse-xml in the SAX mode cannot at this time support the use of those arguments, but is much faster than in base mode. All arguments are considered when regular PXML is loaded or the :pxml-dual
module is loaded and is in :base
mode.
When the SAX implementation of parse-xml is used, the uri-to-package argument may be a list of conses of the form (iri . package)
where iri
may be a string or a uri instance and package
may be a package name or a package instance.
This form is more general than the form accepted by the base implementation of parse-xml. An application using the more general form will not be back-compatible with the base implementation of parse-xml.
Macro, package: net.xml.parser
Arguments: &body body
Defined in the :pxml-dual
module only (see The PXML-DUAL Module). Within the body of this macro the implemetation of parse-xml is dynamically bound to the base implementation. See also with-sax-pxml.
Macro, package: net.xml.parser
Arguments: &body body
Defined in the :pxml-dual
module only (see The PXML-DUAL Module). Within the body of this macro the implemetation of parse-xml is dynamically bound to the SAX implementation. See also with-base-pxml.
Copyright (c) Franz Inc. Lafayette, CA., USA. All rights reserved.
|
Allegro CL version 11.0 |