| Allegro CL version 10.1 Unrevised from 10.0 to 10.1. 10.0 version |
This document contains the following sections:
1.0 Sax parser introductionThis document has been revised since its first release. A patch made available around June 7, 2004, updates the sax module to conform to the description in this document.
This utility provides a validating parser for XML 1.0 and XML 1.1. The interface to the parser is based on the SAX (Simple API for XML) specification.
A SAX parser reads the input file and checks it for correctness. While it is parsing the file, it is making callbacks to user code that note what the parser is seeing. If the parser finds an error in the input file it signals an error.
There are two levels of correctness for an xml file: well formed, described in Section 1.4 Well-formed XML documents and valid, described in Section 1.5 Valid XML documents.
When the sax parser is invoked it creates as instance of the class
sax-parser
(or a subclass of
sax-parser
). This instance holds the data
accumulated during the parse and this instance is also used to
discriminate on the method to call when callbacks are done. A user of
the parser will usually subclass sax-parser
and
then write methods on this subclass for those callbacks that he wishes
to handle. The sax-parser
class has a set of
methods for the callbacks so the user need only write those methods
whose behavior he wishes to change.
All symbols in this module are exported from the
net.xml.sax
package. The module is named
sax
. You load the
sax
module with a form like:
(require :sax)
See also dom.htm, which describes Document Object Model support in Allegro CL.
XML (the Extensible Markup Language) is a language for writing structured documents. An XML document contains characters and elements. An element looks like
<name att1="value1" att2="value2"> body content here </name>
or
<name att1="value1" att2="value2"/>
The elements are used to assign a meaning to the text between the start and end tags of the elements. For example
<name> <lastname>Smith</lastname> <firstname>John</firstname> </name>
The designers of XML intended to write a clear concise specification of a structured document language. While they did not achieve that very ambitious goal, XML has nevertheless become very popular, in large part because of the popularity of the world wide web and the HTML language on which it is written. XML is very similar to HTML.
There are two versions of XML (1.0 and 1.1) and they differ in the characters they permit inside documents. In XML 1.0 the XML designers decided what characters were permitted and declared that all other characters were forbidden. In XML 1.1 the XML designers decided which characters were forbidden and any characters not forbidden were permitted. Far more characters are permitted in XML 1.1 than XML 1.0. All XML 1.0 documents are XML 1.1 documents but not the other way around.
The way that the two versions of XML documents are distinguished is by what appears at the beginning of the document. An XML 1.1 document always begins
<?xml version="1.1"?>
and this form may also include encoding and standalone attributes.
An XML 1.0 document begins with
<?xml version="1.0"?>
or begins with no <?xml..?>
form at all.
There are two popular models for parsing an XML document, DOM and SAX:
The parser reads the whole XML document and returns a object which represents the whole XML document. A program can query this object and the objects this object points to in order to find out what is in the XML document.
The advantage of DOM parsing is the XML document is now in a form that's easily studied and manipulated by a program. It has the disadvantage that there is a limit to the size of the XML document you can parse this way since the whole document represented by objects must fit in the address space of the program.
While the parser is reading the XML document it is calling back to user code to tell it what it's encountering. The callbacks occur immediately, before the parser has even determined that the XML document is completely error free.
The advantage of SAX parsing is the user code can ignore what it does not care about and only keep the data it considers important. Thus it can handle huge XML documents. But the disadvantage is the callbacks occur before it's even known if this document is correct XML. If the goal is to analyze the document then the sax user code will often end up writing ad-hoc DOM structure.
An XML document must be well-formed or technically it is not an XML document. There is no simple definition of well-formed (and readers are invited to read the XML specification at http://www.w3c.org for all the details). Basically though a well-formed document follows these rules:
document 1: <foo/> document 2: <foo></foo> document 3: <foo> hello <bar> </bar> hello <baz/> hello </foo>
and some not well-formed ones:
document 4: <foo/> <bar/> document 5: <foo/> hello document 6: no elements here.
<foo> </foo>
and this one isn't:
<foo>
<foo> hello <bar/> <baz> hello </baz> </foo>
and this one isn't:
<foo> hello <bar> hello </foo> hello </bar>
The ACL Sax parser will signal an error if it detects that the document is not well-formed.
A well-formed document can also be valid. A valid document contains or references a DTD (document type description) and obeys that DTD.
A DTD contains two things:
The ACL Sax parser will test a document for validity only if the
:validate argument is given as true (the
:validate argument defaults to false). The Sax
parser takes longer to parse if it must validate as well. Even if the
paser is not validating it may detect problems in the document for
which it would have signaled an error if it were validating. In this
case the parser will issue a warning. You can surpress those warnings
by passing nil
to the
:warn argument (the default for
:warn is true).
The parser collects all the DTD information about the document and stores it in the parser object that's passed to all the callback functions. You can use the accessors shown below to retrieve information about the DTD.
There are two predefined classes that you can pass to the :class argument of the sax-parse functions.
The class sax-parser
defines callback functions
that do nothing except for compute-external-address and compute-external-format, which do the work
necessary to ensure that the parse will be able to handle external
references.
In the example code shown below we'll assume that we've created our own
subclass of sax-parser
called
my-sax-parser
. We do this by evaluating:
(defclass my-sax-parser (sax-parser) ((private :initform nil :accessor private)))
The class test-sax-parser
defines the callback
methods to bring the values of their arguments. This allows you to see
how the sax parser would treat an xml document. See
Section 3.0 Testing the sax parser: the test-sax-parser class for more information
on this class.
Arguments: (parser sax-parser)
User should define their own method on their subclass of
sax-parser
. start-document is called just before
the parser begins parsing its input. This function can be used to
initialize values in the instance of the parser object. The default method
returns nil
.
This callback is a good place to do initialization of the data structures you will be using during the parse, as we do with the following method (we assume private and make-private-object are elsewhere defined):
(defmethod start-document ((parser my-sax-parser)) (setf (private parser) (make-private-object)))
Arguments: (parser sax-parser)
User should define their own method on their subclass of
sax-parser
. end-document is called after the parse
is compete. end-document will only be called if the
document is well formed, and in the case where the parser was called
with :validate t
then end-document will only be called if the
document is also valid. The default method returns nil
.
(defmethod end-document ((parser my-sax-parser)) (finalize-parse (private parser)))
Arguments: (parser sax-parser) iri localname qname attrs
This method is called when a start element (like <foo> or
<foo/>) is seen. If you sax-parse with :namespace
t
, then iri is the iri that denotes the
namespace of the start element tag (this is specified by a namespace
binding); localname is the part after the colon
in the element tag; qname is what was actually
seen as the tag (e.g. "rdf:foo"
); and
attrs is a list of ("attrname"
. "value")
where attrname
can contain
colons (e.g. namespace processing has not been done).
If, on the other hand, sax-parse with :namespace
nil
, then iri is nil
; localname is the actual
element tag (e.g. "rdf:foo"
);
qname is the same as
localname; and attrs is a
list of ("attrname" . "value")
where
attrname
can contain colons (e.g. namespace
processing has not been done).
Given this xml source:
<foo xmlns="urn:defnamespace" xmlns:pack="http://mydef.com/pack"> <bar/> <pack:baz/> </foo>
If the parser is called with :namespace t
then
during the parse three calls start-element are made with the
arguments to the calls being:
iri="urn:defnamespace", localname="foo", qname="foo" iri="urn:defnamespace", localname="bar", qname="bar" iri="http://mydef.com/pack", localname="baz", qname="pack:baz"
If the parse is called with :namespace nil
then
again three calls are made to start-element, but this time the
arguments are:
iri=nil, localname="foo", qname="foo" iri=nil, localname="bar", qname="bar" iri=nil, localname="pack:baz", qname="pack:baz"
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) iri localname qname
This method is called when an end element (</foo> or
<foo/>) is seen. As with start-element, the values of the iri,
localname, and qname arguments depend on whether sax-parse is called
with :namespace t
or :namespace
nil
. See start-element for details.
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) prefix iri
This method is called when the parser enters a context where the namespace prefix is mapped to the given iri. A prefix of "" means the default mapping. start-prefix-mapping is called before the start-element call for the element that defines the prefix mapping
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) prefix
This method is called when the parser leaves a context where the prefix mapping applies.
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) target data
This method is called when <?name data>
is
seen, with target being "name" and
data being "data". If <?name data
values>
is seen then target is
"name" and data is "data values".
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) content start end ignorable
This method is called when text is seen between elements. Note: for a given string of characters between elements, one or more calls to content or content-character may be made. For example, given the XML fragment <tag>abcdefghijkl</tagg>, the content method may be called once with a string argument of "abcdefghijkl", or twice with string arguments "abcd" and then "efghijkl", etc for all the other permutations.
If an application requires access to the entire content string as a single string, the application program must collect the fragments into a contiguous string. The parse-to-lxml function and the DOM module implement normalize options that ensure contiguous string content appears as a single Lisp string.
This is the most common error people make with this sax parser: assuming that all content between the start and end element tags will be passed in exactly one call to the content or content-character. As we said, the content may be provided in more than one call to content and content-character.
content is a character array. start is the index of the first character with content. end is one past the index of the last character with content.
ignorable is true if content is whitespace inside an element not permitting character data. This can only happen when the parser is validating since it is only then that the parser knows from an element's specification whether that element's body can contain non-whitespace characters.
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) character ignorable
This method is called when a single character of text is seen between elements. character is that character.
ignorable is true if content is whitespace inside an element not permitting character data. This can only happen when the parser is validating since it is only then that the parser knows from an element's specification whether that element's body can contain non-whitespace characters.
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) string
This method is called when an XML comment (i.e. <!--
..... -->
) is seen.
The default method does nothing and returns nil
.
Arguments: (parser sax-parser) system public current-filename
This method is called when the parser has to locate another file in
order to continue parsing. It should return a filename to open
next. It can return nil
if it cannot compute
a name.
system is nil
or a
string holding the value after SYSTEM in the xml source.
public is nil
or a
string holding the value after PUBLIC in the xml source.
current-filename is the filename of the file
being parsed.
The default method does not handle non-file identifiers such as those
beginning with "http:". It merges the pathname of
system with the pathname of
current-filename, if
current-filename is non-nil
, and otherwise returns the value of
system. The default method signals an error if
system is nil
. Thus,
the body of the default method looks like this:
(if* (null system) then (error "Can't compute external address with no system address")) (if* current-filename then (namestring (merge-pathnames (pathname system) (pathname current-filename) )) else system))
Arguments: (parser sax-parser) encoding ef
Given an encoding, this method should return an external format or the name of an external-format. The default method does the following:
(find-external-format (if* (equalp encoding "shift_jis") then :shiftjis elseif (equalp encoding "euc-jp") then :euc elseif (equalp encoding "utf-16") then ef ; already must have the correct ef else encoding)))
Arguments: filename &key (namespace t) (external t) (validate nil) (class (quote sax-parser)) (warn t) comments show-xmlns
This function parses the file specified by filename. The keyword arguments are:
nil
then some validation will still be done but problems will be reported
as warnings and not errors. Even if validate is
nil
the parser will signal an error if the
xml is not well formed.
test-sax-parser
if you just want
to experiment and see the parser in action. The value can be a symbol
naming a class, a class object or an instance of the class
sax-parser
or a subclass of
sax-parser
. If an instance is passed then it must
be a freshly created one that has never been passed to a sax parser
function.
nil
) then call (comment parser
string)
for comments seen in the xml.
If namespace is nil
then
the xmlns attributes are always included with the list of attributes
(since in this case there is nothing special about them).
Arguments: stream &key (namespace t) (external t) (validate nil) (class (quote sax-parser)) (warn t) comments show-xmlns
This function is like sax-parse-file but parses the data from stream, which must be an open stream. stream is closed when sax-parse-stream returns.
Arguments: string &key (namespace t) (external t) (validate nil) (class (quote sax-parser)) (warn t) comments show-xmlns
This function is like sax-parse-file but parses the data from the string argument, which should be a string.
Note that the string should not begin with a byte-order-marker (BOM) character and the XML form parsed should not contain a conflicting encoding declaration as a string-input stream does not have an associated external format. Since the string contents consists of characters already, any transformations from octets or interpretation of octets as meta-data is assumed to have been done when the character content of the string was created.
Arguments: parser flag-name
The parser flags are initially set from the values supplied (or defaulted) to the sax-parse-xxxx functions (sax-parse-file, sax-parse-stream, and sax-parse-string). You can use sax-parser-flag to read the current value of the flag. You can use (setf sax-parser-flag) to set certain flags. Some flags should not be modified after the parse has begun.
parser is an instance of the sax-parser
class (or a
subclass of
sax-parser
). flag-name is one
of the values from the table below. sax-parser-flag
returns t
or nil
as
the flag is or is not set. When you use setf with this function to modify a flag,
specify a non-nil
value to set a flag and
nil
to unset it.
The table below lists flags; writeable means that user code can change the value during the parse. Setting a flag denoted as not writeable will result in undefined behavior.
Flag name | Writeable | Meaning |
:namespace
|
no | obey the xml namespace rules |
:external
|
no | read external entities |
:validate
|
no | do validation |
:warn
|
yes | issue warnings for items that may signify problems in the xml but which aren't actual errors. |
:show-xmlns
|
yes | add xmlns attributes to start-element attribute lists |
:comments
|
yes | call the comment generic function when comments are seen. |
The parser first parses the DTD and then the content of the file. The information found in the DTD is stored in the parser object where it is referenced by the parser during the parse.
An xml document need not have a DTD. However if you tell the parser to validate a document then the document must have a DTD.
When the first start-element callback is made the whole DTD has been parsed and the information is stored in the parser object.
After the parse completes the DTD information is still stored in the parser object.
The following accessors retrieve DTD information from the parser object.
Arguments: parser
Returns a string naming the root element. Every xml file contains exactly one element at 'top level' and may contain other elements inside that root element.
Arguments: parser
Returns a hash table where the key is the general entity name and the value is an entity object.
Arguments: parser
Returns a hash table where the key is the parameter entity name and the value is an entity object.
Arguments: parser
Returns a hash table where the key is the notation name and the value is a notation object.
Arguments: entity
Returns a string naming the entity.
Arguments: entity
Returns nil
or a string holding the
replacement text for the entity. If the entity is internal then this
field will be a string.
Arguments: entity
Returns nil
for internal entities. For
external entities this is a string describing the location of the
entity's value (the string is often a location on the filesystem
relative to the file that references it).
Arguments: entity
Returns nil
or a string. For certain external
entities that have public identifiers, this is that public
identifier.
Arguments: entity
Returns nil
or a string. If this is an
external unparsed entity then this is the name of a notation that
describes its format.
Arguments: entity
Returns true if this entity was defined in the 'external subset' which is a term referring to files other than the main file being parsed.
Arguments: notation
Returns a string naming the notation.
Arguments: notation
Returns nil
or a string naming the public
identifier for this notation.
Arguments: notation
Returns a string naming the location of a description of the notation.
Arguments: attribute
returns a string naming the attribute.
Arguments: attribute
Returns the type of the attribute, which is one of:
:cdata
:id
:idref
:idrefs
:entity
:entities
:nmtoken
:nmtokens
(:notation "name" ...)
(:enum "name" ....)
Arguments: attribute
The value returned is one of: :required
,
:implied
, (:fixed
value)
, (:value value)
.
Arguments: attribute
returns true if the attribute was declared in the external subset.
Arguments: element
returns a string naming the element.
Arguments: element
Returns a list of attribute objects describing the attributes of this element.
Arguments: element
A description of the specification of the body of the element. The format is:
spec := :empty :any cp (:mixed ["name" ...]) cp := (:cp cho/seq modifier) cho/seq := (:choice cp [cp ...]) (:sequence cp [cp ...]) "name" modifier := nil "*" "?" "+"
Arguments: element
Returns true if the element was defined in the external subset.
If you wish to test the sax-parser, we have defined several example
classes. The class test-sax-parser
and its
associated methods are already defined in the system (after the sax
module is loaded). The class sax-count-parser
,
defined below in this section, is not defined in the sax module but
the definition code can be copied from this document.
The examples in this section assume that the SAX module has been
loaded and the relevant package (net.xml.sax
) has
been used. If you do not want to use the package, package-qualify the
relevant symbols. The following forms load the module and use the
package:
(require :sax) (use-package :net.xml.sax)
Here are the definitions of the class
test-sax-parser
and the associated methods (again,
these definitions are included in the sax module so they need not be
defined again). The methods on test-sax-parser
print the arguments to the callbacks.
This is the definition of this class: (defclass test-sax-parser (sax-parser) ()) (defmethod start-document ((parser test-sax-parser)) (format t "sax callback: Start Document~%")) (defmethod end-document ((parser test-sax-parser)) (format t "sax callback: End Document~%")) (defmethod start-element ((parser test-sax-parser) iri localname qname attrs) (format t "sax callback: start element ~s (iri: ~s) (qname: ~s) attrs: ~s~%" localname iri qname attrs) nil) (defmethod end-element ((parser test-sax-parser) iri localname qname) (format t "sax callback: end element ~s (iri: ~s) (qname: ~s)~%" localname iri qname) nil) (defmethod start-prefix-mapping ((parser test-sax-parser) prefix iri) (format t "sax callback: start-prefix-mapping ~s -> ~s~%" prefix iri) nil ) (defmethod end-prefix-mapping ((parser test-sax-parser) prefix) (format t "sax callback: end-prefix-mapping ~s~%" prefix) ) (defmethod processing-instruction ((parser test-sax-parser) target data) (format t "sax callback: processing-instruction target: ~s, data: ~s~%" target data) ;; nil) (defmethod content ((parser test-sax-parser) content start end ignorable) (format t "sax callback: ~:[~;ignorable~] content(~s,~s) ~s~%" ignorable start end (subseq content start end)) nil) (defmethod content-character ((parser test-sax-parser) character ignorable) (format t "sax callback: ~:[~;ignorable~] content-char ~s~%" ignorable character) nil) (defmethod compute-external-format ((parser test-sax-parser) encoding ef) (let ((ans (call-next-method))) (format t "sax callback: compute-external-format of ~s is ~s (current is ~s)~%" encoding ans ef) ans)) (defmethod comment ((parser test-sax-parser) string) ;; ;; called when <!-- ..... --> is seen ;; (format t "sax callback: comment: ~s~%" string) nil)
This is an example of another useful sax-parser subclass. The
sax-count-parser
class maintains a count of the
elements, attributes and characters in an xml file. This class is not
defined when the sax parser is loaded but you can just copy the
definition below and load it into Lisp if you wish to try it.
; definition of a sax parser to count items (defstruct counter (elements 0) (attributes 0) (characters 0)) (defclass sax-count-parser (sax-parser) ((counts :initform (make-counter) :reader counts))) (defmethod start-element ((parser sax-count-parser) iri localname qname attrs) (declare (ignore iri localname qname)) (let ((counter (counts parser))) (incf (counter-elements counter)) (let ((attlen (length attrs))) (if* (> attlen 0) then (incf (counter-attributes counter) attlen))))) (defmethod content ((parser sax-count-parser) content start end ignorable) (declare (ignore content ignorable)) (let ((counter (counts parser))) (incf (counter-characters counter) (- end start)))) (defmethod content-character ((parser sax-count-parser) char ignorable) (declare (ignore char ignorable)) (let ((counter (counts parser))) (incf (counter-characters counter))))
LXML is a list representation of an XML parse tree. The notation was introduced initially with the PXML module (see pxml.htm, but note the PXML module is deprecated may may be removed in a release later than 9.0), and is supported for compatibility with existing applications. It is also a convenient representation for moderately sized XML documents.
The representation is made up of lists of LXML tags containing LXML nodes. An LXML node is either a string or a list of an LXML tag followed by an LXML node. An LXML tag is either a symbol or a list of a symbol followed by attribute/value pairs, where the attribute is a symbol and the value is a string. In brief:
LXML-node -> string | (LXML-tag [LXML-node] ... ) LXML-tag -> symbol | (symbol [attr-name attr-value] ... )
And more formally:
- An LXML node may be a string representing textual element content. - An LXML node may be list representing a named XML element. - The first element in the list represents the element tag - If no attributes were present in the element tag, then the element tag is represented by a Lisp symbol; the symbol-name of the Lisp symbol is the local name of the tag; the XML namespace of the tag is represented by the Lisp home package of the symbol. - If attributes were present in the element tag, then the element tag is represented by a list where the first element is the tag (as above) and the remainder of the list is a lisp property list where the property keys are lisp symbols that represent the attribute names and the property values are strings that represent the property values. - The remainder of the list is a list of LXML nodes that represent the content of the XML tag. - An LXML node may be a list of the form (:comment text-string) to represent a comment in the XML document. - An LXML node may be a list of the form (:pi target data) to represent a processing instruction in the XML document.
Each distinct XML namespace is mapped to a Lisp package. An
application may specify the namespace-to-package mapping in full, in
part, or not at all. If there is no pre-specified Lisp package for
some XML namespace, then the parser creates a new package with a name
"pppnn" where "ppp" is a prefix specified by the user and "nn" is an
integer that guarantees uniqueness. The default prefix is the
symbol-name of :net.xml.namespace.
(ending with a
period).
The :sax
module implements the
lxml-parser
sub-class of
sax-parser
. The methods on this class use the SAX
parser to build an LXML data structure from the parsed XML input. (In
earlier releases, it was possible to require a module named
:sax-lxml
, which would not be included by default
in the :sax
module. Now that module is always
loaded when the :sax
module is loaded and cannot be
required separately.)
A subclass of sax-parser
. Slots include normalize,
default-package, package-prefix, and skip-ignorable. The
add-parser-package method is defined.
The initial value of the package slot is
:keyword
. The inital value of the normalize
slot is nil
.
Arguments: lxml-parser
Returns the value of the normalize slot os its argument, which
must be an instance of lxml-parser
.
If the normalize slot is nil
, string elemnt
content may appear as a list of strings. The length of each fragment
is determined by the implementation and may vary from one parse to the
next.
If the normalize slot is non-nil
, then if an
element contains only string content, this content will appear as one
contiguous string. This option will naturally require the parser to
do more consing during the parse.
Arguments: parser iri package &rest prefixes
The default method, defined on (lxml-parser t t)
,
adds a new iri-to-package mapping to the parser or adds a prefix to an
existing mapping.
The iri argument may be a string or a
net.uri:uri
instance (see
uri.htm). The package argument
may be a package or the name of a package. The
prefixes may be symbols or strings. When the
iri argument is a uri instance, it is converted
to its string form for use during the parse.
Note that the Allegro CL implementation of uri instances may map many different uri instances to the same string. To avoid possible ambiguities, it is best to specify the iri argument as a string that will be used without any interpretation or change.
To pre-specify namespace-to-package mappings in a program, the
application program must call add-parser-package in a start-document method for an
application-specific sub-class of lxml-parser
.
Arguments: lxml-parser
Returns the default package of the lxml-parser
instance.
Arguments: lxml-parser
Returns the prefix string used to generate package names for packages
that represent namespaces that were not specified with add-parser-package. This default value is
:net.xml.namespace.
(with a trailing period).
Arguments: lxml-parser
Returns whether ignorable text will be skipped for the lxml-parser
instance. This default value
is nil
.
Arguments: lxml-parser
When a parse is complete, this accessor returns the resulting lxml data structure.
A subclass of lxml-parser
. The initial value of the
package slot is the value of *package*
. The inital value of the
normalize slot is t
.
Arguments: string-or-stream &key external-callback content-only general-entities parameter-entities uri-to-package package class normalize comments warn
The arguments to this function are like the arguments to net.xml.parser:parse-xml (see pxml.htm).The class and methods are included for compatibility with pxml.
The content-only, external-callback, general-entities, and parameter-entities are ignored, silently in the case of content-only, with a warning for the others.
The package keyword argument specifies the
Lisp package of XML names without a namespace qualifier. If the
argument is omitted or nil
, the initial value
in the class is used.
The class keyword argument specifies the
class of the parser. The choice of class can affect the default
packege and normalize behavior, and many other behaviors. The default
is lxml-parser
.
The class argument may be the name of a class, a class object, or an instance of a suitable class. If an instance is passed, it must be one that has never been used by the SAX or LXML parser.
The normalize keyword argument specifies
the value of the normalize slot in the parser. Values other
than nil
or t
must
be specifed in the call. It can be one of the following values:
nil
: do not combine strings, do not delete
anything
:trim-simple
: applies only to elements where the
only content is strings. Combine adjacent string content into a single
string, delete leading and trailing whitespace in the combined string.
:trim-complex
: applies only to elements that
contain other named XML elements. Combine and delete as in
:trim-simple. If resulting string is the empty string then delete it
entirely.
:trim-all
: apply both :trim-simple and
:trim-complex
:trim
: same as :trim-all
nil
: only combine adjacent string
content into a single string.
Whitespace characters are defined by the parser-char-table in the parser instance. The various trim behaviors are not specified in the XML standard but are often useful when parsing to LXML.
The uri-to-package argument is a list of
conses of the form (iri . package)
where
iri
may be a string or a uri instance and
package
may be a package name or a package
instance.
The :comments argument may
be nil
or non-nil
.
When nil
(the default), XML comments are
discarded during the parse. When non-nil
,
XML comments are included in the LXML output as expressions of the
form (:comment text-string)
.
The :warn argument is propagated to the sax-parse-* function called by parse-to-lxml.
This form is more general than that allowed by the parse-xml function.
This function calls the SAX parser with the following flag values
:namespace t :show-xmlns t :comments nil :validate nil :external as specified in argument :warn SAX parser default
If it is necessary to modify the flag settings for a specific application, the following code can be used:
(defclass local-lxml (lxml-parser) ()) (defmethod start-document :before ((p local-lxml)) (setf (sax-parser-flag xxx) yyy)) (parse-to-lxml what :class 'local-lxml ...)
The lxml-parser
instance created in the most
recent call to parse-to-lxml.
The :pxml-sax
module implements a partial pxml API
to the SAX parser. This module replaces the :pxml
module. It requires the modules
:sax
, and
:sax-lxml
. Symbols naming operators, variables,
etc. in the module are in the :net.xml.parser
package. Load this module with
(require :pxml-sax)
The operators in this module are:
The :pxml-dual
module allows an application to
switch at run time between the base implementation of pxml and the
partial SAX implementation. It requires the modules
:pxml
, :sax
, and
:sax-lxml
. Symbols naming operators, variables,
etc. in the module are in the :net.xml.parser
package. Load this module with
(require :pxml-dual)
When the module is loaded, the initial setting is to use the SAX parser implementation.
We provide this module to allow mission-critical applications to test both parsers in the same run-time environment. You can switch between the base and the SAX parsers with pxml-version.
The operators in this module are:
:base
.
:sax
.
In this section, we list the operators and variables associated with the various PXML modules. In many cases, the operators behave differently depending on what module is loaded.
The PXML parser default behavior was to silently ignore external DTDs unless a function was specified for the external-callback argument. The SAX parser default behavior is to signal an error if an external DTD cannot be located. The built-in default function can only locate files in the local file system.
Existing applications that depend on the default external DTD behavior of the PXML parser may break when using the SAX parser through the PXML compatibility package. These application will need to use the SAX parser more explicitly and specify a suitable compute-external-address method.
Arguments:
In the :pxml-sax module, this function works as described in pxml.htm: called with no arguments, this function returns a string naming the PXML version.
Arguments: &optional parser-type
Called with no arguments, this function returns a string naming the
PXML version. If parser-type is specified, it
should be either :sax
, :base
, or
:query
.
When parser-type is :sax
, the
SAX version of parse-xml is enabled. When
parser-type is
:base
, the original version of parse-xml is enabled.
When parser-type is :query
,
this function returns :base
or
:sax
depending on which version of parse-xml is enabled.
Arguments: input-source &key external-callback content-only general-entities parameter-entities uri-to-package
The arguments and behavior are fully described in
pxml.htm. The differences among modules is whether
the keyword arguments content-only,
external-callback,
general-entities, and
parameter-entities have effect or are ignored. In
the :pxml-sax
module and (thus) in the
:pxml-dual
module when in :sax
mode, those arguments are ignored (silently in the case of
content-only, with a warning for the others). The
implementation of parse-xml in the SAX mode cannot at this
time support the use of those arguments, but is much faster than in
base mode. All arguments are considered when regular PXML is loaded or
the :pxml-dual
module is loaded and is in
:base
mode.
When the SAX implementation of parse-xml is used, the
uri-to-package argument may be a list of conses
of the form (iri . package)
where
iri
may be a string or a uri instance and
package
may be a package name or a package
instance.
This form is more general than the form accepted by the base implementation of parse-xml. An application using the more general form will not be back-compatible with the base implementation of parse-xml.
Arguments: &body body
Defined in the :pxml-dual
module only (see
Section 4.4 The PXML-DUAL Module). Within the body of this
macro the implemetation of parse-xml is dynamically bound to the base
implementation. See also with-sax-pxml.
Arguments: &body body
Defined in the :pxml-dual
module only (see
Section 4.4 The PXML-DUAL Module). Within the body of this
macro the implemetation of parse-xml is dynamically bound to the SAX
implementation. See also with-base-pxml.
lxml-parser
*lxml-parser*
pxml-parser
Copyright (c) 1998-2022, Franz Inc. Lafayette, CA., USA. All rights reserved.
This page was not revised from the 10.0 page.
Created 2019.8.20.
| Allegro CL version 10.1 Unrevised from 10.0 to 10.1. 10.0 version |