URI support in Allegro CL

1.0 Introduction
2.0 The URI API definition
3.0 Parsing, escape decoding/encoding and the path
4.0 Interning URIs
5.0 Allegro CL implementation notes
6.0 Examples
Index

1.0 Introduction

URI stands for Universal Resource Indicator. For a description of URIs, see RFC2396, which can be found in several places, including the IETF web site (http://www.ietf.org/rfc/rfc2396.txt) and the UCI/ICS web site (http://www.ics.uci.edu/pub/ietf/uri/rfc2396.txt). We prefer the UCI/ICS one as it has more examples.

URIs are a superset in functionality and syntax to URLs (Universal Resource Locators) and URNs (Universal Resource Names). That is, RFC2396 updates and merges RFC1738 and RFC1808 into a single syntax, called the URI. It does exclude some portions of RFC1738 that define specific syntax of individual URL schemes.

In URL slang, the scheme is usually called the `protocol', but it is called scheme in RFC1738. A URL `host' corresponds to the URI `authority.' The URL slang `bookmark' or `anchor' is `fragment' in URI lingo.

We have added the URI facility as a patch to Allegro CL 5.0.1. This is meant as a preview of some of the technology available in the next major release of Allegro CL. This new facility is supported in ACL 5.0.1, and we encourage ACL customers to use it and to report problems and suggest improvements. Please address all correspondence to support@franz.com.

To get the patch, evaluate (sys:update-allegro) in Allegro CL. This will automatically download and install all patches, including the one that implements the URI facility. Alternatively, you can manually download uri.fasl and uri.txt from the code/ directory of the patch directory (for the appropriate machine) from ftp://ftp.franz.com/pub/patches/5.0.1/.

Once the file uri.fasl is available, then (require :uri) is all that is needed to load it into Allegro CL.

Broadly, the URI facility creates a Lisp object that represents a URI, and provides setters and accessors to fields in the URI object. The URI object can also be interned, much like symbols in CL are. This document describes the facility and the related operators.

Aside from the obvious slots which are called out in the RFC, URIs also have a property list. With interning, this is another similarity between URIs and CL symbols.

2.0 The URI API definition

Symbols naming objects (functions, variables, etc.) in the uri module are exported from the net.uri package.

URIs are represented by CLOS objects. Their slots are:

scheme
host
port
path
query
fragment
plist

The host and port slots together correspond to the authority (see RFC2396). There is an accessor-like function, uri-authority, that can be used to extract the authority from a URI. See the RFC2396 specifications pointed to at the beginning of the introduction for details of all the slots except plist. The plist slot contains a standard Common Lisp property list.

All symbols are external in the net.uri package, unless otherwise noted.

uri [Class]

The class of URI objects.

uri-p [Function]

Arguments: object

This predicate function returns true when object is an instance of the class uri.

copy-uri [Function]

Arguments: uri &key place scheme host port path query fragment plist

Copy the given uri. If place is given, its values should be a uri instance and it will be modified rather than a new one being created. Note: do not destructively modify interned URIs. The scheme, host, port, path, query, fragment and plist keywords are used to initialize those slots of the new (or given) URI instance.

uri-scheme [Method]
uri-host [Method]
uri-port [Method]
uri-path [Method]
uri-query [Method]
uri-fragment [Method]
uri-plist [Method]

Arguments: uri-object

These are slot accessors of the URI object. The uri-host and uri-port make up subparts of the authority, which is accessed via uri-authority. See the RFC2396 specifications pointed to at the beginning of the introduction for details of all the slots except plist. The plist slot contains a standard Common Lisp property list.

uri-authority [Function]

Arguments: uri-object

The combination of the host and port.

render-uri [Function] 

Arguments: uri stream

Print to stream the printed representation of uri. This is how the print-object method for uri calls it:

(defmethod print-object ((uri uri) stream) 

   (if* *print-escape* then 

         (format stream "#<~a ~a>" 'uri (render-uri uri nil)) 

     else (render-uri uri stream))) 

parse-uri [Function]

Arguments: string &key (class 'uri)

Parse string into a URI object. Escaped encodings of the form %<hex><hex> are properly converted into single characters. The class keyword allows creation of subclasses of uri.

merge-uris [Method]

Arguments: uri base-uri &optional place

Return an absolute URI, based on uri, which can be relative, and base-uri which must be absolute. place can be used as storage for the result. Note: bad things will happen if you use an interned URI as the result for merging. The result is not interned.

The rules for merging URIs are not the same as for merging pathnames. A simplified version of the merge rules from RFC2396 are (applied in order):

  • If uri has no scheme, authority or path, then the query and fragment from uri are used, and all other slots are inherited from the base-uri.
  • If uri has a scheme, then uri is returned. If it does not have a scheme, then it inherits one from the base-uri.
  • If uri has a host, the unmodified uri is returned.
  • If the path of uri is absolute (starts with a /), then the unmodified uri is returned.
  • Otherwise, the path of base-uri and uri are concatenated into a uri path, and the new uri is returned.

One comment about error checking of URIs as a result of merging: RFC2396 says that an implementation may handle too many ..'s in a merge result "by retaining these components in the resolved path, by removing them from the resolved path, or by avoiding traversal of the reference." The examples in appendix C of RFC2396 imply that an implementation should retain these invalid elements, so that is what we do. For example,

(merge-uris (parse-uri "../../../../g") (parse-uri "http://a/b/c/d;p?q")) 

should return #<uri "http://a/../../g">, which is clearly a nonsense result, but this is what our implementation returns, instead of signaling an error.

enough-uri [Method]

Arguments: uri base

Like enough-namestring, enough-uri converts uri into a relative URI using base as the base URI. This method is analogous to enough-namestring.

uri-parsed-path [Method]

Arguments: uri

Return the parsed representation of the path. This is setf'able.

For more information on this representation, see below.

3.0 Parsing, escape decoding/encoding and the path

The method uri-path returns the path portion of the URI, in string form. The method uri-parsed-path returns the path portion of the URI, in list form. This list form is discussed below, after a discussion of decoding/encoding.

RFC2396 lays out a method for inserting into URIs reserved characters. You do this by escaping the character. An escaped character is defined like this:

escaped = "%" hex hex 
hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" 

In addition, the RFC defines excluded characters:

"<" | ">" | "#" | "%" | <"> | "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" 

The set of reserved characters are:

";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," 

with the following exceptions:

  • within the authority component, the characters ";", ":", "@", "?", and "/" are reserved.
  • within a path segment, the characters "/", ";", "=", and "?" are reserved.
  • within a query component, the characters ";", "/", "?", ":", "@", "&", "=", "+", ",", and "$" are reserved.

From the RFC, there are two important rules about escaping and unescaping (encoding and decoding):

  • decoding should only happen when the URI is parsed into component parts;
  • encoding can only occur when a URI is made from component parts (ie, rendered for printing).

The implication of this is that to decode the URI, it must be in a parsed state. That is, you can't convert %2f (the escaped form of "/") until the path has been parsed into its component parts. Another important desire is for the application viewing the component parts to see the decoded values of the components. For example, consider:

http://franz.com/calculator/3%2f2 

This might be the implementation of a calculator, and how someone would execute 3/2. Clearly, the application that implements this would want to see path components of "calculator" and "3/2". "3%2f2" would not be useful to the calculator application.

For the reasons given above, a parsed version of the path is available and has the following form:

([:absolute | :relative] component1 [component2...]) 

where components are:

element | (element param1 [param2 ...]) 

and element is a path element, and the param's are path element parameters. For example, the result of

(uri-parsed-path (parse-uri "foo;10/bar:x;y;z/baz.htm")) 

is

(:relative ("foo" "10") ("bar:x" "y" "z") "baz.htm") 

There is a certain amount of canonicalization that occurs when parsing:

  • A path of (:absolute) or (:absolute "") is equivalent to a nil path. That is, http://a/ is parsed with a nil path and printed as http://a.
  • Escaped characters that are not reserved are not escaped upon printing. For example, "foob%61r" is parsed into "foobar" and appears as "foobar" when the URI is printed.

4.0 Interning URIs

This section describes how to intern URIs. Interning is not mandatory. URIs can be used perfectly well without interning them.

Interned URIs in Allegro are like symbols. That is, a string representing a URI, when parsed and interned, will always yield an eq object. For example:

(eq (intern-uri "http://franz.com") (intern-uri "http://franz.com")) 

is always true. (Two strings with identical contents may or may not be eq in Common Lisp, note.)

make-uri-space [Function]

Arguments: &key size

Make a new object to contain interned URIs. The object is a hash-table and so it will grow as needed if size is insufficient, but specifying a correct size in advance improves efficiency.

uri-space [Function]

Arguments: nil

Return the object into which URIs are currently being interned. This is setf'able.

uri= [Function]

Arguments: uri1 uri2

Return true if uri1 and uri2 are equivalent. Of course, if the URIs are interned, then you can use common-lisp:eq to test for equivalence.

intern-uri [Method]

Arguments: (uri-name string) &optional uri-space
Arguments: (uri uri) &optional uri-space

If uri is a string, intern the URI specified by the string uri-name, which is first parsed into a URI object.

If uri is a uri instance,

uri-space can be used to maintain separate spaces for interning of URIs.

unintern-uri [Function]

Arguments: uri &optional uri-space

Unintern the given uri, or all URIs if uri is t. uri-space can be used to maintain separate spaces for interning of URIs.

do-all-uris [Macro]

Arguments: (var &optional uri-space result) &body body

Iterate over all the currently defined URIs, binding var in each successive loop iteration. (unintern-uri t) will remove all URIs and change the URIs seen by this iterator macro. uri-space can be used to maintain separate spaces for interning of URIs.

5.0 Allegro CL implementation notes

  1. The following are true:
    (uri= (parse-uri "http://franz.com/")
          (parse-uri "http://franz.com"))
    (eq (intern-uri "http://franz.com/")
        (intern-uri "http://franz.com"))
  2. The following is true:
    (eq (intern-uri "http://franz.com:80/foo/bar.htm")
        (intern-uri "http://franz.com/foo/bar.htm"))


    (I.e. specifying the default port is the same as specifying no port at all. This is specific in RFC2396.)
  3. The scheme and authority are case-insensitive. In Allegro CL, the scheme is a keyword that appears in the normal case for the Lisp in which you are executing.
  4. #u"..." is shorthand for (parse-uri "..."). If an existing #u dispatch macro definition exists, it will not be overridden.
  5. The interaction between setting the scheme, host, port, path, query, and fragment slots of URI objects, in conjunction with interning URIs will have very bad and unpredictable results.
  6. The printable representation of URIs is cached, for efficiency. This caching is undone when the above slots are changed. That is, when you create a URI the printed representation is cached. When you change one of the above mentioned slots, the printed representation is cleared and calculated when the URI is next printed. For example:
user(10): (setq u #u"http://foo.bar.com/foo/bar") 

#<uri http://foo.bar.com/foo/bar> 

user(11): (setf (net.uri:uri-host u) "foo.com") 

"foo.com" 

user(12): u 

#<uri http://foo.com/foo/bar> 

user(13): 

This allows URIs behavior to follow the principle of least surprise.

6.0 Examples

uri(10): (use-package :net.uri)

t

uri(11): (parse-uri "foo")

#<uri foo>

uri(12): #u"foo"

#<uri foo>

uri(13): (setq base (intern-uri "http://franz.com/foo/bar/"))

#<uri http://franz.com/foo/bar/>

uri(14): (merge-uris (parse-uri "foo.htm") base)

#<uri http://franz.com/foo/bar/foo.htm>

uri(15): (merge-uris (parse-uri "?foo") base)

#<uri http://franz.com/foo/bar/?foo>

uri(16): (setq base (intern-uri "http://franz.com/foo/bar/baz.htm"))

#<uri http://franz.com/foo/bar/baz.htm>

uri(17): (merge-uris (parse-uri "foo.htm") base)

#<uri http://franz.com/foo/bar/foo.htm>

uri(18): (merge-uris #u"?foo" base)

#<uri http://franz.com/foo/bar/?foo>

uri(19): (describe #u"http://franz.com")

#<uri http://franz.com> is an instance of #<standard-class net.uri:uri>:

 The following slots have :instance allocation:

  scheme        :http

  host          "franz.com"

  port          nil

  path          nil

  query         nil

  fragment      nil

  plist         nil

  escaped       nil

  string        "http://franz.com"

  parsed-path   nil

  hashcode      nil

uri(20): (describe #u"http://franz.com/")

#<uri http://franz.com> is an instance of #<standard-class net.uri:uri>:

 The following slots have :instance allocation:

  scheme        :http

  host          "franz.com"

  port          nil

  path          nil

  query         nil

  fragment      nil

  plist         nil

  escaped       nil

  string        "http://franz.com"

  parsed-path   nil

  hashcode      nil

uri(21): #u"foobar#baz%23xxx"

#<uri foobar#baz#xxx>

Index

copy-uri [Function]

do-all-uris [Macro]

enough-uri [Method]

intern-uri [Method]

make-uri-space [Function]
merge-uris [Method]

parse-uri [Function]

render-uri [Function] 

unintern-uri [Function]
uri [Class]
uri-authority [Function]
uri-fragment [Method]
uri-host [Method]
uri-p [Function]
fun-uri-parsed-path [Method]
uri-path [Method]
uri-plist [Method]
uri-port [Method]
uri-query [Method]
uri-scheme [Method]
uri-space [Function]
uri= [Function]


Last updated on March 29, 2000 09:41:30 Pacific Standard Time

Copyright © 2014 Franz Inc., All Rights Reserved | Privacy Statement
Delicious Google Buzz Twitter Google+