This document contains the following sections:
1.0 Introduction
2.0 The URI API definition
3.0 Parsing, escape decoding/encoding and the path
4.0 Interning URIs
5.0 Allegro CL implementation notes
6.0 Examples
This version of the Allegro CL URI support documentation is for distribution with the Open Source version of the URI code. Links to Allegro CL documentation other than URI-specific files have been supressed. To see Allegro CL documentation, see http://www.franz.com/support/documentation/, which is the Allegro CL documentation page of the franz inc. website. Links to Allegro CL documentation can be found on that page.
URI stands for Universal Resource Identifier. For a description of URIs, see RFC2396, which can be found in several places, including the IETF web site (http://www.ietf.org/rfc/rfc2396.txt) and the UCI/ICS web site (http://www.ics.uci.edu/pub/ietf/uri/rfc2396.txt). We prefer the UCI/ICS one as it has more examples.
URIs are a superset in functionality and syntax to URLs (Universal Resource Locators) and URNs (Universal Resource Names). That is, RFC2396 updates and merges RFC1738 and RFC1808 into a single syntax, called the URI. It does exclude some portions of RFC1738 that define specific syntax of individual URL schemes.
In URL slang, the scheme is usually called the `protocol', but it is called scheme in RFC1738. A URL `host' corresponds to the URI `authority.' The URL slang `bookmark' or `anchor' is `fragment' in URI lingo.
The URI facility was available as a patch to Allegro CL 5.0.1 and is included with
release 6.0. the URI facility might not be in an Allegro CL image. Evaluate (require
:uri)
to ensure the facility is loaded (that form returns nil
if the
URI module is already loaded).
Broadly, the URI facility creates a Lisp object that represents a URI, and provides setters and accessors to fields in the URI object. The URI object can also be interned, much like symbols in CL are. This document describes the facility and the related operators.
Aside from the obvious slots which are called out in the RFC, URIs also have a property list. With interning, this is another similarity between URIs and CL symbols.
Symbols naming objects (functions, variables, etc.) in the uri module are
exported from the net.uri
package.
URIs are represented by CLOS objects. Their slots are:
scheme host port path query fragment plist
The host
and port
slots together correspond to the authority
(see RFC2396). There is an accessor-like function, uri-authority,
that can be used to extract the authority from a URI. See the RFC2396 specifications
pointed to at the beginning of the 1.0 Introduction for details
of all the slots except plist
. The plist
slot contains a
standard Common Lisp property list.
All symbols are external in the net.uri
package, unless otherwise noted.
Brief descriptions are given in this document, with complete descriptions in the
individual pages.
uri
: the class of URI objects. urn
: the class of URN objects. Arguments: object
Returns true if object is an instance of class uri
.
Arguments: uri &key place scheme host port path query fragment plist
Copies the specified URI object. See the description page for information on the keyword arguments.
Arguments: uri-object
These accessors return the value of the associated slots of the uri-object
Arguments: uri-object
Returns the authority of uri-object. The authority combines the host and port.
Arguments: uri stream
Print to stream the printed representation of uri.
Arguments: string &key (class 'uri)
Parse string into a URI object.
Arguments: uri base-uri &optional place
Return an absolute URI, based on uri, which can be relative, and base-uri which must be absolute.
Arguments: uri base
Converts uri into a relative URI using base as the base URI.
Arguments: uri
Return the parsed representation of the path.
Arguments: object
Defined methods: if argument is a uri object, return it; create a uri object if possible and return it, or error if not possible.
The method uri-path returns the path portion of the URI, in string form. The method uri-parsed-path returns the path portion of the URI, in list form. This list form is discussed below, after a discussion of decoding/encoding.
RFC2396 lays out a method for inserting into URIs reserved characters. You do this by escaping the character. An escaped character is defined like this:
escaped = "%" hex hex hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f"
In addition, the RFC defines excluded characters:
"<" | ">" | "#" | "%" | <"> | "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`"
The set of reserved characters are:
";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | ","
with the following exceptions:
From the RFC, there are two important rules about escaping and unescaping (encoding and decoding):
The implication of this is that to decode the URI, it must be in a parsed state. That is, you can't convert %2f (the escaped form of "/") until the path has been parsed into its component parts. Another important desire is for the application viewing the component parts to see the decoded values of the components. For example, consider:
http://www.franz.com/calculator/3%2f2
This might be the implementation of a calculator, and how someone would execute 3/2. Clearly, the application that implements this would want to see path components of "calculator" and "3/2". "3%2f2" would not be useful to the calculator application.
For the reasons given above, a parsed version of the path is available and has the following form:
([:absolute | :relative] component1 [component2...])
where components are:
element | (element param1 [param2 ...])
and element is a path element, and the param's are path element parameters. For example, the result of
(uri-parsed-path (parse-uri "foo;10/bar:x;y;z/baz.htm"))
is
(:relative ("foo" "10") ("bar:x" "y" "z") "baz.htm")
There is a certain amount of canonicalization that occurs when parsing:
(:absolute)
or (:absolute "")
is
equivalent to a nil
path. That is, http://a/
is parsed with a nil
path and printed as http://a
. "foob%61r"
is parsed into "foobar"
and appears as "foobar"
when the URI is printed. This section describes how to intern URIs. Interning is not mandatory. URIs can be used perfectly well without interning them.
Interned URIs in Allegro are like symbols. That is, a string representing a URI, when parsed and interned, will always yield an eq object. For example:
(eq (intern-uri "http://www.franz.com") (intern-uri "http://www.franz.com"))
is always true. (Two strings with identical contents may or may not be eq in Common Lisp, note.)
The functions associated with interning are:
Arguments: &key size
Make a new hash-table object to contain interned URIs.
Arguments:
Return the object into which URIs are currently being interned.
Arguments: uri1 uri2
Returns true if uri1 and uri2 are equivalent.
Arguments: uri-name &optional uri-space
Intern the uri object specified in the uri-space specified. Methods exist for strings and uri objects.
Arguments: uri &optional uri-space
Unintern the uri object specified or all uri objects (in uri-space if specified)
if uri is t
.
Arguments: (var &optional uri-space result) &body body
Bind var to all currently defined uris (in uri-space if specified) and evaluate body.
(uri= (parse-uri "http://www.franz.com/")
(parse-uri "http://www.franz.com"))
(eq (intern-uri "http://www.franz.com/")
(intern-uri "http://www.franz.com"))
(eq (intern-uri "http://www.franz.com:80/foo/bar.htm")
(intern-uri "http://www.franz.com/foo/bar.htm"))
#u"..."
is shorthand for (parse-uri "...")
but if an existing #u
dispatch macro definition exists, it will not be
overridden. user(10): (setq u #u"http://foo.bar.com/foo/bar") #<uri http://foo.bar.com/foo/bar> user(11): (setf (net.uri:uri-host u) "foo.com") "foo.com" user(12): u #<uri http://foo.com/foo/bar> user(13):
This allows URIs behavior to follow the principle of least surprise.
uri(10): (use-package :net.uri) t uri(11): (parse-uri "foo") #<uri foo> uri(12): #u"foo" #<uri foo> uri(13): (setq base (intern-uri "http://www.franz.com/foo/bar/")) #<uri http://www.franz.com/foo/bar/> uri(14): (merge-uris (parse-uri "foo.htm") base) #<uri http://www.franz.com/foo/bar/foo.htm> uri(15): (merge-uris (parse-uri "?foo") base) #<uri http://www.franz.com/foo/bar/?foo> uri(16): (setq base (intern-uri "http://www.franz.com/foo/bar/baz.htm")) #<uri http://www.franz.com/foo/bar/baz.htm> uri(17): (merge-uris (parse-uri "foo.htm") base) #<uri http://www.franz.com/foo/bar/foo.htm> uri(18): (merge-uris #u"?foo" base) #<uri http://www.franz.com/foo/bar/?foo> uri(19): (describe #u"http://www.franz.com") #<uri http://www.franz.com> is an instance of #<standard-class net.uri:uri>: The following slots have :instance allocation: scheme :http host "www.franz.com" port nil path nil query nil fragment nil plist nil escaped nil string "http://www.franz.com" parsed-path nil hashcode nil uri(20): (describe #u"http://www.franz.com/") #<uri http://www.franz.com> is an instance of #<standard-class net.uri:uri>: The following slots have :instance allocation: scheme :http host "www.franz.com" port nil path nil query nil fragment nil plist nil escaped nil string "http://www.franz.com" parsed-path nil hashcode nil uri(21): #u"foobar#baz%23xxx" #<uri foobar#baz#xxx>
Copyright (c) 1998-2001, Franz Inc. Berkeley, CA., USA. All rights reserved. Created 2001.8.16.