From ebf5927e07bc14a20410801882ce4e35ffa0c691 Mon Sep 17 00:00:00 2001 From: "Kevin M. Rosenberg" Date: Sat, 19 Jul 2003 01:57:32 +0000 Subject: [PATCH] r5331: *** empty log message *** --- README | 21 +-- debian/changelog | 6 + debian/rules | 2 +- uri.html | 408 ++++++++++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 427 insertions(+), 10 deletions(-) create mode 100644 uri.html diff --git a/README b/README index 2fa22b5..f81655c 100644 --- a/README +++ b/README @@ -1,19 +1,22 @@ PURI - Portable URI Library -Kevin Rosenberg Franz, Inc - +Kevin Rosenberg This is portable Universal Resource Identifier library for Common Lisp programs. It parses URI according to the RFC 2396 specification. It's -is based on Franz, Inc's opensource URI package and hash been -ported to work other CL implementations. It is licensed with the -LLGPL as include in this distribution. +is based on Franz, Inc's opensource URI package and has been ported to +work other CL implementations. It is licensed under the LLGPL which +is included in this distribution. + +A regression suite is included which uses Franz's open-source tester +library. I've ported that library for use on other CL +implementations. Puri completes 126/126 regression tests successfully. -A regression package is include which uses Franz's open-source -tester library. I've also ported that library for use on other -CL implementations. Puri completes 126/126 regression tests -successfully. +Franz's unmodified documentation file is included in the file +uri.html. The only divergence in usage between Puri and Franz's +package is that Puri's symbols are located in the package PURI while +Franz's original uses the package NET.URI. Puri home: http://files.b9.com/puri/ Portable tester home: http://files.b9.com/tester/ diff --git a/debian/changelog b/debian/changelog index 3527958..f01cdb2 100644 --- a/debian/changelog +++ b/debian/changelog @@ -1,3 +1,9 @@ +cl-puri (1.2.3-1) unstable; urgency=low + + * Include uri.html documentation file. + + -- Kevin M. Rosenberg Fri, 18 Jul 2003 19:57:03 -0600 + cl-puri (1.2.2-1) unstable; urgency=low * Improve tests.lisp diff --git a/debian/rules b/debian/rules index 58f06ed..165756b 100755 --- a/debian/rules +++ b/debian/rules @@ -51,7 +51,7 @@ binary-indep: build install binary-arch: build install dh_testdir dh_testroot - dh_installdocs README + dh_installdocs README uri.html dh_installchangelogs dh_strip dh_compress diff --git a/uri.html b/uri.html new file mode 100644 index 0000000..8de336b --- /dev/null +++ b/uri.html @@ -0,0 +1,408 @@ + + + +URI support in Allegro CL + + + + +

URI support in Allegro CL

+ +

This document contains the following sections:

+ + +

1.0 Introduction
+2.0 The URI API definition
+3.0 Parsing, escape decoding/encoding and the path
+4.0 Interning URIs
+5.0 Allegro CL implementation notes
+6.0 Examples
+

+ +

This version of the Allegro CL URI support documentation is for distribution with the +Open Source version of the URI code. Links to Allegro CL documentation other than +URI-specific files have been supressed. To see Allegro CL documentation, see http://www.franz.com/support/documentation/, +which is the Allegro CL documentation page of the franz inc. website. Links to Allegro CL +documentation can be found on that page.

+ +
+ +
+ +

1.0 Introduction

+ +

URI stands for Universal Resource Identifier. For a description of +URIs, see RFC2396, which can be found in several places, including the IETF web site (http://www.ietf.org/rfc/rfc2396.txt) and +the UCI/ICS web site (http://www.ics.uci.edu/pub/ietf/uri/rfc2396.txt). +We prefer the UCI/ICS one as it has more examples.

+ +

URIs are a superset in functionality and syntax to URLs (Universal Resource Locators) +and URNs (Universal Resource Names). That is, RFC2396 updates and merges RFC1738 and +RFC1808 into a single syntax, called the URI. It does exclude some portions of RFC1738 +that define specific syntax of individual URL schemes.

+ +

In URL slang, the scheme is usually called the `protocol', but it is called +scheme in RFC1738. A URL `host' corresponds to the URI `authority.' The URL slang +`bookmark' or `anchor' is `fragment' in URI lingo.

+ +

The URI facility was available as a patch to Allegro CL 5.0.1 and is included with +release 6.0. the URI facility might not be in an Allegro CL image. Evaluate (require +:uri) to ensure the facility is loaded (that form returns nil if the +URI module is already loaded).

+ +

Broadly, the URI facility creates a Lisp object that represents a URI, and provides +setters and accessors to fields in the URI object. The URI object can also be interned, +much like symbols in CL are. This document describes the facility and the related +operators.

+ +

Aside from the obvious slots which are called out in the RFC, URIs also have a property +list. With interning, this is another similarity between URIs and CL symbols.

+ +
+ +
+ +

2.0 The URI API definition

+ +

Symbols naming objects (functions, variables, etc.) in the uri module are +exported from the net.uri package.

+ +

URIs are represented by CLOS objects. Their slots are:

+ +
+scheme 
+host 
+port 
+path 
+query
+fragment 
+plist 
+
+ +

The host and port slots together correspond to the authority +(see RFC2396). There is an accessor-like function, uri-authority, +that can be used to extract the authority from a URI. See the RFC2396 specifications +pointed to at the beginning of the 1.0 Introduction for details +of all the slots except plist. The plist slot contains a +standard Common Lisp property list.

+ +

All symbols are external in the net.uri package, unless otherwise noted. +Brief descriptions are given in this document, with complete descriptions in the +individual pages. + +

    +
  • uri: the class of URI objects.
  • +
  • urn: the class of URN objects.
  • +
  • uri-p

    Arguments: object

    +

    Returns true if object is an instance of class uri. +

    +
  • +
  • copy-uri

    Arguments: uri &key + place scheme host port path query fragment plist

    +

    Copies the specified URI object. See the description page for information on the + keyword arguments.

    +
  • +
  • uri-scheme
    + uri-host
    + uri-port
    + uri-path
    + uri-query
    + uri-fragment
    + uri-plist
    +

    Arguments: uri-object

    +

    These accessors return the value of the associated slots of the uri-object

    +
  • +
  • uri-authority

    Arguments: uri-object +

    +

    Returns the authority of uri-object. The authority combines the host and port.

    +
  • +
  • render-uri

    Arguments: uri + stream

    +

    Print to stream the printed representation of uri.

    +
  • +
  • parse-uri

    Arguments: string &key + (class 'uri)

    +

    Parse string into a URI object.

    +
  • +
  • merge-uris

    Arguments: uri + base-uri &optional place

    +

    Return an absolute URI, based on uri, which can be relative, and base-uri + which must be absolute.

    +
  • +
  • enough-uri

    Arguments: uri + base

    +

    Converts uri into a relative URI using base as the base URI.

    +
  • +
  • uri-parsed-path

    Arguments: uri +

    +

    Return the parsed representation of the path.

    +
  • +
  • uri

    Arguments: object

    +

    Defined methods: if argument is a uri object, return it; create a uri object if + possible and return it, or error if not possible.

    +
  • +
+ +
+ +
+ +

3.0 Parsing, escape decoding/encoding and the path

+ +

The method uri-path returns the path +portion of the URI, in string form. The method uri-parsed-path +returns the path portion of the URI, in list form. This list form is discussed below, +after a discussion of decoding/encoding.

+ +

RFC2396 lays out a method for inserting into URIs reserved characters. You do +this by escaping the character. An escaped character is defined like this:

+ +
+escaped = "%" hex hex 
+
+hex = digit | "A" | "B" | "C" | "D" | "E" | "F" | "a" | "b" | "c" | "d" | "e" | "f" 
+
+ +

In addition, the RFC defines excluded characters:

+ +
+"<" | ">" | "#" | "%" | <"> | "{" | "}" | "|" | "\" | "^" | "[" | "]" | "`" 
+
+ +

The set of reserved characters are:

+ +
+";" | "/" | "?" | ":" | "@" | "&" | "=" | "+" | "$" | "," 
+
+ +

with the following exceptions: + +

    +
  • within the authority component, the characters ";", ":", + "@", "?", and "/" are reserved.
  • +
  • within a path segment, the characters "/", ";", "=", and + "?" are reserved.
  • +
  • within a query component, the characters ";", "/", "?", + ":", "@", "&", "=", "+", + ",", and "$" are reserved.
  • +
+ +

From the RFC, there are two important rules about escaping and unescaping (encoding and +decoding): + +

    +
  • decoding should only happen when the URI is parsed into component parts;
  • +
  • encoding can only occur when a URI is made from component parts (ie, rendered for + printing).
  • +
+ +

The implication of this is that to decode the URI, it must be in a parsed state. That +is, you can't convert %2f (the escaped form of +"/") until the path has been parsed into its component parts. Another important +desire is for the application viewing the component parts to see the decoded values of the +components. For example, consider:

+ +
+http://www.franz.com/calculator/3%2f2 
+
+ +

This might be the implementation of a calculator, and how someone would execute 3/2. +Clearly, the application that implements this would want to see path components of +"calculator" and "3/2". "3%2f2" would not be useful to the +calculator application.

+ +

For the reasons given above, a parsed version of the path is available and has the +following form:

+ +
+([:absolute | :relative] component1 [component2...]) 
+
+ +

where components are:

+ +
+element | (element param1 [param2 ...]) 
+
+ +

and element is a path element, and the param's are path element parameters. +For example, the result of

+ +
+(uri-parsed-path (parse-uri "foo;10/bar:x;y;z/baz.htm")) 
+
+ +

is

+ +
+(:relative ("foo" "10") ("bar:x" "y" "z") "baz.htm") 
+
+ +

There is a certain amount of canonicalization that occurs when parsing: + +

    +
  • A path of (:absolute) or (:absolute "") is + equivalent to a nil path. That is, http://a/ is parsed with a nil + path and printed as http://a.
  • +
  • Escaped characters that are not reserved are not escaped upon printing. For example, "foob%61r" + is parsed into "foobar" and appears as "foobar" + when the URI is printed.
  • +
+ +
+ +
+ +

4.0 Interning URIs

+ +

This section describes how to intern URIs. Interning is not mandatory. URIs can be used +perfectly well without interning them.

+ +

Interned URIs in Allegro are like symbols. That is, a string representing a URI, when +parsed and interned, will always yield an eq object. For example:

+ +
+(eq (intern-uri "http://www.franz.com") 
+    (intern-uri "http://www.franz.com")) 
+
+ +

is always true. (Two strings with identical contents may or may not be eq +in Common Lisp, note.)

+ +

The functions associated with interning are: + +

    +
  • make-uri-space

    Arguments: &key + size

    +

    Make a new hash-table object to contain interned URIs.

    +
  • +
  • uri-space

    Arguments:

    +

    Return the object into which URIs are currently being interned.

    +
  • +
  • uri=

    Arguments: uri1 uri2

    +

    Returns true if uri1 and uri2 are equivalent.

    +
  • +
  • intern-uri

    Arguments: uri-name + &optional uri-space

    +

    Intern the uri object specified in the uri-space specified. Methods exist for strings + and uri objects.

    +
  • +
  • unintern-uri

    Arguments: uri + &optional uri-space

    +

    Unintern the uri object specified or all uri objects (in uri-space if specified) + if uri is t.

    +
  • +
  • do-all-uris

    Arguments: (var &optional + uri-space result) &body body

    +

    Bind var to all currently defined uris (in uri-space if specified) and + evaluate body.

    +
  • +
+ +
+ +
+ +

5.0 Allegro CL implementation notes

+ +
    +
  1. The following are true:
    + (uri= (parse-uri "http://www.franz.com/")
    +     (parse-uri "http://www.franz.com"))
    + (eq (intern-uri "http://www.franz.com/")
    +    (intern-uri "http://www.franz.com"))
    +
  2. +
  3. The following is true:
    + (eq (intern-uri "http://www.franz.com:80/foo/bar.htm")
    +     (intern-uri "http://www.franz.com/foo/bar.htm"))
    + (I.e. specifying the default port is the same as specifying no port at all. This is + specific in RFC2396.)
  4. +
  5. The scheme and authority are case-insensitive. In Allegro CL, the + scheme is a keyword that appears in the normal case for the Lisp in which you are + executing.
  6. +
  7. #u"..." is shorthand for (parse-uri "...") + but if an existing #u dispatch macro definition exists, it will not be + overridden.
  8. +
  9. The interaction between setting the scheme, host, port, path, query, and fragment slots + of URI objects, in conjunction with interning URIs will have very bad and unpredictable + results.
  10. +
  11. The printable representation of URIs is cached, for efficiency. This caching is undone + when the above slots are changed. That is, when you create a URI the printed + representation is cached. When you change one of the above mentioned slots, the printed + representation is cleared and calculated when the URI is next printed. For example:
  12. +
+ +
+user(10): (setq u #u"http://foo.bar.com/foo/bar") 
+#<uri http://foo.bar.com/foo/bar> 
+user(11): (setf (net.uri:uri-host u) "foo.com") 
+"foo.com" 
+user(12): u 
+#<uri http://foo.com/foo/bar> 
+user(13): 
+
+ +

This allows URIs behavior to follow the principle of least surprise.

+ +
+ +
+ +

6.0 Examples

+ +
+uri(10): (use-package :net.uri)
+t
+uri(11): (parse-uri "foo")
+#<uri foo>
+uri(12): #u"foo"
+#<uri foo>
+uri(13): (setq base (intern-uri "http://www.franz.com/foo/bar/"))
+#<uri http://www.franz.com/foo/bar/>
+uri(14): (merge-uris (parse-uri "foo.htm") base)
+#<uri http://www.franz.com/foo/bar/foo.htm>
+uri(15): (merge-uris (parse-uri "?foo") base)
+#<uri http://www.franz.com/foo/bar/?foo>
+uri(16): (setq base (intern-uri "http://www.franz.com/foo/bar/baz.htm"))
+#<uri http://www.franz.com/foo/bar/baz.htm>
+uri(17): (merge-uris (parse-uri "foo.htm") base)
+#<uri http://www.franz.com/foo/bar/foo.htm>
+uri(18): (merge-uris #u"?foo" base)
+#<uri http://www.franz.com/foo/bar/?foo>
+uri(19): (describe #u"http://www.franz.com")
+#<uri http://www.franz.com> is an instance of #<standard-class net.uri:uri>:
+ The following slots have :instance allocation:
+  scheme        :http
+  host          "www.franz.com"
+  port          nil
+  path          nil
+  query         nil
+  fragment      nil
+  plist         nil
+  escaped       nil
+  string        "http://www.franz.com"
+  parsed-path   nil
+  hashcode      nil
+uri(20): (describe #u"http://www.franz.com/")
+#<uri http://www.franz.com> is an instance of #<standard-class net.uri:uri>:
+ The following slots have :instance allocation:
+  scheme        :http
+  host          "www.franz.com"
+  port          nil
+  path          nil
+  query         nil
+  fragment      nil
+  plist         nil
+  escaped       nil
+  string        "http://www.franz.com"
+  parsed-path   nil
+  hashcode      nil
+uri(21): #u"foobar#baz%23xxx"
+#<uri foobar#baz#xxx>
+
+ +

Copyright (c) 1998-2001, Franz Inc. Berkeley, CA., USA. All rights reserved. +Created 2001.8.16.

+ + -- 1.7.10.4