+;;;; UCD accessor functions
+
+;;; The first (* 8 395) => 3160 entries in **CHARACTER-DATABASE**
+;;; contain entries for the distinct character attributes:
+;;; specifically, indexes into the GC kinds, Bidi kinds, CCC kinds,
+;;; the decimal digit property, the digit property and the
+;;; bidi-mirrored boolean property. (There are two spare bytes for
+;;; other information, should that become necessary)
+;;;
+;;; the next (ash #x110000 -8) entries contain single-byte indexes
+;;; into a table of 256-element 4-byte-sized entries. These entries
+;;; follow directly on, and are of the form
+;;; {attribute-index[11b],transformed-code-point[21b]}x256, where the
+;;; attribute index is an index into the miscellaneous information
+;;; table, and the transformed code point is the code point of the
+;;; simple mapping of the character to its lowercase or uppercase
+;;; equivalent, as appropriate and if any.
+;;;
+;;; I feel the opacity of the above suggests the need for a diagram:
+;;;
+;;; C _______________________________________
+;;; / \
+;;; L \
+;;; [***************|=============================|--------...]
+;;; (a) \ _
+;;; A \______________________/| B
+;;;
+;;; To look up information about a character, take the high 13 bits of
+;;; its code point, and index the character database with that and a
+;;; base of 3160 (going past the miscellaneous information[*], so
+;;; treating (a) as the start of the array). This, labelled A, gives
+;;; us another index into the detailed pages[-], which we can use to
+;;; look up the details for the character in question: we add the low
+;;; 8 bits of the character, shifted twice (because we have four-byte
+;;; table entries) to 1024 times the `page' index, with a base of 6088
+;;; to skip over everything else. This gets us to point B. If we're
+;;; after a transformed code point (i.e. an upcase or downcase
+;;; operation), we can simply read it off now, beginning with an
+;;; offset of 11 bits from point B in some endianness; if we're
+;;; looking for miscellaneous information, we take the 11-bit value at
+;;; B, and index the character database once more to get to the
+;;; relevant miscellaneous information.
+;;;
+;;; As an optimization to the common case (pun intended) of looking up
+;;; case information for a character, the entries in C above are
+;;; sorted such that the characters which are UPPER-CASE-P in CL terms
+;;; have index values lower than all others, followed by those which
+;;; are LOWER-CASE-P in CL terms; this permits implementation of
+;;; character case tests without actually going to the trouble of
+;;; looking up the value associated with the index. (Actually, this
+;;; isn't just a speed optimization; the information about whether a
+;;; character is BOTH-CASE-P is used just in the ordering and not
+;;; explicitly recorded in the database).
+;;;
+;;; The moral of all this? Next time, don't just say "FIXME: document
+;;; this"