repo.macrolet.net Git - sbcl.git/commit

author	Christophe Rhodes <csr21@cantab.net>
	Tue, 4 Jun 2013 12:00:50 +0000 (13:00 +0100)
committer	Christophe Rhodes <csr21@cantab.net>
	Tue, 4 Jun 2013 12:00:50 +0000 (13:00 +0100)
commit	f44f6d1adbaaa7057f1948369299c0b2a08bcd6e
tree	8268bfa25235097ea1fdd3c01bc769815bdf3f06	tree \| snapshot
parent	0d51ca7e5e624dc3bad5c87e14211e8e6f7b3a45	commit \| diff

fix CL case conversions of characters involving iota subscript

Oh boy.  Judging by the length of the web page explaining the issue
(at <http://www.tlg.uci.edu/~opoudjis/unicode/unicode_adscript.html>)
this is a bit of a minefield.  I hope that this doesn't contribute
further to the trouble...

Although the combined _WITH_PROSGEGRAMMENI characters are of
general class "Lt" (i.e. titlecase), for CL purposes we treat them
as the uppercase equivalent of the lowercase _WITH_YPOGEGRAMMENI
characters (as directly specified by the case mapping data in
UnicodeData.txt).  This is a little awkward, and involves a bit
of rearrangement in the indices of the misc table entries to make
the (CL) uppercase/lowercase tests efficient, but seems to be the
best of all possible worlds given that we must comply with CL's
character-to-character case mappings -- the alternative of not
providing an uppercase version of LOWERCASE_OMEGA_WITH_YPOGEGRAMMENI
seems even weirder.

The way this is done in ucd.lisp is a little bit kludgy, because we
have to avoid giving the same exception to the serbian titlecase
digraphs (Dz and friends) which mustn't map to anything, or else
we'd break invertibility.  (The lowercase dz and uppercase DZ are
already (CL) case mappings of each other).  Probably the thing which
will confuse future readers is that some (Unicode) titlecase
characters are (CL) upper-case-p.

src/code/target-char.lisp		diff \| blob \| history
tests/character.pure.lisp		diff \| blob \| history
tools-for-build/ucd.lisp		diff \| blob \| history