doc/cmucl/internals/object.tex

   1 \chapter{Object Format}
   2
   3 \f
   4 \section{Tagging}
   5
   6 The following is a key of the three bit low-tagging scheme:
   7 \begin{description}
   8    \item[000] even fixnum
   9    \item[001] function pointer
  10    \item[010] even other-immediate (header-words, characters, symbol-value trap value, etc.)
  11    \item[011] list pointer
  12    \item[100] odd fixnum
  13    \item[101] structure pointer
  14    \item[110] odd other immediate
  15   \item[111] other-pointer to data-blocks (other than conses, structures,
  16                                      and functions)
  17 \end{description}
  18
  19 This tagging scheme forces a dual-word alignment of data-blocks on the heap,
  20 but this can be pretty negligible:
  21 \begin{itemize}
  22 \item   RATIOS and COMPLEX must have a header-word anyway since they are not a
  23       major type.  This wastes one word for these infrequent data-blocks since
  24       they require two words for the data.
  25
  26 \item BIGNUMS must have a header-word and probably contain only one other word
  27       anyway, so we probably don't waste any words here.  Most bignums just
  28       barely overflow fixnums, that is by a bit or two.
  29
  30 \item   Single and double FLOATS?
  31       no waste, or
  32       one word wasted
  33
  34 \item   SYMBOLS have a pad slot (current called the setf function, but unused.)
  35 \end{itemize}
  36 Everything else is vector-like including code, so these probably take up
  37 so many words that one extra one doesn't matter.
  38
  39
  40 \f
  41 \section{GC Comments}
  42
  43 Data-Blocks comprise only descriptors, or they contain immediate data and raw
  44 bits interpreted by the system.  GC must skip the latter when scanning the
  45 heap, so it does not look at a word of raw bits and interpret it as a pointer
  46 descriptor.  These data-blocks require headers for GC as well as for operations
  47 that need to know how to interpret the raw bits.  When GC is scanning, and it
  48 sees a header-word, then it can determine how to skip that data-block if
  49 necessary.  Header-Words are tagged as other-immediates.  See the sections
  50 "Other-Immediates" and "Data-Blocks and Header-Words" for comments on
  51 distinguishing header-words from other-immediate data.  This distinction is
  52 necessary since we scan through data-blocks containing only descriptors just as
  53 we scan through the heap looking for header-words introducing data-blocks.
  54
  55 Data-Blocks containing only descriptors do not require header-words for GC
  56 since the entire data-block can be scanned by GC a word at a time, taking
  57 whatever action is necessary or appropriate for the data in that slot.  For
  58 example, a cons is referenced by a descriptor with a specific tag, and the
  59 system always knows the size of this data-block.  When GC encounters a pointer
  60 to a cons, it can transport it into the new space, and when scanning, it can
  61 simply scan the two words manifesting the cons interpreting each word as a
  62 descriptor.  Actually there is no cons tag, but a list tag, so we make sure the
  63 cons is not nil when appropriate.  A header may still be desired if the pointer
  64 to the data-block does not contain enough information to adequately maintain
  65 the data-block.  An example of this is a simple-vector containing only
  66 descriptor slots, and we attach a header-word because the descriptor pointing
  67 to the vector lacks necessary information -- the type of the vector's elements,
  68 its length, etc.
  69
  70 There is no need for a major tag for GC forwarding pointers.  Since the tag
  71 bits are in the low end of the word, a range check on the start and end of old
  72 space tells you if you need to move the thing.  This is all GC overhead.
  73
  74
  75 \f
  76 \section{Structures}
  77
  78 A structure descriptor has the structure lowtag type code, making
  79 {\tt structurep} a fast operation.  A structure
  80 data-block has the following format:
  81 \begin{verbatim}
  82     -------------------------------------------------------
  83     |   length (24 bits) | Structure header type (8 bits) |
  84     -------------------------------------------------------
  85     |   structure type name (a symbol)                    |
  86     -------------------------------------------------------
  87     |   structure slot 0                                  |
  88     -------------------------------------------------------
  89     |   ... structure slot length - 2                     |
  90     -------------------------------------------------------
  91 \end{verbatim}
  92
  93 The header word contains the structure length, which is the number of words
  94 (other than the header word.)  The length is always at least one, since the
  95 first word of the structure data is the structure type name.
  96
  97 \f
  98 \section{Fixnums}
  99
 100 A fixnum has one of the following formats in 32 bits:
 101 \begin{verbatim}
 102     -------------------------------------------------------
 103     |        30 bit 2's complement even integer   | 0 0 0 |
 104     -------------------------------------------------------
 105 \end{verbatim}
 106 or
 107 \begin{verbatim}
 108     -------------------------------------------------------
 109     |        30 bit 2's complement odd integer    | 1 0 0 |
 110     -------------------------------------------------------
 111 \end{verbatim}
 112
 113 Effectively, there is one tag for immediate integers, two zeros.  This buys one
 114 more bit for fixnums, and now when these numbers index into simple-vectors or
 115 offset into memory, they point to word boundaries on 32-bit, byte-addressable
 116 machines.  That is, no shifting need occur to use the number directly as an
 117 offset.
 118
 119 This format has another advantage on byte-addressable machines when fixnums are
 120 offsets into vector-like data-blocks, including structures.  Even though we
 121 previously mentioned data-blocks are dual-word aligned, most indexing and slot
 122 accessing is word aligned, and so are fixnums with effectively two tag bits.
 123
 124 Two tags also allow better usage of special instructions on some machines that
 125 can deal with two low-tag bits but not three.
 126
 127 Since the two bits are zeros, we avoid having to mask them off before using the
 128 words for arithmetic, but division and multiplication require special shifting.
 129
 130
 131 \f
 132 \section{Other-immediates}
 133
 134 As for fixnums, there are two different three-bit lowtag codes for
 135 other-immediate, allowing 64 other-immediate types:
 136 \begin{verbatim}
 137 ----------------------------------------------------------------
 138 |   Data (24 bits)        | Type (8 bits with low-tag)   | 1 0 |
 139 ----------------------------------------------------------------
 140 \end{verbatim}
 141
 142 The type-code for an other-immediate type is considered to include the two
 143 lowtag bits.  This supports the concept of a single "type code" namespace for
 144 all descriptors, since the normal lowtag codes are disjoint from the
 145 other-immediate codes.
 146
 147 For other-pointer objects, the full eight bits of the header type code are used
 148 as the type code for that kind of object.  This is why we use two lowtag codes
 149 for other-immediate types: each other-pointer object needs a distinct
 150 other-immediate type to mark its header.
 151
 152 The system uses the other-immediate format for characters,
 153 the {\tt symbol-value} unbound trap value, and header-words for data-blocks on
 154 the heap.  The type codes are laid out to facilitate range checks for common
 155 subtypes; for example, all numbers will have contiguous type codes which are
 156 distinct from the contiguous array type codes.  See section
 157 \ref{data-blocks-and-o-i} for details.
 158
 159 \f
 160 \section{Data-Blocks and Header-Word Format}
 161
 162 Pointers to data-blocks have the following format:
 163 \begin{verbatim}
 164 ----------------------------------------------------------------
 165 |      Dual-word address of data-block (29 bits)       | 1 1    1 |
 166 ----------------------------------------------------------------
 167 \end{verbatim}
 168
 169 The word pointed to by the above descriptor is a header-word, and it has the
 170 same format as an other-immediate:
 171 \begin{verbatim}
 172 ----------------------------------------------------------------
 173 |   Data (24 bits)        | Type (8 bits with low-tag) | 0 1 0 |
 174 ----------------------------------------------------------------
 175 \end{verbatim}
 176 This is convenient for scanning the heap when GC'ing, but it does mean that
 177 whenever GC encounters an other-immediate word, it has to do a range check on
 178 the low byte to see if it is a header-word or just a character (for example).
 179 This is easily acceptable performance hit for scanning.
 180
 181 The system interprets the data portion of the header-word for non-vector
 182 data-blocks as the word length excluding the header-word.  For example, the
 183 data field of the header for ratio and complex numbers is two, one word each
 184 for the numerator and denominator or for the real and imaginary parts.
 185
 186 For vectors and data-blocks representing Lisp objects stored like vectors, the
 187 system ignores the data portion of the header-word:
 188 \begin{verbatim}
 189 ----------------------------------------------------------------
 190 | Unused Data (24 bits)   | Type (8 bits with low-tag) | 0 1 0 |
 191 ----------------------------------------------------------------
 192 |           Element Length of Vector (30 bits)           | 0 0 |
 193 ----------------------------------------------------------------
 194 \end{verbatim}
 195
 196 Using a separate word allows for much larger vectors, and it allows {\tt
 197 length} to simply access a single word without masking or shifting.  Similarly,
 198 the header for complex arrays and vectors has a second word, following the
 199 header-word, the system uses for the fill pointer, so computing the length of
 200 any array is the same code sequence.
 201
 202
 203 \f
 204 \section{Data-Blocks and Other-immediates Typing}
 205
 206 \label{data-blocks-and-o-i}
 207 These are the other-immediate types.  We specify them including all low eight
 208 bits, including the other-immediate tag, so we can think of the type bits as
 209 one type -- not an other-immediate major type and a subtype.  Also, fetching a
 210 byte and comparing it against a constant is more efficient than wasting even a
 211 small amount of time shifting out the other-immediate tag to compare against a
 212 five bit constant.
 213 \begin{verbatim}
 214 Number   (< 30)
 215   bignum                                        10
 216     ratio                                       14
 217     single-float                                18
 218     double-float                                22
 219     complex                                     26
 220
 221 Array   (>= 30 code 86)
 222    Simple-Array   (>= 20 code 70)
 223          simple-array                           30
 224       Vector  (>= 34 code 82)
 225          simple-string                          34
 226          simple-bit-vector                      38
 227          simple-vector                          42
 228          (simple-array (unsigned-byte 2) (*))   46
 229          (simple-array (unsigned-byte 4) (*))   50
 230          (simple-array (unsigned-byte 8) (*))   54
 231          (simple-array (unsigned-byte 16) (*))  58
 232          (simple-array (unsigned-byte 32) (*))  62
 233          (simple-array single-float (*))        66
 234          (simple-array double-float (*))        70
 235       complex-string                            74
 236       complex-bit-vector                        78
 237       (array * (*))   -- general complex vector. 82
 238    complex-array                                86
 239
 240 code-header-type                                90
 241 function-header-type                            94
 242 closure-header-type                             98
 243 funcallable-instance-header-type                102
 244 unused-function-header-1-type                   106
 245 unused-function-header-2-type                   110
 246 unused-function-header-3-type                   114
 247 closure-function-header-type                    118
 248 return-pc-header-type (a.k.a LRA)               122
 249 value-cell-header-type                          126
 250 symbol-header-type                              130
 251 base-character-type                             134
 252 system-area-pointer-type (header type)          138
 253 unbound-marker                                  142
 254 weak-pointer-type                               146
 255 structure-header-type                           150
 256 \end{verbatim}
 257 \f
 258 \section{Strings}
 259
 260 All strings in the system are C-null terminated.  This saves copying the bytes
 261 when calling out to C.  The only time this wastes memory is when the string
 262 contains a multiple of eight characters, and then the system allocates two more
 263 words (since Lisp objects are dual-word aligned) to hold the C-null byte.
 264 Since the system will make heavy use of C routines for systems calls and
 265 libraries that save reimplementation of higher level operating system
 266 functionality (such as pathname resolution or current directory computation),
 267 saving on copying strings for C should make C call out more efficient.
 268
 269 The length word in a string header, see section "Data-Blocks and Header-Word
 270 Format", counts only the characters truly in the Common Lisp string.
 271 Allocation and GC will have to know to handle the extra C-null byte, and GC
 272 already has to deal with rounding up various objects to dual-word alignment.
 273
 274
 275 \f
 276 \section{Symbols and NIL}
 277
 278 Symbol data-block has the following format:
 279 \begin{verbatim}
 280 -------------------------------------------------------
 281 |     7 (data-block words)     | Symbol Type (8 bits) |
 282 -------------------------------------------------------
 283 |               Value Descriptor                      |
 284 -------------------------------------------------------
 285 |                       Function Pointer              |
 286 -------------------------------------------------------
 287 |                     Raw Function Address            |
 288 -------------------------------------------------------
 289 |                        Setf Function                |
 290 -------------------------------------------------------
 291 |                        Property List                |
 292 -------------------------------------------------------
 293 |                          Print Name                 |
 294 -------------------------------------------------------
 295 |                           Package                   |
 296 -------------------------------------------------------
 297 \end{verbatim}
 298
 299 Most of these slots are self-explanatory given what symbols must do in Common
 300 Lisp, but a couple require comments.  We added the Raw Function Address slot to
 301 speed up named call which is the most common calling convention.  This is a
 302 non-descriptor slot, but since objects are dual word aligned, the value
 303 inherently has fixnum low-tag bits.  The GC method for symbols must know to
 304 update this slot.  The Setf Function slot is currently unused, but we had an
 305 extra slot due to adding Raw Function Address since objects must be dual-word
 306 aligned.
 307
 308 The issues with nil are that we want it to act like a symbol, and we need list
 309 operations such as CAR and CDR to be fast on it.  CMU Common Lisp solves this
 310 by putting nil as the first object in static space, where other global values
 311 reside, so it has a known address in the system:
 312 \begin{verbatim}
 313 -------------------------------------------------------  <-- space
 314 |                               0                     |      start
 315 -------------------------------------------------------
 316 |     7 (data-block words)     | Symbol Type (8 bits) |
 317 -------------------------------------------------------  <-- nil
 318 |                           Value/CAR                 |
 319 -------------------------------------------------------
 320 |                         Definition/CDR              |
 321 -------------------------------------------------------
 322 |                      Raw Function Address           |
 323 -------------------------------------------------------
 324 |                         Setf Function               |
 325 -------------------------------------------------------
 326 |                         Property List               |
 327 -------------------------------------------------------
 328 |                           Print Name                |
 329 -------------------------------------------------------
 330 |                            Package                  |
 331 -------------------------------------------------------
 332 |                              ...                    |
 333 -------------------------------------------------------
 334 \end{verbatim}
 335 In addition, we make the list typed pointer to nil actually point past the
 336 header word of the nil symbol data-block.  This has usefulness explained below.
 337 The value and definition of nil are nil.  Therefore, any reference to nil used
 338 as a list has quick list type checking, and CAR and CDR can go right through
 339 the first and second words as if nil were a cons object.
 340
 341 When there is a reference to nil used as a symbol, the system adds offsets to
 342 the address the same as it does for any symbol.  This works due to a
 343 combination of nil pointing past the symbol header-word and the chosen list and
 344 other-pointer type tags.  The list type tag is four less than the other-pointer
 345 type tag, but nil points four additional bytes into its symbol data-block.
 346
 347
 348 \f
 349 ;;;; Array Headers.
 350
 351 The array-header data-block has the following format:
 352 \begin{verbatim}
 353 ----------------------------------------------------------------
 354 | Header Len (24 bits) = Array Rank +5   | Array Type (8 bits) |
 355 ----------------------------------------------------------------
 356 |               Fill Pointer (30 bits)                   | 0 0 |
 357 ----------------------------------------------------------------
 358 |               Available Elements (30 bits)             | 0 0 |
 359 ----------------------------------------------------------------
 360 |               Data Vector (29 bits)                  | 1 1 1 |
 361 ----------------------------------------------------------------
 362 |               Displacement (30 bits)                   | 0 0 |
 363 ----------------------------------------------------------------
 364 |               Displacedp (29 bits) -- t or nil       | 1 1 1 |
 365 ----------------------------------------------------------------
 366 |               Range of First Index (30 bits)           | 0 0 |
 367 ----------------------------------------------------------------
 368                               .
 369                               .
 370                               .
 371
 372 \end{verbatim}
 373 The array type in the header-word is one of the eight-bit patterns from section
 374 "Data-Blocks and Other-immediates Typing", indicating that this is a complex
 375 string, complex vector, complex bit-vector, or a multi-dimensional array.  The
 376 data portion of the other-immediate word is the length of the array header
 377 data-block.  Due to its format, its length is always five greater than the
 378 array's number of dimensions.  The following words have the following
 379 interpretations and types:
 380 \begin{description}
 381    \item[Fill Pointer:]
 382       This is a fixnum indicating the number of elements in the data vector
 383       actually in use.  This is the logical length of the array, and it is
 384       typically the same value as the next slot.  This is the second word, so
 385       LENGTH of any array, with or without an array header, is just four bytes
 386       off the pointer to it.
 387    \item[Available Elements:]
 388       This is a fixnum indicating the number of elements for which there is
 389       space in the data vector.  This is greater than or equal to the logical
 390       length of the array when it is a vector having a fill pointer.
 391    \item[Data Vector:]
 392       This is a pointer descriptor referencing the actual data of the array.
 393       This a data-block whose first word is a header-word with an array type as
 394       described in sections "Data-Blocks and Header-Word Format" and
 395       "Data-Blocks and Other-immediates Typing"
 396    \item[Displacement:]
 397       This is a fixnum added to the computed row-major index for any array.
 398       This is typically zero.
 399    \item[Displacedp:]
 400       This is either t or nil.  This is separate from the displacement slot, so
 401       most array accesses can simply add in the displacement slot.  The rare
 402       need to know if an array is displaced costs one extra word in array
 403       headers which probably aren't very frequent anyway.
 404    \item[Range of First Index:]
 405       This is a fixnum indicating the number of elements in the first dimension
 406       of the array.  Legal index values are zero to one less than this number
 407       inclusively.  IF the array is zero-dimensional, this slot is
 408       non-existent.
 409    \item[... (remaining slots):]
 410       There is an additional slot in the header for each dimension of the
 411       array.  These are the same as the Range of First Index slot.
 412 \end{description}
 413
 414 \f
 415 \section{Bignums}
 416
 417 Bignum data-blocks have the following format:
 418 \begin{verbatim}
 419 -------------------------------------------------------
 420 |      Length (24 bits)        | Bignum Type (8 bits) |
 421 -------------------------------------------------------
 422 |             least significant bits                  |
 423 -------------------------------------------------------
 424                             .
 425                             .
 426                             .
 427 \end{verbatim}
 428 The elements contain the two's complement representation of the integer with
 429 the least significant bits in the first element or closer to the header.  The
 430 sign information is in the high end of the last element.
 431
 432
 433
 434 \f
 435 \section{Code Data-Blocks}
 436
 437 A code data-block is the run-time representation of a "component".  A component
 438 is a connected portion of a program's flow graph that is compiled as a single
 439 unit, and it contains code for many functions.  Some of these functions are
 440 callable from outside of the component, and these are termed "entry points".
 441
 442 Each entry point has an associated user-visible function data-block (of type
 443 {\tt function}).  The full call convention provides for calling an entry point
 444 specified by a function object.
 445
 446 Although all of the function data-blocks for a component's entry points appear
 447 to the user as distinct objects, the system keeps all of the code in a single
 448 code data-block.  The user-visible function object is actually a pointer into
 449 the middle of a code data-block.  This allows any control transfer within a
 450 component to be done using a relative branch.
 451
 452 Besides a function object, there are other kinds of references into the middle
 453 of a code data-block.  Control transfer into a function also occurs at the
 454 return-PC for a call.  The system represents a return-PC somewhat similarly to
 455 a function, so GC can also recognize a return-PC as a reference to a code
 456 data-block.  This representation is known as a Lisp Return Address (LRA).
 457
 458 It is incorrect to think of a code data-block as a concatenation of "function
 459 data-blocks".  Code for a function is not emitted in any particular order with
 460 respect to that function's function-header (if any).  The code following a
 461 function-header may only be a branch to some other location where the
 462 function's "real" definition is.
 463
 464
 465 The following are the three kinds of pointers to code data-blocks:
 466 \begin{description}
 467    \item[Code pointer (labeled A below):]
 468       A code pointer is a descriptor, with other-pointer low-tag bits, pointing
 469       to the beginning of the code data-block.  The code pointer for the
 470       currently running function is always kept in a register (CODE).  In
 471       addition to allowing loading of non-immediate constants, this also serves
 472       to represent the currently running function to the debugger.
 473    \item[LRA (labeled B below):]
 474       The LRA is a descriptor, with other-pointer low-tag bits, pointing
 475       to a location for a function call.  Note that this location contains no
 476       descriptors other than the one word of immediate data, so GC can treat
 477       LRA locations the same as instructions.
 478    \item[Function (labeled C below):]
 479       A function is a descriptor, with function low-tag bits, that is user
 480       callable.  When a function header is referenced from a closure or from
 481       the function header's self-pointer, the pointer has other-pointer low-tag
 482       bits, instead of function low-tag bits.  This ensures that the internal
 483       function data-block associated with a closure appears to be uncallable
 484       (although users should never see such an object anyway).
 485
 486       Information about functions that is only useful for entry points is kept
 487       in some descriptors following the function's self-pointer descriptor.
 488       All of these together with the function's header-word are known as the
 489       "function header".  GC must be able to locate the function header.  We
 490       provide for this by chaining together the function headers in a NIL
 491       terminated list kept in a known slot in the code data-block.
 492 \end{description}
 493
 494 A code data-block has the following format:
 495 \begin{verbatim}
 496 A -->
 497 ****************************************************************
 498 |  Header-Word count (24 bits)    |   Code-Type (8 bits)       |
 499 ----------------------------------------------------------------
 500 |  Number of code words (fixnum tag)                           |
 501 ----------------------------------------------------------------
 502 |  Pointer to first function header (other-pointer tag)        |
 503 ----------------------------------------------------------------
 504 |  Debug information (structure tag)                           |
 505 ----------------------------------------------------------------
 506 |  First constant (a descriptor)                               |
 507 ----------------------------------------------------------------
 508 |  ...                                                         |
 509 ----------------------------------------------------------------
 510 |  Last constant (and last word of code header)                |
 511 ----------------------------------------------------------------
 512 |  Some instructions (non-descriptor)                          |
 513 ----------------------------------------------------------------
 514 |     (pad to dual-word boundary if necessary)                 |
 515
 516 B -->
 517 ****************************************************************
 518 |  Word offset from code header (24)   |   Return-PC-Type (8)  |
 519 ----------------------------------------------------------------
 520 |  First instruction after return                              |
 521 ----------------------------------------------------------------
 522 |  ... more code and LRA header-words                          |
 523 ----------------------------------------------------------------
 524 |     (pad to dual-word boundary if necessary)                 |
 525
 526 C -->
 527 ****************************************************************
 528 |  Offset from code header (24)  |   Function-Header-Type (8)  |
 529 ----------------------------------------------------------------
 530 |  Self-pointer back to previous word (with other-pointer tag) |
 531 ----------------------------------------------------------------
 532 |  Pointer to next function (other-pointer low-tag) or NIL     |
 533 ----------------------------------------------------------------
 534 |  Function name (a string or a symbol)                        |
 535 ----------------------------------------------------------------
 536 |  Function debug arglist (a string)                           |
 537 ----------------------------------------------------------------
 538 |  Function type (a list-style function type specifier)        |
 539 ----------------------------------------------------------------
 540 |  Start of instructions for function (non-descriptor)         |
 541 ----------------------------------------------------------------
 542 |  More function headers and instructions and return PCs,      |
 543 |  until we reach the total size of header-words + code        |
 544 |  words.                                                      |
 545 ----------------------------------------------------------------
 546 \end{verbatim}
 547
 548 The following are detailed slot descriptions:
 549 \begin{description}
 550    \item[Code data-block header-word:]
 551       The immediate data in the code data-block's header-word is the number of
 552       leading descriptors in the code data-block, the fixed overhead words plus
 553       the number of constants.  The first non-descriptor word, some code,
 554       appears at this word offset from the header.
 555    \item[Number of code words:]
 556       The total number of non-header-words in the code data-block.  The total
 557       word size of the code data-block is the sum of this slot and the
 558       immediate header-word data of the previous slot.
 559       header-word.
 560    \item[Pointer to first function header:]
 561       A NIL-terminated list of the function headers for all entry points to
 562       this component.
 563    \item[Debug information:]
 564       The DEBUG-INFO structure describing this component.  All information that
 565       the debugger wants to get from a running function is kept in this
 566       structure.  Since there are many functions, the current PC is used to
 567       locate the appropriate debug information.  The system keeps the debug
 568       information separate from the function data-block, since the currently
 569       running function may not be an entry point.  There is no way to recover
 570       the function object for the currently running function, since this
 571       data-block may not exist.
 572    \item[First constant ... last constant:]
 573       These are the constants referenced by the component, if there are any.
 574 \vspace{1ex}
 575    \item[LRA header word:]
 576       The immediate header-word data is the word offset from the enclosing code
 577       data-block's header-word to this word.  This allows GC and the debugger
 578       to easily recover the code data-block from a LRA.  The code at the
 579       return point restores the current code pointer using a subtract immediate
 580       of the offset, which is known at compile time.
 581 \vspace{1ex}
 582    \item[Function entry point header-word:]
 583       The immediate header-word data is the word offset from the enclosing code
 584       data-block's header-word to this word.  This is the same as for the
 585       retrun-PC header-word.
 586    \item[Self-pointer back to header-word:]
 587       In a non-closure function, this self-pointer to the previous header-word
 588       allows the call sequence to always indirect through the second word in a
 589       user callable function.  See section "Closure Format".  With a closure,
 590       indirecting through the second word gets you a function header-word.  The
 591       system ignores this slot in the function header for a closure, since it
 592       has already indirected once, and this slot could be some random thing
 593       that causes an error if you jump to it.  This pointer has an
 594       other-pointer tag instead of a function pointer tag, indicating it is not
 595       a user callable Lisp object.
 596    \item[Pointer to next function:]
 597       This is the next link in the thread of entry point functions found in
 598       this component.  This value is NIL when the current header is the last
 599       entry point in the component.
 600    \item[Function name:]
 601       This function's name (for printing).  If the user defined this function
 602       with DEFUN, then this is the defined symbol, otherwise it is a
 603       descriptive string.
 604    \item[Function debug arglist:]
 605       A printed string representing the function's argument list, for human
 606       readability.  If it is a macroexpansion function, then this is the
 607       original DEFMACRO arglist, not the actual expander function arglist.
 608    \item[Function type:]
 609       A list-style function type specifier representing the argument signature
 610       and return types for this function.  For example,
 611       \begin{verbatim}
 612 (function (fixnum fixnum fixnum) fixnum)
 613       \end{verbatim}
 614       or
 615       \begin{verbatim}
 616 (function (string &key (:start unsigned-byte)) string)
 617       \end{verbatim}
 618       This information is intended for machine readablilty, such as by the
 619       compiler.
 620 \end{description}
 621
 622 \f
 623 \section{Closure Format}
 624
 625 A closure data-block has the following format:
 626 \begin{verbatim}
 627 ----------------------------------------------------------------
 628 |  Word size (24 bits)           |  Closure-Type (8 bits)      |
 629 ----------------------------------------------------------------
 630 |  Pointer to function header (other-pointer low-tag)          |
 631 ----------------------------------------------------------------
 632 |                                 .                            |
 633 |                      Environment information                 |
 634 |                                 .                            |
 635 ----------------------------------------------------------------
 636 \end{verbatim}
 637
 638 A closure descriptor has function low-tag bits.  This means that a descriptor
 639 with function low-tag bits may point to either a function header or to a
 640 closure.  The idea is that any callable Lisp object has function low-tag bits.
 641 Insofar as call is concerned, we make the format of closures and non-closure
 642 functions compatible.  This is the reason for the self-pointer in a function
 643 header.  Whenever you have a callable object, you just jump through the second
 644 word, offset some bytes, and go.
 645
 646
 647 \f
 648 \section{Function call}
 649
 650 Due to alignment requirements and low-tag codes, it is not possible to use a
 651 hardware call instruction to compute the LRA.  Instead the LRA
 652 for a call is computed by doing an add-immediate to the start of the code
 653 data-block.
 654
 655 An advantage of using a single data-block to represent both the descriptor and
 656 non-descriptor parts of a function is that both can be represented by a
 657 single pointer.  This reduces the number of memory accesses that have to be
 658 done in a full call.  For example, since the constant pool is implicit in a
 659 LRA, a call need only save the LRA, rather than saving both the
 660 return PC and the constant pool.
 661
 662
 663 \f
 664 \section{Memory Layout}
 665
 666 CMU Common Lisp has four spaces, read-only, static, dynamic-0, and dynamic-1.
 667 Read-only contains objects that the system never modifies, moves, or reclaims.
 668 Static space contains some global objects necessary for the system's runtime or
 669 performance (since they are located at a known offset at a know address), and
 670 the system never moves or reclaims these.  However, GC does need to scan static
 671 space for references to moved objects.  Dynamic-0 and dynamic-1 are the two
 672 heap areas for stop-and-copy GC algorithms.
 673
 674 What global objects are at the head of static space???
 675 \begin{verbatim}
 676    NIL
 677    eval::*top-of-stack*
 678    lisp::*current-catch-block*
 679    lisp::*current-unwind-protect*
 680    FLAGS (RT only)
 681    BSP (RT only)
 682    HEAP (RT only)
 683 \end{verbatim}
 684
 685 In addition to the above spaces, the system has a control stack, binding stack,
 686 and a number stack.  The binding stack contains pairs of descriptors, a symbol
 687 and its previous value.  The number stack is the same as the C stack, and the
 688 system uses it for non-Lisp objects such as raw system pointers, saving
 689 non-Lisp registers, parts of bignum computations, etc.
 690
 691
 692 \f
 693 \section{System Pointers}
 694
 695 The system pointers reference raw allocated memory, data returned by foreign
 696 function calls, etc.  The system uses these when you need a pointer to a
 697 non-Lisp block of memory, using an other-pointer.  This provides the greatest
 698 flexibility by relieving contraints placed by having more direct references
 699 that require descriptor type tags.
 700
 701 A system area pointer data-block has the following format:
 702 \begin{verbatim}
 703 -------------------------------------------------------
 704 |     1 (data-block words)        | SAP Type (8 bits) |
 705 -------------------------------------------------------
 706 |             system area pointer                     |
 707 -------------------------------------------------------
 708 \end{verbatim}
 709
 710 "SAP" means "system area pointer", and much of our code contains this naming
 711 scheme.  We don't currently restrict system pointers to one area of memory, but
 712 if they do point onto the heap, it is up to the user to prevent being screwed
 713 by GC or whatever.