doc/internals/objects-in-memory.texinfo

   1 @node Objects In Memory
   2 @comment  node-name,  next,  previous,  up
   3 @chapter Objects In Memory
   4
   5 @menu
   6 * Type tags::
   7 * Heap Object Layout::
   8 @end menu
   9
  10 @node Type tags
  11 @section Type tags
  12
  13 The in-memory representation of Lisp data includes type information
  14 about each object.  This type information takes the form of a lowtag
  15 in the low bits of each pointer to heap space, a widetag for each
  16 boxed immediate value and a header (also with a widetag) at the start
  17 of the allocated space for each object.  These tags are used to inform
  18 both the GC and Lisp code about the type and allocated size of Lisp
  19 objects.
  20
  21 @c FIXME: Add diagrams showing tag allocation to next two sections.
  22
  23 @subsection Lowtags
  24
  25 Objects allocated on the Lisp heap are aligned to a double-word
  26 boundary, leaving the low-order bits (which would normally identify a
  27 particular octet within the first two words) available for use to hold
  28 type information.  This turns out to be three bits on 32-bit systems
  29 and four bits on 64-bit systems.
  30
  31 Of these 8 or 16 tags, we have some constraints for allocation:
  32
  33 @itemize
  34 @item
  35 We need 6 of the low 8 bits of the word for widetags, meaning that one
  36 out of every four lowtags must be an @code{other-immediate} lowtag.
  37 @item
  38 We have four pointer types.  Instance (struct and CLOS) pointers,
  39 function pointers, list pointers, and other pointers.
  40 @item
  41 @code{fixnum}s are required to have their lowtags be comprised
  42 entirely of zeros.
  43 @item
  44 There are additional constraints around the ordering of the pointer
  45 types, particularly with respect to list pointers (the NIL-cons hack).
  46 @end itemize
  47
  48 Complicating this issue is that while the lowtag @emph{space} is three
  49 or four bits wide, some of the lowtags are effectively narrower.  The
  50 @code{other-immediate} tags effectively have a two-bit lowtag, and
  51 @code{fixnum}s have historically been one bit narrower than the other
  52 lowtags (thus @code{even-fixnum-lowtag} and @code{odd-fixnum-lowtag})
  53 though with the recent work on wider fixnums on 64-bit systems this is
  54 no longer necessarily so.
  55
  56 The lowtags are specified in
  57 @file{src/compiler/generic/early-objdef.lisp}.
  58
  59 @c 32-bit lowtag assignment
  60 @c x00 -- Fixnum
  61 @c x10 -- Other-immediate
  62 @c xx1 -- Pointer
  63 @c 001 --   Instance-pointer
  64 @c 011 --   List-pointer
  65 @c 101 --   Function-pointer
  66 @c 111 --   Other-pointer
  67
  68 @c 32-bit lowtag assignment (proposed wider-fixnum branch)
  69 @c xxx0 -- Fixnum
  70 @c xx01 -- Other-immediate
  71 @c xx11 -- Pointer
  72 @c x011 --   List-pointer
  73 @c 0111 --   (Either instance or function pointer)
  74 @c 1111 --   Other-pointer
  75
  76 @c 64-bit lowtag assignment (pre-wider-fixnums)
  77 @c x000 -- Fixnum
  78 @c xx10 -- Other-immediate
  79 @c xyyy -- Unused (where 011 <= yyy <= 110)
  80 @c xyy1 -- Pointer (where yy is either 00 or 11)
  81 @c 0001 --   Instance-pointer
  82 @c 0111 --   List-pointer
  83 @c 1001 --   Function-pointer
  84 @c 1111 --   Other-pointer
  85
  86 @c 64-bit lowtag assignment (wider-fixnums)
  87 @c xyz0 -- Fixnum (where z or yz may also be 0 depending on n-fixnum-tag-bits)
  88 @c xx01 -- Other-immediate
  89 @c xx11 -- Pointer
  90 @c 0011 --   Instance-pointer
  91 @c 0111 --   List-pointer
  92 @c 1011 --   Function-pointer
  93 @c 1111 --   Other-pointer
  94
  95 @subsubsection Fixnums
  96
  97 @code{Fixnum}s are signed integers represented as immediate values.
  98 In SBCL, these integers are @code{(- n-word-bits n-fixnum-tag-bits)}
  99 bits wide, stored in the most-significant section of a machine word.
 100
 101 The reason that @code{fixnum} tags are required to have the low
 102 @code{n-fixnum-tag-bits} as zeros is that it allows for addition and
 103 subtraction to be performed using native machine instructions
 104 directly, and multiplication and division can be performed likewise
 105 using a simple shift instruction to compensate for the effect of the
 106 tag.
 107
 108 @subsubsection Other-immediates
 109
 110 @code{Other-immediate}s are the lowtag part of widetag values.  Due to
 111 the constraints of widetag allocation, one out of every four lowtags
 112 must be a widetag (alternately, the width of the
 113 @code{other-immediate} lowtag is two bits).
 114
 115 @subsubsection Pointers
 116
 117 There are four different pointer lowtags, largely for optimization
 118 purposes.
 119
 120 @itemize
 121 @item
 122 We have a distinct list pointer tag so that we can do a listp test by
 123 simply checking the pointer tag instead of needing to retrieve a
 124 header word for each @code{cons} cell.  This effectively halves the
 125 memory cost of @code{cons} cells.
 126 @item
 127 We have a distinct instance pointer tag so that we do not need to
 128 check a header word for each instance when doing a type check.  This
 129 saves a memory access for retrieving the class of an instance.
 130 @item
 131 We have a distinct function pointer tag so that we do not need to
 132 check a header word to determine if a given pointer is directly
 133 funcallable (that is, if the pointer is to a closure, a simple-fun, or
 134 a funcallable-instance).  This saves a memory access in the type test
 135 prior to @code{funcall} or @code{apply} of a function object.
 136 @item
 137 We have one last pointer tag for everything else.  Obtaining further
 138 type information from these pointers requires fetching the header word
 139 and dispatching on the widetag.
 140 @end itemize
 141
 142 @subsection Widetags
 143
 144 Widetags are used for three purposes.  First, to provide type
 145 information for immediate (non-pointer) data such as characters.
 146 Second, to provide ``marker'' values for things such as unbound slots.
 147 Third, to provide type information for objects stored on the heap.
 148
 149 Because widetags are used for immediate data they must have a lowtag
 150 component.  This ends up being the @code{other-immediate} lowtags.
 151 For various reasons it was deemed convenient for widetags to be no
 152 more than eight bits wide, and with 27 or more distinct array types
 153 (depending on build-time configuration), seven numeric types, markers,
 154 and non-numeric heap object headers there ends up being more than 32
 155 widetags required (though less than 64).  This combination of factors
 156 leads to the requirement that one out of every four lowtags be an
 157 @code{other-immediate} lowtag.
 158
 159 As widetags are involved in type tests for non-CLOS objects, their
 160 allocation is carefully arranged to allow for certain type tests to be
 161 cheaper than they might otherwise be.
 162
 163 @itemize
 164 @item
 165 The numeric types are arranged to make @code{rational}, @code{float},
 166 @code{real}, @code{complex} and @code{number} type tests become range
 167 tests on the widetag.
 168 @item
 169 The array types are arranged to make various type tests become range
 170 tests on the widetag.
 171 @item
 172 The string types have disjoint ranges, but have been arranged so that
 173 their ranges differ only by one bit, allowing the @code{stringp} type
 174 test to become a masking operation followed by a range test or a
 175 masking operation followed by a simple comparison.
 176 @item
 177 There may be other clevernesses, these are just what can be found
 178 through reading the comments above the widetag definition.
 179 @end itemize
 180
 181 The widetags are specified in
 182 @file{src/compiler/generic/early-objdef.lisp}.
 183
 184 @node Heap Object Layout
 185 @section Heap Object Layout
 186
 187 Objects stored in the heap are of two kinds: those with headers, and
 188 cons cells.  If the first word of an object has a header widetag then
 189 the object has the type and layout associated with that widetag.
 190 Otherwise, the object is assumed to be a @code{cons} cell.
 191
 192 Some objects have ``unboxed'' words without any associated type
 193 information as well as the more usual ``boxed'' words with lowtags.
 194 Obvious cases include the specialized array types, some of the numeric
 195 types, @code{system-area-pointer}s, and so on.
 196
 197 The primitive object layouts are specified in
 198 @file{src/compiler/generic/objdef.lisp}.
 199
 200 @subsection Header Values
 201
 202 As a widetag is only eight bits wide but a heap object header takes a
 203 full machine word, there is an extra 24 or 56 bits of space available
 204 for unboxed data storage in each heap object.  This space is called
 205 the ``header value'', and is used for various purposes depending on
 206 the type of heap object in question.
 207
 208 @subsection Symbols
 209
 210 In contrast to the simple model of symbols provided in the Common Lisp
 211 standard, symbol objects in SBCL do not have a function cell.
 212 Instead, the mapping from symbols to functions is done via the
 213 compiler globaldb.
 214
 215 There are two additional slots associated with symbols.  One is a hash
 216 value for the symbol (based on the symbol name), which avoids having
 217 to recompute the hash from the name every time it is required.
 218
 219 The other additional slot, on threaded systems only, is the TLS index,
 220 which is either @code{no-tls-value-marker-widetag} or an unboxed byte
 221 offset within the TLS area to the TLS slot associated with the symbol.
 222 Because the unboxed offset is aligned to a word boundary it appears as
 223 a @code{fixnum} when viewed as boxed data.  It is not, in general,
 224 safe to increment this value as a @code{fixnum}, however, in case
 225 @code{n-fixnum-tag-bits} changes@footnote{This is not as unlikely as
 226 it might seem at first; while historically @code{n-fixnum-tag-bits}
 227 has always been the same as @code{word-shift} there is a branch where
 228 it is permitted to vary at build time from @code{word-shift} to as low
 229 as 1 on 64-bit ports, and a proposed scheme to allow the same on
 230 32-bit ports}.
 231
 232 @subsection The NIL-cons Hack
 233
 234 As an ``optimization'', the symbol @code{nil} has
 235 @code{list-pointer-lowtag} rather than @code{other-pointer-lowtag},
 236 and is aligned in memory so that the value and hash slots are the
 237 @code{car} and @code{cdr} of the @code{cons}, with both slots
 238 containing @code{nil}.  This allows for @code{car} and @code{cdr} to
 239 simply do a lowtag test and slot access instead of having to
 240 explicitly test for @code{nil}, at the cost of requiring all symbol
 241 type tests and slot accesses to test for @code{nil}.
 242
 243 @subsection Functions and Code Components
 244
 245 @c This could do with a diagram showing a code-component with a couple
 246 @c of simple-fun entry points.
 247
 248 All compiled code resides in @code{code-component} objects.  These
 249 objects consist of a header, some number of boxed literal values, a
 250 ``data block'' containing machine code and @code{simple-fun} headers,
 251 and a ``trace table'' which is currently unused@footnote{Trace tables
 252 were originally used to support garbage collection using gengc in
 253 CMUCL.  As there is still vestigial support for carrying them around
 254 at the end of @code{code-component}s, they may end up being used for
 255 something else in the future.}.
 256
 257 The @code{simple-fun} headers represent simple function objects (not
 258 @code{funcallable-instance}s or closures), and each
 259 @code{code-component} will typically have one for the main entry point
 260 and one per closure entry point (as the function underlying the
 261 closure, not the closure object proper).  In a compiler trace-file,
 262 the @code{simple-fun} headers are all listed as entries in the IR2
 263 component.
 264
 265 The @code{simple-fun} headers are held in a linked list per
 266 @code{code-component} in order to allow the garbage collector to find
 267 them during relocation.  In order to be able to find the start of a
 268 @code{code-component} from a @code{simple-fun}, the header value is
 269 the offset in words from the start of the @code{code-component} to the
 270 start of the @code{simple-fun}.