doc/manual/efficiency.texinfo

   1 @node Efficiency
   2 @comment  node-name,  next,  previous,  up
   3 @chapter Efficiency
   4 @cindex Efficiency
   5
   6 @menu
   7 * Slot access::
   8 * Dynamic-extent allocation::
   9 * Modular arithmetic::
  10 * Global and Always-Bound variables::
  11 * Miscellaneous Efficiency Issues::
  12 @end menu
  13
  14 @node  Slot access
  15 @comment  node-name,  next,  previous,  up
  16 @section Slot access
  17 @cindex Slot access
  18
  19 @subsection Structure object slot access
  20
  21 Structure slot accessors are efficient only if the compiler is able to
  22 open code them: compiling a call to a structure slot accessor before
  23 the structure is defined, declaring one @code{notinline}, or passing
  24 it as a functional argument to another function causes severe
  25 perfomance degradation.
  26
  27 @subsection Standard object slot access
  28
  29 The most efficient way to access a slot of a @code{standard-object} is
  30 by using @code{slot-value} with a constant slot name argument inside a
  31 @code{defmethod} body, where the variable holding the instance is a
  32 specializer parameter of the method and is never assigned to. The cost
  33 is roughly 1.6 times that of an open coded structure slot accessor.
  34
  35 Second most efficient way is to use a CLOS slot accessor, or
  36 @code{slot-value} with a constant slot name argument, but in
  37 circumstances other than specified above. This may be up to 3 times as
  38 slow as the method described above.
  39
  40 Example:
  41
  42 @lisp
  43 (defclass foo () ((bar)))
  44
  45 ;; Fast: specializer and never assigned to
  46 (defmethod quux ((foo foo) new)
  47   (let ((old (slot-value foo 'bar)))
  48     (setf (slot-value foo 'bar) new)
  49     old))
  50
  51 ;; Slow: not a specializer
  52 (defmethod quux ((foo foo) new)
  53   (let* ((temp foo)
  54          (old (slot-value temp 'bar)))
  55     (setf (slot-value temp 'bar) new)
  56     old))
  57
  58 ;; Slow: assignment to FOO
  59 (defmethod quux ((foo foo) new)
  60   (let ((old (slot-value foo 'bar)))
  61     (setf (slot-value foo 'bar) new)
  62     (setf foo new)
  63     old))
  64 @end lisp
  65
  66 Note that when profiling code such as this, the first few calls to the
  67 generic function are not representative, as the dispatch mechanism is
  68 lazily set up during those calls.
  69
  70 @node  Dynamic-extent allocation
  71 @comment  node-name,  next,  previous,  up
  72 @section Dynamic-extent allocation
  73 @cindex Dynamic-extent declaration
  74
  75 SBCL has limited support for performing allocation on the stack when a
  76 variable is declared @code{dynamic-extent}. The @code{dynamic-extent}
  77 declarations are not verified, but are simply trusted as long as
  78 @code{sb-ext:*stack-allocate-dynamic-extent*} is true.
  79
  80 If dynamic extent constraints specified in the Common Lisp standard
  81 are violated, the best that can happen is for the program to have
  82 garbage in variables and return values; more commonly, the system will
  83 crash.
  84
  85 @include var-sb-ext-star-stack-allocate-dynamic-extent-star.texinfo
  86
  87 There are many cases when @code{dynamic-extent} declarations could be
  88 useful. At present, SBCL implements stack allocation for
  89
  90 @itemize
  91
  92 @item
  93 @code{&rest} lists, when these are declared @code{dynamic-extent}.
  94
  95 @item
  96 @code{cons}, @code{list}, @code{list*}, and @code{vector} when the
  97 result is bound to a variable declared @code{dynamic-extent}.
  98
  99 @item
 100 simple forms of @code{make-array}, whose result is bound to a variable
 101 declared @code{dynamic-extent}: stack allocation is possible only if
 102 the resulting array is known to be both simple and one-dimensional,
 103 and has a constant @code{:element-type}.
 104
 105 @strong{Note}: stack space is limited, so allocation of a large vector
 106 may cause stack overflow. For this reason potentially large vectors,
 107 which might circumvent stack overflow detection, are stack allocated
 108 only in zero @code{safety} policies.
 109
 110 @item
 111 closures defined with @code{flet} or @code{labels}, with a bound
 112 @code{dynamic-extent} declaration. Closed-over variables, which are
 113 assigned to (either inside or outside the closure) are still allocated
 114 on the heap. Blocks and tags are also allocated on the heap, unless
 115 all non-local control transfers to them are compiled with zero
 116 @code{safety}.
 117
 118 @item
 119 user-defined structures when the structure constructor defined using
 120 @code{defstruct} has been declared @code{inline} and the result of the
 121 call to the constructor is bound to a variable declared
 122 @code{dynamic-extent}.
 123
 124 @strong{Note}: structures with ``raw'' slots can currently be
 125 stack-allocated only on x86 and x86-64.
 126
 127 @item
 128 all of the above when they appear as initial parts of another
 129 stack-allocated object.
 130
 131 @end itemize
 132
 133 Examples:
 134
 135 @lisp
 136 ;;; Declaiming a structure constructor inline before definition makes
 137 ;;; stack allocation possible.
 138 (declaim (inline make-thing))
 139 (defstruct thing obj next)
 140
 141 ;;; Stack allocation of various objects bound to DYNAMIC-EXTENT
 142 ;;; variables.
 143 (let* ((list (list 1 2 3))
 144        (nested (cons (list 1 2) (list* 3 4 (list 5))))
 145        (vector (make-array 3 :element-type 'single-float))
 146        (thing (make-thing :obj list
 147                           :next (make-thing :obj (make-array 3)))))
 148   (declare (dynamic-extent list nested vector thing))
 149   ...)
 150
 151 ;;; Stack allocation of arguments to a local function is equivalent
 152 ;;; to stack allocation of local variable values.
 153 (flet ((f (x)
 154          (declare (dynamic-extent x))
 155          ...))
 156   ...
 157   (f (list 1 2 3))
 158   (f (cons (cons 1 2) (cons 3 4)))
 159   ...)
 160
 161 ;;; Stack allocation of &REST lists
 162 (defun foo (&rest args)
 163   (declare (dynamic-extent args))
 164   ...)
 165 @end lisp
 166
 167 Future plans include
 168
 169 @itemize
 170
 171 @item
 172 Stack allocation of assigned-to closed-over variables, where these are
 173 declared @code{dynamic-extent};
 174
 175 @item
 176 Automatic detection of the common idiom of applying a function to some
 177 defaults and a @code{&rest} list, even when this is not declared
 178 @code{dynamic-extent};
 179
 180 @item
 181 Automatic detection of the common idiom of calling quantifiers with a
 182 closure, even when the closure is not declared @code{dynamic-extent}.
 183
 184 @end itemize
 185
 186 @node  Modular arithmetic
 187 @comment  node-name,  next,  previous,  up
 188 @section Modular arithmetic
 189 @cindex Modular arithmetic
 190 @cindex Arithmetic, modular
 191 @cindex Arithmetic, hardware
 192
 193 Some numeric functions have a property: @var{N} lower bits of the
 194 result depend only on @var{N} lower bits of (all or some)
 195 arguments. If the compiler sees an expression of form @code{(logand
 196 @var{exp} @var{mask})}, where @var{exp} is a tree of such ``good''
 197 functions and @var{mask} is known to be of type @code{(unsigned-byte
 198 @var{w})}, where @var{w} is a ``good'' width, all intermediate results
 199 will be cut to @var{w} bits (but it is not done for variables and
 200 constants!). This often results in an ability to use simple machine
 201 instructions for the functions.
 202
 203 Consider an example.
 204
 205 @lisp
 206 (defun i (x y)
 207   (declare (type (unsigned-byte 32) x y))
 208   (ldb (byte 32 0) (logxor x (lognot y))))
 209 @end lisp
 210
 211 The result of @code{(lognot y)} will be negative and of type
 212 @code{(signed-byte 33)}, so a naive implementation on a 32-bit
 213 platform is unable to use 32-bit arithmetic here. But modular
 214 arithmetic optimizer is able to do it: because the result is cut down
 215 to 32 bits, the compiler will replace @code{logxor} and @code{lognot}
 216 with versions cutting results to 32 bits, and because terminals
 217 (here---expressions @code{x} and @code{y}) are also of type
 218 @code{(unsigned-byte 32)}, 32-bit machine arithmetic can be used.
 219
 220 As of SBCL 0.8.5 ``good'' functions are @code{+}, @code{-};
 221 @code{logand}, @code{logior}, @code{logxor}, @code{lognot} and their
 222 combinations; and @code{ash} with the positive second
 223 argument. ``Good'' widths are 32 on HPPA, MIPS, PPC, Sparc and x86 and
 224 64 on Alpha.  While it is possible to support smaller widths as well,
 225 currently this is not implemented.
 226
 227 @node  Global and Always-Bound variables
 228 @comment  node-name,  next,  previous,  up
 229 @section Global and Always-Bound variables
 230
 231 @include macro-sb-ext-defglobal.texinfo
 232
 233 @deftp {Declaration} sb-ext:global
 234
 235 Syntax: @code{(sb-ext:global symbol*)}
 236
 237 Only valid as a global proclamation.
 238
 239 Specifies that the named symbols cannot be proclaimed or locally
 240 declared @code{special}. Proclaiming an already special or constant
 241 variable name as @code{global} signal an error. Allows more efficient
 242 value lookup in threaded environments in addition to expressing
 243 programmer intention.
 244 @end deftp
 245
 246 @deftp {Declaration} sb-ext:always-bound
 247
 248 Syntax: @code{(sb-ext:always-bound symbol*)}
 249
 250 Only valid as a global proclamation.
 251
 252 Specifies that the named symbols is always bound. Inhibits @code{makunbound}
 253 of the named symbols. Proclaiming an unbound symbol as @code{always-bound} signals
 254 an error. Allows compiler to elide boundness checks from value lookups.
 255 @end deftp
 256
 257 @node  Miscellaneous Efficiency Issues
 258 @comment  node-name,  next,  previous,  up
 259 @section Miscellaneous Efficiency Issues
 260
 261 FIXME: The material in the CMUCL manual about getting good
 262 performance from the compiler should be reviewed, reformatted in
 263 Texinfo, lightly edited for SBCL, and substituted into this
 264 manual. In the meantime, the original CMUCL manual is still 95+%
 265 correct for the SBCL version of the Python compiler. See the
 266 sections
 267
 268 @itemize
 269 @item Advanced Compiler Use and Efficiency Hints
 270 @item Advanced Compiler Introduction
 271 @item More About Types in Python
 272 @item Type Inference
 273 @item Source Optimization
 274 @item Tail Recursion
 275 @item Local Call
 276 @item Block Compilation
 277 @item Inline Expansion
 278 @item Object Representation
 279 @item Numbers
 280 @item General Efficiency Hints
 281 @item Efficiency Notes
 282 @end itemize
 283
 284 Besides this information from the CMUCL manual, there are a few other
 285 points to keep in mind.
 286
 287 @itemize
 288
 289 @item
 290 The CMUCL manual doesn't seem to state it explicitly, but Python has a
 291 mental block about type inference when assignment is involved. Python
 292 is very aggressive and clever about inferring the types of values
 293 bound with @code{let}, @code{let*}, inline function call, and so
 294 forth. However, it's much more passive and dumb about inferring the
 295 types of values assigned with @code{setq}, @code{setf}, and
 296 friends. It would be nice to fix this, but in the meantime don't
 297 expect that just because it's very smart about types in most respects
 298 it will be smart about types involved in assignments.  (This doesn't
 299 affect its ability to benefit from explicit type declarations
 300 involving the assigned variables, only its ability to get by without
 301 explicit type declarations.)
 302
 303 @c <!-- FIXME: Python dislikes assignments, but not in type
 304 @c     inference. The real problems are loop induction, closed over
 305 @c     variables and aliases. -->
 306
 307 @item
 308 Since the time the CMUCL manual was written, CMUCL (and thus SBCL) has
 309 gotten a generational garbage collector. This means that there are
 310 some efficiency implications of various patterns of memory usage which
 311 aren't discussed in the CMUCL manual. (Some new material should be
 312 written about this.)
 313
 314 @item
 315 SBCL has some important known efficiency problems.  Perhaps the most
 316 important are
 317
 318 @itemize @minus
 319
 320 @item
 321 The garbage collector is not particularly efficient, at least on
 322 platforms without the generational collector (as of SBCL 0.8.9, all
 323 except x86).
 324
 325 @item
 326 Various aspects of the PCL implementation of CLOS are more inefficient
 327 than necessary.
 328
 329 @end itemize
 330
 331 @end itemize
 332
 333 Finally, note that Common Lisp defines many constructs which, in
 334 the infamous phrase, ``could be compiled efficiently by a
 335 sufficiently smart compiler''. The phrase is infamous because
 336 making a compiler which actually is sufficiently smart to find all
 337 these optimizations systematically is well beyond the state of the art
 338 of current compiler technology. Instead, they're optimized on a
 339 case-by-case basis by hand-written code, or not optimized at all if
 340 the appropriate case hasn't been hand-coded. Some cases where no such
 341 hand-coding has been done as of SBCL version 0.6.3 include
 342
 343 @itemize
 344
 345 @item
 346 @code{(reduce #'f x)} where the type of @code{x} is known at compile
 347 time
 348
 349 @item
 350 various bit vector operations, e.g.  @code{(position 0
 351 some-bit-vector)}
 352
 353 @item
 354 specialized sequence idioms, e.g.  @code{(remove item list :count 1)}
 355
 356 @item
 357 cases where local compilation policy does not require excessive type
 358 checking, e.g.  @code{(locally (declare (safety 1)) (assoc item
 359 list))} (which currently performs safe @code{endp} checking internal
 360 to assoc).
 361
 362 @end itemize
 363
 364 If your system's performance is suffering because of some construct
 365 which could in principle be compiled efficiently, but which the SBCL
 366 compiler can't in practice compile efficiently, consider writing a
 367 patch to the compiler and submitting it for inclusion in the main
 368 sources. Such code is often reasonably straightforward to write;
 369 search the sources for the string ``@code{deftransform}'' to find many
 370 examples (some straightforward, some less so).