doc/manual/efficiency.texinfo

   1 @node Efficiency
   2 @comment  node-name,  next,  previous,  up
   3 @chapter Efficiency
   4
   5 FIXME: The material in the CMUCL manual about getting good
   6 performance from the compiler should be reviewed, reformatted in
   7 Texinfo, lightly edited for SBCL, and substituted into this
   8 manual. In the meantime, the original CMUCL manual is still 95+%
   9 correct for the SBCL version of the Python compiler. See the
  10 sections
  11
  12 @itemize
  13 @item Advanced Compiler Use and Efficiency Hints
  14 @item Advanced Compiler Introduction
  15 @item More About Types in Python
  16 @item Type Inference
  17 @item Source Optimization
  18 @item Tail Recursion
  19 @item Local Call
  20 @item Block Compilation
  21 @item Inline Expansion
  22 @item Object Representation
  23 @item Numbers
  24 @item General Efficiency Hints
  25 @item Efficiency Notes
  26 @end itemize
  27
  28 Besides this information from the CMUCL manual, there are a few other
  29 points to keep in mind.
  30
  31 @itemize
  32
  33 @item
  34 The CMUCL manual doesn't seem to state it explicitly, but Python has a
  35 mental block about type inference when assignment is involved. Python
  36 is very aggressive and clever about inferring the types of values
  37 bound with @code{let}, @code{let*}, inline function call, and so
  38 forth. However, it's much more passive and dumb about inferring the
  39 types of values assigned with @code{setq}, @code{setf}, and
  40 friends. It would be nice to fix this, but in the meantime don't
  41 expect that just because it's very smart about types in most respects
  42 it will be smart about types involved in assignments.  (This doesn't
  43 affect its ability to benefit from explicit type declarations
  44 involving the assigned variables, only its ability to get by without
  45 explicit type declarations.)
  46
  47 @c <!-- FIXME: Python dislikes assignments, but not in type
  48 @c     inference. The real problems are loop induction, closed over
  49 @c     variables and aliases. -->
  50
  51 @item
  52 Since the time the CMUCL manual was written, CMUCL (and thus SBCL) has
  53 gotten a generational garbage collector. This means that there are
  54 some efficiency implications of various patterns of memory usage which
  55 aren't discussed in the CMUCL manual. (Some new material should be
  56 written about this.)
  57
  58 @item
  59 SBCL has some important known efficiency problems.  Perhaps the most
  60 important are
  61
  62 @itemize @minus
  63
  64 @item
  65 There is only limited support for the ANSI @code{dynamic-extent}
  66 declaration.  @xref{Dynamic-extent allocation}.
  67
  68 @item
  69 The garbage collector is not particularly efficient, at least on
  70 platforms without the generational collector (as of SBCL 0.8.9, all
  71 except x86).
  72
  73 @item
  74 Various aspects of the PCL implementation of CLOS are more inefficient
  75 than necessary.
  76
  77 @end itemize
  78
  79 @end itemize
  80
  81 Finally, note that Common Lisp defines many constructs which, in
  82 the infamous phrase, ``could be compiled efficiently by a
  83 sufficiently smart compiler''. The phrase is infamous because
  84 making a compiler which actually is sufficiently smart to find all
  85 these optimizations systematically is well beyond the state of the art
  86 of current compiler technology. Instead, they're optimized on a
  87 case-by-case basis by hand-written code, or not optimized at all if
  88 the appropriate case hasn't been hand-coded. Some cases where no such
  89 hand-coding has been done as of SBCL version 0.6.3 include
  90
  91 @itemize
  92
  93 @item
  94 @code{(reduce #'f x)} where the type of @code{x} is known at compile
  95 time
  96
  97 @item
  98 various bit vector operations, e.g.  @code{(position 0
  99 some-bit-vector)}
 100
 101 @item
 102 specialized sequence idioms, e.g.  @code{(remove item list :count 1)}
 103
 104 @item
 105 cases where local compilation policy does not require excessive type
 106 checking, e.g.  @code{(locally (declare (safety 1)) (assoc item
 107 list))} (which currently performs safe @code{endp} checking internal
 108 to assoc).
 109
 110 @end itemize
 111
 112 If your system's performance is suffering because of some construct
 113 which could in principle be compiled efficiently, but which the SBCL
 114 compiler can't in practice compile efficiently, consider writing a
 115 patch to the compiler and submitting it for inclusion in the main
 116 sources. Such code is often reasonably straightforward to write;
 117 search the sources for the string ``@code{deftransform}'' to find many
 118 examples (some straightforward, some less so).
 119
 120 @menu
 121 * Dynamic-extent allocation::
 122 * Modular arithmetic::
 123 @end menu
 124
 125 @node  Dynamic-extent allocation
 126 @comment  node-name,  next,  previous,  up
 127 @section Dynamic-extent allocation
 128 @cindex Dynamic-extent declaration
 129
 130 SBCL has limited support for performing allocation on the stack when a
 131 variable is declared @code{dynamic-extent}.  The @code{dynamic-extent}
 132 declarations are not verified, but are simply trusted; if the
 133 constraints in the Common Lisp standard are violated, the best that
 134 can happen is for the program to have garbage in variables and return
 135 values; more commonly, the system will crash.
 136
 137 As a consequence of this, the condition for performing stack
 138 allocation is stringent: either of the @code{speed} or @code{space}
 139 optimization qualities must be higher than the maximum of
 140 @code{safety} and @code{debug} at the point of the allocation.  For
 141 example:
 142
 143 @lisp
 144 (locally
 145   (declare (optimize speed (safety 1) (debug 1)))
 146   (defun foo (&rest rest)
 147     (declare (dynamic-extent rest))
 148     (length rest)))
 149 @end lisp
 150
 151 Here the @code{&rest} list will be allocated on the stack.  Note that
 152 it would not be in the following situation:
 153
 154 @lisp
 155 (defun foo (&rest rest)
 156   (declare (optimize speed (safety 1) (debug 1)))
 157   (declare (dynamic-extent rest))
 158   (length rest))
 159 @end lisp
 160
 161 because both the allocation of the @code{&rest} list and the variable
 162 binding are outside the scope of the @code{optimize} declaration.
 163
 164 There are many cases when dynamic-extent declarations could be useful.
 165 At present, SBCL implements
 166
 167 @itemize
 168
 169 @item
 170 Stack allocation of @code{&rest} lists, where these are declared
 171 @code{dynamic-extent}.
 172
 173 @end itemize
 174
 175 Future plans include
 176
 177 @itemize
 178
 179 @item
 180 Stack allocation of closures, where these are declared
 181 @code{dynamic-extent};
 182
 183 @item
 184 Stack allocation of @code{list}, @code{list*} and @code{cons}
 185 (including following chains during initialization, and also for
 186 binding mutation), where the allocation is declared
 187 @code{dynamic-extent};
 188
 189 @item
 190 Automatic detection of the common idiom of applying a function to some
 191 defaults and a @code{&rest} list, even when this is not declared
 192 @code{dynamic-extent};
 193
 194 @item
 195 Automatic detection of the common idiom of calling quantifiers with a
 196 closure, even when the closure is not declared @code{dynamic-extent}.
 197
 198 @end itemize
 199
 200 @node  Modular arithmetic
 201 @comment  node-name,  next,  previous,  up
 202 @section Modular arithmetic
 203 @cindex Modular arithmetic
 204 @cindex Arithmetic, modular
 205 @cindex Arithmetic, hardware
 206
 207 Some numeric functions have a property: @var{N} lower bits of the
 208 result depend only on @var{N} lower bits of (all or some)
 209 arguments. If the compiler sees an expression of form @code{(logand
 210 @var{exp} @var{mask})}, where @var{exp} is a tree of such ``good''
 211 functions and @var{mask} is known to be of type @code{(unsigned-byte
 212 @var{w})}, where @var{w} is a ``good'' width, all intermediate results
 213 will be cut to @var{w} bits (but it is not done for variables and
 214 constants!). This often results in an ability to use simple machine
 215 instructions for the functions.
 216
 217 Consider an example.
 218
 219 @lisp
 220 (defun i (x y)
 221   (declare (type (unsigned-byte 32) x y))
 222   (ldb (byte 32 0) (logxor x (lognot y))))
 223 @end lisp
 224
 225 The result of @code{(lognot y)} will be negative and of type
 226 @code{(signed-byte 33)}, so a naive implementation on a 32-bit
 227 platform is unable to use 32-bit arithmetic here. But modular
 228 arithmetic optimizer is able to do it: because the result is cut down
 229 to 32 bits, the compiler will replace @code{logxor} and @code{lognot}
 230 with versions cutting results to 32 bits, and because terminals
 231 (here---expressions @code{x} and @code{y}) are also of type
 232 @code{(unsigned-byte 32)}, 32-bit machine arithmetic can be used.
 233
 234 As of SBCL 0.8.5 ``good'' functions are @code{+}, @code{-};
 235 @code{logand}, @code{logior}, @code{logxor}, @code{lognot} and their
 236 combinations; and @code{ash} with the positive second
 237 argument. ``Good'' widths are 32 on HPPA, MIPS, PPC, Sparc and x86 and
 238 64 on Alpha.  While it is possible to support smaller widths as well,
 239 currently this is not implemented.