doc/manual/efficiency.texinfo

   1 @node Efficiency
   2 @comment  node-name,  next,  previous,  up
   3 @chapter Efficiency
   4 @cindex Efficiency
   5
   6 FIXME: The material in the CMUCL manual about getting good
   7 performance from the compiler should be reviewed, reformatted in
   8 Texinfo, lightly edited for SBCL, and substituted into this
   9 manual. In the meantime, the original CMUCL manual is still 95+%
  10 correct for the SBCL version of the Python compiler. See the
  11 sections
  12
  13 @itemize
  14 @item Advanced Compiler Use and Efficiency Hints
  15 @item Advanced Compiler Introduction
  16 @item More About Types in Python
  17 @item Type Inference
  18 @item Source Optimization
  19 @item Tail Recursion
  20 @item Local Call
  21 @item Block Compilation
  22 @item Inline Expansion
  23 @item Object Representation
  24 @item Numbers
  25 @item General Efficiency Hints
  26 @item Efficiency Notes
  27 @end itemize
  28
  29 Besides this information from the CMUCL manual, there are a few other
  30 points to keep in mind.
  31
  32 @itemize
  33
  34 @item
  35 The CMUCL manual doesn't seem to state it explicitly, but Python has a
  36 mental block about type inference when assignment is involved. Python
  37 is very aggressive and clever about inferring the types of values
  38 bound with @code{let}, @code{let*}, inline function call, and so
  39 forth. However, it's much more passive and dumb about inferring the
  40 types of values assigned with @code{setq}, @code{setf}, and
  41 friends. It would be nice to fix this, but in the meantime don't
  42 expect that just because it's very smart about types in most respects
  43 it will be smart about types involved in assignments.  (This doesn't
  44 affect its ability to benefit from explicit type declarations
  45 involving the assigned variables, only its ability to get by without
  46 explicit type declarations.)
  47
  48 @c <!-- FIXME: Python dislikes assignments, but not in type
  49 @c     inference. The real problems are loop induction, closed over
  50 @c     variables and aliases. -->
  51
  52 @item
  53 Since the time the CMUCL manual was written, CMUCL (and thus SBCL) has
  54 gotten a generational garbage collector. This means that there are
  55 some efficiency implications of various patterns of memory usage which
  56 aren't discussed in the CMUCL manual. (Some new material should be
  57 written about this.)
  58
  59 @item
  60 SBCL has some important known efficiency problems.  Perhaps the most
  61 important are
  62
  63 @itemize @minus
  64
  65 @item
  66 There is only limited support for the ANSI @code{dynamic-extent}
  67 declaration.  @xref{Dynamic-extent allocation}.
  68
  69 @item
  70 The garbage collector is not particularly efficient, at least on
  71 platforms without the generational collector (as of SBCL 0.8.9, all
  72 except x86).
  73
  74 @item
  75 Various aspects of the PCL implementation of CLOS are more inefficient
  76 than necessary.
  77
  78 @end itemize
  79
  80 @end itemize
  81
  82 Finally, note that Common Lisp defines many constructs which, in
  83 the infamous phrase, ``could be compiled efficiently by a
  84 sufficiently smart compiler''. The phrase is infamous because
  85 making a compiler which actually is sufficiently smart to find all
  86 these optimizations systematically is well beyond the state of the art
  87 of current compiler technology. Instead, they're optimized on a
  88 case-by-case basis by hand-written code, or not optimized at all if
  89 the appropriate case hasn't been hand-coded. Some cases where no such
  90 hand-coding has been done as of SBCL version 0.6.3 include
  91
  92 @itemize
  93
  94 @item
  95 @code{(reduce #'f x)} where the type of @code{x} is known at compile
  96 time
  97
  98 @item
  99 various bit vector operations, e.g.  @code{(position 0
 100 some-bit-vector)}
 101
 102 @item
 103 specialized sequence idioms, e.g.  @code{(remove item list :count 1)}
 104
 105 @item
 106 cases where local compilation policy does not require excessive type
 107 checking, e.g.  @code{(locally (declare (safety 1)) (assoc item
 108 list))} (which currently performs safe @code{endp} checking internal
 109 to assoc).
 110
 111 @end itemize
 112
 113 If your system's performance is suffering because of some construct
 114 which could in principle be compiled efficiently, but which the SBCL
 115 compiler can't in practice compile efficiently, consider writing a
 116 patch to the compiler and submitting it for inclusion in the main
 117 sources. Such code is often reasonably straightforward to write;
 118 search the sources for the string ``@code{deftransform}'' to find many
 119 examples (some straightforward, some less so).
 120
 121 @menu
 122 * Dynamic-extent allocation::
 123 * Modular arithmetic::
 124 @end menu
 125
 126 @node  Dynamic-extent allocation
 127 @comment  node-name,  next,  previous,  up
 128 @section Dynamic-extent allocation
 129 @cindex Dynamic-extent declaration
 130
 131 SBCL has limited support for performing allocation on the stack when a
 132 variable is declared @code{dynamic-extent}.  The @code{dynamic-extent}
 133 declarations are not verified, but are simply trusted; if the
 134 constraints in the Common Lisp standard are violated, the best that
 135 can happen is for the program to have garbage in variables and return
 136 values; more commonly, the system will crash.
 137
 138 As a consequence of this, the condition for performing stack
 139 allocation is stringent: either of the @code{speed} or @code{space}
 140 optimization qualities must be higher than the maximum of
 141 @code{safety} and @code{debug} at the point of the allocation.  For
 142 example:
 143
 144 @lisp
 145 (locally
 146   (declare (optimize speed (safety 1) (debug 1)))
 147   (defun foo (&rest rest)
 148     (declare (dynamic-extent rest))
 149     (length rest)))
 150 @end lisp
 151
 152 Here the @code{&rest} list will be allocated on the stack.  Note that
 153 it would not be in the following situation:
 154
 155 @lisp
 156 (defun foo (&rest rest)
 157   (declare (optimize speed (safety 1) (debug 1)))
 158   (declare (dynamic-extent rest))
 159   (length rest))
 160 @end lisp
 161
 162 because both the allocation of the @code{&rest} list and the variable
 163 binding are outside the scope of the @code{optimize} declaration.
 164
 165 There are many cases when @code{dynamic-extent} declarations could be
 166 useful. At present, SBCL implements
 167
 168 @itemize
 169
 170 @item
 171 Stack allocation of @code{&rest} lists, where these are declared
 172 @code{dynamic-extent}.
 173
 174 @item
 175 Stack allocation of @code{list} and @code{list*}, whose result is
 176 bound to a variable, declared @code{dynamic-extent}, such as
 177
 178 @lisp
 179 (let ((list (list 1 2 3)))
 180   (declare (dynamic-extent list)
 181   ...))
 182 @end lisp
 183
 184 or
 185
 186 @lisp
 187 (flet ((f (x)
 188          (declare (dynamic-extent x))
 189          ...))
 190   ...
 191   (f (list 1 2 3))
 192   ...)
 193 @end lisp
 194
 195 @item
 196 Stack allocation of simple forms of @code{make-array}, whose result is
 197 bound to a variable, declared @code{dynamic-extent}. The resulting
 198 array should be one-dimensional, the only allowed keyword argument is
 199 @code{:element-type}.
 200
 201 Notice, that stack space is limited, so allocation of a large vector
 202 may cause stack overflow and abnormal termination of the SBCL process.
 203
 204 @item
 205 Stack allocation of closures, defined with @code{flet} or
 206 @code{labels} with a bound declaration @code{dynamic-extent}.
 207 Closed-over variables, which are assigned (either inside or outside
 208 the closure) are still allocated on the heap. Blocks and tags are also
 209 allocated on the heap, unless all non-local control transfers to them
 210 are compiled with zero @code{safety}.
 211
 212 @end itemize
 213
 214 Future plans include
 215
 216 @itemize
 217
 218 @item
 219 Stack allocation of closures, where these are declared
 220 @code{dynamic-extent};
 221
 222 @item
 223 Stack allocation of @code{list}, @code{list*} and @code{cons}
 224 (including following chains during initialization, and also for
 225 binding mutation), where the allocation is declared
 226 @code{dynamic-extent};
 227
 228 @item
 229 Automatic detection of the common idiom of applying a function to some
 230 defaults and a @code{&rest} list, even when this is not declared
 231 @code{dynamic-extent};
 232
 233 @item
 234 Automatic detection of the common idiom of calling quantifiers with a
 235 closure, even when the closure is not declared @code{dynamic-extent}.
 236
 237 @end itemize
 238
 239 @node  Modular arithmetic
 240 @comment  node-name,  next,  previous,  up
 241 @section Modular arithmetic
 242 @cindex Modular arithmetic
 243 @cindex Arithmetic, modular
 244 @cindex Arithmetic, hardware
 245
 246 Some numeric functions have a property: @var{N} lower bits of the
 247 result depend only on @var{N} lower bits of (all or some)
 248 arguments. If the compiler sees an expression of form @code{(logand
 249 @var{exp} @var{mask})}, where @var{exp} is a tree of such ``good''
 250 functions and @var{mask} is known to be of type @code{(unsigned-byte
 251 @var{w})}, where @var{w} is a ``good'' width, all intermediate results
 252 will be cut to @var{w} bits (but it is not done for variables and
 253 constants!). This often results in an ability to use simple machine
 254 instructions for the functions.
 255
 256 Consider an example.
 257
 258 @lisp
 259 (defun i (x y)
 260   (declare (type (unsigned-byte 32) x y))
 261   (ldb (byte 32 0) (logxor x (lognot y))))
 262 @end lisp
 263
 264 The result of @code{(lognot y)} will be negative and of type
 265 @code{(signed-byte 33)}, so a naive implementation on a 32-bit
 266 platform is unable to use 32-bit arithmetic here. But modular
 267 arithmetic optimizer is able to do it: because the result is cut down
 268 to 32 bits, the compiler will replace @code{logxor} and @code{lognot}
 269 with versions cutting results to 32 bits, and because terminals
 270 (here---expressions @code{x} and @code{y}) are also of type
 271 @code{(unsigned-byte 32)}, 32-bit machine arithmetic can be used.
 272
 273 As of SBCL 0.8.5 ``good'' functions are @code{+}, @code{-};
 274 @code{logand}, @code{logior}, @code{logxor}, @code{lognot} and their
 275 combinations; and @code{ash} with the positive second
 276 argument. ``Good'' widths are 32 on HPPA, MIPS, PPC, Sparc and x86 and
 277 64 on Alpha.  While it is possible to support smaller widths as well,
 278 currently this is not implemented.