Efficiency</> <para>FIXME: The material in the &CMUCL; manual about getting good performance from the compiler should be reviewed, reformatted in DocBook, lightly edited for &SBCL;, and substituted into this manual. In the meantime, the original &CMUCL; manual is still 95+% correct for the &SBCL; version of the &Python; compiler. See the sections <itemizedlist> <listitem><para>Advanced Compiler Use and Efficiency Hints</></> <listitem><para>Advanced Compiler Introduction</></> <listitem><para>More About Types in Python</></> <listitem><para>Type Inference</></> <listitem><para>Source Optimization</></> <listitem><para>Tail Recursion</></> <listitem><para>Local Call</></> <listitem><para>Block Compilation</></> <listitem><para>Inline Expansion</></> <listitem><para>Object Representation</></> <listitem><para>Numbers</></> <listitem><para>General Efficiency Hints</></> <listitem><para>Efficiency Notes</></> </itemizedlist> </para> <para>Besides this information from the &CMUCL; manual, there are a few other points to keep in mind. <itemizedlist> <listitem><para>The &CMUCL; manual doesn't seem to state it explicitly, but &Python; has a mental block about type inference when assignment is involved. &Python; is very aggressive and clever about inferring the types of values bound with <function>let</>, <function>let*</>, inline function call, and so forth. However, it's much more passive and dumb about inferring the types of values assigned with <function>setq</>, <function>setf</>, and friends. It would be nice to fix this, but in the meantime don't expect that just because it's very smart about types in most respects it will be smart about types involved in assignments. (This doesn't affect its ability to benefit from explicit type declarations involving the assigned variables, only its ability to get by without explicit type declarations.)</para></listitem>  <listitem><para>Since the time the &CMUCL; manual was written, &CMUCL; (and thus &SBCL;) has gotten a generational garbage collector. This means that there are some efficiency implications of various patterns of memory usage which aren't discussed in the &CMUCL; manual. (Some new material should be written about this.)</para></listitem> <listitem><para>&SBCL; has some important known efficiency problems. Perhaps the most important are <itemizedlist> <listitem><para>There is no support for the &ANSI; <parameter>dynamic-extent</> declaration, not even for closures or <parameter>&rest</> lists.</para></listitem> <listitem><para>The garbage collector is not particularly efficient.</para></listitem> <listitem><para>Various aspects of the PCL implementation of CLOS are more inefficient than necessary.</para></listitem> </itemizedlist> </para></listitem> </itemizedlist> </para> <para>Finally, note that &CommonLisp; defines many constructs which, in the infamous phrase, <quote>could be compiled efficiently by a sufficiently smart compiler</quote>. The phrase is infamous because making a compiler which actually is sufficiently smart to find all these optimizations systematically is well beyond the state of the art of current compiler technology. Instead, they're optimized on a case-by-case basis by hand-written code, or not optimized at all if the appropriate case hasn't been hand-coded. Some cases where no such hand-coding has been done as of &SBCL; version 0.6.3 include <itemizedlist> <listitem><para><literal>(reduce #'f x)</> where the type of <varname>x</> is known at compile time</para></listitem> <listitem><para>various bit vector operations, e.g. <literal>(position 0 some-bit-vector)</></para></listitem> </itemizedlist> If your system's performance is suffering because of some construct which could in principle be compiled efficiently, but which the &SBCL; compiler can't in practice compile efficiently, consider writing a patch to the compiler and submitting it for inclusion in the main sources. Such code is often reasonably straightforward to write; search the sources for the string <quote><function>deftransform</></> to find many examples (some straightforward, some less so).</para> <sect1 id="modular-arithmetic"><title>Modular arithmetic</> <para> Some numeric functions have a property: <varname>N</> lower bits of the result depend only on <varname>N</> lower bits of (all or some) arguments. If the compiler sees an expression of form <literal>(logand exp mask)</>, where <varname>exp</> is a tree of such "good" functions and <varname>mask</> is known to be of type <type>(unsigned-byte w)</>, where <varname>w</> is a "good" width, all intermediate results will be cut to <varname>w</> bits (but it is not done for variables and constants!). This often results in an ability to use simple machine instructions for the functions. </para> <para> Consider an example. <programlisting> (defun i (x y) (declare (type (unsigned-byte 32) x y)) (ldb (byte 32 0) (logxor x (lognot y)))) </programlisting> The result of <literal>(lognot y)</> will be negative and of type <type>(signed-byte 33)</>, so a naive implementation on a 32-bit platform is unable to use 32-bit arithmetic here. But modular arithmetic optimizer is able to do it: because the result is cut down to 32 bits, the compiler will replace <function>logxor</> and <function>lognot</> with versions cutting results to 32 bits, and because terminals (here---expressions <literal>x</> and <literal>y</>) are also of type <type>(unsigned-byte 32)</>, 32-bit machine arithmetic can be used. </para> <note><para> As of &SBCL; 0.8.5 "good" functions are <function>+</>, <function>-</>; <function>logand</>, <function>logior</>, <function>logxor</>, <function>lognot</> and their combinations; and <function>ash</> with the positive second argument. "Good" widths are 32 on HPPA, MIPS, PPC, Sparc and X86 and 64 on Alpha. While it is possible to support smaller widths as well, currently it is not implemented. </para></note> </sect1> </chapter>