+(declaim (optimize speed))
+
+(defun cpu-test (n)
+ (let ((a 0))
+ (dotimes (i (expt 2 n) a)
+ (setf a (logxor a
+ (* i 5)
+ (+ a i))))))
+
+;;;; CPU profiling
+
+;;; Take up to 1000 samples of running (CPU-TEST 26), and give a flat
+;;; table report at the end. Profiling will end one the body has been
+;;; evaluated once, whether or not 1000 samples have been taken.
+(sb-sprof:with-profiling (:max-samples 1000
+ :report :flat
+ :loop nil)
+ (cpu-test 26))
+
+;;; Take 1000 samples of running (CPU-TEST 24), and give a flat
+;;; table report at the end. The body will be re-evaluated in a loop
+;;; until 1000 samples have been taken. A sample count will be printed
+;;; after each iteration.
+(sb-sprof:with-profiling (:max-samples 1000
+ :report :flat
+ :loop t
+ :show-progress t)
+ (cpu-test 24))
+
+;;;; Allocation profiling
+
+(defun foo (&rest args)
+ (mapcar (lambda (x) (float x 1d0)) args))
+
+(defun bar (n)
+ (declare (fixnum n))
+ (apply #'foo (loop repeat n collect n)))
+
+(sb-sprof:with-profiling (:max-samples 10000
+ :mode :alloc
+ :report :flat)
+ (bar 1000))
+@end lisp
+
+@subsection Output
+
+The flat report format will show a table of all functions that the
+profiler encountered on the call stack during sampling, ordered by the
+number of samples taken while executing that function.
+
+@lisp
+ Self Total Cumul
+ Nr Count % Count % Count % Function
+------------------------------------------------------------------------
+ 1 165 38.3 165 38.3 165 38.3 SB-KERNEL:TWO-ARG-XOR
+ 2 141 32.7 141 32.7 306 71.0 SB-VM::GENERIC-+
+ 3 67 15.5 145 33.6 373 86.5 CPU-TEST-2
+@end lisp
+
+For each function, the table will show three absolute and relative
+sample counts. The Self column shows samples taken while directly
+executing that function. The Total column shows samples taken while
+executing that function or functions called from it (sampled to a
+platform-specific depth). The Cumul column shows the sum of all
+Self columns up to and including that line in the table.
+
+The profiler also hooks into the disassembler such that instructions which
+have been sampled are annotated with their relative frequency of
+sampling. This information is not stored across different sampling
+runs.