X-Git-Url: http://repo.macrolet.net/gitweb/?a=blobdiff_plain;f=OPTIMIZATIONS;h=0c8e790a86ae463ece5795b9fcaac1d1312cf8b3;hb=d771d9b799b609e7b75b799d2879b4fda1ba8fa7;hp=b0c8f47fe343d475fa73a9392c955dd31969ca9c;hpb=8902b8b6bd2e9285749dd39d313b33b6c69c5213;p=sbcl.git diff --git a/OPTIMIZATIONS b/OPTIMIZATIONS index b0c8f47..0c8e790 100644 --- a/OPTIMIZATIONS +++ b/OPTIMIZATIONS @@ -62,34 +62,6 @@ VOP DATA-VECTOR-SET/SIMPLE-STRING V2!14[EDI] t32[EAX] t30[S2]>t33[CL] * And why two moves? -------------------------------------------------------------------------------- -#6 -09:49:05 I have found a case in those where suboptimal code is - generate with nested loops, it might be moderately easy to fix that -09:49:28 see - http://www.bagley.org/~doug/shootout/bench/nestedloop/nestedloop.cmucl -09:50:30 if you add declarations to dotimes, generated code is - almost optimal, but most inner loops run out of registers and use - memory location for iteration variable - -;;; -*- mode: lisp -*- -;;; http://www.bagley.org/~doug/shootout/ -;;; from Friedrich Dominicus - -(defun main () - (let ((n (parse-integer (or (car (last extensions:*command-line-strings*)) "1"))) - (x 0)) - (declare (fixnum n) - (fixnum x) - (optimize (speed 3) (debug 0) (safety 0))) - (dotimes (a n) - (dotimes (b n) - (dotimes (c n) - (dotimes (d n) - (dotimes (e n) - (dotimes (f n) - (incf x))))))) - (format t "~A~%" x))) --------------------------------------------------------------------------------- #8 (defun foo (d) (declare (optimize (speed 3) (safety 0) (debug 0))) @@ -215,9 +187,9 @@ stack-allocation of structures is impossible. -------------------------------------------------------------------------------- #21 (defun-with-dx foo () - (let ((dx (list (list 1 2) (list 3 4) + (let ((dx (list (list 1 2) (list 3 4)))) (declare (dynamic-extent dx)) - ...))))) + ...)) External list in DX is allocated on stack, but internal are not. -------------------------------------------------------------------------------- @@ -231,3 +203,80 @@ Python does not know that &REST lists are LISTs (and cannot derive it). a. Iterations on &REST lists, returning them as VALUES could be rewritten with &MORE vectors. b. Implement local unknown-values mv-call (useful for fast type checking). +-------------------------------------------------------------------------------- +#26 +SBCL cannot derive upper bound for I and uses generic arithmetic here: + +(defun foo (l) + (declare (vector l)) + (dotimes (i (length l)) + (if (block nil + (map-foo (lambda (x) (if x (return t))) + l)) + t + nil))) + +(So the constraint propagator or a possible future SSA-convertor +should know the connection between an NLE and its CLEANUP.) +-------------------------------------------------------------------------------- +#27 +Initialization of stack-allocated arrays is inefficient: we always +fill the vector with zeroes, even when it is not needed (as for +platforms with conservative GC or for arrays of unboxed objectes) and +is performed later explicitely. +-------------------------------------------------------------------------------- +#28 +a. Accessing raw slots in structure instances is more inefficient than +it could be; if we placed raw slots before the header word, we would +not need to do arithmetic at runtime to access them. (But beware: +this would complicate handling of the interior pointer). + +b. (Also note that raw slots are currently disabled on HPPA) +-------------------------------------------------------------------------------- +#29 +Python is overly zealous when converting high-level CL functions, such +as MIN/MAX, LOGBITP, and LOGTEST, to low-level CL functions. Reducing +Python's aggressiveness would make it easier to effect changes such as + +x86-64: +* direct MIN/MAX on {SINGLE,DOUBLE}-FLOATs ({MIN,MAX}S{S,D}) + +x86{,-64}: +* direct LOGBITP on word-sized integers and fixnums (BT + JC) + +x86{,-64}/PPC: +* branch-free MIN/MAX on word-sized integers and fixnums +* efficient LOGTESTs on word-sized integers and fixnums (TEST / AND.) + +PPC: +* efficient LDB on word-sized integers and fixnums (RLWINM) + +etc., etc. + +The "easier" part claimed above would come about because the functions +would be available for :TRANSLATE through a VOP or similar, whereas with +the current architecture, one would have to pattern-match IR1. While +IR1 pattern-matching would be useful in other contexts, it seems better +here to attempt the direct :TRANSLATE route. + +I (NJF) don't know how to implement such architecture-specific +optimizations whilst keeping the high->low transformations for other +architectures. Certainly adding #!+/- magic in compiler/*.lisp could be +done, but such a solution is somewhat inelegant. Moving the relevant +DEFTRANSFORMs to the architecture-specific compiler/* areas is also +possible, but that would duplicate quite a bit of code. +-------------------------------------------------------------------------------- +#30 +(defun foo (x y) + (< x y)) + +FOO's IR1 representation is roughly: + +(defun foo (x y) + (if (< x y) + T + NIL)) + +However, if a full call is generated for < (and similarly for other +predicate functions), then the IF is unnecessary, since the return value +of (< x y) is already T or NIL.