From 1c74f342b23aafaa8f514112c9bcca7526e07a58 Mon Sep 17 00:00:00 2001 From: Lutz Euler Date: Mon, 10 Jun 2013 13:44:20 +0200 Subject: [PATCH] Micro-optimize DOUBLE-FLOAT-LOW-BITS on x68-64. Instead of loading a 64-bit register from memory and zeroing the upper 32 bits of it by the sequence SHL reg, 32; SHR reg, 32 simply load the corresponding 32-bit register from memory, relying on the implicit zero-extension to 64 bits this does. This is smaller and faster. For example, if the input to the VOP is a descriptor register, the old instruction sequence is: MOV RDX, [RDX-7] SHL RDX, 32 SHR RDX, 32 and the new one: MOV EDX, [RDX-7] Regarding store-to-load forwarding this change should make no difference: Most current processors can forward a 64-bit store to a 32-bit load from the same address. The exception is Intel's Atom which can forward only to a load of the same size as the store; but it also supports this only between integer registers, and DOUBLE-FLOAT-LOW-BITS mostly or even always acts on memory slots written from an XMM register (of the three storage classes it supports as input, for the first it does the store itself from an XMM register; for the other two I have investigated some disassemblies and always found the prior store to be from am XMM register). --- NEWS | 1 + src/compiler/x86-64/float.lisp | 26 +++++++++++++++----------- 2 files changed, 16 insertions(+), 11 deletions(-) diff --git a/NEWS b/NEWS index c39179a..531f156 100644 --- a/NEWS +++ b/NEWS @@ -18,6 +18,7 @@ changes relative to sbcl-1.1.8: * optimization: comparisons between rationals and constant floats or between integers and constant ratios are now converted to rationals/integers at compile time. + * optimization: Smaller and faster DOUBLE-FLOAT-LOW-BITS on x86-64. * bug fix: problems with NCONC type derivation (reported by Jerry James). * bug fix: EXPT type derivation no longer constructs bogus floating-point types. (reported by Vsevolod Dyomkin) diff --git a/src/compiler/x86-64/float.lisp b/src/compiler/x86-64/float.lisp index 0a03d50..397ffdf 100644 --- a/src/compiler/x86-64/float.lisp +++ b/src/compiler/x86-64/float.lisp @@ -1266,17 +1266,21 @@ (:policy :fast-safe) (:vop-var vop) (:generator 5 - (sc-case float - (double-reg - (inst movsd temp float) - (move lo-bits temp)) - (double-stack - (loadw lo-bits ebp-tn (frame-word-offset (tn-offset float)))) - (descriptor-reg - (loadw lo-bits float double-float-value-slot - other-pointer-lowtag))) - (inst shl lo-bits 32) - (inst shr lo-bits 32))) + (let ((dword-lo-bits (reg-in-size lo-bits :dword))) + (sc-case float + (double-reg + (inst movsd temp float) + (inst mov dword-lo-bits + (make-ea :dword :base rbp-tn + :disp (frame-byte-offset (tn-offset temp))))) + (double-stack + (inst mov dword-lo-bits + (make-ea :dword :base rbp-tn + :disp (frame-byte-offset (tn-offset float))))) + (descriptor-reg + (inst mov dword-lo-bits + (make-ea-for-object-slot-half float double-float-value-slot + other-pointer-lowtag))))))) -- 1.7.10.4