doc/internals/calling-convention.texinfo

   1 @node Calling Convention
   2 @comment  node-name,  next,  previous,  up
   3 @chapter Calling Convention
   4
   5 @menu
   6 * Assembly Routines::
   7 * Local Calls::
   8 * Full Calls::
   9 * Unknown-Values Returns::
  10 * IR2 Conversion::
  11 * Additional Notes::
  12 @end menu
  13
  14 The calling convention used within Lisp code on SBCL/x86 was, for the
  15 longest time, really bad. If it weren't for the fact that it predates
  16 modern x86 CPUs, one might almost believe it to have been designed
  17 explicitly to defeat the branch-prediction hardware therein. This
  18 chapter is somewhat of a brain-dump of information that might be
  19 useful when attempting to improve the situation further, mostly
  20 written immediately after having made a dent in the problem.
  21
  22 Assumptions about the calling convention are embedded throughout the
  23 system. The runtime knows how to call in to Lisp and receive a value
  24 from Lisp, the assembly-routines have intimate knowledge of what
  25 registers are involved in a call situation,
  26 @file{src/compiler/target/call.lisp} contains the VOPs involved in
  27 implementing function call/return, and
  28 @file{src/compiler/ir2tran.lisp} has assumptions about frame
  29 allocation and argument/return-value passing locations.
  30
  31 Note that most of this documentation also applies to other CPUs,
  32 modulo the actual registers involved, the displacement used in the
  33 single-value return convention, and the fact that they use the ``old''
  34 convention anywhere it is mentioned.
  35
  36
  37 @node Assembly Routines
  38 @comment  node-name,  next,  previous,  up
  39 @section Assembly Routines
  40
  41 @example
  42 ;;; The :full-call assembly-routines must use the same full-call
  43 ;;; unknown-values return convention as a normal call, as some
  44 ;;; of the routines will tail-chain to a static-function. The
  45 ;;; routines themselves, however, take all of their arguments
  46 ;;; in registers (this will typically be one or two arguments,
  47 ;;; and is one of the lower bounds on the number of argument-
  48 ;;; passing registers), and thus don't need a call frame, which
  49 ;;; simplifies things for the normal call/return case. When it
  50 ;;; is neccessary for one of the assembly-functions to call a
  51 ;;; static-function it will construct the required call frame.
  52 ;;; Also, none of the assembly-routines return other than one
  53 ;;; value, which again simplifies the return path.
  54 ;;;    -- AB, 2006/Feb/05.
  55 @end example
  56
  57 There are a couple of assembly-routines that implement parts of the
  58 process of returning or tail-calling with a variable number of values.
  59 These are @code{return-multiple} and @code{tail-call-variable} in
  60 @file{src/assembly/x86/assem-rtns.lisp}. They have their own calling
  61 convention for invocation from a VOP, but implement various block-move
  62 operations on the stack contents followed by a return or tail-call
  63 operation.
  64
  65 That's about all I have to say about the assembly-routines.
  66
  67
  68 @node Local Calls
  69 @comment  node-name,  next,  previous,  up
  70 @section Local Calls
  71
  72 Calls within a block, whatever a block is, can use a local calling
  73 convention in which the compiler knows where all of the values are to
  74 be stored, and thus can elide the check for number of return values,
  75 stack-pointer restoration, etc. Alternately, they can use the full
  76 unknown-values return convention while trying to short-circuit the
  77 call convention. There is probably some low-hanging fruit here in
  78 terms of CPU branch-prediction.
  79
  80 The local (known-values) calling convention is implemented by the
  81 @code{known-call-local} and @code{known-return} VOPs.
  82
  83 Local unknown-values calls are handled at the call site by the
  84 @code{call-local} and @code{mutiple-call-local} VOPs. The main
  85 difference between the full call and local call protocols here is that
  86 local calls use a different frame setup protocol, and will tend to not
  87 use the normal frame layout for the old frame-pointer and
  88 return-address.
  89
  90
  91 @node Full Calls
  92 @comment  node-name,  next,  previous,  up
  93 @section Full Calls
  94
  95 @example
  96 ;;; There is something of a cross-product effect with full calls.
  97 ;;; Different versions are used depending on whether we know the
  98 ;;; number of arguments or the name of the called function, and
  99 ;;; whether we want fixed values, unknown values, or a tail call.
 100 ;;;
 101 ;;; In full call, the arguments are passed creating a partial frame on
 102 ;;; the stack top and storing stack arguments into that frame. On
 103 ;;; entry to the callee, this partial frame is pointed to by FP.
 104 @end example
 105
 106 Basically, we use caller-allocated frames, pass an fdefinition,
 107 function, or closure in @code{EAX}, argcount in @code{ECX}, and first
 108 three args in @code{EDX}, @code{EDI}, and @code{ESI}. @code{EBP}
 109 points to just past the start of the frame (the first frame slot is at
 110 @code{[EBP-4]}, not the traditional @code{[EBP]}, due in part to how
 111 the frame allocation works). The caller stores the link for the old
 112 frame at @code{[EBP-4]} and reserved space for a return address at
 113 @code{[EBP-8]}. @code{[EBP-12]} appears to be an empty slot that
 114 conveniently makes just enough space for the first three multiple
 115 return values (returned in the argument passing registers) to be
 116 written over the beginning of the frame by the receiver. The first
 117 stack argument is at @code{[EBP-16]}. The callee then reallocates the
 118 frame to include sufficient space for its local variables, after
 119 possibly converting any @code{&rest} arguments to a proper list.
 120
 121 The above scheme was changed in 1.0.27 on x86 and x86-64 by swapping
 122 the old frame pointer with the return address and making EBP point two
 123 words later:
 124
 125 On x86/x86-64 the stack now looks like this (stack grows downwards):
 126
 127 @verbatim
 128 ----------
 129 RETURN PC
 130 ----------
 131 OLD FP
 132 ---------- <- FP points here
 133 EMPTY SLOT
 134 ----------
 135 FIRST ARG
 136 ----------
 137 @end verbatim
 138
 139 just as if the function had been CALLed and upon entry executed the
 140 standard prologue: PUSH EBP; MOV EBP, ESP. On other architectures the
 141 stack looks like this (stack grows upwards):
 142
 143 @verbatim
 144 ----------
 145 FIRST ARG
 146 ----------
 147 EMPTY SLOT
 148 ----------
 149 RETURN PC
 150 ----------
 151 OLD FP
 152 ---------- <- FP points here
 153 @end verbatim
 154
 155
 156 @node Unknown-Values Returns
 157 @comment  node-name,  next,  previous,  up
 158 @section Unknown-Values Returns
 159
 160 The unknown-values return convention consists of two parts. The first
 161 part is that of returning a single value. The second is that of
 162 returning a different number of values. We also changed the convention
 163 in 0.9.10, so we should describe both the old and new versions. The
 164 three interesting VOPs here are @code{return-single}, @code{return},
 165 and @code{return-multiple}.
 166
 167 For a single-value return, we load the return value in the first
 168 argument-passing register (@code{A0}, or @code{EDI}), reload the old
 169 frame pointer, burn the stack frame, and return. The old convention
 170 was to increment the return address by two before returning, typically
 171 via a @code{JMP}, which was guaranteed to screw up branch- prediction
 172 hardware. The new convention is to return with the carry flag clear.
 173
 174 For a multiple-value return, we pass the first three values in the
 175 argument-passing registers, and the remainder on the stack. @code{ECX}
 176 contains the total number of values as a fixnum, @code{EBX} points to
 177 where the callee frame was, @code{EBP} has been restored to point to
 178 the caller frame, and the first of the values on the stack (the fourth
 179 overall) is at @code{[EBP-16]}. The old convention was just to jump to
 180 the return address at this point. The newer one has us setting the
 181 carry flag first.
 182
 183 The code at the call site for accepting some number of unknown- values
 184 is fairly well boilerplated. If we are expecting zero or one values,
 185 then we need to reset the stack pointer if we are in a multiple-value
 186 return. In the old convention we just encoded a @code{MOV ESP, EBX}
 187 instruction, which neatly fit in the two byte gap that was skipped by
 188 a single-value return. In the new convention we have to explicitly
 189 check the carry flag with a conditional jump around the @code{MOV ESP,
 190 EBX} instruction. When expecting more than one value, we need to
 191 arrange to set up default values when a single-value return happens,
 192 so we encode a jump around a stub of code which fakes up the register
 193 use convention of a multiple-value return. Again, in the old
 194 convention this was a two-byte unconditionl jump, and in the new
 195 convention this is a conditional jump based on the carry flag.
 196
 197
 198 @node IR2 Conversion
 199 @comment  node-name,  next,  previous,  up
 200 @section IR2 Conversion
 201
 202 The actual selection of VOPs for implementing call/return for a given
 203 function is handled in ir2tran.lisp. Returns are handled by
 204 @code{ir2-convert-return}, calls are handled by
 205 @code{ir2-convert-local-call}, @code{ir2-convert-full-call}, and
 206 @code{ir2-convert-mv-call}, and function prologues are handled by
 207 @code{ir2-convert-bind} (which calls @code{init-xep-environment} for
 208 the case of an entry point for a full call).
 209
 210
 211 @node Additional Notes
 212 @comment  node-name,  next,  previous,  up
 213 @section Additional Notes
 214
 215 The low-hanging fruit is going to be changing every call and return to
 216 use @code{CALL} and @code{RETURN} instructions instead of @code{JMP}
 217 instructions which is partly done on x86oids: a trampoline is
 218 @code{CALL}ed and that @code{JMP}s to the target which is sufficient
 219 to negate (most of?) the penalty.
 220
 221 A more involved change would be to reduce the number of argument
 222 passing registers from three to two, which may be beneficial in terms
 223 of our quest to free up a GPR for use on Win32 boxes for a thread
 224 structure.
 225
 226 Another possible win could be to store multiple return-values
 227 somewhere other than the stack, such as a dedicated area of the thread
 228 structure. The main concern here in terms of clobbering would be to
 229 make sure that interrupts (and presumably the internal-error
 230 machinery) know to save the area and that the compiler knows that the
 231 area cannot be live across a function call. Actually implementing this
 232 would involve hacking the IR2 conversion, since as it stands now the
 233 same argument conventions are used for both call and return value
 234 storage (same TNs).