doc/internals/calling-convention.texinfo

   1 @node Calling Convention
   2 @comment  node-name,  next,  previous,  up
   3 @chapter Calling Convention
   4
   5 @menu
   6 * Assembly Routines::
   7 * Local Calls::
   8 * Full Calls::
   9 * Unknown-Values Returns::
  10 * IR2 Conversion::
  11 * Additional Notes::
  12 @end menu
  13
  14 The calling convention used within Lisp code on SBCL/x86 was,
  15 for the longest time, really bad. If it weren't for the fact
  16 that it predates modern x86 CPUs, one might almost believe it to
  17 have been designed explicitly to defeat the branch-prediction
  18 hardware therein. This chapter is somewhat of a brain-dump of
  19 information that might be useful when attempting to improve the
  20 situation further, mostly written immediately after having made
  21 a dent in the problem.
  22
  23 Assumptions about the calling convention are embedded throughout
  24 the system. The runtime knows how to call in to Lisp and receive
  25 a value from Lisp, the assembly-routines have intimate knowledge
  26 of what registers are involved in a call situation,
  27 @file{src/compiler/target/call.lisp} contains the VOPs involved in
  28 implementing function call/return, and @file{src/compiler/ir2tran.lisp} has
  29 assumptions about frame allocation and argument/return-value
  30 passing locations.
  31
  32 The current round of changes has been limited to VOPs, assembly-routines,
  33 related support functions, and the required support in the runtime.
  34
  35 Note that most of this documentation also applies to other CPUs,
  36 modulo the actual registers involved, the displacement used
  37 in the single-value return convention, and the fact that they
  38 use the ``old'' convention anywhere it is mentioned.
  39
  40
  41 @node Assembly Routines
  42 @comment  node-name,  next,  previous,  up
  43 @section Assembly Routines
  44
  45 @example
  46 ;;; The :full-call assembly-routines must use the same full-call
  47 ;;; unknown-values return convention as a normal call, as some
  48 ;;; of the routines will tail-chain to a static-function. The
  49 ;;; routines themselves, however, take all of their arguments
  50 ;;; in registers (this will typically be one or two arguments,
  51 ;;; and is one of the lower bounds on the number of argument-
  52 ;;; passing registers), and thus don't need a call frame, which
  53 ;;; simplifies things for the normal call/return case. When it
  54 ;;; is neccessary for one of the assembly-functions to call a
  55 ;;; static-function it will construct the required call frame.
  56 ;;; Also, none of the assembly-routines return other than one
  57 ;;; value, which again simplifies the return path.
  58 ;;;    -- AB, 2006/Feb/05.
  59 @end example
  60
  61 There are a couple of assembly-routines that implement parts of
  62 the process of returning or tail-calling with a variable number
  63 of values. These are @code{return-multiple} and @code{tail-call-variable} in
  64 @file{src/assembly/x86/assem-rtns.lisp}. They have their own calling
  65 convention for invocation from a VOP, but implement various
  66 block-move operations on the stack contents followed by a return
  67 or tail-call operation.
  68
  69 That's about all I have to say about the assembly-routines.
  70
  71
  72 @node Local Calls
  73 @comment  node-name,  next,  previous,  up
  74 @section Local Calls
  75
  76 Calls within a block, whatever a block is, can use a local
  77 calling convention in which the compiler knows where all of the
  78 values are to be stored, and thus can elide the check for number
  79 of return values, stack-pointer restoration, etc. Alternately,
  80 they can use the full unknown-values return convention while
  81 trying to short-circuit the call convention. There is probably
  82 some low-hanging fruit here in terms of CPU branch-prediction.
  83
  84 The local (known-values) calling convention is implemented by
  85 the @code{known-call-local} and @code{known-return} VOPs.
  86
  87 Local unknown-values calls are handled at the call site by the
  88 @code{call-local} and @code{mutiple-call-local} VOPs. The main difference
  89 between the full call and local call protocols here is that
  90 local calls use a different frame setup protocol, and will tend
  91 to not use the normal frame layout for the old frame-pointer and
  92 return-address.
  93
  94
  95 @node Full Calls
  96 @comment  node-name,  next,  previous,  up
  97 @section Full Calls
  98
  99 @example
 100 ;;; There is something of a cross-product effect with full calls.
 101 ;;; Different versions are used depending on whether we know the
 102 ;;; number of arguments or the name of the called function, and
 103 ;;; whether we want fixed values, unknown values, or a tail call.
 104 ;;;
 105 ;;; In full call, the arguments are passed creating a partial frame on
 106 ;;; the stack top and storing stack arguments into that frame. On
 107 ;;; entry to the callee, this partial frame is pointed to by FP.
 108 @end example
 109
 110 Basically, we use caller-allocated frames, pass an fdefinition,
 111 function, or closure in @code{EAX},
 112 argcount in @code{ECX}, and first three args in @code{EDX}, @code{EDI},
 113 and @code{ESI}. @code{EBP} points to just past the start of the frame
 114 (the first frame slot is at @code{[EBP-4]}, not the traditional @code{[EBP]},
 115 due in part to how the frame allocation works). The caller stores the
 116 link for the old frame at @code{[EBP-4]} and reserved space for a
 117 return address at @code{[EBP-8]}. @code{[EBP-12]} appears to be an
 118 empty slot available to the compiler within a function, it
 119 may-or-may-not be used by some of the call/return junk. The first stack
 120 argument is at @code{[EBP-16]}. The callee then reallocates the
 121 frame to include sufficient space for its local variables, after
 122 possibly converting any @code{&rest} arguments to a proper list.
 123
 124
 125 @node Unknown-Values Returns
 126 @comment  node-name,  next,  previous,  up
 127 @section Unknown-Values Returns
 128
 129 The unknown-values return convention consists of two parts. The
 130 first part is that of returning a single value. The second is
 131 that of returning a different number of values. We also changed
 132 the convention here recently, so we should describe both the old
 133 and new versions. The three interesting VOPs here are @code{return-single},
 134 @code{return}, and @code{return-multiple}.
 135
 136 For a single-value return, we load the return value in the first
 137 argument-passing register (@code{A0}, or @code{EDI}), reload the old frame
 138 pointer, burn the stack frame, and return. The old convention
 139 was to increment the return address by two before returning,
 140 typically via a @code{JMP}, which was guaranteed to screw up branch-
 141 prediction hardware. The new convention is to return with the
 142 carry flag clear.
 143
 144 For a multiple-value return, we pass the first three values in
 145 the argument-passing registers, and the remainder on the stack.
 146 @code{ECX} contains the total number of values as a fixnum, @code{EBX} points
 147 to where the callee frame was, @code{EBP} has been restored to point to
 148 the caller frame, and the first of the values on the stack (the
 149 fourth overall) is at @code{[EBP-16]}. The old convention was just to
 150 jump to the return address at this point. The newer one has us
 151 setting the carry flag first.
 152
 153 The code at the call site for accepting some number of unknown-
 154 values is fairly well boilerplated. If we are expecting zero or
 155 one values, then we need to reset the stack pointer if we are in
 156 a multiple-value return. In the old convention we just encoded a
 157 @code{MOV ESP, EBX} instruction, which neatly fit in the two byte gap
 158 that was skipped by a single-value return. In the new convention
 159 we have to explicitly check the carry flag with a conditional
 160 jump around the @code{MOV ESP, EBX} instruction. When expecting more
 161 than one value, we need to arrange to set up default values when
 162 a single-value return happens, so we encode a jump around a
 163 stub of code which fakes up the register use convention of a
 164 multiple-value return. Again, in the old convention this was a
 165 two-byte unconditionl jump, and in the new convention this is
 166 a conditional jump based on the carry flag.
 167
 168
 169 @node IR2 Conversion
 170 @comment  node-name,  next,  previous,  up
 171 @section IR2 Conversion
 172
 173 The actual selection of VOPs for implementing call/return for a
 174 given function is handled in ir2tran.lisp. Returns are handled
 175 by @code{ir2-convert-return}, calls are handled by @code{ir2-convert-local-call},
 176 @code{ir2-convert-full-call}, and @code{ir2-convert-mv-call}, and
 177 function prologues are handled by @code{ir2-convert-bind} (which calls
 178 @code{init-xep-environment} for the case of an entry point for a full
 179 call).
 180
 181
 182 @node Additional Notes
 183 @comment  node-name,  next,  previous,  up
 184 @section Additional Notes
 185
 186 The low-hanging fruit here is going to be changing every call
 187 and return to use @code{CALL} and @code{RETURN} instructions
 188 instead of @code{JMP} instructions.
 189
 190 A more involved change would be to reduce the number of argument
 191 passing registers from three to two, which may be beneficial in
 192 terms of our quest to free up a GPR for use on Win32 boxes for a
 193 thread structure.
 194
 195 Another possible win could be to store multiple return-values
 196 somewhere other than the stack, such as a dedicated area of the
 197 thread structure. The main concern here in terms of clobbering
 198 would be to make sure that interrupts (and presumably the
 199 internal-error machinery) know to save the area and that the
 200 compiler knows that the area cannot be live across a function
 201 call. Actually implementing this would involve hacking the IR2
 202 conversion, since as it stands now the same argument conventions
 203 are used for both call and return value storage (same TNs).