X-Git-Url: http://repo.macrolet.net/gitweb/?a=blobdiff_plain;f=doc%2Finternals%2Fcalling-convention.texinfo;h=671ea6ce6b874880c8c02be6044672eea4b8ebe7;hb=18c093eb771c1ab038090863d99bf4baf4224966;hp=3e403f035b04598585ea919a90976a8c03f56c4d;hpb=67a29630594d9ed342ceb6d61fd1b7b16d215dbe;p=sbcl.git diff --git a/doc/internals/calling-convention.texinfo b/doc/internals/calling-convention.texinfo index 3e403f0..671ea6c 100644 --- a/doc/internals/calling-convention.texinfo +++ b/doc/internals/calling-convention.texinfo @@ -11,31 +11,27 @@ * Additional Notes:: @end menu -The calling convention used within Lisp code on SBCL/x86 was, -for the longest time, really bad. If it weren't for the fact -that it predates modern x86 CPUs, one might almost believe it to -have been designed explicitly to defeat the branch-prediction -hardware therein. This chapter is somewhat of a brain-dump of -information that might be useful when attempting to improve the -situation further, mostly written immediately after having made -a dent in the problem. - -Assumptions about the calling convention are embedded throughout -the system. The runtime knows how to call in to Lisp and receive -a value from Lisp, the assembly-routines have intimate knowledge -of what registers are involved in a call situation, +The calling convention used within Lisp code on SBCL/x86 was, for the +longest time, really bad. If it weren't for the fact that it predates +modern x86 CPUs, one might almost believe it to have been designed +explicitly to defeat the branch-prediction hardware therein. This +chapter is somewhat of a brain-dump of information that might be +useful when attempting to improve the situation further, mostly +written immediately after having made a dent in the problem. + +Assumptions about the calling convention are embedded throughout the +system. The runtime knows how to call in to Lisp and receive a value +from Lisp, the assembly-routines have intimate knowledge of what +registers are involved in a call situation, @file{src/compiler/target/call.lisp} contains the VOPs involved in -implementing function call/return, and @file{src/compiler/ir2tran.lisp} has -assumptions about frame allocation and argument/return-value -passing locations. - -The current round of changes has been limited to VOPs, assembly-routines, -related support functions, and the required support in the runtime. +implementing function call/return, and +@file{src/compiler/ir2tran.lisp} has assumptions about frame +allocation and argument/return-value passing locations. Note that most of this documentation also applies to other CPUs, -modulo the actual registers involved, the displacement used -in the single-value return convention, and the fact that they -use the ``old'' convention anywhere it is mentioned. +modulo the actual registers involved, the displacement used in the +single-value return convention, and the fact that they use the ``old'' +convention anywhere it is mentioned. @node Assembly Routines @@ -58,13 +54,13 @@ use the ``old'' convention anywhere it is mentioned. ;;; -- AB, 2006/Feb/05. @end example -There are a couple of assembly-routines that implement parts of -the process of returning or tail-calling with a variable number -of values. These are @code{return-multiple} and @code{tail-call-variable} in +There are a couple of assembly-routines that implement parts of the +process of returning or tail-calling with a variable number of values. +These are @code{return-multiple} and @code{tail-call-variable} in @file{src/assembly/x86/assem-rtns.lisp}. They have their own calling -convention for invocation from a VOP, but implement various -block-move operations on the stack contents followed by a return -or tail-call operation. +convention for invocation from a VOP, but implement various block-move +operations on the stack contents followed by a return or tail-call +operation. That's about all I have to say about the assembly-routines. @@ -73,22 +69,22 @@ That's about all I have to say about the assembly-routines. @comment node-name, next, previous, up @section Local Calls -Calls within a block, whatever a block is, can use a local -calling convention in which the compiler knows where all of the -values are to be stored, and thus can elide the check for number -of return values, stack-pointer restoration, etc. Alternately, -they can use the full unknown-values return convention while -trying to short-circuit the call convention. There is probably -some low-hanging fruit here in terms of CPU branch-prediction. +Calls within a block, whatever a block is, can use a local calling +convention in which the compiler knows where all of the values are to +be stored, and thus can elide the check for number of return values, +stack-pointer restoration, etc. Alternately, they can use the full +unknown-values return convention while trying to short-circuit the +call convention. There is probably some low-hanging fruit here in +terms of CPU branch-prediction. -The local (known-values) calling convention is implemented by -the @code{known-call-local} and @code{known-return} VOPs. +The local (known-values) calling convention is implemented by the +@code{known-call-local} and @code{known-return} VOPs. Local unknown-values calls are handled at the call site by the -@code{call-local} and @code{mutiple-call-local} VOPs. The main difference -between the full call and local call protocols here is that -local calls use a different frame setup protocol, and will tend -to not use the normal frame layout for the old frame-pointer and +@code{call-local} and @code{multiple-call-local} VOPs. The main +difference between the full call and local call protocols here is that +local calls use a different frame setup protocol, and will tend to not +use the normal frame layout for the old frame-pointer and return-address. @@ -108,96 +104,131 @@ return-address. @end example Basically, we use caller-allocated frames, pass an fdefinition, -function, or closure in @code{EAX}, -argcount in @code{ECX}, and first three args in @code{EDX}, @code{EDI}, -and @code{ESI}. @code{EBP} points to just past the start of the frame -(the first frame slot is at @code{[EBP-4]}, not the traditional @code{[EBP]}, -due in part to how the frame allocation works). The caller stores the -link for the old frame at @code{[EBP-4]} and reserved space for a -return address at @code{[EBP-8]}. @code{[EBP-12]} appears to be an -empty slot available to the compiler within a function, it -may-or-may-not be used by some of the call/return junk. The first stack -argument is at @code{[EBP-16]}. The callee then reallocates the +function, or closure in @code{EAX}, argcount in @code{ECX}, and first +three args in @code{EDX}, @code{EDI}, and @code{ESI}. @code{EBP} +points to just past the start of the frame (the first frame slot is at +@code{[EBP-4]}, not the traditional @code{[EBP]}, due in part to how +the frame allocation works). The caller stores the link for the old +frame at @code{[EBP-4]} and reserved space for a return address at +@code{[EBP-8]}. @code{[EBP-12]} appears to be an empty slot that +conveniently makes just enough space for the first three multiple +return values (returned in the argument passing registers) to be +written over the beginning of the frame by the receiver. The first +stack argument is at @code{[EBP-16]}. The callee then reallocates the frame to include sufficient space for its local variables, after possibly converting any @code{&rest} arguments to a proper list. +The above scheme was changed in 1.0.27 on x86 and x86-64 by swapping +the old frame pointer with the return address and making EBP point two +words later: + +On x86/x86-64 the stack now looks like this (stack grows downwards): + +@verbatim +---------- +RETURN PC +---------- +OLD FP +---------- <- FP points here +EMPTY SLOT +---------- +FIRST ARG +---------- +@end verbatim + +just as if the function had been CALLed and upon entry executed the +standard prologue: PUSH EBP; MOV EBP, ESP. On other architectures the +stack looks like this (stack grows upwards): + +@verbatim +---------- +FIRST ARG +---------- +EMPTY SLOT +---------- +RETURN PC +---------- +OLD FP +---------- <- FP points here +@end verbatim + @node Unknown-Values Returns @comment node-name, next, previous, up @section Unknown-Values Returns -The unknown-values return convention consists of two parts. The -first part is that of returning a single value. The second is -that of returning a different number of values. We also changed -the convention here recently, so we should describe both the old -and new versions. The three interesting VOPs here are @code{return-single}, -@code{return}, and @code{return-multiple}. +The unknown-values return convention consists of two parts. The first +part is that of returning a single value. The second is that of +returning a different number of values. We also changed the convention +in 0.9.10, so we should describe both the old and new versions. The +three interesting VOPs here are @code{return-single}, @code{return}, +and @code{return-multiple}. For a single-value return, we load the return value in the first -argument-passing register (@code{A0}, or @code{EDI}), reload the old frame -pointer, burn the stack frame, and return. The old convention -was to increment the return address by two before returning, -typically via a @code{JMP}, which was guaranteed to screw up branch- -prediction hardware. The new convention is to return with the -carry flag clear. - -For a multiple-value return, we pass the first three values in -the argument-passing registers, and the remainder on the stack. -@code{ECX} contains the total number of values as a fixnum, @code{EBX} points -to where the callee frame was, @code{EBP} has been restored to point to -the caller frame, and the first of the values on the stack (the -fourth overall) is at @code{[EBP-16]}. The old convention was just to -jump to the return address at this point. The newer one has us -setting the carry flag first. - -The code at the call site for accepting some number of unknown- -values is fairly well boilerplated. If we are expecting zero or -one values, then we need to reset the stack pointer if we are in -a multiple-value return. In the old convention we just encoded a -@code{MOV ESP, EBX} instruction, which neatly fit in the two byte gap -that was skipped by a single-value return. In the new convention -we have to explicitly check the carry flag with a conditional -jump around the @code{MOV ESP, EBX} instruction. When expecting more -than one value, we need to arrange to set up default values when -a single-value return happens, so we encode a jump around a -stub of code which fakes up the register use convention of a -multiple-value return. Again, in the old convention this was a -two-byte unconditionl jump, and in the new convention this is -a conditional jump based on the carry flag. +argument-passing register (@code{A0}, or @code{EDI}), reload the old +frame pointer, burn the stack frame, and return. The old convention +was to increment the return address by two before returning, typically +via a @code{JMP}, which was guaranteed to screw up branch- prediction +hardware. The new convention is to return with the carry flag clear. + +For a multiple-value return, we pass the first three values in the +argument-passing registers, and the remainder on the stack. @code{ECX} +contains the total number of values as a fixnum, @code{EBX} points to +where the callee frame was, @code{EBP} has been restored to point to +the caller frame, and the first of the values on the stack (the fourth +overall) is at @code{[EBP-16]}. The old convention was just to jump to +the return address at this point. The newer one has us setting the +carry flag first. + +The code at the call site for accepting some number of unknown- values +is fairly well boilerplated. If we are expecting zero or one values, +then we need to reset the stack pointer if we are in a multiple-value +return. In the old convention we just encoded a @code{MOV ESP, EBX} +instruction, which neatly fit in the two byte gap that was skipped by +a single-value return. In the new convention we have to explicitly +check the carry flag with a conditional jump around the @code{MOV ESP, +EBX} instruction. When expecting more than one value, we need to +arrange to set up default values when a single-value return happens, +so we encode a jump around a stub of code which fakes up the register +use convention of a multiple-value return. Again, in the old +convention this was a two-byte unconditional jump, and in the new +convention this is a conditional jump based on the carry flag. @node IR2 Conversion @comment node-name, next, previous, up @section IR2 Conversion -The actual selection of VOPs for implementing call/return for a -given function is handled in ir2tran.lisp. Returns are handled -by @code{ir2-convert-return}, calls are handled by @code{ir2-convert-local-call}, -@code{ir2-convert-full-call}, and @code{ir2-convert-mv-call}, and -function prologues are handled by @code{ir2-convert-bind} (which calls -@code{init-xep-environment} for the case of an entry point for a full -call). +The actual selection of VOPs for implementing call/return for a given +function is handled in ir2tran.lisp. Returns are handled by +@code{ir2-convert-return}, calls are handled by +@code{ir2-convert-local-call}, @code{ir2-convert-full-call}, and +@code{ir2-convert-mv-call}, and function prologues are handled by +@code{ir2-convert-bind} (which calls @code{init-xep-environment} for +the case of an entry point for a full call). @node Additional Notes @comment node-name, next, previous, up @section Additional Notes -The low-hanging fruit here is going to be changing every call -and return to use @code{CALL} and @code{RETURN} instructions -instead of @code{JMP} instructions. +The low-hanging fruit is going to be changing every call and return to +use @code{CALL} and @code{RETURN} instructions instead of @code{JMP} +instructions which is partly done on x86oids: a trampoline is +@code{CALL}ed and that @code{JMP}s to the target which is sufficient +to negate (most of?) the penalty. A more involved change would be to reduce the number of argument -passing registers from three to two, which may be beneficial in -terms of our quest to free up a GPR for use on Win32 boxes for a -thread structure. +passing registers from three to two, which may be beneficial in terms +of our quest to free up a GPR for use on Win32 boxes for a thread +structure. Another possible win could be to store multiple return-values -somewhere other than the stack, such as a dedicated area of the -thread structure. The main concern here in terms of clobbering -would be to make sure that interrupts (and presumably the -internal-error machinery) know to save the area and that the -compiler knows that the area cannot be live across a function -call. Actually implementing this would involve hacking the IR2 -conversion, since as it stands now the same argument conventions -are used for both call and return value storage (same TNs). +somewhere other than the stack, such as a dedicated area of the thread +structure. The main concern here in terms of clobbering would be to +make sure that interrupts (and presumably the internal-error +machinery) know to save the area and that the compiler knows that the +area cannot be live across a function call. Actually implementing this +would involve hacking the IR2 conversion, since as it stands now the +same argument conventions are used for both call and return value +storage (same TNs).