1 \part{Compiler Retargeting}
5 In general, it is a danger sign if a generator references a TN that isn't an
6 operand or temporary, since lifetime analysis hasn't been done for that use.
7 We are doing weird stuff for the old-cont and return-pc passing locations,
8 hoping that the conflicts at the called function have the desired effect.
9 Other stuff? When a function returns unknown values, we don't reference the
10 values locations when a single-value return is done. But nothing is live at a
15 Have a way for template conversion to special-case constant arguments?
17 If an arg restriction is (:satisfies [<predicate function>]), and the
18 corresponding argument is constant, with the constant value satisfying the
19 predicate, then (if any other restrictions are satisfied), the template
20 will be emitted with the literal value passed as an info argument. If the
21 predicate is omitted, then any constant will do.
23 We could sugar this up a bit by allowing (:member <object>*) for
24 (:satisfies (lambda (x) (member x '(<object>*))))
26 We could allow this to be translated into a Lisp type by adding a new Constant
27 type specifier. This could only appear as an argument to a function type.
28 To satisfy (Constant <type>), the argument must be a compile-time constant of
29 the specified type. Just Constant means any constant (i.e. (Constant *)).
30 This would be useful for the type constraints on ICR transforms.
33 Constant TNs: we count on being able to indirect to the leaf, and don't try to
34 wedge the information into the offset. We set the FSC to an appropriate
37 Allow "more operands" to VOPs in define-vop. You can't do much with the
38 more operands: define-vop just fills in the cost information according to
39 the loading costs for a SC you specify. You can't restrict more operands,
40 and you can't make local preferences. In the generator, the named variable
41 is bound to the TN-ref for the first extra operand. This should be good
42 enough to handle all the variable arg VOPs (primarily function call and
43 return). Usually more operands are used just to get TN lifetimes to work
44 out; the generator actually ignores them.
46 Variable-arg VOPs can't be used with the VOP macro. You must use VOP*.
47 VOP* doesn't do anything with these extra operand except stick them on the
48 ends of the operand lists passed into the template. VOP* is often useful
49 within the convert functions for non-VOP templates, since it can emit a VOP
50 using an already prepared TN-Ref list.
53 It is pretty basic to the whole primitive-type idea that there is only one
54 primitive-type for a given lisp type. This is really the same as saying
55 primitive types are disjoint. A primitive type serves two somewhat
57 -- It is an abstraction a Lisp type used to select type specific
58 operations. Originally kind of an efficiency hack, but it lets a
59 template's type signature be used both for selection and operand
60 representation determination.
61 -- It represents a set of possible representations for a value (SCs). The
62 primitive type is used to determine the legal SCs for a TN, and is also
63 used to determine which type-coercion/move VOP to use.
67 There are basically three levels of target dependence:
69 -- Code in the "front end" (before VMR conversion) deals only with Lisp
70 semantics, and is totally target independent.
72 -- Code after VMR conversion and before code generation depends on the VM,
73 but should work with little modification across a wide range of
74 "conventional" architectures.
76 -- Code generation depends on the machine's instruction set and other
77 implementation details, so it will have to be redone for each
78 implementation. Most of the work here is in defining the translation into
79 assembly code of all the supported VOPs.
83 \chapter{Storage bases and classes}
84 New interface: instead of CURRENT-FRAME-SIZE, have CURRENT-SB-SIZE <name> which
85 returns the current element size of the named SB.
87 How can we have primitive types that overlap, i.e. (UNSIGNED-BYTE 32),
88 (SIGNED-BYTE 32), FIXNUM?
89 Primitive types are used for two things:
90 Representation selection: which SCs can be used to represent this value?
91 For this purpose, it isn't necessary that primitive types be disjoint,
92 since any primitive type can choose an arbitrary set of
93 representations. For moves between the overlapping representations,
94 the move/load operations can just be noops when the locations are the
95 same (vanilla MOVE), since any bad moves should be caught out by type
98 Is this operand legal for this VOP? When ptypes overlap in interesting
99 ways, there is a problem with allowing just a simple ptype restriction,
100 since we might want to allow multiple ptypes. This could be handled
101 by allowing "union primitive types", or by allowing multiple primitive
102 types to be specified (only in the operand restriction.) The latter
103 would be long the lines of other more flexible VOP operand restriction
104 mechanisms, (constant, etc.)
108 Ensure that load/save-operand never need to do representation conversion.
110 The PRIMITIVE-TYPE more/coerce info would be moved into the SC. This could
111 perhaps go along with flushing the TN-COSTS. We would annotate the TN with
112 best SC, which implies the representation (boxed or unboxed). We would still
113 need represent the legal SCs for restricted TNs somehow, and also would have to
114 come up with some other way for pack to keep track of which SCs we have already
117 A SC would have a list of "alternate" SCs and a boolean SAVE-P value that
118 indicates it needs to be saved across calls in some non-SAVE-P SC. A TN is
119 initially given its "best" SC. The SC is annotated with VOPs that are used for
120 moving between the SC and its alternate SCs (load/save operand, save/restore
121 register). It is also annotated with the "move" VOPs used for moving between
122 this SC and all other SCs it is possible to move between. We flush the idea
123 that there is only c-to-t and c-from-t.
125 But how does this mesh with the idea of putting operand load/save back into the
126 generator? Maybe we should instead specify a load/save function? The
127 load/save functions would also differ from the move VOPs in that they would
128 only be called when the TN is in fact in that particular alternate SC, whereas
129 the move VOPs will be associated with the primary SC, and will be emitted
130 before it is known whether the TN will be packed in the primary SC or an
133 I guess a packed SC could also have immediate SCs as alternate SCs, and
134 constant loading functions could be associated with SCs using this mechanism.
136 So given a TN packed in SC X and a SC restriction for Y and Z, how do we know
137 which load function to call? There would be ambiguity if X was an alternate
138 for both Y and Z and they specified different load functions. This seems
139 unlikely to arise in practice, though, so we could just detect the ambiguity
140 and give an error at define-vop time. If they are doing something totally
141 weird, they can always inhibit loading and roll their own.
143 Note that loading costs can be specified at the same time (same syntax) as
144 association of loading functions with SCs. It seems that maybe we will be
145 rolling DEFINE-SAVE-SCS and DEFINE-MOVE-COSTS into DEFINE-STORAGE-CLASS.
147 Fortunately, these changes will affect most VOP definitions very little.
150 A Storage Base represents a physical storage resource such as a register set or
151 stack frame. Storage bases for non-global resources such as the stack are
152 relativized by the environment that the TN is allocated in. Packing conflict
153 information is kept in the storage base, but non-packed storage resources such
154 as closure environments also have storage bases.
156 General purpose registers
157 Floating point registers
158 Boxed (control) stack environment
159 Unboxed (number) stack environment
162 A storage class is a potentially arbitrary set of the elements in a storage
163 base. Although conceptually there may be a hierarchy of storage classes such
164 as "all registers", "boxed registers", "boxed scratch registers", this doesn't
165 exist at the implementation level. Such things can be done by specifying
166 storage classes whose locations overlap. A TN shouldn't have lots of
167 overlapping SC's as legal SC's, since time would be wasted repeatedly
168 attempting to pack in the same locations.
170 There will be some SC's whose locations overlap a great deal, since we get Pack
171 to do our representation analysis by having lots of SC's. A SC is basically a
172 way of looking at a storage resource. Although we could keep a fixnum and an
173 unboxed representation of the same number in the same register, they correspond
174 to different SC's since they are different representation choices.
176 TNs are annotated with the primitive type of the object that they hold:
177 T: random boxed object with only one representation.
178 Fixnum, Integer, XXX-Float: Object is always of the specified numeric type.
179 String-Char: Object is always a string-char.
181 When a TN is packed, it is annotated with the SC it was packed into. The code
182 generator for a VOP must be able to uniquely determine the representation of
183 its operands from the SC. (debugger also...)
186 Reg: any register (immediate objects)
187 Save-Reg: a boxed register near r15 (registers easily saved in a call)
188 Boxed-Reg: any boxed register (any boxed object)
189 Unboxed-Reg: any unboxed register (any unboxed object)
190 Float-Reg, Double-Float-Reg: float in FP register.
191 Stack: boxed object on the stack (on cstack)
192 Word: any 32bit unboxed object on nstack.
193 Double: any 64bit unboxed object on nstack.
195 We have a number of non-packed storage classes which serve to represent access
196 costs associated with values that are not allocated using conflicts
197 information. Non-packed TNs appear to already be packed in the appropriate
198 storage base so that Pack doesn't get confused. Costs for relevant non-packed
199 SC's appear in the TN-Ref cost information, but need not ever be summed into
200 the TN cost vectors, since TNs cannot be packed into them.
202 There are SCs for non-immediate constants and for each significant kind of
203 immediate operand in the architecture. On the RT, 4, 8 and 20 bit integer SCs
204 are probably worth having.
208 Immediate constant SCs:
209 Signed-Byte-<N>, Unsigned-Byte-<N>, for various architecture dependent
213 Magic values: T, NIL, 0.
216 \chapter{Type system parameterization}
218 The main aspect of the VM that is likely to vary for good reason is the type
221 -- Different systems will have different ways of representing dynamic type
222 information. The primary effect this has on the compiler is causing VMR
223 conversion of type tests and checks to be implementation dependent.
224 Rewriting this code for each implementation shouldn't be a big problem,
225 since the portable semantics of types has already been dealt with.
227 -- Different systems will have different specialized number and array types,
228 and different VOPs specialized for these types. It is easy add this kind
229 of knowledge without affecting the rest of the compiler. All you have to
230 do is define the VOPs and translations.
232 -- Different systems will offer different specialized storage resources
233 such as floating-point registers, and will have additional kinds of
234 primitive-types. The storage class mechanism handles a large part of this,
235 but there may be some problem in getting VMR conversion to realize the
236 possibly large hidden costs in implicit moves to and from these specialized
237 storage resources. Probably the answer is to have some sort of general
238 mechanism for determining the primitive-type for a TN given the Lisp type,
239 and then to have some sort of mechanism for automatically using specialized
240 Move VOPs when the source or destination has some particular primitive-type.
243 How to deal with list/null(symbol)/cons in primitive-type structure? Since
244 cons and symbol aren't used for type-specific template selection, it isn't
245 really all that critical. Probably Primitive-Type should return the List
246 primitive type for all of Cons, List and Null (indicating when it is exact).
247 This would allow type-dispatch for simple sequence functions (such as length)
248 to be done using the standard template-selection mechanism. [Not a wired
254 \chapter{VOP Definition}
256 Before the operand TN-refs are passed to the emit function, the following
258 -- The refs in the operand and result lists are linked together in order using
259 the Across slot. This list is properly NIL terminated.
260 -- The TN slot in each ref is set, and the ref is linked into that TN's refs
262 -- The Write-P slot is set depending on whether the ref is an argument or
264 -- The other slots have the default values.
266 The template emit function fills in the Vop, Costs, Cost-Function,
267 SC-Restriction and Preference slots, and links together the Next-Ref chain as
271 \section{Lifetime model}
274 Note in doc that the same TN may not be used as both a more operand and as any
275 other operand to the same VOP, to simplify more operand LTN number coalescing.
278 It seems we need a fairly elaborate model for intra-VOP conflicts in order to
279 allocate temporaries without introducing spurious conflicts. Consider the
280 important case of a VOP such as a miscop that must have operands in certain
281 registers. We allocate a wired temporary, create a local preference for the
282 corresponding operand, and move to (or from) the temporary. If all temporaries
283 conflict with all arguments, the result will be correct, but arguments could
284 never be packed in the actual passing register. If temporaries didn't conflict
285 with any arguments, then the temporary for an earlier argument might get packed
286 in the same location as the operand for a later argument; loading would then
287 destroy an argument before it was read.
289 A temporary's intra-VOP lifetime is represented by the times at which its life
290 starts and ends. There are various instants during the evaluation that start
291 and end VOP lifetimes. Two TNs conflict if the live intervals overlap.
292 Lifetimes are open intervals: if one TN's lifetime begins at a point where
293 another's ends, then the TNs don't conflict.
295 The times within a VOP are the following:
298 This is the beginning of the argument's lives, as far as intra-vop
299 conflicts are concerned. If load-TNs are allocated, then this is the
300 beginning of their lives.
303 The point at which the N'th argument is read for the last time (by this
304 VOP). If the argument is dead after this VOP, then the argument becomes
305 dead at this time, and may be reused as a temporary or result load-TN.
308 The N'th evaluation step. There may be any number of evaluation steps, but
309 it is unlikely that more than two are needed.
312 The point at which the N'th result is first written into. This is the
313 point at which that result becomes live.
316 Similar to :Load, but marks the end of time. This is point at which result
317 load-TNs are stored back to the actual location.
319 In any of the list-style time specifications, the keyword by itself stands for
320 the first such time, i.e.
321 :argument <==> (:argument 0)
324 Note that argument/result read/write times don't actually have to be in the
325 order specified, but they must *appear* to happen in that order as far as
326 conflict analysis is concerned. For example, the arguments can be read in any
327 order as long no TN is written that has a life beginning at or after
328 (:Argument <n>), where N is the number of an argument whose reading was
333 We probably also want some syntactic sugar in Define-VOP for automatically
334 moving operands to/from explicitly allocated temporaries so that this kind of
335 thing is somewhat easy. There isn't really any reason to consider the
336 temporary to be a load-TN, but we want to compute costs as though it was and
337 want to use the same operand loading routines.
339 We also might consider allowing the lifetime of an argument/result to be
340 extended forward/backward. This would in many cases eliminate the need for
341 temporaries when operands are read/written out of order.
345 \section{VOP Cost model}
347 Note that in this model, if a operand has no restrictions, it has no cost.
348 This makes make sense, since the purpose of the cost is to indicate the
349 relative value of packing in different SCs. If the operand isn't required to
350 be in a good SC (i.e. a register), then we might as well leave it in memory.
351 The SC restriction mechanism can be used even when doing a move into the SC is
352 too complex to be generated automatically (perhaps requiring temporary
353 registers), since Define-VOP allows operand loading to be done explicitly.
356 \section{Efficiency notes}
359 being used to tell whether a particular unsafe template might get emitted, we
360 can also use it to give better efficiency notes:
361 -- We can say what is wrong with the call types, rather than just saying we
363 -- We can tell whether any of the "better" templates could possibly apply,
364 i.e. is the inapplicability of a template because of inadequate type
365 information or because the type is just plain wrong. We don't want to
366 flame people when a template that couldn't possibly match doesn't match,
367 e.g. complaining that we can't use fixnum+ when the arguments are known to
371 This is how we give better efficiency notes:
373 The Template-Note is a short noun-like string without capitalization or
374 punctuation that describes what the template "does", i.e. we say
375 "Unable to do ~A, doing ~A instead."
377 The Cost is moved from the Vop-Info to the Template structure, and is used to
378 determine the "goodness" of possibly applicable templates. [Could flush
379 Template/Vop-Info distinction] The cost is used to choose the best applicable
380 template to emit, and also to determine what better templates we might have
383 A template is possibly applicable if there is an intersection between all of
384 the arg/result types and the corresponding arg/result restrictions, i.e. the
385 template is not clearly impossible: more declarations might allow it to be
389 \chapter{Assembler Retargeting}
392 \chapter{Writing Assembly Code}
396 You write when you port the assembler.)
398 Assembler interface like INST. Takes a label you made and says "stick it
401 Returns a new label suitable for use with EMIT-LABEL exactly once and
402 for referencing as often as necessary.
404 Recognizes and dispatches to instructions you defined for assembler.
406 This takes the number of zero bits you want in the low end of the address
407 of the next instruction.
410 Get ready for assembling stuff. Takes a VOP and arbitrary PROGN-style
411 body. Wrap these around instruction emission code announcing the first
412 pass of our assembler.
414 This returns a TN for the NFP if the caller uses the number stack, or
417 This returns the size of some storage based used by the currently
427 These move a value from a register to the control stack, or from the
428 control stack to a register. They take care of checking the TN types,
429 modifying offsets according to the address units per word, etc.
432 \chapter{Required VOPS}
435 Note: the move VOP cannot have any wired temps. (Move-Argument also?) This is
436 so we can move stuff into wired TNs without stepping on our toes.
439 We create set closure variables using the Value-Cell VOP, which takes a value
440 and returns a value cell containing the value. We can basically use this
441 instead of a Move VOP when initializing the variable. Value-Cell-Set and
442 Value-Cell-Ref are used to access the value cell. We can have a special effect
443 for value cells so that value cells references can be discovered to be common
444 subexpressions or loop invariants.
449 Represent unknown-values continuations as (start, count). Unknown values
450 continuations are always outside of the current frame (on stack top). Within a
451 function, we always set up and receive values in the standard passing
452 locations. If we receive stack values, then we must BLT them down to the start
453 of our frame, filling in any unsupplied values. If we generate unknown values
454 (i.e. PUSH-VALUES), then we set the values up in the standard locations, then
455 BLT them to stack top. When doing a tail-return of MVs, we just set them up in
456 the standard locations and decrement SP: no BLT is necessary.
458 Unknown argument call (MV-CALL) takes its arguments on stack top (is given a
459 base pointer). If not a tail call, then we just set the arg pointer to the
460 base pointer and call. If a tail call, we must BLT the arguments down to the
461 beginning of the current frame.
464 Implement more args by BLT'ing the more args *on top* of the current frame.
465 This solves two problems:
466 -- Any register more arguments can be made uniformly accessibly by copying
467 them into memory. [We can't store the registers in place, since the
468 beginning of the frame gets double use for storing the old-cont, return-pc
470 -- It solves the deallocation problem: the arguments will be deallocated when
471 the frame is returned from or a tail full call is done out of it. So
472 keyword args will be properly tail-recursive without any special mechanism
473 for squeezing out the more arg once the parsing is done. Note that a tail
474 local call won't blast the more arg, since in local call the callee just
475 takes the frame it is given (in this case containing the more arg).
477 More args in local call??? Perhaps we should not attempt local call conversion
478 in this case. We already special-case keyword args in local call. It seems
479 that the main importance of more args is primarily related to full call: it is
480 used for defining various kinds of frobs that need to take arbitrary arguments:
483 -- "Pass through" applications such as dispatch functions
485 Given the marginal importance of more args in local call, it seems unworth
486 going to any implementation difficulty. In fact, it seems that it would cause
487 complications both at the VMR level and also in the VM definition. This being
488 the case, we should flush it.
491 \section{Function Call}
495 \subsection{Registers and frame format}
497 These registers are used in function call and return:
500 In full call, the first three arguments. In unknown values return, the
501 first three return values.
504 The current frame pointer. In full call, this initially points to a
505 partial frame large enough to hold the passed stack arguments (zero-length
509 The current control stack top pointer.
512 In full call, the passing location for the frame to return to.
514 In unknown-values return of other than one value, the pointer to returned
515 stack values. In such a return, OCFP is always initialized to point to
516 the frame returned from, even when no stack values are returned. This
517 allows OCFP to be used to restore CSP.
520 In full call, the passing location for the return PC.
523 In full call, the number of arguments passed. In unknown-values return of
524 other than one value, the number of values returned.
527 \subsection{Full call}
529 What is our usage of CFP, OCFP and CSP?
531 It is an invariant that CSP always points after any useful information so that
532 at any time an interrupt can come and allocate stuff in the stack.
534 TR call is also a constraint: we can't deallocate the caller's frame before the
535 call, since it holds the stack arguments for the call.
537 What we do is have the caller set up CFP, and have the callee set CSP to CFP
538 plus the frame size. The caller leaves CSP alone: the callee is the one who
539 does any necessary stack deallocation.
541 In a TR call, we don't do anything: CFP is left as CFP, and CSP points to the
542 end of the frame, keeping the stack arguments from being trashed.
544 In a normal call, CFP is set to CSP, causing the callee's frame to be allocated
545 after the current frame.
548 \subsection{Unknown values return}
550 The unknown values return convention is always used in full call, and is used
551 in local call when the compiler either can't prove that a fixed number of
552 values are returned, or decides not to use the fixed values convention to allow
553 tail-recursive XEP calls.
555 The unknown-values return convention has variants: single value and variable
556 values. We make this distinction to optimize the important case of a returner
557 whose knows exactly one value is being returned. Note that it is possible to
558 return a single value using the variable-values convention, but it is less
561 We indicate single-value return by returning at the return-pc+4; variable value
562 return is indicated by returning at the return PC.
564 Single-value return makes only the following guarantees:
565 A0 holds the value returned.
566 CSP has been reset: there is no garbage on the stack.
568 In variable value return, more information is passed back:
569 A0..A2 hold the first three return values. If fewer than three values are
570 returned, then the unused registers are initialized to NIL.
572 OCFP points to the frame returned from. Note that because of our
573 tail-recursive implementation of call, the frame receiving the values is
574 always immediately under the frame returning the values. This means that
575 we can use OCFP to index the values when we access them, and to restore
576 CSP when we want to discard them.
578 NARGS holds the number of values returned.
580 CSP is always (+ OCFP (* NARGS 4)), i.e. there is room on the stack
581 allocated for all returned values, even if they are all actually passed in
585 \subsection{External Entry Points}
587 Things that need to be done on XEP entry:
589 2] Move more arg above the frame, saving context
590 3] Set up env, saving closure pointer if closure
591 4] Move arguments from closure to local home
592 Move old-cont and return-pc to the save locations
593 5] Argument count checking and dispatching
598 Copy-More-Arg <nargs-tn> 'fixed {in a3} => <context>, <count>
600 Setup-Closure-Environment => <closure>
601 Verify-Argument-Count <nargs-tn> 'count {for fixed-arg lambdas}
602 Argument-Count-Error <nargs-tn> {Drop-thru on hairy arg dispatching}
603 Use fast-if-=/fixnum and fast-if-</fixnum for dispatching.
606 make-closure <fun entry> <slot count> => <closure>
607 closure-init <closure> <values> 'slot
610 Things that need to be done on all function entry:
611 -- Move arguments to the variable home (consing value cells as necessary)
612 -- Move environment values to the local home
613 -- Move old-cont and return-pc to the save locations
618 Calling VOP's are a cross product of the following sets (with some members
621 multiple (all values)
622 fixed (calling with unknown values conventions, wanting a certain
624 known (only in local call where caller/callee agree on number of
626 tail (doesn't return but does tail call)
629 named (going through symbol, like full but stash fun name for error sys)
630 full (have a function)
632 fixed (number of args are known at compile-time)
633 variable (MULTIPLE-VALUE-CALL and APPLY)
635 Note on all jumps for calls and returns that we want to put some instruction
636 in the jump's delay slot(s).
638 Register usage at the time of the call:
641 This holds the lexical environment to use during the call if it's a closure,
642 and it is undefined otherwise.
645 This holds the symbol for a named call and garbage otherwise.
648 This holds the frame pointer, which the system restores upon return. The
649 callee saves this if necessary; this is passed as a pseudo-argument.
652 These holds the first n+1 arguments.
655 This holds the number of arguments, as a fixnum.
658 This holds the lisp-return-address object which indicates where to return.
659 For a tail call, this retains its current value. The callee saves this
660 if necessary; this is passed as a pseudo-argument.
663 This holds the function object being called.
666 The caller ignores this. The callee sets it as necessary based on CFP.
669 This holds the callee's frame pointer. Caller sets this to the new frame
670 pointer, which it remembered when it started computing arguments; this is
671 CSP if there were no stack arguments. For a tail call CFP retains its
675 The system uses this within a single function. A function using NSP must
676 allocate and deallocate before returning or making a tail call.
678 Register usage at the time of the return for single value return, which
679 goes with the unknown-values convention the caller used.
685 This holds the lisp-return-address at which the system continues executing.
688 This holds the CFP. That is, the stack is guaranteed to be clean, and there
689 is no code at the return site to adjust the CSP.
694 Additional register usage for multiple value return:
697 This holds the number of values returned.
700 These holds the first n+1 values, or NIL if there are less than n+1 values.
703 Returner stores CSP to hold its CFP + NARGS * <address units per word>
706 Returner stores this as its CFP, so the returnee has a handle on either
707 the start of the returned values on the stack.
710 ALLOCATE FULL CALL FRAME.
712 If the number of call arguments (passed to the VOP as an info argument)
713 indicates that there are stack arguments, then it makes some callee frame for
716 CSP <- CSP + value of VOP info arg times address units per word.
718 In a call sequence, move some arguments to the right places.
720 There's a variety of MOVE-ARGUMENT VOP's.
723 (variations determined by whether it's named, it's a tail call, there
724 is a variable arg count, etc.)
726 if variable number of arguments
727 NARGS <- (CSP - value of VOP argument) shift right by address units per word.
728 A0...An <- values off of VOP argument (just fill them all)
730 NARGS <- value of VOP info argument (always a constant)
733 OCFP <- value from VOP argument
734 LRA <- value from VOP argument
735 CFP stays the same since we reuse the frame
739 LRA <- compute LRA by adding an assemble-time determined constant to
741 CFP <- new frame pointer (remembered when starting to compute args)
742 This is CSP if no stack args.
743 when (current-nfp-tn VOP-self-pointer)
747 CNAME <- function symbol name
748 the-fun <- function object out of symbol
750 LEXENV <- the-fun (from previous line or VOP argument)
751 CODE <- function-entry (the first word after the-fun)
752 LIP <- calc first instruction addr (CODE + constant-offset)
753 jump and run off temp
755 <emit Lisp return address data-block>
756 <default and move return values OR receive return values>
757 when (current-nfp-tn VOP-self-pointer)
763 emit function header (maybe initializes offset back to component start,
764 but other pointers are set up at load-time. Pads
765 to dual-word boundary.)
766 CSP <- CFP + compile-time determined constant (frame size)
767 if the function uses the number stack
769 NSP <- NSP + compile-time determined constant (number stack frame size)
772 (either use this or the next one)
774 CODE <- CODE - assembler-time determined offset from function-entry back to
775 the code data-block address.
777 SETUP-CLOSURE-ENVIRONMENT
778 (either use this or the previous one)
779 After this the CLOSURE-REF VOP can reference closure variables.
782 CODE <- CODE - assembler-time determined offset from function-entry back to
783 the code data-block address.
786 RETURN and RETURN-MULTIPLE are for the unknown-values return convention.
787 For some previous caller this is either it wants n values (and it doesn't
788 know how many are coming), or it wants all the values returned (and it
789 doesn't know how many are coming).
793 (known fixed number of values, used with the unknown-values convention
795 When compiler invokes VOP, all values are already where they should be;
796 just get back to caller.
798 when (current-nfp-tn VOP-self-pointer)
799 ;; The number stack grows down in memory.
800 NSP <- NFP + number stack frame size for calls within the currently
802 times address units per word
803 CODE <- value of VOP argument with LRA
804 if VOP info arg is 1 (number of values we know we're returning)
806 LIP <- calc target addr
807 (CODE + skip over LRA header word + skip over address units per branch)
808 (The branch is in the caller to skip down to the MV code.)
810 NARGS <- value of VOP info arg
811 nil out unused arg regs
812 OCFP <- CFP (This indicates the start of return values on the stack,
813 but you leave space for those in registers for convenience.)
814 CSP <- CFP + NARGS * address-units-per-word
815 LIP <- calc target addr (CODE + skip over LRA header word)
816 CFP <- value of VOP argument with OCFP
820 (unknown number of values, used with the unknown-values convention in
822 When compiler invokes VOP, it gets TN's representing a pointer to the
823 values on the stack and how many values were computed.
825 when (current-nfp-tn VOP-self-pointer)
826 ;; The number stack grows down in memory.
827 NSP <- NFP + number stack frame size for calls within the currently
829 times address units per word
830 NARGS <- value of VOP argument
831 copy the args to the beginning of the current (returner's) frame.
832 Actually some go into the argument registers. When putting the rest at
833 the beginning of the frame, leave room for those in the argument registers.
834 CSP <- CFP + NARGS * address-units-per-word
835 nil out unused arg regs
836 OCFP <- CFP (This indicates the start of return values on the stack,
837 but you leave space for those in registers for convenience.)
838 CFP <- value of VOP argument with OCFP
839 CODE <- value of VOP argument with LRA
840 LIP <- calc target addr (CODE + skip over LRA header word)
845 The call VOP's call DEFAULT-UNKNOWN-VALUES or RECEIVE-UNKNOWN-VALUES after
846 spitting out transfer control to get stuff from the returner.
848 DEFAULT-UNKNOWN-VALUES
849 (We know what we want and we got something.)
850 If returnee wants one value, it never does anything to deal with a shortage
851 of return values. However, if start at PC, then it has to adjust the stack
852 pointer to dump extra values (move OCFP into CSP). If it starts at PC+N,
853 then it just goes along with the "want one value, got it" case.
854 If the returnee wants multiple values, and there's a shortage of return
855 values, there are two cases to handle. One, if the returnee wants fewer
856 values than there are return registers, and we start at PC+N, then it fills
857 in return registers A1..A<desired values necessary>; if we start at PC,
858 then the returnee is fine since the returning conventions have filled in
859 the unused return registers with nil, but the returnee must adjust the
860 stack pointer to dump possible stack return values (move OCFP to CSP).
861 Two, if the returnee wants more values than the number of return registers,
862 and it starts at PC+N (got one value), then it sets up returnee state as if
863 an unknown number of values came back:
867 OCFP gets CSP, so general code described below can move OCFP into CSP
868 If we start at PC, then branch down to the general "got k values, wanted n"
869 code which takes care of the following issues:
870 If k < n, fill in stack return values of nil for shortage of return
871 values and move OCFP into CSP
872 If k >= n, move OCFP into CSP
873 This also restores CODE from LRA by subtracting an assemble-time constant.
875 RECEIVE-UKNOWN-VALUES
876 (I want whatever I get.)
877 We want these at the end of our frame. When the returnee starts starts at
878 PC, it moves the return value registers to OCFP..OCFP[An] ignoring where
879 the end of the stack is and whether all the return value registers had
880 values. The returner left room on the stack before the stack return values
881 for the register return values. When the returnee starts at PC+N, bump CSP
882 by 1 and copy A0 there.
883 This also restores CODE from LRA by subtracting an assemble-time constant.
888 There are three flavors:
890 Uses known call convention where caller and callee agree where all
891 the values are, and there's a fixed number of return values.
893 Uses the unknown-values convention, but we expect a particular
894 number of values in return.
895 3] MULTIPLE-CALL-LOCAL
896 Uses the unknown-values convention, but we want all values returned.
900 If the number of call arguments (passed to the VOP as an info argument)
901 indicates that there are stack arguments, then it makes some callee frame for
904 CSP <- CSP + control stack frame size for calls within the currently
906 times address units per word.
907 when (callee-nfp-tn <VOP info arg holding callee>)
908 ;; The number stack grows down.
909 ;; May have to round to dual-word boundary if machines C calling
910 ;; conventions demand this.
911 NSP <- NSP - number stack frame size for calls within the currently
913 times address units per word
916 KNOWN-CALL-LOCAL, CALL-LOCAL, MULTIPLE-CALL-LOCAL
917 KNOWN-CALL-LOCAL has no need to affect CODE since CODE is the same for the
918 caller/returnee and the returner. This uses KNOWN-RETURN.
919 With CALL-LOCAL and MULTIPLE-CALL-LOCAL, the caller/returnee must fixup
920 CODE since the callee may do a tail full call. This happens in the code
921 emitted by DEFAULT-UNKNOWN-VALUES and RECEIVE-UNKNOWN-VALUES. We use these
922 return conventions since we don't know what kind of values the returner
923 will give us. This could happen due to a tail full call to an unknown
924 function, or because the callee had different return points that returned
925 various numbers of values.
927 when (current-nfp-tn VOP-self-pointer) ;Get VOP self-pointer with
928 ;DEFINE-VOP switch :vop-var.
930 CFP <- value of VOP arg
931 when (callee-nfp-tn <VOP info arg holding callee>)
932 <where-callee-wants-NFP-tn> <- value of VOP arg
933 <where-callee-wants-LRA-tn> <- compute LRA by adding an assemble-time
934 determined constant to CODE.
935 jump and run off VOP info arg holding start instruction for callee
937 <emit Lisp return address data-block>
938 <case call convention
940 call: default and move return values
941 multiple: receive return values
943 when (current-nfp-tn VOP-self-pointer)
949 when (current-nfp-tn VOP-self-pointer)
950 ;; number stack grows down in memory.
951 NSP <- NFP + number stack frame size for calls within the currently
953 times address units per word
954 LIP <- calc target addr (value of VOP arg + skip over LRA header word)
955 CFP <- value of VOP arg
961 \chapter{Standard Primitives}
964 \chapter{Customizing VMR Conversion}
966 Another way in which different implementations differ is in the relative cost
967 of operations. On machines without an integer multiply instruction, it may be
968 desirable to convert multiplication by a constant into shifts and adds, while
969 this is surely a bad idea on machines with hardware support for multiplication.
970 Part of the tuning process for an implementation will be adding implementation
971 dependent transforms and disabling undesirable standard transforms.
973 When practical, ICR transforms should be used instead of VMR generators, since
974 transforms are more portable and less error-prone. Note that the Lisp code
975 need not be implementation independent: it may contain all sorts of
976 sub-primitives and similar stuff. Generally a function should be implemented
977 using a transform instead of an VMR translator unless it cannot be implemented
978 as a transform due to being totally evil or it is just as easy to implement as
979 a translator because it is so simple.
982 \section{Constant Operands}
984 If the code emitted for a VOP when an argument is constant is very different
985 than the non-constant case, then it may be desirable to special-case the
986 operation in VMR conversion by emitting different VOPs. An example would be if
987 SVREF is only open-coded when the index is a constant, and turns into a miscop
988 call otherwise. We wouldn't want constant references to spuriously allocate
989 all the miscop linkage registers on the off chance that the offset might not be
990 constant. See the :constant feature of VOP primitive type restrictions.
993 \section{Supporting Multiple Hardware Configurations}
996 A winning way to change emitted code depending on the hardware configuration,
997 i.e. what FPA is present is to do this using primitive types. Note that the
998 Primitive-Type function is VM supplied, and can look at any appropriate
999 hardware configuration switches. Short-Float can become 6881-Short-Float,
1000 AFPA-Short-Float, etc. There would be separate SBs and SCs for the registers
1001 of each kind of FP hardware, with the each hardware-specific primitive type
1002 using the appropriate float register SC. Then the hardware specific templates
1003 would provide AFPA-Short-Float as the argument type restriction.
1005 Primitive type changes:
1007 The primitive-type structure is given a new %Type slot, which is the CType
1008 structure that is equivalent to this type. There is also a Guard slot, with,
1009 if true is a function that control whether this primitive type is allowed (due
1010 to hardware configuration, etc.)
1012 We add new :Type and :Guard keywords to Def-Primitive-Type. Type is the type
1013 specifier that is equivalent (default to the primitive-type name), and Guard is
1014 an expression evaluated in the null environment that controls whether this type
1015 applies (default to none, i.e. constant T).
1017 The Primitive-Type-Type function returns the Lisp CType corresponding to a
1018 primitive type. This is the %Type unless there is a guard that returns false,
1019 in which case it is the empty type (i.e. NIL).
1021 [But this doesn't do what we want it to do, since we will compute the
1022 function type for a template at load-time, so they will correspond to whatever
1023 configuration was in effect then. Maybe we don't want to dick with guards here
1024 (if at all). I guess we can defer this issue until we actually support
1025 different FP configurations. But it would seem pretty losing to separately
1026 flame about all the different FP configurations that could be used to open-code
1027 + whenever we are forced to closed-code +.
1029 If we separately report each better possibly applicable template that we
1030 couldn't use, then it would be reasonable to report any conditional template
1031 allowed by the configuration.
1033 But it would probably also be good to give some sort of hint that perhaps it
1034 would be a good time to make sure you understand how to tell the compiler to
1035 compile for a particular configuration. Perhaps if there is a template that
1036 applies *but for the guard*, then we could give a note. This way, if someone
1037 thinks they are being efficient by throwing in lots of declarations, we can let
1038 them know that they may have to do more.
1040 I guess the guard should be associated with the template rather than the
1041 primitive type. This would allow LTN and friends to easily tell whether a
1042 template applies in this configuration. It is also probably more natural for
1043 some sorts of things: with some hardware variants, it may be that the SBs and
1044 representations (SCs) are really the same, but there some different allowed
1045 operations. In this case, we could easily conditionalize VOPs without the
1046 increased complexity due to bogus SCs. If there are different storage
1047 resources, then we would conditionalize Primitive-Type as well.
1051 \section{Special-case VMR convert methods}
1053 (defun continuation-tn (cont \&optional (check-p t))
1055 Return the TN which holds Continuation's first result value. In general
1056 this may emit code to load the value into a TN. If Check-P is true, then
1057 when policy indicates, code should be emitted to check that the value satisfies
1058 the continuation asserted type.
1060 (defun result-tn (cont)
1062 Return the TN that Continuation's first value is delivered in. In general,
1063 may emit code to default any additional values to NIL.
1065 (defun result-tns (cont n)
1067 Similar to Result-TN, except that it returns a list of N result TNs, one
1068 for each of the first N values.
1071 Nearly all open-coded functions should be handled using standard template
1072 selection. Some (all?) exceptions:
1073 -- List, List* and Vector take arbitrary numbers of arguments. Could
1074 implement Vector as a source transform. Could even do List in a transform
1075 if we explicitly represent the stack args using %More-Args or something.
1076 -- %Typep varies a lot depending on the type specifier. We don't want to
1077 transform it, since we want %Typep as a canonical form so that we can do
1080 -- Funny functions emitted by the compiler: %Listify-Rest-Args, Arg,
1081 %More-Args, %Special-Bind, %Catch, %Unknown-Values (?), %Unwind-Protect,
1082 %Unwind, %%Primitive.