doc/cmucl/internals/retargeting.tex

   1 \part{Compiler Retargeting}
   2
   3 [\#\#\#
   4
   5 In general, it is a danger sign if a generator references a TN that isn't an
   6 operand or temporary, since lifetime analysis hasn't been done for that use.
   7 We are doing weird stuff for the old-cont and return-pc passing locations,
   8 hoping that the conflicts at the called function have the desired effect.
   9 Other stuff?  When a function returns unknown values, we don't reference the
  10 values locations when a single-value return is done.  But nothing is live at a
  11 return point anyway.
  12
  13
  14
  15 Have a way for template conversion to special-case constant arguments?
  16 How about:
  17     If an arg restriction is (:satisfies [<predicate function>]), and the
  18     corresponding argument is constant, with the constant value satisfying the
  19     predicate, then (if any other restrictions are satisfied), the template
  20     will be emitted with the literal value passed as an info argument.  If the
  21     predicate is omitted, then any constant will do.
  22
  23     We could sugar this up a bit by allowing (:member <object>*) for
  24     (:satisfies (lambda (x) (member x '(<object>*))))
  25
  26 We could allow this to be translated into a Lisp type by adding a new Constant
  27 type specifier.  This could only appear as an argument to a function type.
  28 To satisfy (Constant <type>), the argument must be a compile-time constant of
  29 the specified type.  Just Constant means any constant (i.e. (Constant *)).
  30 This would be useful for the type constraints on ICR transforms.
  31
  32
  33 Constant TNs: we count on being able to indirect to the leaf, and don't try to
  34 wedge the information into the offset.  We set the FSC to an appropriate
  35 immediate SC.
  36
  37     Allow "more operands" to VOPs in define-vop.  You can't do much with the
  38     more operands: define-vop just fills in the cost information according to
  39     the loading costs for a SC you specify.  You can't restrict more operands,
  40     and you can't make local preferences.  In the generator, the named variable
  41     is bound to the TN-ref for the first extra operand.  This should be good
  42     enough to handle all the variable arg VOPs (primarily function call and
  43     return).  Usually more operands are used just to get TN lifetimes to work
  44     out; the generator actually ignores them.
  45
  46     Variable-arg VOPs can't be used with the VOP macro.  You must use VOP*.
  47     VOP* doesn't do anything with these extra operand except stick them on the
  48     ends of the operand lists passed into the template.  VOP* is often useful
  49     within the convert functions for non-VOP templates, since it can emit a VOP
  50     using an already prepared TN-Ref list.
  51
  52
  53     It is pretty basic to the whole primitive-type idea that there is only one
  54     primitive-type for a given lisp type.  This is really the same as saying
  55     primitive types are disjoint.  A primitive type serves two somewhat
  56     unrelated purposes:
  57      -- It is an abstraction a Lisp type used to select type specific
  58         operations.  Originally kind of an efficiency hack, but it lets a
  59         template's type signature be used both for selection and operand
  60         representation determination.
  61      -- It represents a set of possible representations for a value (SCs).  The
  62         primitive type is used to determine the legal SCs for a TN, and is also
  63         used to determine which type-coercion/move VOP to use.
  64
  65 ]
  66
  67 There are basically three levels of target dependence:
  68
  69  -- Code in the "front end" (before VMR conversion) deals only with Lisp
  70     semantics, and is totally target independent.
  71
  72  -- Code after VMR conversion and before code generation depends on the VM,
  73     but should work with little modification across a wide range of
  74     "conventional" architectures.
  75
  76  -- Code generation depends on the machine's instruction set and other
  77     implementation details, so it will have to be redone for each
  78     implementation.  Most of the work here is in defining the translation into
  79     assembly code of all the supported VOPs.
  80
  81
  82 \f
  83 \chapter{Storage bases and classes}
  84 New interface: instead of CURRENT-FRAME-SIZE, have CURRENT-SB-SIZE <name> which
  85 returns the current element size of the named SB.
  86
  87 How can we have primitive types that overlap, i.e. (UNSIGNED-BYTE 32),
  88 (SIGNED-BYTE 32), FIXNUM?
  89 Primitive types are used for two things:
  90     Representation selection: which SCs can be used to represent this value?
  91         For this purpose, it isn't necessary that primitive types be disjoint,
  92         since any primitive type can choose an arbitrary set of
  93         representations.  For moves between the overlapping representations,
  94         the move/load operations can just be noops when the locations are the
  95         same (vanilla MOVE), since any bad moves should be caught out by type
  96         checking.
  97     VOP selection:
  98         Is this operand legal for this VOP?  When ptypes overlap in interesting
  99         ways, there is a problem with allowing just a simple ptype restriction,
 100         since we might want to allow multiple ptypes.  This could be handled
 101         by allowing "union primitive types", or by allowing multiple primitive
 102         types to be specified (only in the operand restriction.)  The latter
 103         would be long the lines of other more flexible VOP operand restriction
 104         mechanisms, (constant, etc.)
 105
 106
 107
 108 Ensure that load/save-operand never need to do representation conversion.
 109
 110 The PRIMITIVE-TYPE more/coerce info would be moved into the SC.  This could
 111 perhaps go along with flushing the TN-COSTS.  We would annotate the TN with
 112 best SC, which implies the representation (boxed or unboxed).  We would still
 113 need represent the legal SCs for restricted TNs somehow, and also would have to
 114 come up with some other way for pack to keep track of which SCs we have already
 115 tried.
 116
 117 A SC would have a list of "alternate" SCs and a boolean SAVE-P value that
 118 indicates it needs to be saved across calls in some non-SAVE-P SC.  A TN is
 119 initially given its "best" SC.  The SC is annotated with VOPs that are used for
 120 moving between the SC and its alternate SCs (load/save operand, save/restore
 121 register).  It is also annotated with the "move" VOPs used for moving between
 122 this SC and all other SCs it is possible to move between.  We flush the idea
 123 that there is only c-to-t and c-from-t.
 124
 125 But how does this mesh with the idea of putting operand load/save back into the
 126 generator?  Maybe we should instead specify a load/save function?  The
 127 load/save functions would also differ from the move VOPs in that they would
 128 only be called when the TN is in fact in that particular alternate SC, whereas
 129 the move VOPs will be associated with the primary SC, and will be emitted
 130 before it is known whether the TN will be packed in the primary SC or an
 131 alternate.
 132
 133 I guess a packed SC could also have immediate SCs as alternate SCs, and
 134 constant loading functions could be associated with SCs using this mechanism.
 135
 136 So given a TN packed in SC X and a SC restriction for Y and Z, how do we know
 137 which load function to call?  There would be ambiguity if X was an alternate
 138 for both Y and Z and they specified different load functions.  This seems
 139 unlikely to arise in practice, though, so we could just detect the ambiguity
 140 and give an error at define-vop time.  If they are doing something totally
 141 weird, they can always inhibit loading and roll their own.
 142
 143 Note that loading costs can be specified at the same time (same syntax) as
 144 association of loading functions with SCs.  It seems that maybe we will be
 145 rolling DEFINE-SAVE-SCS and DEFINE-MOVE-COSTS into DEFINE-STORAGE-CLASS.
 146
 147 Fortunately, these changes will affect most VOP definitions very little.
 148
 149
 150 A Storage Base represents a physical storage resource such as a register set or
 151 stack frame.  Storage bases for non-global resources such as the stack are
 152 relativized by the environment that the TN is allocated in.  Packing conflict
 153 information is kept in the storage base, but non-packed storage resources such
 154 as closure environments also have storage bases.
 155 Some storage bases:
 156     General purpose registers
 157     Floating point registers
 158     Boxed (control) stack environment
 159     Unboxed (number) stack environment
 160     Closure environment
 161
 162 A storage class is a potentially arbitrary set of the elements in a storage
 163 base.  Although conceptually there may be a hierarchy of storage classes such
 164 as "all registers", "boxed registers", "boxed scratch registers", this doesn't
 165 exist at the implementation level.  Such things can be done by specifying
 166 storage classes whose locations overlap.  A TN shouldn't have lots of
 167 overlapping SC's as legal SC's, since time would be wasted repeatedly
 168 attempting to pack in the same locations.
 169
 170 There will be some SC's whose locations overlap a great deal, since we get Pack
 171 to do our representation analysis by having lots of SC's.  A SC is basically a
 172 way of looking at a storage resource.  Although we could keep a fixnum and an
 173 unboxed representation of the same number in the same register, they correspond
 174 to different SC's since they are different representation choices.
 175
 176 TNs are annotated with the primitive type of the object that they hold:
 177     T: random boxed object with only one representation.
 178     Fixnum, Integer, XXX-Float: Object is always of the specified numeric type.
 179     String-Char: Object is always a string-char.
 180
 181 When a TN is packed, it is annotated with the SC it was packed into.  The code
 182 generator for a VOP must be able to uniquely determine the representation of
 183 its operands from the SC. (debugger also...)
 184
 185 Some SCs:
 186     Reg: any register (immediate objects)
 187     Save-Reg: a boxed register near r15 (registers easily saved in a call)
 188     Boxed-Reg: any boxed register (any boxed object)
 189     Unboxed-Reg: any unboxed register (any unboxed object)
 190     Float-Reg, Double-Float-Reg: float in FP register.
 191     Stack: boxed object on the stack (on cstack)
 192     Word: any 32bit unboxed object on nstack.
 193     Double: any 64bit unboxed object on nstack.
 194
 195 We have a number of non-packed storage classes which serve to represent access
 196 costs associated with values that are not allocated using conflicts
 197 information.  Non-packed TNs appear to already be packed in the appropriate
 198 storage base so that Pack doesn't get confused.  Costs for relevant non-packed
 199 SC's appear in the TN-Ref cost information, but need not ever be summed into
 200 the TN cost vectors, since TNs cannot be packed into them.
 201
 202 There are SCs for non-immediate constants and for each significant kind of
 203 immediate operand in the architecture.  On the RT, 4, 8 and 20 bit integer SCs
 204 are probably worth having.
 205
 206 Non-packed SCs:
 207     Constant
 208     Immediate constant SCs:
 209         Signed-Byte-<N>, Unsigned-Byte-<N>, for various architecture dependent
 210             values of <N>
 211         String-Char
 212         XXX-Float
 213         Magic values: T, NIL, 0.
 214
 215 \f
 216 \chapter{Type system parameterization}
 217
 218 The main aspect of the VM that is likely to vary for good reason is the type
 219 system:
 220
 221  -- Different systems will have different ways of representing dynamic type
 222     information.  The primary effect this has on the compiler is causing VMR
 223     conversion of type tests and checks to be implementation dependent.
 224     Rewriting this code for each implementation shouldn't be a big problem,
 225     since the portable semantics of types has already been dealt with.
 226
 227  -- Different systems will have different specialized number and array types,
 228     and different VOPs specialized for these types.  It is easy add this kind
 229     of knowledge without affecting the rest of the compiler.  All you have to
 230     do is define the VOPs and translations.
 231
 232  -- Different systems will offer different specialized storage resources
 233     such as floating-point registers, and will have additional kinds of
 234     primitive-types.  The storage class mechanism handles a large part of this,
 235     but there may be some problem in getting VMR conversion to realize the
 236     possibly large hidden costs in implicit moves to and from these specialized
 237     storage resources.  Probably the answer is to have some sort of general
 238     mechanism for determining the primitive-type for a TN given the Lisp type,
 239     and then to have some sort of mechanism for automatically using specialized
 240     Move VOPs when the source or destination has some particular primitive-type.
 241
 242 \#|
 243 How to deal with list/null(symbol)/cons in primitive-type structure?  Since
 244 cons and symbol aren't used for type-specific template selection, it isn't
 245 really all that critical.  Probably Primitive-Type should return the List
 246 primitive type for all of Cons, List and Null (indicating when it is exact).
 247 This would allow type-dispatch for simple sequence functions (such as length)
 248 to be done using the standard template-selection mechanism.  [Not a wired
 249 assumption]
 250 |\#
 251
 252
 253 \f
 254 \chapter{VOP Definition}
 255
 256 Before the operand TN-refs are passed to the emit function, the following
 257 stuff is done:
 258  -- The refs in the operand and result lists are linked together in order using
 259     the Across slot.  This list is properly NIL terminated.
 260  -- The TN slot in each ref is set, and the ref is linked into that TN's refs
 261     using the Next slot.
 262  -- The Write-P slot is set depending on whether the ref is an argument or
 263     result.
 264  -- The other slots have the default values.
 265
 266 The template emit function fills in the Vop, Costs, Cost-Function,
 267 SC-Restriction and Preference slots, and links together the Next-Ref chain as
 268 appropriate.
 269
 270 \f
 271 \section{Lifetime model}
 272
 273 \#|
 274 Note in doc that the same TN may not be used as both a more operand and as any
 275 other operand to the same VOP, to simplify more operand LTN number coalescing.
 276 |\#
 277
 278 It seems we need a fairly elaborate model for intra-VOP conflicts in order to
 279 allocate temporaries without introducing spurious conflicts.  Consider the
 280 important case of a VOP such as a miscop that must have operands in certain
 281 registers.  We allocate a wired temporary, create a local preference for the
 282 corresponding operand, and move to (or from) the temporary.  If all temporaries
 283 conflict with all arguments, the result will be correct, but arguments could
 284 never be packed in the actual passing register.  If temporaries didn't conflict
 285 with any arguments, then the temporary for an earlier argument might get packed
 286 in the same location as the operand for a later argument; loading would then
 287 destroy an argument before it was read.
 288
 289 A temporary's intra-VOP lifetime is represented by the times at which its life
 290 starts and ends.  There are various instants during the evaluation that start
 291 and end VOP lifetimes.  Two TNs conflict if the live intervals overlap.
 292 Lifetimes are open intervals: if one TN's lifetime begins at a point where
 293 another's ends, then the TNs don't conflict.
 294
 295 The times within a VOP are the following:
 296
 297 :Load
 298     This is the beginning of the argument's lives, as far as intra-vop
 299     conflicts are concerned.  If load-TNs are allocated, then this is the
 300     beginning of their lives.
 301
 302 (:Argument <n>)
 303     The point at which the N'th argument is read for the last time (by this
 304     VOP).  If the argument is dead after this VOP, then the argument becomes
 305     dead at this time, and may be reused as a temporary or result load-TN.
 306
 307 (:Eval <n>)
 308     The N'th evaluation step.  There may be any number of evaluation steps, but
 309     it is unlikely that more than two are needed.
 310
 311 (:Result <n>)
 312     The point at which the N'th result is first written into.  This is the
 313     point at which that result becomes live.
 314
 315 :Save
 316     Similar to :Load, but marks the end of time.  This is point at which result
 317     load-TNs are stored back to the actual location.
 318
 319 In any of the list-style time specifications, the keyword by itself stands for
 320 the first such time, i.e.
 321     :argument  <==>  (:argument 0)
 322
 323
 324 Note that argument/result read/write times don't actually have to be in the
 325 order specified, but they must *appear* to happen in that order as far as
 326 conflict analysis is concerned.  For example, the arguments can be read in any
 327 order as long no TN is written that has a life beginning at or after
 328 (:Argument <n>), where N is the number of an argument whose reading was
 329 postponed.
 330
 331 [\#\#\# (???)
 332
 333 We probably also want some syntactic sugar in Define-VOP for automatically
 334 moving operands to/from explicitly allocated temporaries so that this kind of
 335 thing is somewhat easy.  There isn't really any reason to consider the
 336 temporary to be a load-TN, but we want to compute costs as though it was and
 337 want to use the same operand loading routines.
 338
 339 We also might consider allowing the lifetime of an argument/result to be
 340 extended forward/backward.  This would in many cases eliminate the need for
 341 temporaries when operands are read/written out of order.
 342 ]
 343
 344 \f
 345 \section{VOP Cost model}
 346
 347 Note that in this model, if a operand has no restrictions, it has no cost.
 348 This makes make sense, since the purpose of the cost is to indicate the
 349 relative value of packing in different SCs.  If the operand isn't required to
 350 be in a good SC (i.e. a register), then we might as well leave it in memory.
 351 The SC restriction mechanism can be used even when doing a move into the SC is
 352 too complex to be generated automatically (perhaps requiring temporary
 353 registers), since Define-VOP allows operand loading to be done explicitly.
 354
 355 \f
 356 \section{Efficiency notes}
 357
 358   In addition to
 359 being used to tell whether a particular unsafe template might get emitted, we
 360 can also use it to give better efficiency notes:
 361  -- We can say what is wrong with the call types, rather than just saying we
 362     failed to open-code.
 363  -- We can tell whether any of the "better" templates could possibly apply,
 364     i.e. is the inapplicability of a template because of inadequate type
 365     information or because the type is just plain wrong.  We don't want to
 366     flame people when a template that couldn't possibly match doesn't match,
 367     e.g. complaining that we can't use fixnum+ when the arguments are known to
 368     be floats.
 369
 370
 371 This is how we give better efficiency notes:
 372
 373 The Template-Note is a short noun-like string without capitalization or
 374 punctuation that describes what the template "does", i.e. we say
 375 "Unable to do ~A, doing ~A instead."
 376
 377 The Cost is moved from the Vop-Info to the Template structure, and is used to
 378 determine the "goodness" of possibly applicable templates.  [Could flush
 379 Template/Vop-Info distinction]  The cost is used to choose the best applicable
 380 template to emit, and also to determine what better templates we might have
 381 been able to use.
 382
 383 A template is possibly applicable if there is an intersection between all of
 384 the arg/result types and the corresponding arg/result restrictions, i.e. the
 385 template is not clearly impossible: more declarations might allow it to be
 386 emitted.
 387
 388 \f
 389 \chapter{Assembler Retargeting}
 390
 391 \f
 392 \chapter{Writing Assembly Code}
 393
 394 VOP writers expect:
 395    MOVE
 396       You write when you port the assembler.)
 397    EMIT-LABEL
 398       Assembler interface like INST.  Takes a label you made and says "stick it
 399       here."
 400    GEN-LABEL
 401       Returns a new label suitable for use with EMIT-LABEL exactly once and
 402       for referencing as often as necessary.
 403    INST
 404       Recognizes and dispatches to instructions you defined for assembler.
 405    ALIGN
 406       This takes the number of zero bits you want in the low end of the address
 407       of the next instruction.
 408    ASSEMBLE
 409    ASSEMBLE-ELSEWHERE
 410       Get ready for assembling stuff.  Takes a VOP and arbitrary PROGN-style
 411       body.  Wrap these around instruction emission code announcing the first
 412       pass of our assembler.
 413    CURRENT-NFP-TN
 414       This returns a TN for the NFP if the caller uses the number stack, or
 415       nil.
 416    SB-ALLOCATED-SIZE
 417       This returns the size of some storage based used by the currently
 418       compiling component.
 419    ...
 420
 421 ;;;
 422 ;;; VOP idioms
 423 ;;;
 424
 425 STORE-STACK-TN
 426 LOAD-STACK-TN
 427    These move a value from a register to the control stack, or from the
 428    control stack to a register.  They take care of checking the TN types,
 429    modifying offsets according to the address units per word, etc.
 430
 431 \f
 432 \chapter{Required VOPS}
 433
 434
 435 Note: the move VOP cannot have any wired temps.  (Move-Argument also?)  This is
 436 so we can move stuff into wired TNs without stepping on our toes.
 437
 438
 439 We create set closure variables using the Value-Cell VOP, which takes a value
 440 and returns a value cell containing the value.  We can basically use this
 441 instead of a Move VOP when initializing the variable.  Value-Cell-Set and
 442 Value-Cell-Ref are used to access the value cell.  We can have a special effect
 443 for value cells so that value cells references can be discovered to be common
 444 subexpressions or loop invariants.
 445
 446
 447
 448
 449 Represent unknown-values continuations as (start, count).  Unknown values
 450 continuations are always outside of the current frame (on stack top).  Within a
 451 function, we always set up and receive values in the standard passing
 452 locations.  If we receive stack values, then we must BLT them down to the start
 453 of our frame, filling in any unsupplied values.  If we generate unknown values
 454 (i.e. PUSH-VALUES), then we set the values up in the standard locations, then
 455 BLT them to stack top.  When doing a tail-return of MVs, we just set them up in
 456 the standard locations and decrement SP: no BLT is necessary.
 457
 458 Unknown argument call (MV-CALL) takes its arguments on stack top (is given a
 459 base pointer).  If not a tail call, then we just set the arg pointer to the
 460 base pointer and call.  If a tail call, we must BLT the arguments down to the
 461 beginning of the current frame.
 462
 463
 464 Implement more args by BLT'ing the more args *on top* of the current frame.
 465 This solves two problems:
 466  -- Any register more arguments can be made uniformly accessibly by copying
 467     them into memory.  [We can't store the registers in place, since the
 468     beginning of the frame gets double use for storing the old-cont, return-pc
 469     and env.]
 470  -- It solves the deallocation problem: the arguments will be deallocated when
 471     the frame is returned from or a tail full call is done out of it.  So
 472     keyword args will be properly tail-recursive without any special mechanism
 473     for squeezing out the more arg once the parsing is done.  Note that a tail
 474     local call won't blast the more arg, since in local call the callee just
 475     takes the frame it is given (in this case containing the more arg).
 476
 477 More args in local call???  Perhaps we should not attempt local call conversion
 478 in this case.  We already special-case keyword args in local call.  It seems
 479 that the main importance of more args is primarily related to full call: it is
 480 used for defining various kinds of frobs that need to take arbitrary arguments:
 481  -- Keyword arguments
 482  -- Interpreter stubs
 483  -- "Pass through" applications such as dispatch functions
 484
 485 Given the marginal importance of more args in local call, it seems unworth
 486 going to any implementation difficulty.  In fact, it seems that it would cause
 487 complications both at the VMR level and also in the VM definition.  This being
 488 the case, we should flush it.
 489
 490
 491 \section{Function Call}
 492
 493
 494 \f
 495 \subsection{Registers and frame format}
 496
 497 These registers are used in function call and return:
 498
 499 A0..A{\it n}
 500     In full call, the first three arguments.  In unknown values return, the
 501     first three return values.
 502
 503 CFP
 504     The current frame pointer.  In full call, this initially points to a
 505     partial frame large enough to hold the passed stack arguments (zero-length
 506     if none).
 507
 508 CSP
 509     The current control stack top pointer.
 510
 511 OCFP
 512     In full call, the passing location for the frame to return to.
 513
 514     In unknown-values return of other than one value, the pointer to returned
 515     stack values.  In such a return, OCFP is always initialized to point to
 516     the frame returned from, even when no stack values are returned.  This
 517     allows OCFP to be used to restore CSP.
 518
 519 LRA
 520     In full call, the passing location for the return PC.
 521
 522 NARGS
 523     In full call, the number of arguments passed.  In unknown-values return of
 524     other than one value, the number of values returned.
 525
 526 \f
 527 \subsection{Full call}
 528
 529 What is our usage of CFP, OCFP and CSP?
 530
 531 It is an invariant that CSP always points after any useful information so that
 532 at any time an interrupt can come and allocate stuff in the stack.
 533
 534 TR call is also a constraint: we can't deallocate the caller's frame before the
 535 call, since it holds the stack arguments for the call.
 536
 537 What we do is have the caller set up CFP, and have the callee set CSP to CFP
 538 plus the frame size.  The caller leaves CSP alone: the callee is the one who
 539 does any necessary stack deallocation.
 540
 541 In a TR call, we don't do anything: CFP is left as CFP, and CSP points to the
 542 end of the frame, keeping the stack arguments from being trashed.
 543
 544 In a normal call, CFP is set to CSP, causing the callee's frame to be allocated
 545 after the current frame.
 546
 547 \f
 548 \subsection{Unknown values return}
 549
 550 The unknown values return convention is always used in full call, and is used
 551 in local call when the compiler either can't prove that a fixed number of
 552 values are returned, or decides not to use the fixed values convention to allow
 553 tail-recursive XEP calls.
 554
 555 The unknown-values return convention has variants: single value and variable
 556 values.  We make this distinction to optimize the important case of a returner
 557 whose knows exactly one value is being returned.  Note that it is possible to
 558 return a single value using the variable-values convention, but it is less
 559 efficient.
 560
 561 We indicate single-value return by returning at the return-pc+4; variable value
 562 return is indicated by returning at the return PC.
 563
 564 Single-value return makes only the following guarantees:
 565     A0 holds the value returned.
 566     CSP has been reset: there is no garbage on the stack.
 567
 568 In variable value return, more information is passed back:
 569     A0..A2 hold the first three return values.  If fewer than three values are
 570     returned, then the unused registers are initialized to NIL.
 571
 572     OCFP points to the frame returned from.  Note that because of our
 573     tail-recursive implementation of call, the frame receiving the values is
 574     always immediately under the frame returning the values.  This means that
 575     we can use OCFP to index the values when we access them, and to restore
 576     CSP when we want to discard them.
 577
 578     NARGS holds the number of values returned.
 579
 580     CSP is always (+ OCFP (* NARGS 4)), i.e. there is room on the stack
 581     allocated for all returned values, even if they are all actually passed in
 582     registers.
 583
 584 \f
 585 \subsection{External Entry Points}
 586
 587 Things that need to be done on XEP entry:
 588  1] Allocate frame
 589  2] Move more arg above the frame, saving context
 590  3] Set up env, saving closure pointer if closure
 591  4] Move arguments from closure to local home
 592     Move old-cont and return-pc to the save locations
 593  5] Argument count checking and dispatching
 594
 595 XEP VOPs:
 596
 597 Allocate-Frame
 598 Copy-More-Arg <nargs-tn> 'fixed {in a3} => <context>, <count>
 599 Setup-Environment
 600 Setup-Closure-Environment => <closure>
 601 Verify-Argument-Count <nargs-tn> 'count {for fixed-arg lambdas}
 602 Argument-Count-Error <nargs-tn> {Drop-thru on hairy arg dispatching}
 603 Use fast-if-=/fixnum and fast-if-</fixnum for dispatching.
 604
 605 Closure vops:
 606 make-closure <fun entry> <slot count> => <closure>
 607 closure-init <closure> <values> 'slot
 608
 609
 610 Things that need to be done on all function entry:
 611  -- Move arguments to the variable home (consing value cells as necessary)
 612  -- Move environment values to the local home
 613  -- Move old-cont and return-pc to the save locations
 614
 615 \f
 616 \section{Calls}
 617
 618 Calling VOP's are a cross product of the following sets (with some members
 619 missing):
 620    Return values
 621       multiple (all values)
 622       fixed (calling with unknown values conventions, wanting a certain
 623              number.)
 624       known (only in local call where caller/callee agree on number of
 625              values.)
 626       tail (doesn't return but does tail call)
 627    What function
 628       local
 629       named (going through symbol, like full but stash fun name for error sys)
 630       full (have a function)
 631    Args
 632       fixed (number of args are known at compile-time)
 633       variable (MULTIPLE-VALUE-CALL and APPLY)
 634
 635 Note on all jumps for calls and returns that we want to put some instruction
 636 in the jump's delay slot(s).
 637
 638 Register usage at the time of the call:
 639
 640 LEXENV
 641    This holds the lexical environment to use during the call if it's a closure,
 642    and it is undefined otherwise.
 643
 644 CNAME
 645    This holds the symbol for a named call and garbage otherwise.
 646
 647 OCFP
 648    This holds the frame pointer, which the system restores upon return.  The
 649    callee saves this if necessary; this is passed as a pseudo-argument.
 650
 651 A0 ... An
 652    These holds the first n+1 arguments.
 653
 654 NARGS
 655    This holds the number of arguments, as a fixnum.
 656
 657 LRA
 658    This holds the lisp-return-address object which indicates where to return.
 659    For a tail call, this retains its current value.  The callee saves this
 660    if necessary; this is passed as a pseudo-argument.
 661
 662 CODE
 663    This holds the function object being called.
 664
 665 CSP
 666    The caller ignores this.  The callee sets it as necessary based on CFP.
 667
 668 CFP
 669    This holds the callee's frame pointer.  Caller sets this to the new frame
 670    pointer, which it remembered when it started computing arguments; this is
 671    CSP if there were no stack arguments.  For a tail call CFP retains its
 672    current value.
 673
 674 NSP
 675    The system uses this within a single function.  A function using NSP must
 676    allocate and deallocate before returning or making a tail call.
 677
 678 Register usage at the time of the return for single value return, which
 679 goes with the unknown-values convention the caller used.
 680
 681 A0
 682    The holds the value.
 683
 684 CODE
 685    This holds the lisp-return-address at which the system continues executing.
 686
 687 CSP
 688    This holds the CFP.  That is, the stack is guaranteed to be clean, and there
 689    is no code at the return site to adjust the CSP.
 690
 691 CFP
 692    This holds the OCFP.
 693
 694 Additional register usage for multiple value return:
 695
 696 NARGS
 697    This holds the number of values returned.
 698
 699 A0 ... An
 700    These holds the first n+1 values, or NIL if there are less than n+1 values.
 701
 702 CSP
 703    Returner stores CSP to hold its CFP + NARGS * <address units per word>
 704
 705 OCFP
 706    Returner stores this as its CFP, so the returnee has a handle on either
 707    the start of the returned values on the stack.
 708
 709
 710 ALLOCATE FULL CALL FRAME.
 711
 712 If the number of call arguments (passed to the VOP as an info argument)
 713 indicates that there are stack arguments, then it makes some callee frame for
 714 arguments:
 715    VOP-result <- CSP
 716    CSP <- CSP + value of VOP info arg times address units per word.
 717
 718 In a call sequence, move some arguments to the right places.
 719
 720 There's a variety of MOVE-ARGUMENT VOP's.
 721
 722 FULL CALL VOP'S
 723 (variations determined by whether it's named, it's a tail call, there
 724 is a variable arg count, etc.)
 725
 726   if variable number of arguments
 727     NARGS <- (CSP - value of VOP argument) shift right by address units per word.
 728     A0...An <- values off of VOP argument (just fill them all)
 729   else
 730     NARGS <- value of VOP info argument (always a constant)
 731
 732   if tail call
 733     OCFP <- value from VOP argument
 734     LRA <- value from VOP argument
 735     CFP stays the same since we reuse the frame
 736     NSP <- NFP
 737   else
 738     OCFP <- CFP
 739     LRA <- compute LRA by adding an assemble-time determined constant to
 740            CODE.
 741     CFP <- new frame pointer (remembered when starting to compute args)
 742            This is CSP if no stack args.
 743     when (current-nfp-tn VOP-self-pointer)
 744       stack-temp <- NFP
 745
 746   if named
 747     CNAME <- function symbol name
 748     the-fun <- function object out of symbol
 749
 750   LEXENV <- the-fun (from previous line or VOP argument)
 751   CODE <- function-entry (the first word after the-fun)
 752   LIP <- calc first instruction addr (CODE + constant-offset)
 753   jump and run off temp
 754
 755   <emit Lisp return address data-block>
 756   <default and move return values OR receive return values>
 757   when (current-nfp-tn VOP-self-pointer)
 758     NFP <- stack-temp
 759
 760 Callee:
 761
 762 XEP-ALLOCATE-FRAME
 763   emit function header (maybe initializes offset back to component start,
 764                         but other pointers are set up at load-time.  Pads
 765                         to dual-word boundary.)
 766   CSP <- CFP + compile-time determined constant (frame size)
 767   if the function uses the number stack
 768     NFP <- NSP
 769     NSP <- NSP + compile-time determined constant (number stack frame size)
 770
 771 SETUP-ENVIRONMENT
 772 (either use this or the next one)
 773
 774 CODE <- CODE - assembler-time determined offset from function-entry back to
 775                the code data-block address.
 776
 777 SETUP-CLOSURE-ENVIRONMENT
 778 (either use this or the previous one)
 779 After this the CLOSURE-REF VOP can reference closure variables.
 780
 781 VOP-result <- LEXENV
 782 CODE <- CODE - assembler-time determined offset from function-entry back to
 783                the code data-block address.
 784
 785 Return VOP's
 786 RETURN and RETURN-MULTIPLE are for the unknown-values return convention.
 787 For some previous caller this is either it wants n values (and it doesn't
 788 know how many are coming), or it wants all the values returned (and it
 789 doesn't know how many are coming).
 790
 791
 792 RETURN
 793 (known fixed number of values, used with the unknown-values convention
 794  in the caller.)
 795 When compiler invokes VOP, all values are already where they should be;
 796 just get back to caller.
 797
 798 when (current-nfp-tn VOP-self-pointer)
 799   ;; The number stack grows down in memory.
 800   NSP <- NFP + number stack frame size for calls within the currently
 801                   compiling component
 802                times address units per word
 803 CODE <- value of VOP argument with LRA
 804 if VOP info arg is 1 (number of values we know we're returning)
 805   CSP <- CFP
 806   LIP <- calc target addr
 807           (CODE + skip over LRA header word + skip over address units per branch)
 808           (The branch is in the caller to skip down to the MV code.)
 809 else
 810   NARGS <- value of VOP info arg
 811   nil out unused arg regs
 812   OCFP <- CFP  (This indicates the start of return values on the stack,
 813                 but you leave space for those in registers for convenience.)
 814   CSP <- CFP + NARGS * address-units-per-word
 815   LIP <- calc target addr (CODE + skip over LRA header word)
 816 CFP <- value of VOP argument with OCFP
 817 jump and run off LIP
 818
 819 RETURN-MULTIPLE
 820 (unknown number of values, used with the unknown-values convention in
 821  the caller.)
 822 When compiler invokes VOP, it gets TN's representing a pointer to the
 823 values on the stack and how many values were computed.
 824
 825 when (current-nfp-tn VOP-self-pointer)
 826   ;; The number stack grows down in memory.
 827   NSP <- NFP + number stack frame size for calls within the currently
 828                   compiling component
 829                times address units per word
 830 NARGS <- value of VOP argument
 831 copy the args to the beginning of the current (returner's) frame.
 832    Actually some go into the argument registers.  When putting the rest at
 833    the beginning of the frame, leave room for those in the argument registers.
 834 CSP <- CFP + NARGS * address-units-per-word
 835 nil out unused arg regs
 836 OCFP <- CFP  (This indicates the start of return values on the stack,
 837               but you leave space for those in registers for convenience.)
 838 CFP <- value of VOP argument with OCFP
 839 CODE <- value of VOP argument with LRA
 840 LIP <- calc target addr (CODE + skip over LRA header word)
 841 jump and run off LIP
 842
 843
 844 Returnee
 845 The call VOP's call DEFAULT-UNKNOWN-VALUES or RECEIVE-UNKNOWN-VALUES after
 846 spitting out transfer control to get stuff from the returner.
 847
 848 DEFAULT-UNKNOWN-VALUES
 849 (We know what we want and we got something.)
 850 If returnee wants one value, it never does anything to deal with a shortage
 851 of return values.  However, if start at PC, then it has to adjust the stack
 852 pointer to dump extra values (move OCFP into CSP).  If it starts at PC+N,
 853 then it just goes along with the "want one value, got it" case.
 854 If the returnee wants multiple values, and there's a shortage of return
 855 values, there are two cases to handle.  One, if the returnee wants fewer
 856 values than there are return registers, and we start at PC+N, then it fills
 857 in return registers A1..A<desired values necessary>; if we start at PC,
 858 then the returnee is fine since the returning conventions have filled in
 859 the unused return registers with nil, but the returnee must adjust the
 860 stack pointer to dump possible stack return values (move OCFP to CSP).
 861 Two, if the returnee wants more values than the number of return registers,
 862 and it starts at PC+N (got one value), then it sets up returnee state as if
 863 an unknown number of values came back:
 864    A0 has the one value
 865    A1..An get nil
 866    NARGS gets 1
 867    OCFP gets CSP, so general code described below can move OCFP into CSP
 868 If we start at PC, then branch down to the general "got k values, wanted n"
 869 code which takes care of the following issues:
 870    If k < n, fill in stack return values of nil for shortage of return
 871       values and move OCFP into CSP
 872    If k >= n, move OCFP into CSP
 873 This also restores CODE from LRA by subtracting an assemble-time constant.
 874
 875 RECEIVE-UKNOWN-VALUES
 876 (I want whatever I get.)
 877 We want these at the end of our frame.  When the returnee starts starts at
 878 PC, it moves the return value registers to OCFP..OCFP[An] ignoring where
 879 the end of the stack is and whether all the return value registers had
 880 values.  The returner left room on the stack before the stack return values
 881 for the register return values.  When the returnee starts at PC+N, bump CSP
 882 by 1 and copy A0 there.
 883 This also restores CODE from LRA by subtracting an assemble-time constant.
 884
 885
 886 Local call
 887
 888 There are three flavors:
 889    1] KNOWN-CALL-LOCAL
 890       Uses known call convention where caller and callee agree where all
 891       the values are, and there's a fixed number of return values.
 892    2] CALL-LOCAL
 893       Uses the unknown-values convention, but we expect a particular
 894       number of values in return.
 895    3] MULTIPLE-CALL-LOCAL
 896       Uses the unknown-values convention, but we want all values returned.
 897
 898 ALLOCATE-FRAME
 899
 900 If the number of call arguments (passed to the VOP as an info argument)
 901 indicates that there are stack arguments, then it makes some callee frame for
 902 arguments:
 903    VOP-result1 <- CSP
 904    CSP <- CSP + control stack frame size for calls within the currently
 905                    compiling component
 906                 times address units per word.
 907    when (callee-nfp-tn <VOP info arg holding callee>)
 908      ;; The number stack grows down.
 909      ;; May have to round to dual-word boundary if machines C calling
 910      ;;    conventions demand this.
 911      NSP <- NSP - number stack frame size for calls within the currently
 912                      compiling component
 913                   times address units per word
 914      VOP-result2 <- NSP
 915
 916 KNOWN-CALL-LOCAL, CALL-LOCAL, MULTIPLE-CALL-LOCAL
 917 KNOWN-CALL-LOCAL has no need to affect CODE since CODE is the same for the
 918 caller/returnee and the returner.  This uses KNOWN-RETURN.
 919 With CALL-LOCAL and MULTIPLE-CALL-LOCAL, the caller/returnee must fixup
 920 CODE since the callee may do a tail full call.  This happens in the code
 921 emitted by DEFAULT-UNKNOWN-VALUES and RECEIVE-UNKNOWN-VALUES.  We use these
 922 return conventions since we don't know what kind of values the returner
 923 will give us.  This could happen due to a tail full call to an unknown
 924 function, or because the callee had different return points that returned
 925 various numbers of values.
 926
 927 when (current-nfp-tn VOP-self-pointer)   ;Get VOP self-pointer with
 928                                          ;DEFINE-VOP switch :vop-var.
 929   stack-temp <- NFP
 930 CFP <- value of VOP arg
 931 when (callee-nfp-tn <VOP info arg holding callee>)
 932   <where-callee-wants-NFP-tn>  <-  value of VOP arg
 933 <where-callee-wants-LRA-tn>  <-  compute LRA by adding an assemble-time
 934                                  determined constant to CODE.
 935 jump and run off VOP info arg holding start instruction for callee
 936
 937 <emit Lisp return address data-block>
 938 <case call convention
 939   known: do nothing
 940   call: default and move return values
 941   multiple: receive return values
 942 >
 943 when (current-nfp-tn VOP-self-pointer)
 944   NFP <- stack-temp
 945
 946 KNOWN-RETURN
 947
 948 CSP <- CFP
 949 when (current-nfp-tn VOP-self-pointer)
 950   ;; number stack grows down in memory.
 951   NSP <- NFP + number stack frame size for calls within the currently
 952                   compiling component
 953                times address units per word
 954 LIP <- calc target addr (value of VOP arg + skip over LRA header word)
 955 CFP <- value of VOP arg
 956 jump and run off LIP
 957
 958
 959
 960 \f
 961 \chapter{Standard Primitives}
 962
 963 \f
 964 \chapter{Customizing VMR Conversion}
 965
 966 Another way in which different implementations differ is in the relative cost
 967 of operations.  On machines without an integer multiply instruction, it may be
 968 desirable to convert multiplication by a constant into shifts and adds, while
 969 this is surely a bad idea on machines with hardware support for multiplication.
 970 Part of the tuning process for an implementation will be adding implementation
 971 dependent transforms and disabling undesirable standard transforms.
 972
 973 When practical, ICR transforms should be used instead of VMR generators, since
 974 transforms are more portable and less error-prone.  Note that the Lisp code
 975 need not be implementation independent: it may contain all sorts of
 976 sub-primitives and similar stuff.  Generally a function should be implemented
 977 using a transform instead of an VMR translator unless it cannot be implemented
 978 as a transform due to being totally evil or it is just as easy to implement as
 979 a translator because it is so simple.
 980
 981 \f
 982 \section{Constant Operands}
 983
 984 If the code emitted for a VOP when an argument is constant is very different
 985 than the non-constant case, then it may be desirable to special-case the
 986 operation in VMR conversion by emitting different VOPs.  An example would be if
 987 SVREF is only open-coded when the index is a constant, and turns into a miscop
 988 call otherwise.  We wouldn't want constant references to spuriously allocate
 989 all the miscop linkage registers on the off chance that the offset might not be
 990 constant.  See the :constant feature of VOP primitive type restrictions.
 991
 992 \f
 993 \section{Supporting Multiple Hardware Configurations}
 994
 995
 996 A winning way to change emitted code depending on the hardware configuration,
 997 i.e. what FPA is present is to do this using primitive types.  Note that the
 998 Primitive-Type function is VM supplied, and can look at any appropriate
 999 hardware configuration switches.  Short-Float can become 6881-Short-Float,
1000 AFPA-Short-Float, etc.  There would be separate SBs and SCs for the registers
1001 of each kind of FP hardware, with the each hardware-specific primitive type
1002 using the appropriate float register SC.  Then the hardware specific templates
1003 would provide AFPA-Short-Float as the argument type restriction.
1004
1005 Primitive type changes:
1006
1007 The primitive-type structure is given a new %Type slot, which is the CType
1008 structure that is equivalent to this type.  There is also a Guard slot, with,
1009 if true is a function that control whether this primitive type is allowed (due
1010 to hardware configuration, etc.)
1011
1012 We add new :Type and :Guard keywords to Def-Primitive-Type.  Type is the type
1013 specifier that is equivalent (default to the primitive-type name), and Guard is
1014 an expression evaluated in the null environment that controls whether this type
1015 applies (default to none, i.e. constant T).
1016
1017 The Primitive-Type-Type function returns the Lisp CType corresponding to a
1018 primitive type.  This is the %Type unless there is a guard that returns false,
1019 in which case it is the empty type (i.e. NIL).
1020
1021 [But this doesn't do what we want it to do, since we will compute the
1022 function type for a template at load-time, so they will correspond to whatever
1023 configuration was in effect then.  Maybe we don't want to dick with guards here
1024 (if at all).  I guess we can defer this issue until we actually support
1025 different FP configurations.  But it would seem pretty losing to separately
1026 flame about all the different FP configurations that could be used to open-code
1027 + whenever we are forced to closed-code +.
1028
1029 If we separately report each better possibly applicable template that we
1030 couldn't use, then it would be reasonable to report any conditional template
1031 allowed by the configuration.
1032
1033 But it would probably also be good to give some sort of hint that perhaps it
1034 would be a good time to make sure you understand how to tell the compiler to
1035 compile for a particular configuration.  Perhaps if there is a template that
1036 applies *but for the guard*, then we could give a note.  This way, if someone
1037 thinks they are being efficient by throwing in lots of declarations, we can let
1038 them know that they may have to do more.
1039
1040 I guess the guard should be associated with the template rather than the
1041 primitive type.  This would allow LTN and friends to easily tell whether a
1042 template applies in this configuration.  It is also probably more natural for
1043 some sorts of things: with some hardware variants, it may be that the SBs and
1044 representations (SCs) are really the same, but there some different allowed
1045 operations.  In this case, we could easily conditionalize VOPs without the
1046 increased complexity due to bogus SCs.  If there are different storage
1047 resources, then we would conditionalize Primitive-Type as well.
1048
1049
1050 \f
1051 \section{Special-case VMR convert methods}
1052
1053     (defun continuation-tn (cont \&optional (check-p t))
1054       ...)
1055 Return the TN which holds Continuation's first result value.  In general
1056 this may emit code to load the value into a TN.  If Check-P is true, then
1057 when policy indicates, code should be emitted to check that the value satisfies
1058 the continuation asserted type.
1059
1060     (defun result-tn (cont)
1061       ...)
1062 Return the TN that Continuation's first value is delivered in.  In general,
1063 may emit code to default any additional values to NIL.
1064
1065     (defun result-tns (cont n)
1066       ...)
1067 Similar to Result-TN, except that it returns a list of N result TNs, one
1068 for each of the first N values.
1069
1070
1071 Nearly all open-coded functions should be handled using standard template
1072 selection.  Some (all?) exceptions:
1073  -- List, List* and Vector take arbitrary numbers of arguments.  Could
1074     implement Vector as a source transform.  Could even do List in a transform
1075     if we explicitly represent the stack args using %More-Args or something.
1076  -- %Typep varies a lot depending on the type specifier.  We don't want to
1077     transform it, since we want %Typep as a canonical form so that we can do
1078     type optimizations.
1079  -- Apply is weird.
1080  -- Funny functions emitted by the compiler: %Listify-Rest-Args, Arg,
1081     %More-Args, %Special-Bind, %Catch, %Unknown-Values (?), %Unwind-Protect,
1082     %Unwind, %%Primitive.