doc/cmucl/internals/middle.tex

   1 % -*- Dictionary: design -*-
   2
   3 \f
   4 \chapter{Virtual Machine Representation Introduction}
   5
   6 \f
   7 \chapter{Global TN assignment}
   8
   9 [\#\#\# Rename this phase so as not to be confused with the local/global TN
  10 representation.]
  11
  12 The basic mechanism for closing over values is to pass the values as additional
  13 implicit arguments in the function call.  This technique is only applicable
  14 when:
  15  -- the calling function knows which values the called function wants to close
  16     over, and
  17  -- the values to be closed over are available in the calling environment.
  18
  19 The first condition is always true of local function calls.  Environment
  20 analysis can guarantee that the second condition holds by closing over any
  21 needed values in the calling environment.
  22
  23 If the function that closes over values may be called in an environment where
  24 the closed over values are not available, then we must store the values in a
  25 "closure" so that they are always accessible.  Closures are called using the
  26 "full call" convention.  When a closure is called, control is transferred to
  27 the "external entry point", which fetches the values out of the closure and
  28 then does a local call to the real function, passing the closure values as
  29 implicit arguments.
  30
  31 In this scheme there is no such thing as a "heap closure variable" in code,
  32 since the closure values are moved into TNs by the external entry point.  There
  33 is some potential for pessimization here, since we may end up moving the values
  34 from the closure into a stack memory location, but the advantages are also
  35 substantial.  Simplicity is gained by always representing closure values the
  36 same way, and functions with closure references may still be called locally
  37 without allocating a closure.  All the TN based VMR optimizations will apply
  38 to closure variables, since closure variables are represented in the same way
  39 as all other variables in VMR.  Closure values will be allocated in registers
  40 where appropriate.
  41
  42 Closures are created at the point where the function is referenced, eliminating
  43 the need to be able to close over closures.  This lazy creation of closures has
  44 the additional advantage that when a closure reference is conditionally not
  45 done, then the closure consing will never be done at all.  The corresponding
  46 disadvantage is that a closure over the same values may be created multiple
  47 times if there are multiple references.  Note however, that VMR loop and common
  48 subexpression optimizations can eliminate redundant closure consing.  In any
  49 case, multiple closures over the same variables doesn't seem to be that common.
  50
  51 \#|
  52 Having the Tail-Info would also make return convention determination trivial.
  53 We could just look at the type, checking to see if it represents a fixed number
  54 of values.  To determine if the standard return convention is necessary to
  55 preserve tail-recursion, we just iterate over the equivalent functions, looking
  56 for XEPs and uses in full calls.
  57 |\#
  58
  59 The Global TN Assignment pass (GTN) can be considered a post-pass to
  60 environment analysis.  This phase assigns the TNs used to hold local lexical
  61 variables and pass arguments and return values and determines the value-passing
  62 strategy used in local calls.
  63
  64 To assign return locations, we look at the function's tail-set.
  65
  66 If the result continuation for an entry point is used as the continuation for a
  67 full call, then we may need to constrain the continuation's values passing
  68 convention to the standard one.  This is not necessary when the call is known
  69 not to be part of a tail-recursive loop (due to being a known function).
  70
  71 Once we have figured out where we must use the standard value passing strategy,
  72 we can use a more flexible strategy to determine the return locations for local
  73 functions.  We determine the possible numbers of return values from each
  74 function by examining the uses of all the result continuations in the
  75 equivalence class of the result continuation.
  76
  77 If the tail-set type is for a fixed number of
  78 values, then we return that fixed number of values from all the functions whose
  79 result continuations are equated.  If the number of values is not fixed, then
  80 we must use the unknown-values convention, although we are not forced to use
  81 the standard locations.  We assign the result TNs at this time.
  82
  83 We also use the tail-sets to see what convention we want to use.  What we do is
  84 use the full convention for any function that has a XEP its tail-set, even if
  85 we aren't required to do so by a tail-recursive full call, as long as there are
  86 no non-tail-recursive local calls in the set.  This prevents us from
  87 gratuitously using a non-standard convention when there is no reason to.
  88
  89 \f
  90 \chapter{Local TN assignment}
  91
  92 [Want a different name for this so as not to be confused with the different
  93 local/global TN representations.  The really interesting stuff in this phase is
  94 operation selection, values representation selection, return strategy, etc.
  95 Maybe this phase should be conceptually lumped with GTN as "implementation
  96 selection", since GTN determines call strategies and locations.]
  97
  98 \#|
  99
 100 [\#\#\# I guess I believe that it is OK for VMR conversion to dick the ICR flow
 101 graph.  An alternative would be to give VMR its very own flow graph, but that
 102 seems like overkill.
 103
 104 In particular, it would be very nice if a TR local call looked exactly like a
 105 jump in VMR.  This would allow loop optimizations to be done on loops written
 106 as recursions.  In addition to making the call block transfer to the head of
 107 the function rather than to the return, we would also have to do something
 108 about skipping the part of the function prolog that moves arguments from the
 109 passing locations, since in a TR call they are already in the right frame.
 110
 111
 112 In addition to directly indicating whether a call should be coded with a TR
 113 variant, the Tail-P annotation flags non-call nodes that can directly return
 114 the value (an "advanced return"), rather than moving the value to the result
 115 continuation and jumping to the return code.  Then (according to policy), we
 116 can decide to advance all possible returns.  If all uses of the result are
 117 Tail-P, then LTN can annotate the result continuation as :Unused, inhibiting
 118 emission of the default return code.
 119
 120 [\#\#\# But not really.  Now there is a single list of templates, and a given
 121 template has only one policy.]
 122
 123 In LTN, we use the :Safe template as a last resort even when the policy is
 124 unsafe.  Note that we don't try :Fast-Safe; if this is also a good unsafe
 125 template, then it should have the unsafe policies explicitly specified.
 126
 127 With a :Fast-Safe template, the result type must be proven to satisfy the
 128 output type assertion.  This means that a fast-safe template with a fixnum
 129 output type doesn't need to do fixnum overflow checking.  [\#\#\# Not right to
 130 just check against the Node-Derived-Type, since type-check intersects with
 131 this.]
 132
 133 It seems that it would be useful to have a kind of template where the args must
 134 be checked to be fixnum, but the template checks for overflow and signals an
 135 error.  In the case where an output assertion is present, this would generate
 136 better code than conditionally branching off to make a bignum, and then doing a
 137 type check on the result.
 138
 139     How do we deal with deciding whether to do a fixnum overflow check?  This
 140     is perhaps a more general problem with the interpretation of result type
 141     restrictions in templates.  It would be useful to be able to discriminate
 142     between the case where the result has been proven to be a fixnum and where
 143     it has simply been asserted to be so.
 144
 145     The semantics of result type restriction is that the result must be proven
 146     to be of that type *except* for safe generators, which are assumed to
 147     verify the assertion.  That way "is-fixnum" case can be a fast-safe
 148     generator and the "should-be-fixnum" case is a safe generator.  We could
 149     choose not to have a safe "should-be-fixnum" generator, and let the
 150     unrestricted safe generator handle it.  We would then have to do an
 151     explicit type check on the result.
 152
 153     In other words, for all template except Safe, a type restriction on either
 154     an argument or result means "this must be true; if it is not the system may
 155     break."  In contrast, in a Safe template, the restriction means "If this is
 156     not true, I will signal an error."
 157
 158     Since the node-derived-type only takes into consideration stuff that can be
 159     proved from the arguments, we can use the node-derived-type to select
 160     fast-safe templates.  With unsafe policies, we don't care, since the code
 161     is supposed to be unsafe.
 162
 163 |\#
 164
 165 Local TN assignment (LTN) assigns all the TNs needed to represent the values of
 166 continuations.  This pass scans over the code for the component, examining each
 167 continuation and its destination.  A number of somewhat unrelated things are
 168 also done at the same time so that multiple passes aren't necessary.
 169  -- Determine the Primitive-Type for each continuation value and assigns TNs
 170     to hold the values.
 171  -- Use policy information to determine the implementation strategy for each
 172     call to a known function.
 173  -- Clear the type-check flags in continuations whose destinations have safe
 174     implementations.
 175  -- Determine the value-passing strategy for each continuation: known or
 176     unknown.
 177  -- Note usage of unknown-values continuations so that stack analysis can tell
 178     when stack values must be discarded.
 179
 180 If safety is more important that speed and space, then we consider generating
 181 type checks on the values of nodes whose CONT has the Type-Check flag set.  If
 182 the destinatation for the continuation value is safe, then we don't need to do
 183 a check.  We assume that all full calls are safe, and use the template
 184 information to determine whether inline operations are safe.
 185
 186 This phase is where compiler policy switches have most of their effect.  The
 187 speed/space/safety tradeoff can determine which of a number of coding
 188 strategies are used.  It is important to make the policy choice in VMR
 189 conversion rather than in code generation because the cost and storage
 190 requirement information which drives TNBIND will depend strongly on what actual
 191 VOP is chosen.  In the case of +/FIXNUM, there might be three or more
 192 implementations, some optimized for speed, some for space, etc.  Some of these
 193 VOPS might be open-coded and some not.
 194
 195 We represent the implementation strategy for a call by either marking it as a
 196 full call or annotating it with a "template" representing the open-coding
 197 strategy.  Templates are selected using a two-way dispatch off of operand
 198 primitive-types and policy.  The general case of LTN is handled by the
 199 LTN-Annotate function in the function-info, but most functions are handled by a
 200 table-driven mechanism.  There are four different translation policies that a
 201 template may have:
 202 \begin{description}
 203 \item[Safe]
 204         The safest implementation; must do argument type checking.
 205
 206 \item[Small]
 207         The (unsafe) smallest implementation.
 208
 209 \item[Fast]
 210         The (unsafe) fastest implementation.
 211
 212 \item[Fast-Safe]
 213         An implementation optimized for speed, but which does any necessary
 214         checks exclusive of argument type checking.  Examples are array bounds
 215         checks and fixnum overflow checks.
 216 \end{description}
 217
 218 Usually a function will have only one or two distinct templates.  Either or
 219 both of the safe and fast-safe templates may be omitted; if both are specified,
 220 then they should be distinct.  If there is no safe template and our policy is
 221 safe, then we do a full call.
 222
 223 We use four different coding strategies, depending on the policy:
 224 \begin{description}
 225 \item[Safe:]  safety $>$ space $>$ speed, or
 226 we want to use the fast-safe template, but there isn't one.
 227
 228 \item[Small:] space $>$ (max speed safety)
 229
 230 \item[Fast:] speed $>$ (max space safety)
 231
 232 \item[Fast-Safe (and type check):] safety $>$ speed $>$ space, or we want to use
 233 the safe template, but there isn't one.
 234 \end{description}
 235
 236 ``Space'' above is actually the maximum of space and cspeed, under the theory
 237 that less code will take less time to generate and assemble.  [\#\#\# This could
 238 lose if the smallest case is out-of-line, and must allocate many linkage
 239 registers.]
 240
 241 \f
 242 \chapter{Control optimization}
 243
 244 In this phase we annotate blocks with drop-throughs.  This controls how code
 245 generation linearizes code so that drop-throughs are used most effectively.  We
 246 totally linearize the code here, allowing code generation to scan the blocks
 247 in the emit order.
 248
 249 There are basically two aspects to this optimization:
 250  1] Dynamically reducing the number of branches taken v.s. branches not
 251     taken under the assumption that branches not taken are cheaper.
 252  2] Statically minimizing the number of unconditional branches, saving space
 253     and presumably time.
 254
 255 These two goals can conflict, but if they do it seems pretty clear that the
 256 dynamic optimization should get preference.  The main dynamic optimization is
 257 changing the sense of a conditional test so that the more commonly taken branch
 258 is the fall-through case.  The problem is determining which branch is more
 259 commonly taken.
 260
 261 The most clear-cut case is where one branch leads out of a loop and the other
 262 is within.  In this case, clearly the branch within the loop should be
 263 preferred.  The only added complication is that at some point in the loop there
 264 has to be a backward branch, and it is preferable for this branch to be
 265 conditional, since an unconditional branch is just a waste of time.
 266
 267 In the absence of such good information, we can attempt to guess which branch
 268 is more popular on the basis of difference in the cost between the two cases.
 269 Min-max strategy suggests that we should choose the cheaper alternative, since
 270 the percentagewise improvement is greater when the branch overhead is
 271 significant with respect to the cost of the code branched to.  A tractable
 272 approximation of this is to compare only the costs of the two blocks
 273 immediately branched to, since this would avoid having to do any hairy graph
 274 walking to find all the code for the consequent and the alternative.  It might
 275 be worthwhile discriminating against ultra-expensive functions such as ERROR.
 276
 277 For this to work, we have to detect when one of the options is empty.  In this
 278 case, the next for one branch is a successor of the other branch, making the
 279 comparison meaningless.  We use dominator information to detect this situation.
 280 When a branch is empty, one of the predecessors of the first block in the empty
 281 branch will be dominated by the first block in the other branch.  In such a
 282 case we favor the empty branch, since that's about as cheap as you can get.
 283
 284 Statically minimizing branches is really a much more tractable problem, but
 285 what literature there is makes it look hard.  Clearly the thing to do is to use
 286 a non-optimal heuristic algorithm.
 287
 288 A good possibility is to use an algorithm based on the depth first ordering.
 289 We can modify the basic DFO algorithm so that it chooses an ordering which
 290 favors any drop-thrus that we may choose for dynamic reasons.  When we are
 291 walking the graph, we walk the desired drop-thru arc last, which will place it
 292 immediately after us in the DFO unless the arc is a retreating arc.
 293
 294 We scan through the DFO and whenever we find a block that hasn't been done yet,
 295 we build a straight-line segment by setting the drop-thru to the unreached
 296 successor block which has the lowest DFN greater than that for the block.  We
 297 move to the drop-thru block and repeat the process until there is no such
 298 block.  We then go back to our original scan through the DFO, looking for the
 299 head of another straight-line segment.
 300
 301 This process will automagically implement all of the dynamic optimizations
 302 described above as long as we favor the appropriate IF branch when creating the
 303 DFO.  Using the DFO will prevent us from making the back branch in a loop the
 304 drop-thru, but we need to be clever about favoring IF branches within loops
 305 while computing the DFO.  The IF join will be favored without any special
 306 effort, since we follow through the most favored path until we reach the end.
 307
 308 This needs some knowledge about the target machine, since on most machines
 309 non-tail-recursive calls will use some sort of call instruction.  In this case,
 310 the call actually wants to drop through to the return point, rather than
 311 dropping through to the beginning of the called function.
 312
 313 \f
 314 \chapter{VMR conversion}
 315
 316 \#|
 317 Single-use let var continuation substitution not really correct, since it can
 318 cause a spurious type error.  Maybe we do want stuff to prove that an NLX can't
 319 happen after all.  Or go back to the idea of moving a combination arg to the
 320 ref location, and having that use the ref cont (with its output assertion.)
 321 This lossage doesn't seem very likely to actually happen, though.
 322 [\#\#\# must-reach stuff wouldn't work quite as well as combination substitute in
 323 psetq, etc., since it would fail when one of the new values is random code
 324 (might unwind.)]
 325
 326 Is this really a general problem with eager type checking?  It seems you could
 327 argue that there was no type error in this code:
 328     (+ :foo (throw 'up nil))
 329 But we would signal an error.
 330
 331
 332 Emit explicit you-lose operation when we do a move between two non-T ptypes,
 333 even when type checking isn't on.  Can this really happen?  Seems we should
 334 treat continuations like this as though type-check was true.  Maybe LTN should
 335 leave type-check true in this case, even when the policy is unsafe.  (Do a type
 336 check against NIL?)
 337
 338 At continuation use time, we may in general have to do both a coerce-to-t and a
 339 type check, allocating two temporary TNs to hold the intermediate results.
 340
 341
 342 VMR Control representation:
 343
 344 We represent all control transfer explicitly.  In particular, :Conditional VOPs
 345 take a single Target continuation and a Not-P flag indicating whether the sense
 346 of the test is negated.  Then an unconditional Branch VOP will be emitted
 347 afterward if the other path isn't a drop-through.
 348
 349 So we linearize the code before VMR-conversion.  This isn't a problem,
 350 since there isn't much change in control flow after VMR conversion (none until
 351 loop optimization requires introduction of header blocks.)  It does make
 352 cost-based branch prediction a bit ucky, though, since we don't have any cost
 353 information in ICR.  Actually, I guess we do have pretty good cost information
 354 after LTN even before VMR conversion, since the most important thing to know is
 355 which functions are open-coded.
 356
 357 |\#
 358
 359 VMR preserves the block structure of ICR, but replaces the nodes with a target
 360 dependent virtual machine (VM) representation.  Different implementations may
 361 use different VMs without making major changes in the back end.  The two main
 362 components of VMR are Temporary Names (TNs) and Virtual OPerations (VOPs).  TNs
 363 represent the locations that hold values, and VOPs represent the operations
 364 performed on the values.
 365
 366 A "primitive type" is a type meaningful at the VM level.  Examples are Fixnum,
 367 String-Char, Short-Float.  During VMR conversion we use the primitive type of
 368 an expression to determine both where we can store the result of the expression
 369 and which type-specific implementations of an operation can be applied to the
 370 value.  [Ptype is a set of SCs == representation choices and representation
 371 specific operations]
 372
 373 The VM specific definitions provide functions that do stuff like find the
 374 primitive type corresponding to a type and test for primitive type subtypep.
 375 Usually primitive types will be disjoint except for T, which represents all
 376 types.
 377
 378 The primitive type T is special-cased.  Not only does it overlap with all the
 379 other types, but it implies a descriptor ("boxed" or "pointer") representation.
 380 For efficiency reasons, we sometimes want to use
 381 alternate representations for some objects such as numbers.  The majority of
 382 operations cannot exploit alternate representations, and would only be
 383 complicated if they had to be able to convert alternate representations into
 384 descriptors.  A template can require an operand to be a descriptor by
 385 constraining the operand to be of type T.
 386
 387 A TN can only represent a single value, so we bare the implementation of MVs at
 388 this point.  When we know the number of multiple values being handled, we use
 389 multiple TNs to hold them.  When the number of values is actually unknown, we
 390 use a convention that is compatible with full function call.
 391
 392 Everything that is done is done by a VOP in VMR.  Calls to simple primitive
 393 functions such as + and CAR are translated to VOP equivalents by a table-driven
 394 mechanism.  This translation is specified by the particular VM definition; VMR
 395 conversion makes no assumptions about which operations are primitive or what
 396 operand types are worth special-casing.  The default calling mechanisms and
 397 other miscellaneous builtin features are implemented using standard VOPs that
 398 must implemented by each VM.
 399
 400 Type information can be forgotten after VMR conversion, since all type-specific
 401 operation selections have been made.
 402
 403 Simple type checking is explicitly done using CHECK-xxx VOPs.  They act like
 404 innocuous effectless/unaffected VOPs which return the checked thing as a
 405 result.  This allows loop-invariant optimization and common subexpression
 406 elimination to remove redundant checks.  All type checking is done at the time
 407 the continuation is used.
 408
 409 Note that we need only check asserted types, since if type inference works, the
 410 derived types will also be satisfied.  We can check whichever is more
 411 convenient, since both should be true.
 412
 413 Constants are turned into special Constant TNs, which are wired down in a SC
 414 that is determined by their type.  The VM definition provides a function that
 415 returns constant a TN to represent a Constant Leaf.
 416
 417 Each component has a constant pool.  There is a register dedicated to holding
 418 the constant pool for the current component.  The back end allocates
 419 non-immediate constants in the constant pool when it discovers them during
 420 translation from ICR.
 421
 422 [\#\#\# Check that we are describing what is actually implemented.  But this
 423 really isn't very good in the presence of interesting unboxed
 424 representations...]
 425 Since LTN only deals with values from the viewpoint of the receiver, we must be
 426 prepared during the translation pass to do stuff to the continuation at the
 427 time it is used.
 428  -- If a VOP yields more values than are desired, then we must create TNs to
 429     hold the discarded results.  An important special-case is continuations
 430     whose value is discarded.  These continuations won't be annotated at all.
 431     In the case of a Ref, we can simply skip evaluation of the reference when
 432     the continuation hasn't been annotated.  Although this will eliminate
 433     bogus references that for some reason weren't optimized away, the real
 434     purpose is to handle deferred references.
 435  -- If a VOP yields fewer values than desired, then we must default the extra
 436     values to NIL.
 437  -- If a continuation has its type-check flag set, then we must check the type
 438     of the value before moving it into the result location.  In general, this
 439     requires computing the result in a temporary, and having the type-check
 440     operation deliver it in the actual result location.
 441  -- If the template's result type is T, then we must generate a boxed
 442     temporary to compute the result in when the continuation's type isn't T.
 443
 444
 445 We may also need to do stuff to the arguments when we generate code for a
 446 template.  If an argument continuation isn't annotated, then it must be a
 447 deferred reference.  We use the leaf's TN instead.  We may have to do any of
 448 the above use-time actions also.  Alternatively, we could avoid hair by not
 449 deferring references that must be type-checked or may need to be boxed.
 450
 451 \f
 452 \section{Stack analysis}
 453
 454 Think of this as a lifetime problem: a values generator is a write and a values
 455 receiver is a read.  We want to annotate each VMR-Block with the unknown-values
 456 continuations that are live at that point.  If we do a control transfer to a
 457 place where fewer continuations are live, then we must deallocate the newly
 458 dead continuations.
 459
 460 We want to convince ourselves that values deallocation based on lifetime
 461 analysis actually works.  In particular, we need to be sure that it doesn't
 462 violate the required stack discipline.  It is clear that it is impossible to
 463 deallocate the values before they become dead, since later code may decide to
 464 use them.  So the only thing we need to ensure is that the "right" time isn't
 465 later than the time that the continuation becomes dead.
 466
 467 The only reason why we couldn't deallocate continuation A as soon as it becomes
 468 dead would be that there is another continuation B on top of it that isn't dead
 469 (since we can only deallocate the topmost continuation).
 470
 471 The key to understanding why this can't happen is that each continuation has
 472 only one read (receiver).  If B is on top of A, then it must be the case that A
 473 is live at the receiver for B.  This means that it is impossible for B to be
 474 live without A being live.
 475
 476
 477 The reason that we don't solve this problem using a normal iterative flow
 478 analysis is that we also need to know the ordering of the continuations on the
 479 stack so that we can do deallocation.  When it comes time to discard values, we
 480 want to know which discarded continuation is on the bottom so that we can reset
 481 SP to its start.
 482
 483 [I suppose we could also decrement SP by the aggregate size of the discarded
 484 continuations.]  Another advantage of knowing the order in which we expect
 485 continuations to be on the stack is that it allows us to do some consistency
 486 checking.  Also doing a localized graph walk around the values-receiver is
 487 likely to be much more efficient than doing an iterative flow analysis problem
 488 over all the code in the component (not that big a consideration.)
 489
 490
 491
 492 \#|
 493 Actually, what we do is do a backward graph walk from each unknown-values
 494 receiver.   As we go, we mark each walked block with ther ordered list of
 495 continuations we believe are on the stack.  Starting with an empty stack, we:
 496  -- When we encounter another unknown-values receiver, we push that
 497     continuation on our simulated stack.
 498  -- When we encounter a receiver (which had better be for the topmost
 499     continuation), we pop that continuation.
 500  -- When we pop all continuations, we terminate our walk.
 501
 502 [\#\#\# not quite right...  It seems we may run into "dead values" during the
 503 graph walk too.  It seems that we have to check if the pushed continuation is
 504 on stack top, and if not, add it to the ending stack so that the post-pass will
 505 discard it.]
 506
 507
 508
 509 [\#\#\# Also, we can't terminate our walk just because we hit a block previously
 510 walked.  We have to compare the the End-Stack with the values received along
 511 the current path: if we have more values on our current walk than on the walk
 512 that last touched the block, then we need to re-walk the subgraph reachable
 513 from from that block, using our larger set of continuations.  It seems that our
 514 actual termination condition is reaching a block whose End-Stack is already EQ
 515 to our current stack.]
 516
 517
 518
 519
 520
 521 If at the start, the block containing the values receiver has already been
 522 walked, the we skip the walk for that continuation, since it has already been
 523 handled by an enclosing values receiver.  Once a walk has started, we
 524 ignore any signs of a previous walk, clobbering the old result with our own,
 525 since we enclose that continuation, and the previous walk doesn't take into
 526 consideration the fact that our values block underlies its own.
 527
 528 When we are done, we have annotated each block with the stack current both at
 529 the beginning and at the end of that block.  Blocks that aren't walked don't
 530 have anything on the stack either place (although they may hack MVs
 531 internally).
 532
 533 We then scan all the blocks in the component, looking for blocks that have
 534 predecessors with a different ending stack than that block's starting stack.
 535 (The starting stack had better be a tail of the predecessor's ending stack.)
 536 We insert a block intervening between all of these predecessors that sets SP to
 537 the end of the values for the continuation that should be on stack top.  Of
 538 course, this pass needn't be done if there aren't any global unknown MVs.
 539
 540 Also, if we find any block that wasn't reached during the walk, but that USEs
 541 an outside unknown-values continuation, then we know that the DEST can't be
 542 reached from this point, so the values are unused.  We either insert code to
 543 pop the values, or somehow mark the code to prevent the values from ever being
 544 pushed.  (We could cause the popping to be done by the normal pass if we
 545 iterated over the pushes beforehand, assigning a correct END-STACK.)
 546
 547 [\#\#\# But I think that we have to be a bit clever within blocks, given the
 548 possibility of blocks being joined.  We could collect some unknown MVs in a
 549 block, then do a control transfer out of the receiver, and this control
 550 transfer could be squeezed out by merging blocks.  How about:
 551
 552     (tagbody
 553       (return
 554        (multiple-value-prog1 (foo)
 555          (when bar
 556            (go UNWIND))))
 557
 558      UNWIND
 559       (return
 560        (multiple-value-prog1 (baz)
 561          bletch)))
 562
 563 But the problem doesn't happen here (can't happen in general?) since a node
 564 buried within a block can't use a continuation outside of the block.  In fact,
 565 no block can have more then one PUSH continuation, and this must always be be
 566 last continuation.  So it is trivially (structurally) true that all pops come
 567 before any push.
 568
 569 [\#\#\# But not really: the DEST of an embedded continuation may be outside the
 570 block.  There can be multiple pushes, and we must find them by iterating over
 571 the uses of MV receivers in LTN.  But it would be hard to get the order right
 572 this way.  We could easily get the order right if we added the generators as we
 573 saw the uses, except that we can't guarantee that the continuations will be
 574 annotated at that point.  (Actually, I think we only need the order for
 575 consistency checks, but that is probably worthwhile).  I guess the thing to do
 576 is when we process the receiver, add the generator blocks to the
 577 Values-Generators, then do a post-pass that re-scans the blocks adding the
 578 pushes.]
 579
 580 I believe that above concern with a dead use getting mashed inside a block
 581 can't happen, since the use inside the block must be the only use, and if the
 582 use isn't reachable from the push, then the use is totally unreachable, and
 583 should have been deleted, which would prevent the prevent it from ever being
 584 annotated.
 585 ]
 586 ]
 587 |\#
 588
 589 We find the partial ordering of the values globs for unknown values
 590 continuations in each environment.  We don't have to scan the code looking for
 591 unknown values continuations since LTN annotates each block with the
 592 continuations that were popped and not pushed or pushed and not popped.  This
 593 is all we need to do the inter-block analysis.
 594
 595 After we have found out what stuff is on the stack at each block boundary, we
 596 look for blocks with predecessors that have junk on the stack.  For each such
 597 block, we introduce a new block containing code to restore the stack pointer.
 598 Since unknown-values continuations are represented as <start, count>, we can
 599 easily pop a continuation using the Start TN.
 600
 601 Note that there is only doubt about how much stuff is on the control stack,
 602 since only it is used for unknown values.  Any special stacks such as number
 603 stacks will always have a fixed allocation.
 604
 605 \f
 606 \section{Non-local exit}
 607
 608
 609 If the starting and ending continuations are not in the same environment, then
 610 the control transfer is a non-local exit.  In this case just call Unwind with
 611 the appropriate stack pointer, and let the code at the re-entry point worry
 612 about fixing things up.
 613
 614 It seems like maybe a good way to organize VMR conversion of NLX would be to
 615 have environment analysis insert funny functions in new interposed cleanup
 616 blocks.  The thing is that we need some way for VMR conversion to:
 617  1] Get its hands on the returned values.
 618  2] Do weird control shit.
 619  3] Deliver the values to the original continuation destination.
 620 I.e. we need some way to interpose arbitrary code in the path of value
 621 delivery.
 622
 623 What we do is replace the NLX uses of the continuation with another
 624 continuation that is received by a MV-Call to %NLX-VALUES in a cleanup block
 625 that is interposed between the NLX uses and the old continuation's block.  The
 626 MV-Call uses the original continuation to deliver it's values to.
 627
 628 [Actually, it's not really important that this be an MV-Call, since it has to
 629 be special-cased by LTN anyway.  Or maybe we would want it to be an MV call.
 630 If did normal LTN analysis of an MV call, it would force the returned values
 631 into the unknown values convention, which is probably pretty convenient for use
 632 in NLX.
 633
 634 Then the entry code would have to use some special VOPs to receive the unknown
 635 values.  But we probably need special VOPs for NLX entry anyway, and the code
 636 can share with the call VOPs.  Also we probably need the technology anyway,
 637 since THROW will use truly unknown values.]
 638
 639
 640 On entry to a dynamic extent that has non-local-exists into it (always at an
 641 ENTRY node), we take a complete snapshot of the dynamic state:
 642     the top pointers for all stacks
 643     current Catch and Unwind-Protect
 644     current special binding (binding stack pointer in shallow binding)
 645
 646 We insert code at the re-entry point which restores the saved dynamic state.
 647 All TNs live at a NLX EP are forced onto the stack, so we don't have to restore
 648 them, and we don't have to worry about getting them saved.
 649