doc/cmucl/internals/front.tex

   1 \chapter{ICR conversion} % -*- Dictionary: design -*-
   2
   3
   4 \f
   5 \section{Canonical forms}
   6
   7 \#|
   8
   9 Would be useful to have a Freeze-Type proclamation.  Its primary use would to
  10 be say that the indicated type won't acquire any new subtypes in the future.
  11 This allows better open-coding of structure type predicates, since the possible
  12 types that would satisfy the predicate will be constant at compile time, and
  13 thus can be compiled as a skip-chain of EQ tests.
  14
  15 Of course, this is only a big win when the subtypes are few: the most important
  16 case is when there are none.  If the closure of the subtypes is much larger
  17 than the average number of supertypes of an inferior, then it is better to grab
  18 the list of superiors out of the object's type, and test for membership in that
  19 list.
  20
  21 Should type-specific numeric equality be done by EQL rather than =?  i.e.
  22 should = on two fixnums become EQL and then convert to EQL/FIXNUM?
  23 Currently we transform EQL into =, which is complicated, since we have to prove
  24 the operands are the class of numeric type before we do it.  Also, when EQL
  25 sees one operand is a FIXNUM, it transforms to EQ, but the generator for EQ
  26 isn't expecting numbers, so it doesn't use an immediate compare.
  27
  28
  29 Array hackery:
  30
  31
  32 Array type tests are transformed to %array-typep, separation of the
  33 implementation-dependent array-type handling.  This way we can transform
  34 STRINGP to:
  35      (or (simple-string-p x)
  36          (and (complex-array-p x)
  37               (= (array-rank x) 1)
  38               (simple-string-p (%array-data x))))
  39
  40 In addition to the similar bit-vector-p, we also handle vectorp and any type
  41 tests on which the dimension isn't wild.
  42 [Note that we will want to expand into frobs compatible with those that
  43 array references expand into so that the same optimizations will work on both.]
  44
  45 These changes combine to convert hairy type checks into hairy typep's, and then
  46 convert hairyp typeps into simple typeps.
  47
  48
  49 Do we really need non-VOP templates?  It seems that we could get the desired
  50 effect through implementation-dependent ICR transforms.  The main risk would be
  51 of obscuring the type semantics of the code.  We could fairly easily retain all
  52 the type information present at the time the tranform is run, but if we
  53 discover new type information, then it won't be propagated unless the VM also
  54 supplies type inference methods for its internal frobs (precluding the use of
  55 %PRIMITIVE, since primitives don't have derive-type methods.)
  56
  57 I guess one possibility would be to have the call still considered "known" even
  58 though it has been transformed.  But this doesn't work, since we start doing
  59 LET optimizations that trash the arglist once the call has been transformed
  60 (and indeed we want to.)
  61
  62 Actually, I guess the overhead for providing type inference methods for the
  63 internal frobs isn't that great, since we can usually borrow the inference
  64 method for a Common Lisp function.  For example, in our AREF case:
  65     (aref x y)
  66 ==>
  67     (let ((\#:len (array-dimension x 0)))
  68       (%unchecked-aref x (%check-in-bounds y \#:len)))
  69
  70 Now in this case, if we made %UNCHECKED-AREF have the same derive-type method
  71 as AREF, then if we discovered something new about X's element type, we could
  72 derive a new type for the entire expression.
  73
  74 Actually, it seems that baring this detail at the ICR level is beneficial,
  75 since it admits the possibly of optimizing away the bounds check using type
  76 information.  If we discover X's dimensions, then \#:LEN becomes a constant that
  77 can be substituted.  Then %CHECK-IN-BOUNDS can notice that the bound is
  78 constant and check it against the type for Y.  If Y is known to be in range,
  79 then we can optimize away the bounds check.
  80
  81 Actually in this particular case, the best thing to do would be if we
  82 discovered the bound is constant, then replace the bounds check with an
  83 implicit type check.  This way all the type check optimization mechanisms would
  84 be brought into the act.
  85
  86 So we actually want to do the bounds-check expansion as soon as possible,
  87 rather than later than possible: it should be a source-transform, enabled by
  88 the fast-safe policy.
  89
  90 With multi-dimensional arrays we probably want to explicitly do the index
  91 computation: this way portions of the index computation can become loop
  92 invariants.  In a scan in row-major order, the inner loop wouldn't have to do
  93 any multiplication: it would only do an addition.  We would use normal
  94 fixnum arithmetic, counting on * to cleverly handle multiplication by a
  95 constant, and appropriate inline expansion.
  96
  97 Note that in a source transform, we can't make any assumptions the type of the
  98 array.  If it turns out to be a complex array without declared dimensions, then
  99 the calls to ARRAY-DIMENSION will have to turn into a VOP that can be affected.
 100 But if it is simple, then the VOP is unaffected, and if we know the bounds, it
 101 is constant.  Similarly, we would have %ARRAY-DATA and %ARRAY-DISPLACEMENT
 102 operations.  %ARRAY-DISPLACEMENT would optimize to 0 if we discover the array
 103 is simple.  [This is somewhat inefficient when the array isn't eventually
 104 discovered to be simple, since finding the data and finding the displacement
 105 duplicate each other.  We could make %ARRAY-DATA return both as MVs, and then
 106 optimize to (VALUES (%SIMPLE-ARRAY-DATA x) 0), but this would require
 107 optimization of trivial VALUES uses.]
 108
 109 Also need (THE (ARRAY * * * ...) x) to assert correct rank.
 110
 111 |\#
 112
 113 A bunch of functions have source transforms that convert them into the
 114 canonical form that later parts of the compiler want to see.  It is not legal
 115 to rely on the canonical form since source transforms can be inhibited by a
 116 Notinline declaration.  This shouldn't be a problem, since everyone should keep
 117 their hands off of Notinline calls.
 118
 119 Some transformations:
 120
 121 Endp  ==>  (NULL (THE LIST ...))
 122 (NOT xxx) or (NULL xxx) => (IF xxx NIL T)
 123
 124 (typep x '<simple type>) => (<simple predicate> x)
 125 (typep x '<complex type>) => ...composition of simpler operations...
 126 TYPEP of AND, OR and NOT types turned into conditionals over multiple TYPEP
 127 calls.  This makes hairy TYPEP calls more digestible to type constraint
 128 propagation, and also means that the TYPEP code generators don't have to deal
 129 with these cases.  [\#\#\# In the case of union types we may want to do something
 130 to preserve information for type constraint propagation.]
 131
 132
 133     (apply \#'foo a b c)
 134 ==>
 135     (multiple-value-call \#'foo (values a) (values b) (values-list c))
 136
 137 This way only MV-CALL needs to know how to do calls with unknown numbers of
 138 arguments.  It should be nearly as efficient as a special-case VMR-Convert
 139 method could be.
 140
 141
 142 Make-String => Make-Array
 143 N-arg predicates associated into two-arg versions.
 144 Associate N-arg arithmetic ops.
 145 Expand CxxxR and FIRST...nTH
 146 Zerop, Plusp, Minusp, 1+, 1-, Min, Max, Rem, Mod
 147 (Values x), (Identity x) => (Prog1 x)
 148
 149 All specialized aref functions => (aref (the xxx) ...)
 150
 151 Convert (ldb (byte ...) ...) into internal frob that takes size and position as
 152 separate args.  Other byte functions also...
 153
 154 Change for-value primitive predicates into (if <pred> t nil).  This isn't
 155 particularly useful during ICR phases, but makes life easy for VMR conversion.
 156
 157 This last can't be a source transformation, since a source transform can't tell
 158 where the form appears.  Instead, ICR conversion special-cases calls to known
 159 functions with the Predicate attribute by doing the conversion when the
 160 destination of the result isn't an IF.  It isn't critical that this never be
 161 done for predicates that we ultimately discover to deliver their value to an
 162 IF, since IF optimizations will flush unnecessary IFs in a predicate.
 163
 164 \f
 165 \section{Inline functions}
 166
 167 [\#\#\# Inline expansion is especially powerful in the presence of good lisp-level
 168 optimization ("partial evaluation").  Many "optimizations" usually done in Lisp
 169 compilers by special-case source-to-source transforms can be had simply by
 170 making the source of the general case function available for inline expansion.
 171 This is especially helpful in Common Lisp, which has many commonly used
 172 functions with simple special cases but bad general cases (list and sequence
 173 functions, for example.)
 174
 175 Inline expansion of recursive functions is allowed, and is not as silly as it
 176 sounds.  When expanded in a specific context, much of the overhead of the
 177 recursive calls may be eliminated (especially if there are many keyword
 178 arguments, etc.)
 179
 180 [Also have MAYBE-INLINE]
 181 ]
 182
 183 We only record a function's inline expansion in the global environment when the
 184 function is in the null lexical environment, since it the expansion must be
 185 represented as source.
 186
 187 We do inline expansion of functions locally defined by FLET or LABELS even when
 188 the environment is not null.  Since the appearances of the local function must
 189 be nested within the desired environment, it is possible to expand local
 190 functions inline even when they use the environment.  We just stash the source
 191 form and environments in the Functional for the local function.  When we
 192 convert a call to it, we just reconvert the source in the saved environment.
 193
 194 An interesting alternative to the inline/full-call dichotomy is "semi-inline"
 195 coding.  Whenever we have an inline expansion for a function, we can expand it
 196 only once per block compilation, and then use local call to call this copied
 197 version.  This should get most of the speed advantage of real inline coding
 198 with much less code bloat.  This is especially attractive for simple system
 199 functions such as Read-Char.
 200
 201 The main place where true inline expansion would still be worth doing is where
 202 large amounts of the function could be optimized away by constant folding or
 203 other optimizations that depend on the exact arguments to the call.
 204
 205
 206 \f
 207 \section{Compilation policy}
 208
 209 We want more sophisticated control of compilation safety than is offered in CL,
 210 so that we can emit only those type checks that are likely to discover
 211 something (i.e. external interfaces.)
 212
 213 \#|
 214
 215 \f
 216 \section{Notes}
 217
 218 Generalized back-end notion provides dynamic retargeting?  (for byte code)
 219
 220 The current node type annotations seem to be somewhat unsatisfactory, since we
 221 lose information when we do a THE on a continuation that already has uses, or
 222 when we convert a let where the actual result continuation has other uses.
 223
 224 But the case with THE isn't really all that bad, since the test of whether
 225 there are any uses happens before conversion of the argument, thus THE loses
 226 information only when there are uses outside of the declared form.  The LET
 227 case may not be a big deal either.
 228
 229 Note also that losing user assertions isn't really all that bad, since it won't
 230 damage system integrity.  At worst, it will cause a bug to go undetected.  More
 231 likely, it will just cause the error to be signaled in a different place (and
 232 possibly in a less informative way).  Of course, there is an efficiency hit for
 233 losing type information, but if it only happens in strange cases, then this
 234 isn't a big deal.
 235
 236 \f
 237 \chapter{Local call analysis}
 238
 239 All calls to local functions (known named functions and LETs) are resolved to
 240 the exact LAMBDA node which is to be called.  If the call is syntactically
 241 illegal, then we emit a warning and mark the reference as :notinline, forcing
 242 the call to be a full call.  We don't even think about converting APPLY calls;
 243 APPLY is not special-cased at all in ICR.  We also take care not to convert
 244 calls in the top-level component, which would join it to normal code.  Calls to
 245 functions with rest args and calls with non-constant keywords are also not
 246 converted.
 247
 248 We also convert MV-Calls that look like MULTIPLE-VALUE-BIND to local calls,
 249 since we know that they can be open-coded.  We replace the optional dispatch
 250 with a call to the last optional entry point, letting MV-Call magically default
 251 the unsupplied values to NIL.
 252
 253 When ICR optimizations discover a possible new local call, they explicitly
 254 invoke local call analysis on the code that needs to be reanalyzed.
 255
 256 [\#\#\# Let conversion.  What is means to be a let.  Argument type checking done
 257 by caller.  Significance of local call is that all callers are known, so
 258 special call conventions may be used.]
 259 A lambda called in only one place is called a "let" call, since a Let would
 260 turn into one.
 261
 262 In addition to enabling various ICR optimizations, the let/non-let distinction
 263 has important environment significance.  We treat the code in function and all
 264 of the lets called by that function as being in the same environment.  This
 265 allows exits from lets to be treated as local exits, and makes life easy for
 266 environment analysis.
 267
 268 Since we will let-convert any function with only one call, we must be careful
 269 about cleanups.  It is possible that a lexical exit from the let function may
 270 have to clean up dynamic bindings not lexically apparent at the exit point.  We
 271 handle this by annotating lets with any cleanup in effect at the call site.
 272 The cleanup for continuations with no immediately enclosing cleanup is the
 273 lambda that the continuation is in.  In this case, we look at the lambda to see
 274 if any cleanups need to be done.
 275
 276 Let conversion is disabled for entry-point functions, since otherwise we might
 277 convert the call from the XEP to the entry point into a let.  Then later on, we
 278 might want to convert a non-local reference into a local call, and not be able
 279 to, since once a function has been converted to a let, we can't convert it
 280 back.
 281
 282
 283 A function's return node may also be deleted if it is unreachable, which can
 284 happen if the function never returns normally.  Such functions are not lets.
 285
 286 \f
 287 \chapter{Find components}
 288
 289 This is a post-pass to ICR conversion that massages the flow graph into the
 290 shape subsequent phases expect.  Things done:
 291   Compute the depth-first ordering for the flow graph.
 292   Find the components (disconnected parts) of the flow graph.
 293
 294 This pass need only be redone when newly converted code has been added to the
 295 flow graph.  The reanalyze flag in the component structure should be set by
 296 people who mess things up.
 297
 298 We create the initial DFO using a variant of the basic algorithm.  The initial
 299 DFO computation breaks the ICR up into components, which are parts that can be
 300 compiled independently.  This is done to increase the efficiency of large block
 301 compilations.  In addition to improving locality of reference and reducing the
 302 size of flow analysis problems, this allows back-end data structures to be
 303 reclaimed after the compilation of each component.
 304
 305 ICR optimization can change the connectivity of the flow graph by discovering
 306 new calls or eliminating dead code.  Initial DFO determination splits up the
 307 flow graph into separate components, but does so conservatively, ensuring that
 308 parts that might become joined (due to local call conversion) are joined from
 309 the start.  Initial DFO computation also guarantees that all code which shares
 310 a lexical environment is in the same component so that environment analysis
 311 needs to operate only on a single component at a time.
 312
 313 [This can get a bit hairy, since code seemingly reachable from the
 314 environment entry may be reachable from a NLX into that environment.  Also,
 315 function references must be considered as links joining components even though
 316 the flow graph doesn't represent these.]
 317
 318 After initial DFO determination, components are neither split nor joined.  The
 319 standard DFO computation doesn't attempt to split components that have been
 320 disconnected.
 321
 322 \f
 323 \chapter{ICR optimize}
 324
 325 {\bf Somewhere describe basic ICR utilities: continuation-type,
 326 constant-continuation-p, etc.  Perhaps group by type in ICR description?}
 327
 328 We are conservative about doing variable-for-variable substitution in ICR
 329 optimization, since if we substitute a variable with a less restrictive type,
 330 then we may prevent use of a "good" representation within the scope of the
 331 inner binding.
 332
 333 Note that variable-variable substitutions aren't really crucial in ICR, since
 334 they don't create opportunities for new optimizations (unlike substitution of
 335 constants and functions).  A spurious variable-variable binding will show up as
 336 a Move operation in VMR.  This can be optimized away by reaching-definitions
 337 and also by targeting.  [\#\#\# But actually, some optimizers do see if operands
 338 are the same variable.]
 339
 340 \#|
 341
 342 The IF-IF optimization can be modeled as a value driven optimization, since
 343 adding a use definitely is cause for marking the continuation for
 344 reoptimization.  [When do we add uses?  Let conversion is the only obvious
 345 time.]  I guess IF-IF conversion could also be triggered by a non-immediate use
 346 of the test continuation becoming immediate, but to allow this to happen would
 347 require Delete-Block (or somebody) to mark block-starts as needing to be
 348 reoptimized when a predecessor changes.  It's not clear how important it is
 349 that IF-IF conversion happen under all possible circumstances, as long as it
 350 happens to the obvious cases.
 351
 352 [\#\#\# It isn't totally true that code flushing never enables other worthwhile
 353 optimizations.  Deleting a functional reference can cause a function to cease
 354 being an XEP, or even trigger let conversion.  It seems we still want to flush
 355 code during ICR optimize, but maybe we want to interleave it more intimately
 356 with the optimization pass.
 357
 358 Ref-flushing works just as well forward as backward, so it could be done in the
 359 forward pass.  Call flushing doesn't work so well, but we could scan the block
 360 backward looking for any new flushable stuff if we flushed a call on the
 361 forward pass.
 362
 363 When we delete a variable due to lack of references, we leave the variable
 364 in the lambda-list so that positional references still work.  The initial value
 365 continuation is flushed, though (replaced with NIL) allowing the initial value
 366 for to be deleted (modulo side-effects.)
 367
 368 Note that we can delete vars with no refs even when they have sets.  I guess
 369 when there are no refs, we should also flush all sets, allowing the value
 370 expressions to be flushed as well.
 371
 372 Squeeze out single-reference unset let variables by changing the dest of the
 373 initial value continuation to be the node that receives the ref.  This can be
 374 done regardless of what the initial value form is, since we aren't actually
 375 moving the evaluation.  Instead, we are in effect using the continuation's
 376 locations in place of the temporary variable.
 377
 378 Doing this is of course, a wild violation of stack discipline, since the ref
 379 might be inside a loop, etc.  But with the VMR back-end, we only need to
 380 preserve stack discipline for unknown-value continuations; this ICR
 381 transformation must be already be inhibited when the DEST of the REF is a
 382 multiple-values receiver (EXIT, RETURN or MV-COMBINATION), since we must
 383 preserve the single-value semantics of the let-binding in this case.
 384
 385 The REF and variable must be deleted as part of this operation, since the ICR
 386 would otherwise be left in an inconsistent state; we can't wait for the REF to
 387 be deleted due to bing unused, since we have grabbed the arg continuation and
 388 substituted it into the old DEST.
 389
 390 The big reason for doing this transformation is that in macros such as INCF and
 391 PSETQ, temporaries are squeezed out, and the new value expression is evaluated
 392 directly to the setter, allowing any result type assertion to be applied to the
 393 expression evaluation.  Unlike in the case of substitution, there is no point
 394 in inhibiting this transformation when the initial value type is weaker than
 395 the variable type.  Instead, we intersect the asserted type for the old REF's
 396 CONT with the type assertion on the initial value continuation.  Note that the
 397 variable's type has already been asserted on the initial-value continuation.
 398
 399 Of course, this transformation also simplifies the ICR even when it doesn't
 400 discover interesting type assertions, so it makes sense to do it whenever
 401 possible.  This reduces the demands placed on register allocation, etc.
 402
 403 |\#
 404
 405 There are three dead-code flushing rules:
 406  1] Refs with no DEST may be flushed.
 407  2] Known calls with no dest that are flushable may be flushed.  We null the
 408     DEST in all the args.
 409  3] If a lambda-var has no refs, then it may be deleted.  The flushed argument
 410     continuations have their DEST nulled.
 411
 412 These optimizations all enable one another.  We scan blocks backward, looking
 413 for nodes whose CONT has no DEST, then type-dispatching off of the node.  If we
 414 delete a ref, then we check to see if it is a lambda-var with no refs.  When we
 415 flush an argument, we mark the blocks for all uses of the CONT as needing to be
 416 reoptimized.
 417
 418 \f
 419 \section{Goals for ICR optimizations}
 420
 421 \#|
 422
 423 When an optimization is disabled, code should still be correct and not
 424 ridiculously inefficient.  Phases shouldn't be made mandatory when they have
 425 lots of non-required stuff jammed into them.
 426
 427 |\#
 428
 429 This pass is optional, but is desirable if anything is more important than
 430 compilation speed.
 431
 432 This phase is a grab-bag of optimizations that concern themselves with the flow
 433 of values through the code representation.  The main things done are type
 434 inference, constant folding and dead expression elimination.  This phase can be
 435 understood as a walk of the expression tree that propagates assertions down the
 436 tree and propagates derived information up the tree.  The main complication is
 437 that there isn't any expression tree, since ICR is flow-graph based.
 438
 439 We repeat this pass until we don't discover anything new.  This is a bit of
 440 feat, since we dispatch to arbitrary functions which may do arbitrary things,
 441 making it hard to tell if anything really happened.  Even if we solve this
 442 problem by requiring people to flag when they changed or by checking to see if
 443 they changed something, there are serious efficiency problems due to massive
 444 redundant computation, since in many cases the only way to tell if anything
 445 changed is to recompute the value and see if it is different from the old one.
 446
 447 We solve this problem by requiring that optimizations for a node only depend on
 448 the properties of the CONT and the continuations that have the node as their
 449 DEST.  If the continuations haven't changed since the last pass, then we don't
 450 attempt to re-optimize the node, since we know nothing interesting will happen.
 451
 452 We keep track of which continuations have changed by a REOPTIMIZE flag that is
 453 set whenever something about the continuation's value changes.
 454
 455 When doing the bottom up pass, we dispatch to type specific code that knows how
 456 to tell when a node needs to be reoptimized and does the optimization.  These
 457 node types are special-cased: COMBINATION, IF, RETURN, EXIT, SET.
 458
 459 The REOPTIMIZE flag in the COMBINATION-FUN is used to detect when the function
 460 information might have changed, so that we know when where are new assertions
 461 that could be propagated from the function type to the arguments.
 462
 463 When we discover something about a leaf, or substitute for leaf, we reoptimize
 464 the CONT for all the REF and SET nodes.
 465
 466 We have flags in each block that indicate when any nodes or continuations in
 467 the block need to be re-optimized, so we don't have to scan blocks where there
 468 is no chance of anything happening.
 469
 470 It is important for efficiency purposes that optimizers never say that they did
 471 something when they didn't, but this by itself doesn't guarantee timely
 472 termination.  I believe that with the type system implemented, type inference
 473 will converge in finite time, but as a practical matter, it can take far too
 474 long to discover not much.  For this reason, ICR optimization is terminated
 475 after three consecutive passes that don't add or delete code.  This premature
 476 termination only happens 2% of the time.
 477
 478 \f
 479 \section{Flow graph simplification}
 480
 481 Things done:
 482     Delete blocks with no predecessors.
 483     Merge blocks that can be merged.
 484     Convert local calls to Let calls.
 485     Eliminate degenerate IFs.
 486
 487 We take care not to merge blocks that are in different functions or have
 488 different cleanups.  This guarantees that non-local exits are always at block
 489 ends and that cleanup code never needs to be inserted within a block.
 490
 491 We eliminate IFs with identical consequent and alternative.  This would most
 492 likely happen if both the consequent and alternative were optimized away.
 493
 494 [Could also be done if the consequent and alternative were different blocks,
 495 but computed the same value.  This could be done by a sort of cross-jumping
 496 optimization that looked at the predecessors for a block and merged code shared
 497 between predecessors.  IFs with identical branches would eventually be left
 498 with nothing in their branches.]
 499
 500 We eliminate IF-IF constructs:
 501     (IF (IF A B C) D E) ==>
 502     (IF A (IF B D E) (IF C D E))
 503
 504 In reality, what we do is replicate blocks containing only an IF node where the
 505 predicate continuation is the block start.  We make one copy of the IF node for
 506 each use, leaving the consequent and alternative the same.  If you look at the
 507 flow graph representation, you will see that this is really the same thing as
 508 the above source to source transformation.
 509
 510 \f
 511 \section{Forward ICR optimizations}
 512
 513 In the forward pass, we scan the code in forward depth-first order.  We
 514 examine each call to a known function, and:
 515
 516 \begin{itemize}
 517 \item Eliminate any bindings for unused variables.
 518
 519 \item Do top-down type assertion propagation.  In local calls, we propagate
 520 asserted and derived types between the call and the called lambda.
 521
 522 \item
 523     Replace calls of foldable functions with constant arguments with the
 524     result.  We don't have to actually delete the call node, since Top-Down
 525     optimize will delete it now that its value is unused.
 526
 527 \item
 528    Run any Optimizer for the current function.  The optimizer does arbitrary
 529     transformations by hacking directly on the IR.  This is useful primarily
 530     for arithmetic simplification and similar things that may need to examine
 531     and modify calls other than the current call.  The optimizer is responsible
 532     for recording any changes that it makes.  An optimizer can inhibit further
 533     optimization of the node during the current pass by returning true.  This
 534     is useful when deleting the node.
 535
 536 \item
 537    Do ICR transformations, replacing a global function call with equivalent
 538     inline lisp code.
 539
 540 \item
 541     Do bottom-up type propagation/inferencing.  For some functions such as
 542     Coerce we will dispatch to a function to find the result type.  The
 543     Derive-Type function just returns a type structure, and we check if it is
 544     different from the old type in order to see if there was a change.
 545
 546 \item
 547     Eliminate IFs with predicates known to be true or false.
 548
 549 \item
 550     Substitute the value for unset let variables that are bound to constants,
 551     unset lambda variables or functionals.
 552
 553 \item
 554     Propagate types from local call args to var refs.
 555 \end{itemize}
 556
 557 We use type info from the function continuation to find result types for
 558 functions that don't have a derive-type method.
 559
 560
 561 ICR transformation:
 562
 563 ICR transformation does "source to source" transformations on known global
 564 functions, taking advantage of semantic information such as argument types and
 565 constant arguments.  Transformation is optional, but should be done if speed or
 566 space is more important than compilation speed.  Transformations which increase
 567 space should pass when space is more important than speed.
 568
 569 A transform is actually an inline function call where the function is computed
 570 at compile time.  The transform gets to peek at the continuations for the
 571 arguments, and computes a function using the information gained.  Transforms
 572 should be cautious about directly using the values of constant continuations,
 573 since the compiler must preserve eqlness of named constants, and it will have a
 574 hard time if transforms go around randomly copying constants.
 575
 576 The lambda that the transform computes replaces the original function variable
 577 reference as the function for the call.  This lets the compiler worry about
 578 evaluating each argument once in the right order.  We want to be careful to
 579 preserve type information when we do a transform, since it may be less than
 580 obvious what the transformed code does.
 581
 582 There can be any number of transforms for a function.  Each transform is
 583 associated with a function type that the call must be compatible with.  A
 584 transform is only invoked if the call has the right type.  This provides a way
 585 to deal with the common case of a transform that only applies when the
 586 arguments are of certain types and some arguments are not specified.  We always
 587 use the derived type when determining whether a transform is applicable.  Type
 588 check is responsible for setting the derived type to the intersection of the
 589 asserted and derived types.
 590
 591 If the code in the expansion has insufficient explicit or implicit argument
 592 type checking, then it should cause checks to be generated by making
 593 declarations.
 594
 595 A transformation may decide to pass if it doesn't like what it sees when it
 596 looks at the args.  The Give-Up function unwinds out of the transform and deals
 597 with complaining about inefficiency if speed is more important than brevity.
 598 The format args for the message are arguments to Give-Up.  If a transform can't
 599 be done, we just record the message where ICR finalize can find it.  note.  We
 600 can't complain immediately, since it might get transformed later on.
 601
 602 \f
 603 \section{Backward ICR optimizations}
 604
 605 In the backward pass, we scan each block in reverse order, and
 606 eliminate any effectless nodes with unused values.  In ICR this is the
 607 only way that code is deleted other than the elimination of unreachable blocks.
 608
 609 \f
 610 \chapter{Type checking}
 611
 612 [\#\#\# Somehow split this section up into three parts:
 613  -- Conceptual: how we know a check is necessary, and who is responsible for
 614     doing checks.
 615  -- Incremental: intersection of derived and asserted types, checking for
 616     non-subtype relationship.
 617  -- Check generation phase.
 618 ]
 619
 620
 621 We need to do a pretty good job of guessing when a type check will ultimately
 622 need to be done.  Generic arithmetic, for example: In the absence of
 623 declarations, we will use use the safe variant, but if we don't know this, we
 624 will generate a check for NUMBER anyway.  We need to look at the fast-safe
 625 templates and guess if any of them could apply.
 626
 627 We compute a function type from the VOP arguments
 628 and assertions on those arguments.  This can be used with Valid-Function-Use
 629 to see which templates do or might apply to a particular call.  If we guess
 630 that a safe implementation will be used, then we mark the continuation so as to
 631 force a safe implementation to be chosen.  [This will happen if ICR optimize
 632 doesn't run to completion, so the icr optimization after type check generation
 633 can discover new type information.  Since we won't redo type check at that
 634 point, there could be a call that has applicable unsafe templates, but isn't
 635 type checkable.]
 636
 637 [\#\#\# A better and more general optimization of structure type checks: in type
 638 check conversion, we look at the *original derived* type of the continuation:
 639 if the difference between the proven type and the asserted type is a simple
 640 type check, then check for the negation of the difference.  e.g. if we want a
 641 FOO and we know we've got (OR FOO NULL), then test for (NOT NULL).  This is a
 642 very important optimization for linked lists of structures, but can also apply
 643 in other situations.]
 644
 645 If after ICR phases, we have a continuation with check-type set in a context
 646 where it seems likely a check will be emitted, and the type is too
 647 hairy to be easily checked (i.e. no CHECK-xxx VOP), then we do a transformation
 648 on the ICR equivalent to:
 649   (... (the hair <foo>) ...)
 650 ==>
 651   (... (funcall \#'(lambda (\#:val)
 652                     (if (typep \#:val 'hair)
 653                         \#:val
 654                         (%type-check-error \#:val 'hair)))
 655                 <foo>)
 656        ...)
 657 This way, we guarantee that VMR conversion never has to emit type checks for
 658 hairy types.
 659
 660 [Actually, we need to do a MV-bind and several type checks when there is a MV
 661 continuation.  And some values types are just too hairy to check.  We really
 662 can't check any assertion for a non-fixed number of values, since there isn't
 663 any efficient way to bind arbitrary numbers of values.  (could be done with
 664 MV-call of a more-arg function, I guess...)
 665 ]
 666
 667 [Perhaps only use CHECK-xxx VOPs for types equivalent to a ptype?  Exceptions
 668 for CONS and SYMBOL?  Anyway, no point in going to trouble to implement and
 669 emit rarely used CHECK-xxx vops.]
 670
 671 One potential lose in converting a type check to explicit conditionals rather
 672 than to a CHECK-xxx VOP is that VMR code motion optimizations won't be able to
 673 do anything.  This shouldn't be much of an issue, though, since type constraint
 674 propagation has already done global optimization of type checks.
 675
 676
 677 This phase is optional, but should be done if anything is more important than
 678 compile speed.
 679
 680 Type check is responsible for reconciling the continuation asserted and derived
 681 types, emitting type checks if appropriate.  If the derived type is a subtype
 682 of the asserted type, then we don't need to do anything.
 683
 684 If there is no intersection between the asserted and derived types, then there
 685 is a manifest type error.  We print a warning message, indicating that
 686 something is almost surely wrong.  This will inhibit any transforms or
 687 generators that care about their argument types, yet also inhibits further
 688 error messages, since NIL is a subtype of every type.
 689
 690 If the intersection is not null, then we set the derived type to the
 691 intersection of the asserted and derived types and set the Type-Check flag in
 692 the continuation.  We always set the flag when we can't prove that the type
 693 assertion is satisfied, regardless of whether we will ultimately actually emit
 694 a type check or not.  This is so other phases such as type constraint
 695 propagation can use the Type-Check flag to detect an interesting type
 696 assertion, instead of having to duplicate much of the work in this phase.
 697 [\#\#\# 7 extremely random values for CONTINUATION-TYPE-CHECK.]
 698
 699 Type checks are generated on the fly during VMR conversion.  When VMR
 700 conversion generates the check, it prints an efficiency note if speed is
 701 important.  We don't flame now since type constraint progpagation may decide
 702 that the check is unnecessary.  [\#\#\# Not done now, maybe never.]
 703
 704 In local function call, it is the caller that is in effect responsible for
 705 checking argument types.  This happens in the same way as any other type check,
 706 since ICR optimize propagates the declared argument types to the type
 707 assertions for the argument continuations in all the calls.
 708
 709 Since the types of arguments to entry points are unknown at compile time, we
 710 want to do runtime checks to ensure that the incoming arguments are of the
 711 correct type.  This happens without any special effort on the part of type
 712 check, since the XEP is represented as a local call with unknown type
 713 arguments.  These arguments will be marked as needing to be checked.
 714
 715 \f
 716 \chapter{Constraint propagation}
 717
 718 \#|
 719 New lambda-var-slot:
 720
 721 constraints: a list of all the constraints on this var for either X or Y.
 722
 723 How to maintain consistency?  Does it really matter if there are constraints
 724 with deleted vars lying around?  Note that whatever mechanism we use for
 725 getting the constraints in the first place should tend to keep them up to date.
 726 Probably we would define optimizers for the interesting relations that look at
 727 their CONT's dest and annotate it if it is an IF.
 728
 729 But maybe it is more trouble then it is worth trying to build up the set of
 730 constraints during ICR optimize (maintaining consistency in the process).
 731 Since ICR optimize iterates a bunch of times before it converges, we would be
 732 wasting time recomputing the constraints, when nobody uses them till constraint
 733 propagation runs.
 734
 735 It seems that the only possible win is if we re-ran constraint propagation
 736 (which we might want to do.)  In that case, we wouldn't have to recompute all
 737 the constraints from scratch.  But it seems that we could do this just as well
 738 by having ICR optimize invalidate the affected parts of the constraint
 739 annotation, rather than trying to keep them up to date.  This also fits better
 740 with the optional nature of constraint propagation, since we don't want ICR
 741 optimize to commit to doing a lot of the work of constraint propagation.
 742
 743 For example, we might have a per-block flag indicating that something happened
 744 in that block since the last time constraint propagation ran.  We might have
 745 different flags to represent the distinction between discovering a new type
 746 assertion inside the block and discovering something new about an if
 747 predicate, since the latter would be cheaper to update and probably is more
 748 common.
 749
 750 It's fairly easy to see how we can build these sets of restrictions and
 751 propagate them using flow analysis, but actually using this information seems
 752 a bit more ad-hoc.
 753
 754 Probably the biggest thing we do is look at all the refs.  If have proven that
 755 the value is EQ (EQL for a number) to some other leaf (constant or lambda-var),
 756 then we can substitute for that reference.  In some cases, we will want to do
 757 special stuff depending on the DEST.  If the dest is an IF and we proved (not
 758 null), then we can substitute T.  And if the dest is some relation on the same
 759 two lambda-vars, then we want to see if we can show that relation is definitely
 760 true or false.
 761
 762 Otherwise, we can do our best to invert the set of restrictions into a type.
 763 Since types hold only constant info, we have to ignore any constraints between
 764 two vars.  We can make some use of negated type restrictions by using
 765 TYPE-DIFFERENCE to remove the type from the ref types.  If our inferred type is
 766 as good as the type assertion, then the continuation's type-check flag will be
 767 cleared.
 768
 769 It really isn't much of a problem that we don't infer union types on joins,
 770 since union types are relatively easy to derive without using flow information.
 771 The normal bottom-up type inference done by ICR optimize does this for us: it
 772 annotates everything with the union of all of the things it might possibly be.
 773 Then constraint propagation subtracts out those types that can't be in effect
 774 because of predicates or checks.
 775
 776
 777
 778 This phase is optional, but is desirable if anything is more important than
 779 compilation speed.  We use an algorithm similar to available expressions to
 780 propagate variable type information that has been discovered by implicit or
 781 explicit type tests, or by type inference.
 782
 783 We must do a pre-pass which locates set closure variables, since we cannot do
 784 flow analysis on such variables.  We set a flag in each set closure variable so
 785 that we can quickly tell that it is losing when we see it again.  Although this
 786 may seem to be wastefully redundant with environment analysis, the overlap
 787 isn't really that great, and the cost should be small compared to that of the
 788 flow analysis that we are preparing to do.  [Or we could punt on set
 789 variables...]
 790
 791 A type constraint is a structure that includes sset-element and has the type
 792 and variable.
 793 [\#\#\# Also a not-p flag indicating whether the sense is negated.]
 794   Each variable has a list of its type constraints.  We create a
 795 type constraint when we see a type test or check.  If there is already a
 796 constraint for the same variable and type, then we just re-use it.  If there is
 797 already a weaker constraint, then we generate both the weak constraints and the
 798 strong constraint so that the weak constraints won't be lost even if the strong
 799 one is unavailable.
 800
 801 We find all the distinct type constraints for each variable during the pre-pass
 802 over the lambda nesting.  Each constraint has a list of the weaker constraints
 803 so that we can easily generate them.
 804
 805 Every block generates all the type constraints in it, but a constraint is
 806 available in a successor only if it is available in all predecessors.  We
 807 determine the actual type constraint for a variable at a block by intersecting
 808 all the available type constraints for that variable.
 809
 810 This isn't maximally tense when there are constraints that are not
 811 hierarchically related, e.g. (or a b) (or b c).  If these constraints were
 812 available from two predecessors, then we could infer that we have an (or a b c)
 813 constraint, but the above algorithm would come up with none.  This probably
 814 isn't a big problem.
 815
 816 [\#\#\# Do we want to deal with (if (eq <var> '<foo>) ...) indicating singleton
 817 member type?]
 818
 819 We detect explicit type tests by looking at type test annotation in the IF
 820 node.  If there is a type check, the OUT sets are stored in the node, with
 821 different sets for the consequent and alternative.  Implicit type checks are
 822 located by finding Ref nodes whose Cont has the Type-Check flag set.  We don't
 823 actually represent the GEN sets, we just initialize OUT to it, and then form
 824 the union in place.
 825
 826 When we do the post-pass, we clear the Type-Check flags in the continuations
 827 for Refs when we discover that the available constraints satisfy the asserted
 828 type.  Any explicit uses of typep should be cleaned up by the ICR optimizer for
 829 typep.  We can also set the derived type for Refs to the intersection of the
 830 available type assertions.  If we discover anything, we should consider redoing
 831 ICR optimization, since better type information might enable more
 832 optimizations.
 833
 834
 835 \chapter{ICR finalize} % -*- Dictionary: design -*-
 836
 837 This pass looks for interesting things in the ICR so that we can forget about
 838 them.  Used and not defined things are flamed about.
 839
 840 We postpone these checks until now because the ICR optimizations may discover
 841 errors that are not initially obvious.  We also emit efficiency notes about
 842 optimizations that we were unable to do.  We can't emit the notes immediately,
 843 since we don't know for sure whether a repeated attempt at optimization will
 844 succeed.
 845
 846 We examine all references to unknown global function variables and update the
 847 approximate type accordingly.  We also record the names of the unknown
 848 functions so that they can be flamed about if they are never defined.  Unknown
 849 normal variables are flamed about on the fly during ICR conversion, so we
 850 ignore them here.
 851
 852 We check each newly defined global function for compatibility with previously
 853 recorded type information.  If there is no :defined or :declared type, then we
 854 check for compatibility with any approximate function type inferred from
 855 previous uses.
 856 \f
 857 \chapter{Environment analysis}
 858 \#|
 859
 860 A related change would be to annotate ICR with information about tail-recursion
 861 relations.  What we would do is add a slot to the node structure that points to
 862 the corresponding Tail-Info when a node is in a TR position.  This annotation
 863 would be made in a final ICR pass that runs after cleanup code is generated
 864 (part of environment analysis).  When true, the node is in a true TR position
 865 (modulo return-convention incompatibility).  When we determine return
 866 conventions, we null out the tail-p slots in XEP calls or known calls where we
 867 decided not to preserve tail-recursion.
 868
 869
 870 In this phase, we also check for changes in the dynamic binding environment
 871 that require cleanup code to be generated.  We just check for changes in the
 872 Continuation-Cleanup on local control transfers.  If it changes from
 873 an inner dynamic context to an outer one that is in the same environment, then
 874 we emit code to clean up the dynamic bindings between the old and new
 875 continuation.  We represent the result of cleanup detection to the back end by
 876 interposing a new block containing a call to a funny function.  Local exits
 877 from CATCH or UNWIND-PROTECT are detected in the same way.
 878
 879
 880 |\#
 881
 882 The primary activity in environment analysis is the annotation of ICR with
 883 environment structures describing where variables are allocated and what values
 884 the environment closes over.
 885
 886 Each lambda points to the environment where its variables are allocated, and
 887 the environments point back.  We always allocate the environment at the Bind
 888 node for the sole non-let lambda in the environment, so there is a close
 889 relationship between environments and functions.  Each "real function" (i.e.
 890 not a LET) has a corresponding environment.
 891
 892 We attempt to share the same environment among as many lambdas as possible so
 893 that unnecessary environment manipulation is not done.  During environment
 894 analysis the only optimization of this sort is realizing that a Let (a lambda
 895 with no Return node) cannot need its own environment, since there is no way
 896 that it can return and discover that its old values have been clobbered.
 897
 898 When the function is called, values from other environments may need to be made
 899 available in the function's environment.  These values are said to be "closed
 900 over".
 901
 902 Even if a value is not referenced in a given environment, it may need to be
 903 closed over in that environment so that it can be passed to a called function
 904 that does reference the value.  When we discover that a value must be closed
 905 over by a function, we must close over the value in all the environments where
 906 that function is referenced.  This applies to all references, not just local
 907 calls, since at other references we must have the values on hand so that we can
 908 build a closure.  This propagation must be applied recursively, since the value
 909 must also be available in *those* functions' callers.
 910
 911 If a closure reference is known to be "safe" (not an upward funarg), then the
 912 closure structure may be allocated on the stack.
 913
 914 Closure analysis deals only with closures over values, while Common Lisp
 915 requires closures over variables.  The difference only becomes significant when
 916 variables are set.  If a variable is not set, then we can freely make copies of
 917 it without keeping track of where they are.  When a variable is set, we must
 918 maintain a single value cell, or at least the illusion thereof.  We achieve
 919 this by creating a heap-allocated "value cell" structure for each set variable
 920 that is closed over.  The pointer to this value cell is passed around as the
 921 "value" corresponding to that variable.  References to the variable must
 922 explicitly indirect through the value cell.
 923
 924 When we are scanning over the lambdas in the component, we also check for bound
 925 but not referenced variables.
 926
 927 Environment analysis emits cleanup code for local exits and markers for
 928 non-local exits.
 929
 930 A non-local exit is a control transfer from one environment to another.  In a
 931 non-local exit, we must close over the continuation that we transfer to so that
 932 the exiting function can find its way back.  We indicate the need to close a
 933 continuation by placing the continuation structure in the closure and also
 934 pushing it on a list in the environment structure for the target of the exit.
 935 [\#\#\# To be safe, we would treat the continuation as a set closure variable so
 936 that we could invalidate it when we leave the dynamic extent of the exit point.
 937 Transferring control to a meaningless stack pointer would be apt to cause
 938 horrible death.]
 939
 940 Each local control transfer may require dynamic state such as special bindings
 941 to be undone.  We represent cleanup actions by funny function calls in a new
 942 block linked in as an implicit MV-PROG1.
 943