1 \chapter{ICR conversion} % -*- Dictionary: design -*-
5 \section{Canonical forms}
9 Would be useful to have a Freeze-Type proclamation. Its primary use would to
10 be say that the indicated type won't acquire any new subtypes in the future.
11 This allows better open-coding of structure type predicates, since the possible
12 types that would satisfy the predicate will be constant at compile time, and
13 thus can be compiled as a skip-chain of EQ tests.
15 Of course, this is only a big win when the subtypes are few: the most important
16 case is when there are none. If the closure of the subtypes is much larger
17 than the average number of supertypes of an inferior, then it is better to grab
18 the list of superiors out of the object's type, and test for membership in that
21 Should type-specific numeric equality be done by EQL rather than =? i.e.
22 should = on two fixnums become EQL and then convert to EQL/FIXNUM?
23 Currently we transform EQL into =, which is complicated, since we have to prove
24 the operands are the class of numeric type before we do it. Also, when EQL
25 sees one operand is a FIXNUM, it transforms to EQ, but the generator for EQ
26 isn't expecting numbers, so it doesn't use an immediate compare.
32 Array type tests are transformed to %array-typep, separation of the
33 implementation-dependent array-type handling. This way we can transform
35 (or (simple-string-p x)
36 (and (complex-array-p x)
38 (simple-string-p (%array-data x))))
40 In addition to the similar bit-vector-p, we also handle vectorp and any type
41 tests on which the dimension isn't wild.
42 [Note that we will want to expand into frobs compatible with those that
43 array references expand into so that the same optimizations will work on both.]
45 These changes combine to convert hairy type checks into hairy typep's, and then
46 convert hairyp typeps into simple typeps.
49 Do we really need non-VOP templates? It seems that we could get the desired
50 effect through implementation-dependent ICR transforms. The main risk would be
51 of obscuring the type semantics of the code. We could fairly easily retain all
52 the type information present at the time the tranform is run, but if we
53 discover new type information, then it won't be propagated unless the VM also
54 supplies type inference methods for its internal frobs (precluding the use of
55 %PRIMITIVE, since primitives don't have derive-type methods.)
57 I guess one possibility would be to have the call still considered "known" even
58 though it has been transformed. But this doesn't work, since we start doing
59 LET optimizations that trash the arglist once the call has been transformed
60 (and indeed we want to.)
62 Actually, I guess the overhead for providing type inference methods for the
63 internal frobs isn't that great, since we can usually borrow the inference
64 method for a Common Lisp function. For example, in our AREF case:
67 (let ((\#:len (array-dimension x 0)))
68 (%unchecked-aref x (%check-in-bounds y \#:len)))
70 Now in this case, if we made %UNCHECKED-AREF have the same derive-type method
71 as AREF, then if we discovered something new about X's element type, we could
72 derive a new type for the entire expression.
74 Actually, it seems that baring this detail at the ICR level is beneficial,
75 since it admits the possibly of optimizing away the bounds check using type
76 information. If we discover X's dimensions, then \#:LEN becomes a constant that
77 can be substituted. Then %CHECK-IN-BOUNDS can notice that the bound is
78 constant and check it against the type for Y. If Y is known to be in range,
79 then we can optimize away the bounds check.
81 Actually in this particular case, the best thing to do would be if we
82 discovered the bound is constant, then replace the bounds check with an
83 implicit type check. This way all the type check optimization mechanisms would
84 be brought into the act.
86 So we actually want to do the bounds-check expansion as soon as possible,
87 rather than later than possible: it should be a source-transform, enabled by
90 With multi-dimensional arrays we probably want to explicitly do the index
91 computation: this way portions of the index computation can become loop
92 invariants. In a scan in row-major order, the inner loop wouldn't have to do
93 any multiplication: it would only do an addition. We would use normal
94 fixnum arithmetic, counting on * to cleverly handle multiplication by a
95 constant, and appropriate inline expansion.
97 Note that in a source transform, we can't make any assumptions the type of the
98 array. If it turns out to be a complex array without declared dimensions, then
99 the calls to ARRAY-DIMENSION will have to turn into a VOP that can be affected.
100 But if it is simple, then the VOP is unaffected, and if we know the bounds, it
101 is constant. Similarly, we would have %ARRAY-DATA and %ARRAY-DISPLACEMENT
102 operations. %ARRAY-DISPLACEMENT would optimize to 0 if we discover the array
103 is simple. [This is somewhat inefficient when the array isn't eventually
104 discovered to be simple, since finding the data and finding the displacement
105 duplicate each other. We could make %ARRAY-DATA return both as MVs, and then
106 optimize to (VALUES (%SIMPLE-ARRAY-DATA x) 0), but this would require
107 optimization of trivial VALUES uses.]
109 Also need (THE (ARRAY * * * ...) x) to assert correct rank.
113 A bunch of functions have source transforms that convert them into the
114 canonical form that later parts of the compiler want to see. It is not legal
115 to rely on the canonical form since source transforms can be inhibited by a
116 Notinline declaration. This shouldn't be a problem, since everyone should keep
117 their hands off of Notinline calls.
119 Some transformations:
121 Endp ==> (NULL (THE LIST ...))
122 (NOT xxx) or (NULL xxx) => (IF xxx NIL T)
124 (typep x '<simple type>) => (<simple predicate> x)
125 (typep x '<complex type>) => ...composition of simpler operations...
126 TYPEP of AND, OR and NOT types turned into conditionals over multiple TYPEP
127 calls. This makes hairy TYPEP calls more digestible to type constraint
128 propagation, and also means that the TYPEP code generators don't have to deal
129 with these cases. [\#\#\# In the case of union types we may want to do something
130 to preserve information for type constraint propagation.]
135 (multiple-value-call \#'foo (values a) (values b) (values-list c))
137 This way only MV-CALL needs to know how to do calls with unknown numbers of
138 arguments. It should be nearly as efficient as a special-case VMR-Convert
142 Make-String => Make-Array
143 N-arg predicates associated into two-arg versions.
144 Associate N-arg arithmetic ops.
145 Expand CxxxR and FIRST...nTH
146 Zerop, Plusp, Minusp, 1+, 1-, Min, Max, Rem, Mod
147 (Values x), (Identity x) => (Prog1 x)
149 All specialized aref functions => (aref (the xxx) ...)
151 Convert (ldb (byte ...) ...) into internal frob that takes size and position as
152 separate args. Other byte functions also...
154 Change for-value primitive predicates into (if <pred> t nil). This isn't
155 particularly useful during ICR phases, but makes life easy for VMR conversion.
157 This last can't be a source transformation, since a source transform can't tell
158 where the form appears. Instead, ICR conversion special-cases calls to known
159 functions with the Predicate attribute by doing the conversion when the
160 destination of the result isn't an IF. It isn't critical that this never be
161 done for predicates that we ultimately discover to deliver their value to an
162 IF, since IF optimizations will flush unnecessary IFs in a predicate.
165 \section{Inline functions}
167 [\#\#\# Inline expansion is especially powerful in the presence of good lisp-level
168 optimization ("partial evaluation"). Many "optimizations" usually done in Lisp
169 compilers by special-case source-to-source transforms can be had simply by
170 making the source of the general case function available for inline expansion.
171 This is especially helpful in Common Lisp, which has many commonly used
172 functions with simple special cases but bad general cases (list and sequence
173 functions, for example.)
175 Inline expansion of recursive functions is allowed, and is not as silly as it
176 sounds. When expanded in a specific context, much of the overhead of the
177 recursive calls may be eliminated (especially if there are many keyword
180 [Also have MAYBE-INLINE]
183 We only record a function's inline expansion in the global environment when the
184 function is in the null lexical environment, since it the expansion must be
185 represented as source.
187 We do inline expansion of functions locally defined by FLET or LABELS even when
188 the environment is not null. Since the appearances of the local function must
189 be nested within the desired environment, it is possible to expand local
190 functions inline even when they use the environment. We just stash the source
191 form and environments in the Functional for the local function. When we
192 convert a call to it, we just reconvert the source in the saved environment.
194 An interesting alternative to the inline/full-call dichotomy is "semi-inline"
195 coding. Whenever we have an inline expansion for a function, we can expand it
196 only once per block compilation, and then use local call to call this copied
197 version. This should get most of the speed advantage of real inline coding
198 with much less code bloat. This is especially attractive for simple system
199 functions such as Read-Char.
201 The main place where true inline expansion would still be worth doing is where
202 large amounts of the function could be optimized away by constant folding or
203 other optimizations that depend on the exact arguments to the call.
207 \section{Compilation policy}
209 We want more sophisticated control of compilation safety than is offered in CL,
210 so that we can emit only those type checks that are likely to discover
211 something (i.e. external interfaces.)
218 Generalized back-end notion provides dynamic retargeting? (for byte code)
220 The current node type annotations seem to be somewhat unsatisfactory, since we
221 lose information when we do a THE on a continuation that already has uses, or
222 when we convert a let where the actual result continuation has other uses.
224 But the case with THE isn't really all that bad, since the test of whether
225 there are any uses happens before conversion of the argument, thus THE loses
226 information only when there are uses outside of the declared form. The LET
227 case may not be a big deal either.
229 Note also that losing user assertions isn't really all that bad, since it won't
230 damage system integrity. At worst, it will cause a bug to go undetected. More
231 likely, it will just cause the error to be signaled in a different place (and
232 possibly in a less informative way). Of course, there is an efficiency hit for
233 losing type information, but if it only happens in strange cases, then this
237 \chapter{Local call analysis}
239 All calls to local functions (known named functions and LETs) are resolved to
240 the exact LAMBDA node which is to be called. If the call is syntactically
241 illegal, then we emit a warning and mark the reference as :notinline, forcing
242 the call to be a full call. We don't even think about converting APPLY calls;
243 APPLY is not special-cased at all in ICR. We also take care not to convert
244 calls in the top-level component, which would join it to normal code. Calls to
245 functions with rest args and calls with non-constant keywords are also not
248 We also convert MV-Calls that look like MULTIPLE-VALUE-BIND to local calls,
249 since we know that they can be open-coded. We replace the optional dispatch
250 with a call to the last optional entry point, letting MV-Call magically default
251 the unsupplied values to NIL.
253 When ICR optimizations discover a possible new local call, they explicitly
254 invoke local call analysis on the code that needs to be reanalyzed.
256 [\#\#\# Let conversion. What is means to be a let. Argument type checking done
257 by caller. Significance of local call is that all callers are known, so
258 special call conventions may be used.]
259 A lambda called in only one place is called a "let" call, since a Let would
262 In addition to enabling various ICR optimizations, the let/non-let distinction
263 has important environment significance. We treat the code in function and all
264 of the lets called by that function as being in the same environment. This
265 allows exits from lets to be treated as local exits, and makes life easy for
266 environment analysis.
268 Since we will let-convert any function with only one call, we must be careful
269 about cleanups. It is possible that a lexical exit from the let function may
270 have to clean up dynamic bindings not lexically apparent at the exit point. We
271 handle this by annotating lets with any cleanup in effect at the call site.
272 The cleanup for continuations with no immediately enclosing cleanup is the
273 lambda that the continuation is in. In this case, we look at the lambda to see
274 if any cleanups need to be done.
276 Let conversion is disabled for entry-point functions, since otherwise we might
277 convert the call from the XEP to the entry point into a let. Then later on, we
278 might want to convert a non-local reference into a local call, and not be able
279 to, since once a function has been converted to a let, we can't convert it
283 A function's return node may also be deleted if it is unreachable, which can
284 happen if the function never returns normally. Such functions are not lets.
287 \chapter{Find components}
289 This is a post-pass to ICR conversion that massages the flow graph into the
290 shape subsequent phases expect. Things done:
291 Compute the depth-first ordering for the flow graph.
292 Find the components (disconnected parts) of the flow graph.
294 This pass need only be redone when newly converted code has been added to the
295 flow graph. The reanalyze flag in the component structure should be set by
296 people who mess things up.
298 We create the initial DFO using a variant of the basic algorithm. The initial
299 DFO computation breaks the ICR up into components, which are parts that can be
300 compiled independently. This is done to increase the efficiency of large block
301 compilations. In addition to improving locality of reference and reducing the
302 size of flow analysis problems, this allows back-end data structures to be
303 reclaimed after the compilation of each component.
305 ICR optimization can change the connectivity of the flow graph by discovering
306 new calls or eliminating dead code. Initial DFO determination splits up the
307 flow graph into separate components, but does so conservatively, ensuring that
308 parts that might become joined (due to local call conversion) are joined from
309 the start. Initial DFO computation also guarantees that all code which shares
310 a lexical environment is in the same component so that environment analysis
311 needs to operate only on a single component at a time.
313 [This can get a bit hairy, since code seemingly reachable from the
314 environment entry may be reachable from a NLX into that environment. Also,
315 function references must be considered as links joining components even though
316 the flow graph doesn't represent these.]
318 After initial DFO determination, components are neither split nor joined. The
319 standard DFO computation doesn't attempt to split components that have been
323 \chapter{ICR optimize}
325 {\bf Somewhere describe basic ICR utilities: continuation-type,
326 constant-continuation-p, etc. Perhaps group by type in ICR description?}
328 We are conservative about doing variable-for-variable substitution in ICR
329 optimization, since if we substitute a variable with a less restrictive type,
330 then we may prevent use of a "good" representation within the scope of the
333 Note that variable-variable substitutions aren't really crucial in ICR, since
334 they don't create opportunities for new optimizations (unlike substitution of
335 constants and functions). A spurious variable-variable binding will show up as
336 a Move operation in VMR. This can be optimized away by reaching-definitions
337 and also by targeting. [\#\#\# But actually, some optimizers do see if operands
338 are the same variable.]
342 The IF-IF optimization can be modeled as a value driven optimization, since
343 adding a use definitely is cause for marking the continuation for
344 reoptimization. [When do we add uses? Let conversion is the only obvious
345 time.] I guess IF-IF conversion could also be triggered by a non-immediate use
346 of the test continuation becoming immediate, but to allow this to happen would
347 require Delete-Block (or somebody) to mark block-starts as needing to be
348 reoptimized when a predecessor changes. It's not clear how important it is
349 that IF-IF conversion happen under all possible circumstances, as long as it
350 happens to the obvious cases.
352 [\#\#\# It isn't totally true that code flushing never enables other worthwhile
353 optimizations. Deleting a functional reference can cause a function to cease
354 being an XEP, or even trigger let conversion. It seems we still want to flush
355 code during ICR optimize, but maybe we want to interleave it more intimately
356 with the optimization pass.
358 Ref-flushing works just as well forward as backward, so it could be done in the
359 forward pass. Call flushing doesn't work so well, but we could scan the block
360 backward looking for any new flushable stuff if we flushed a call on the
363 When we delete a variable due to lack of references, we leave the variable
364 in the lambda-list so that positional references still work. The initial value
365 continuation is flushed, though (replaced with NIL) allowing the initial value
366 for to be deleted (modulo side-effects.)
368 Note that we can delete vars with no refs even when they have sets. I guess
369 when there are no refs, we should also flush all sets, allowing the value
370 expressions to be flushed as well.
372 Squeeze out single-reference unset let variables by changing the dest of the
373 initial value continuation to be the node that receives the ref. This can be
374 done regardless of what the initial value form is, since we aren't actually
375 moving the evaluation. Instead, we are in effect using the continuation's
376 locations in place of the temporary variable.
378 Doing this is of course, a wild violation of stack discipline, since the ref
379 might be inside a loop, etc. But with the VMR back-end, we only need to
380 preserve stack discipline for unknown-value continuations; this ICR
381 transformation must be already be inhibited when the DEST of the REF is a
382 multiple-values receiver (EXIT, RETURN or MV-COMBINATION), since we must
383 preserve the single-value semantics of the let-binding in this case.
385 The REF and variable must be deleted as part of this operation, since the ICR
386 would otherwise be left in an inconsistent state; we can't wait for the REF to
387 be deleted due to bing unused, since we have grabbed the arg continuation and
388 substituted it into the old DEST.
390 The big reason for doing this transformation is that in macros such as INCF and
391 PSETQ, temporaries are squeezed out, and the new value expression is evaluated
392 directly to the setter, allowing any result type assertion to be applied to the
393 expression evaluation. Unlike in the case of substitution, there is no point
394 in inhibiting this transformation when the initial value type is weaker than
395 the variable type. Instead, we intersect the asserted type for the old REF's
396 CONT with the type assertion on the initial value continuation. Note that the
397 variable's type has already been asserted on the initial-value continuation.
399 Of course, this transformation also simplifies the ICR even when it doesn't
400 discover interesting type assertions, so it makes sense to do it whenever
401 possible. This reduces the demands placed on register allocation, etc.
405 There are three dead-code flushing rules:
406 1] Refs with no DEST may be flushed.
407 2] Known calls with no dest that are flushable may be flushed. We null the
408 DEST in all the args.
409 3] If a lambda-var has no refs, then it may be deleted. The flushed argument
410 continuations have their DEST nulled.
412 These optimizations all enable one another. We scan blocks backward, looking
413 for nodes whose CONT has no DEST, then type-dispatching off of the node. If we
414 delete a ref, then we check to see if it is a lambda-var with no refs. When we
415 flush an argument, we mark the blocks for all uses of the CONT as needing to be
419 \section{Goals for ICR optimizations}
423 When an optimization is disabled, code should still be correct and not
424 ridiculously inefficient. Phases shouldn't be made mandatory when they have
425 lots of non-required stuff jammed into them.
429 This pass is optional, but is desirable if anything is more important than
432 This phase is a grab-bag of optimizations that concern themselves with the flow
433 of values through the code representation. The main things done are type
434 inference, constant folding and dead expression elimination. This phase can be
435 understood as a walk of the expression tree that propagates assertions down the
436 tree and propagates derived information up the tree. The main complication is
437 that there isn't any expression tree, since ICR is flow-graph based.
439 We repeat this pass until we don't discover anything new. This is a bit of
440 feat, since we dispatch to arbitrary functions which may do arbitrary things,
441 making it hard to tell if anything really happened. Even if we solve this
442 problem by requiring people to flag when they changed or by checking to see if
443 they changed something, there are serious efficiency problems due to massive
444 redundant computation, since in many cases the only way to tell if anything
445 changed is to recompute the value and see if it is different from the old one.
447 We solve this problem by requiring that optimizations for a node only depend on
448 the properties of the CONT and the continuations that have the node as their
449 DEST. If the continuations haven't changed since the last pass, then we don't
450 attempt to re-optimize the node, since we know nothing interesting will happen.
452 We keep track of which continuations have changed by a REOPTIMIZE flag that is
453 set whenever something about the continuation's value changes.
455 When doing the bottom up pass, we dispatch to type specific code that knows how
456 to tell when a node needs to be reoptimized and does the optimization. These
457 node types are special-cased: COMBINATION, IF, RETURN, EXIT, SET.
459 The REOPTIMIZE flag in the COMBINATION-FUN is used to detect when the function
460 information might have changed, so that we know when where are new assertions
461 that could be propagated from the function type to the arguments.
463 When we discover something about a leaf, or substitute for leaf, we reoptimize
464 the CONT for all the REF and SET nodes.
466 We have flags in each block that indicate when any nodes or continuations in
467 the block need to be re-optimized, so we don't have to scan blocks where there
468 is no chance of anything happening.
470 It is important for efficiency purposes that optimizers never say that they did
471 something when they didn't, but this by itself doesn't guarantee timely
472 termination. I believe that with the type system implemented, type inference
473 will converge in finite time, but as a practical matter, it can take far too
474 long to discover not much. For this reason, ICR optimization is terminated
475 after three consecutive passes that don't add or delete code. This premature
476 termination only happens 2% of the time.
479 \section{Flow graph simplification}
482 Delete blocks with no predecessors.
483 Merge blocks that can be merged.
484 Convert local calls to Let calls.
485 Eliminate degenerate IFs.
487 We take care not to merge blocks that are in different functions or have
488 different cleanups. This guarantees that non-local exits are always at block
489 ends and that cleanup code never needs to be inserted within a block.
491 We eliminate IFs with identical consequent and alternative. This would most
492 likely happen if both the consequent and alternative were optimized away.
494 [Could also be done if the consequent and alternative were different blocks,
495 but computed the same value. This could be done by a sort of cross-jumping
496 optimization that looked at the predecessors for a block and merged code shared
497 between predecessors. IFs with identical branches would eventually be left
498 with nothing in their branches.]
500 We eliminate IF-IF constructs:
501 (IF (IF A B C) D E) ==>
502 (IF A (IF B D E) (IF C D E))
504 In reality, what we do is replicate blocks containing only an IF node where the
505 predicate continuation is the block start. We make one copy of the IF node for
506 each use, leaving the consequent and alternative the same. If you look at the
507 flow graph representation, you will see that this is really the same thing as
508 the above source to source transformation.
511 \section{Forward ICR optimizations}
513 In the forward pass, we scan the code in forward depth-first order. We
514 examine each call to a known function, and:
517 \item Eliminate any bindings for unused variables.
519 \item Do top-down type assertion propagation. In local calls, we propagate
520 asserted and derived types between the call and the called lambda.
523 Replace calls of foldable functions with constant arguments with the
524 result. We don't have to actually delete the call node, since Top-Down
525 optimize will delete it now that its value is unused.
528 Run any Optimizer for the current function. The optimizer does arbitrary
529 transformations by hacking directly on the IR. This is useful primarily
530 for arithmetic simplification and similar things that may need to examine
531 and modify calls other than the current call. The optimizer is responsible
532 for recording any changes that it makes. An optimizer can inhibit further
533 optimization of the node during the current pass by returning true. This
534 is useful when deleting the node.
537 Do ICR transformations, replacing a global function call with equivalent
541 Do bottom-up type propagation/inferencing. For some functions such as
542 Coerce we will dispatch to a function to find the result type. The
543 Derive-Type function just returns a type structure, and we check if it is
544 different from the old type in order to see if there was a change.
547 Eliminate IFs with predicates known to be true or false.
550 Substitute the value for unset let variables that are bound to constants,
551 unset lambda variables or functionals.
554 Propagate types from local call args to var refs.
557 We use type info from the function continuation to find result types for
558 functions that don't have a derive-type method.
563 ICR transformation does "source to source" transformations on known global
564 functions, taking advantage of semantic information such as argument types and
565 constant arguments. Transformation is optional, but should be done if speed or
566 space is more important than compilation speed. Transformations which increase
567 space should pass when space is more important than speed.
569 A transform is actually an inline function call where the function is computed
570 at compile time. The transform gets to peek at the continuations for the
571 arguments, and computes a function using the information gained. Transforms
572 should be cautious about directly using the values of constant continuations,
573 since the compiler must preserve eqlness of named constants, and it will have a
574 hard time if transforms go around randomly copying constants.
576 The lambda that the transform computes replaces the original function variable
577 reference as the function for the call. This lets the compiler worry about
578 evaluating each argument once in the right order. We want to be careful to
579 preserve type information when we do a transform, since it may be less than
580 obvious what the transformed code does.
582 There can be any number of transforms for a function. Each transform is
583 associated with a function type that the call must be compatible with. A
584 transform is only invoked if the call has the right type. This provides a way
585 to deal with the common case of a transform that only applies when the
586 arguments are of certain types and some arguments are not specified. We always
587 use the derived type when determining whether a transform is applicable. Type
588 check is responsible for setting the derived type to the intersection of the
589 asserted and derived types.
591 If the code in the expansion has insufficient explicit or implicit argument
592 type checking, then it should cause checks to be generated by making
595 A transformation may decide to pass if it doesn't like what it sees when it
596 looks at the args. The Give-Up function unwinds out of the transform and deals
597 with complaining about inefficiency if speed is more important than brevity.
598 The format args for the message are arguments to Give-Up. If a transform can't
599 be done, we just record the message where ICR finalize can find it. note. We
600 can't complain immediately, since it might get transformed later on.
603 \section{Backward ICR optimizations}
605 In the backward pass, we scan each block in reverse order, and
606 eliminate any effectless nodes with unused values. In ICR this is the
607 only way that code is deleted other than the elimination of unreachable blocks.
610 \chapter{Type checking}
612 [\#\#\# Somehow split this section up into three parts:
613 -- Conceptual: how we know a check is necessary, and who is responsible for
615 -- Incremental: intersection of derived and asserted types, checking for
616 non-subtype relationship.
617 -- Check generation phase.
621 We need to do a pretty good job of guessing when a type check will ultimately
622 need to be done. Generic arithmetic, for example: In the absence of
623 declarations, we will use use the safe variant, but if we don't know this, we
624 will generate a check for NUMBER anyway. We need to look at the fast-safe
625 templates and guess if any of them could apply.
627 We compute a function type from the VOP arguments
628 and assertions on those arguments. This can be used with Valid-Function-Use
629 to see which templates do or might apply to a particular call. If we guess
630 that a safe implementation will be used, then we mark the continuation so as to
631 force a safe implementation to be chosen. [This will happen if ICR optimize
632 doesn't run to completion, so the icr optimization after type check generation
633 can discover new type information. Since we won't redo type check at that
634 point, there could be a call that has applicable unsafe templates, but isn't
637 [\#\#\# A better and more general optimization of structure type checks: in type
638 check conversion, we look at the *original derived* type of the continuation:
639 if the difference between the proven type and the asserted type is a simple
640 type check, then check for the negation of the difference. e.g. if we want a
641 FOO and we know we've got (OR FOO NULL), then test for (NOT NULL). This is a
642 very important optimization for linked lists of structures, but can also apply
643 in other situations.]
645 If after ICR phases, we have a continuation with check-type set in a context
646 where it seems likely a check will be emitted, and the type is too
647 hairy to be easily checked (i.e. no CHECK-xxx VOP), then we do a transformation
648 on the ICR equivalent to:
649 (... (the hair <foo>) ...)
651 (... (funcall \#'(lambda (\#:val)
652 (if (typep \#:val 'hair)
654 (%type-check-error \#:val 'hair)))
657 This way, we guarantee that VMR conversion never has to emit type checks for
660 [Actually, we need to do a MV-bind and several type checks when there is a MV
661 continuation. And some values types are just too hairy to check. We really
662 can't check any assertion for a non-fixed number of values, since there isn't
663 any efficient way to bind arbitrary numbers of values. (could be done with
664 MV-call of a more-arg function, I guess...)
667 [Perhaps only use CHECK-xxx VOPs for types equivalent to a ptype? Exceptions
668 for CONS and SYMBOL? Anyway, no point in going to trouble to implement and
669 emit rarely used CHECK-xxx vops.]
671 One potential lose in converting a type check to explicit conditionals rather
672 than to a CHECK-xxx VOP is that VMR code motion optimizations won't be able to
673 do anything. This shouldn't be much of an issue, though, since type constraint
674 propagation has already done global optimization of type checks.
677 This phase is optional, but should be done if anything is more important than
680 Type check is responsible for reconciling the continuation asserted and derived
681 types, emitting type checks if appropriate. If the derived type is a subtype
682 of the asserted type, then we don't need to do anything.
684 If there is no intersection between the asserted and derived types, then there
685 is a manifest type error. We print a warning message, indicating that
686 something is almost surely wrong. This will inhibit any transforms or
687 generators that care about their argument types, yet also inhibits further
688 error messages, since NIL is a subtype of every type.
690 If the intersection is not null, then we set the derived type to the
691 intersection of the asserted and derived types and set the Type-Check flag in
692 the continuation. We always set the flag when we can't prove that the type
693 assertion is satisfied, regardless of whether we will ultimately actually emit
694 a type check or not. This is so other phases such as type constraint
695 propagation can use the Type-Check flag to detect an interesting type
696 assertion, instead of having to duplicate much of the work in this phase.
697 [\#\#\# 7 extremely random values for CONTINUATION-TYPE-CHECK.]
699 Type checks are generated on the fly during VMR conversion. When VMR
700 conversion generates the check, it prints an efficiency note if speed is
701 important. We don't flame now since type constraint progpagation may decide
702 that the check is unnecessary. [\#\#\# Not done now, maybe never.]
704 In local function call, it is the caller that is in effect responsible for
705 checking argument types. This happens in the same way as any other type check,
706 since ICR optimize propagates the declared argument types to the type
707 assertions for the argument continuations in all the calls.
709 Since the types of arguments to entry points are unknown at compile time, we
710 want to do runtime checks to ensure that the incoming arguments are of the
711 correct type. This happens without any special effort on the part of type
712 check, since the XEP is represented as a local call with unknown type
713 arguments. These arguments will be marked as needing to be checked.
716 \chapter{Constraint propagation}
721 constraints: a list of all the constraints on this var for either X or Y.
723 How to maintain consistency? Does it really matter if there are constraints
724 with deleted vars lying around? Note that whatever mechanism we use for
725 getting the constraints in the first place should tend to keep them up to date.
726 Probably we would define optimizers for the interesting relations that look at
727 their CONT's dest and annotate it if it is an IF.
729 But maybe it is more trouble then it is worth trying to build up the set of
730 constraints during ICR optimize (maintaining consistency in the process).
731 Since ICR optimize iterates a bunch of times before it converges, we would be
732 wasting time recomputing the constraints, when nobody uses them till constraint
735 It seems that the only possible win is if we re-ran constraint propagation
736 (which we might want to do.) In that case, we wouldn't have to recompute all
737 the constraints from scratch. But it seems that we could do this just as well
738 by having ICR optimize invalidate the affected parts of the constraint
739 annotation, rather than trying to keep them up to date. This also fits better
740 with the optional nature of constraint propagation, since we don't want ICR
741 optimize to commit to doing a lot of the work of constraint propagation.
743 For example, we might have a per-block flag indicating that something happened
744 in that block since the last time constraint propagation ran. We might have
745 different flags to represent the distinction between discovering a new type
746 assertion inside the block and discovering something new about an if
747 predicate, since the latter would be cheaper to update and probably is more
750 It's fairly easy to see how we can build these sets of restrictions and
751 propagate them using flow analysis, but actually using this information seems
754 Probably the biggest thing we do is look at all the refs. If have proven that
755 the value is EQ (EQL for a number) to some other leaf (constant or lambda-var),
756 then we can substitute for that reference. In some cases, we will want to do
757 special stuff depending on the DEST. If the dest is an IF and we proved (not
758 null), then we can substitute T. And if the dest is some relation on the same
759 two lambda-vars, then we want to see if we can show that relation is definitely
762 Otherwise, we can do our best to invert the set of restrictions into a type.
763 Since types hold only constant info, we have to ignore any constraints between
764 two vars. We can make some use of negated type restrictions by using
765 TYPE-DIFFERENCE to remove the type from the ref types. If our inferred type is
766 as good as the type assertion, then the continuation's type-check flag will be
769 It really isn't much of a problem that we don't infer union types on joins,
770 since union types are relatively easy to derive without using flow information.
771 The normal bottom-up type inference done by ICR optimize does this for us: it
772 annotates everything with the union of all of the things it might possibly be.
773 Then constraint propagation subtracts out those types that can't be in effect
774 because of predicates or checks.
778 This phase is optional, but is desirable if anything is more important than
779 compilation speed. We use an algorithm similar to available expressions to
780 propagate variable type information that has been discovered by implicit or
781 explicit type tests, or by type inference.
783 We must do a pre-pass which locates set closure variables, since we cannot do
784 flow analysis on such variables. We set a flag in each set closure variable so
785 that we can quickly tell that it is losing when we see it again. Although this
786 may seem to be wastefully redundant with environment analysis, the overlap
787 isn't really that great, and the cost should be small compared to that of the
788 flow analysis that we are preparing to do. [Or we could punt on set
791 A type constraint is a structure that includes sset-element and has the type
793 [\#\#\# Also a not-p flag indicating whether the sense is negated.]
794 Each variable has a list of its type constraints. We create a
795 type constraint when we see a type test or check. If there is already a
796 constraint for the same variable and type, then we just re-use it. If there is
797 already a weaker constraint, then we generate both the weak constraints and the
798 strong constraint so that the weak constraints won't be lost even if the strong
801 We find all the distinct type constraints for each variable during the pre-pass
802 over the lambda nesting. Each constraint has a list of the weaker constraints
803 so that we can easily generate them.
805 Every block generates all the type constraints in it, but a constraint is
806 available in a successor only if it is available in all predecessors. We
807 determine the actual type constraint for a variable at a block by intersecting
808 all the available type constraints for that variable.
810 This isn't maximally tense when there are constraints that are not
811 hierarchically related, e.g. (or a b) (or b c). If these constraints were
812 available from two predecessors, then we could infer that we have an (or a b c)
813 constraint, but the above algorithm would come up with none. This probably
816 [\#\#\# Do we want to deal with (if (eq <var> '<foo>) ...) indicating singleton
819 We detect explicit type tests by looking at type test annotation in the IF
820 node. If there is a type check, the OUT sets are stored in the node, with
821 different sets for the consequent and alternative. Implicit type checks are
822 located by finding Ref nodes whose Cont has the Type-Check flag set. We don't
823 actually represent the GEN sets, we just initialize OUT to it, and then form
826 When we do the post-pass, we clear the Type-Check flags in the continuations
827 for Refs when we discover that the available constraints satisfy the asserted
828 type. Any explicit uses of typep should be cleaned up by the ICR optimizer for
829 typep. We can also set the derived type for Refs to the intersection of the
830 available type assertions. If we discover anything, we should consider redoing
831 ICR optimization, since better type information might enable more
835 \chapter{ICR finalize} % -*- Dictionary: design -*-
837 This pass looks for interesting things in the ICR so that we can forget about
838 them. Used and not defined things are flamed about.
840 We postpone these checks until now because the ICR optimizations may discover
841 errors that are not initially obvious. We also emit efficiency notes about
842 optimizations that we were unable to do. We can't emit the notes immediately,
843 since we don't know for sure whether a repeated attempt at optimization will
846 We examine all references to unknown global function variables and update the
847 approximate type accordingly. We also record the names of the unknown
848 functions so that they can be flamed about if they are never defined. Unknown
849 normal variables are flamed about on the fly during ICR conversion, so we
852 We check each newly defined global function for compatibility with previously
853 recorded type information. If there is no :defined or :declared type, then we
854 check for compatibility with any approximate function type inferred from
857 \chapter{Environment analysis}
860 A related change would be to annotate ICR with information about tail-recursion
861 relations. What we would do is add a slot to the node structure that points to
862 the corresponding Tail-Info when a node is in a TR position. This annotation
863 would be made in a final ICR pass that runs after cleanup code is generated
864 (part of environment analysis). When true, the node is in a true TR position
865 (modulo return-convention incompatibility). When we determine return
866 conventions, we null out the tail-p slots in XEP calls or known calls where we
867 decided not to preserve tail-recursion.
870 In this phase, we also check for changes in the dynamic binding environment
871 that require cleanup code to be generated. We just check for changes in the
872 Continuation-Cleanup on local control transfers. If it changes from
873 an inner dynamic context to an outer one that is in the same environment, then
874 we emit code to clean up the dynamic bindings between the old and new
875 continuation. We represent the result of cleanup detection to the back end by
876 interposing a new block containing a call to a funny function. Local exits
877 from CATCH or UNWIND-PROTECT are detected in the same way.
882 The primary activity in environment analysis is the annotation of ICR with
883 environment structures describing where variables are allocated and what values
884 the environment closes over.
886 Each lambda points to the environment where its variables are allocated, and
887 the environments point back. We always allocate the environment at the Bind
888 node for the sole non-let lambda in the environment, so there is a close
889 relationship between environments and functions. Each "real function" (i.e.
890 not a LET) has a corresponding environment.
892 We attempt to share the same environment among as many lambdas as possible so
893 that unnecessary environment manipulation is not done. During environment
894 analysis the only optimization of this sort is realizing that a Let (a lambda
895 with no Return node) cannot need its own environment, since there is no way
896 that it can return and discover that its old values have been clobbered.
898 When the function is called, values from other environments may need to be made
899 available in the function's environment. These values are said to be "closed
902 Even if a value is not referenced in a given environment, it may need to be
903 closed over in that environment so that it can be passed to a called function
904 that does reference the value. When we discover that a value must be closed
905 over by a function, we must close over the value in all the environments where
906 that function is referenced. This applies to all references, not just local
907 calls, since at other references we must have the values on hand so that we can
908 build a closure. This propagation must be applied recursively, since the value
909 must also be available in *those* functions' callers.
911 If a closure reference is known to be "safe" (not an upward funarg), then the
912 closure structure may be allocated on the stack.
914 Closure analysis deals only with closures over values, while Common Lisp
915 requires closures over variables. The difference only becomes significant when
916 variables are set. If a variable is not set, then we can freely make copies of
917 it without keeping track of where they are. When a variable is set, we must
918 maintain a single value cell, or at least the illusion thereof. We achieve
919 this by creating a heap-allocated "value cell" structure for each set variable
920 that is closed over. The pointer to this value cell is passed around as the
921 "value" corresponding to that variable. References to the variable must
922 explicitly indirect through the value cell.
924 When we are scanning over the lambdas in the component, we also check for bound
925 but not referenced variables.
927 Environment analysis emits cleanup code for local exits and markers for
930 A non-local exit is a control transfer from one environment to another. In a
931 non-local exit, we must close over the continuation that we transfer to so that
932 the exiting function can find its way back. We indicate the need to close a
933 continuation by placing the continuation structure in the closure and also
934 pushing it on a list in the environment structure for the target of the exit.
935 [\#\#\# To be safe, we would treat the continuation as a set closure variable so
936 that we could invalidate it when we leave the dynamic extent of the exit point.
937 Transferring control to a meaningless stack pointer would be apt to cause
940 Each local control transfer may require dynamic state such as special bindings
941 to be undone. We represent cleanup actions by funny function calls in a new
942 block linked in as an implicit MV-PROG1.