doc/cmucl/internals/interpreter.tex

   1 %                                       -*- Dictionary: design; Package: C -*-
   2
   3 May be worth having a byte-code representation for interpreted code.  This way,
   4 an entire system could be compiled into byte-code for debugging (the
   5 "check-out" compiler?).
   6
   7 Given our current inclination for using a stack machine to interpret IR1, it
   8 would be straightforward to layer a byte-code interpreter on top of this.
   9
  10
  11 Interpreter:
  12
  13 Instead of having no interpreter, or a more-or-less conventional interpreter,
  14 or byte-code interpreter, how about directly executing IR1?
  15
  16 We run through the IR1 passes, possibly skipping optional ones, until we get
  17 through environment analysis.  Then we run a post-pass that annotates IR1 with
  18 information about where values are kept, i.e. the stack slot.
  19
  20 We can lazily convert functions by having FUNCTION make an interpreted function
  21 object that holds the code (really a closure over the interpreter).  The first
  22 time that we try to call the function, we do the conversion and processing.
  23 Also, we can easily keep track of which interpreted functions we have expanded
  24 macros in, so that macro redefinition automatically invalidates the old
  25 expansion, causing lazy reconversion.
  26
  27 Probably the interpreter will want to represent MVs by a recognizable structure
  28 that is always heap-allocated.  This way, we can punt the stack issues involved
  29 in trying to spread MVs.  So a continuation value can always be kept in a
  30 single cell.
  31
  32 The compiler can have some special frobs for making the interpreter efficient,
  33 such as a call operation that extracts arguments from the stack
  34 slots designated by a continuation list.  Perhaps
  35     (values-mapcar fun . lists)
  36 <==>
  37     (values-list (mapcar fun . lists))
  38 This would be used with MV-CALL.
  39
  40
  41 This scheme seems to provide nearly all of the advantages of both the compiler
  42 and conventional interpretation.  The only significant disadvantage with
  43 respect to a conventional interpreter is that there is the one-time overhead of
  44 conversion, but doing this lazily should make this quite acceptable.
  45
  46 With respect to a conventional interpreter, we have major advantages:
  47  + Full syntax checking: safety comparable to compiled code.
  48  + Semantics similar to compiled code due to code sharing.  Similar diagnostic
  49    messages, etc.  Reduction of error-prone code duplication.
  50  + Potential for full type checking according to declarations (would require
  51    running IR1 optimize?)
  52  + Simplifies debugger interface, since interpreted code can look more like
  53    compiled code: source paths, edit definition, etc.
  54
  55 For all non-run-time symbol annotations (anything other than SYMBOL-FUNCTION
  56 and SYMBOL-VALUE), we use the compiler's global database.  MACRO-FUNCTION will
  57 use INFO, rather than vice-versa.
  58
  59 When doing the IR1 phases for the interpreter, we probably want to suppress
  60 optimizations that change user-visible function calls:
  61  -- Don't do local call conversion of any named functions (even lexical ones).
  62     This is so that a call will appear on the stack that looks like the call in
  63     the original source.  The keyword and optional argument transformations
  64     done by local call mangle things quite a bit.  Also, note local-call
  65     converting prevents unreferenced arguments from being deleted, which is
  66     another non-obvious transformation.
  67  -- Don't run source-transforms, IR1 transforms and IR1 optimizers.  This way,
  68     TRACE and BACKTRACE will show calls with the original arguments, rather
  69     than the "optimized" form, etc.  Also, for the interpreter it will
  70     actually be faster to call the original function (which is compiled) than
  71     to "inline expand" it.  Also, this allows implementation-dependent
  72     transforms to expand into %PRIMITIVE uses.
  73
  74 There are some problems with stepping, due to our non-syntactic IR1
  75 representation.  The source path information is the key that makes this
  76 conceivable.  We can skip over the stepping of a subform by quietly evaluating
  77 nodes whose source path lies within the form being skipped.
  78
  79 One problem with determining what value has been returned by a form.  With a
  80 function call, it is theoretically possible to precisely determine this, since
  81 if we complete evaluation of the arguments, then we arrive at the Combination
  82 node whose value is synonymous with the value of the form.  We can even detect
  83 this case, since the Node-Source will be EQ to the form.  And we can also
  84 detect when we unwind out of the evaluation, since we will leave the form
  85 without having ever reached this node.
  86
  87 But with macros and special-forms, there is no node whose value is the value of
  88 the form, and no node whose source is the macro call or special form.  We can
  89 still detect when we leave the form, but we can't be sure whether this was a
  90 normal evaluation result or an explicit RETURN-FROM.
  91
  92 But does this really matter?  It seems that we can print the value returned (if
  93 any), then just print the next form to step.  In the rare case where we did
  94 unwind, the user should be able to figure it out.
  95
  96 [We can look at this as a side-effect of CPS: there isn't any difference
  97 between a "normal" return and a non-local one.]
  98
  99 [Note that in any control transfer (normal or otherwise), the stepper may need
 100 to unwind out of an arbitrary number of levels of stepping.  This is because a
 101 form in a TR position may yield its to a node arbitrarily far our.]
 102
 103 Another problem is with deciding what form is being stepped.  When we start
 104 evaluating a node, we dive into code that is nested somewhere down inside that
 105 form.  So we actually have to do a loop of asking questions before we do any
 106 evaluation.  But what do we ask about?
 107
 108 If we ask about the outermost enclosing form that is a subform of the the last
 109 form that the user said to execute, then we might offer a form that isn't
 110 really evaluated, such as a LET binding list.
 111
 112 But once again, is this really a problem?  It is certainly different from a
 113 conventional stepper, but a pretty good argument could be made that it is
 114 superior.  Haven't you ever wanted to skip the evaluation of all the
 115 LET bindings, but not the body?  Wouldn't it be useful to be able to skip the
 116 DO step forms?
 117
 118 All of this assumes that nobody ever wants to step through the guts of a
 119 macroexpansion.  This seems reasonable, since steppers are for weenies, and
 120 weenies don't define macros (hence don't debug them).  But there are probably
 121 some weenies who don't know that they shouldn't be writing macros.
 122
 123 We could handle this by finding the "source paths" in the expansion of each
 124 macro by sticking some special frob in the source path marking the place where
 125 the expansion happened.  When we hit code again that is in the source, then we
 126 revert to the normal source path.  Something along these lines might be a good
 127 idea anyway (for compiler error messages, for example).
 128
 129 The source path hack isn't guaranteed to work quite so well in generated code,
 130 though, since macros return stuff that isn't freshly consed.  But we could
 131 probably arrange to win as long as any given expansion doesn't return two EQ
 132 forms.
 133
 134 It might be nice to have a command that skipped stepping of the form, but
 135 printed the results of each outermost enclosed evaluated subform, i.e. if you
 136 used this on the DO step-list, it would print the result of each new-value
 137 form.  I think this is implementable.  I guess what you would do is print each
 138 value delivered to a DEST whose source form is the current or an enclosing
 139 form.  Along with the value, you would print the source form for the node that
 140 is computing the value.
 141
 142 The stepper can also have a "back" command that "unskips" or "unsteps".  This
 143 would allow the evaluation of forms that are pure (modulo lexical variable
 144 setting) to be undone.  This is useful, since in stepping it is common that you
 145 skip a form that you shouldn't have, or get confused and want to restart at
 146 some earlier point.
 147
 148 What we would do is remember the current node and the values of all local
 149 variables.  heap before doing each step or skip action.  We can then back up
 150 the state of all lexical variables and the "program counter".  To make this
 151 work right with set closure variables, we would copy the cell's value, rather
 152 than the value cell itself.
 153
 154 [To be fair, note that this could easily be done with our current interpreter:
 155 the stepper could copy the environment alists.]
 156
 157 We can't back up the "program counter" when a control transfer leaves the
 158 current function, since this state is implicitly represented in the
 159 interpreter's state, and is discarded when we exit.  We probably want to ask
 160 for confirmation before leaving the function to give users a chance to "unskip"
 161 the forms in a TR position.
 162
 163 Another question is whether the conventional stepper is really a good thing to
 164 imitate...  How about an editor-based mouse-driven interface?  Instead of
 165 "skipping" and "stepping", you would just designate the next form that you
 166 wanted to stop at.  Instead of displaying return values, you replace the source
 167 text with the printed representation of the value.
 168
 169 It would show the "program counter" by highlighting the *innermost* form that
 170 we are about to evaluate, i.e. the source form for the node that we are stopped
 171 at.  It would probably also be useful to display the start of the form that was
 172 used to designate the next stopping point, although I guess this could be
 173 implied by the mouse position.
 174
 175
 176 Such an interface would be a little harder to implement than a dumb stepper,
 177 but it would be much easier to use.  [It would be impossible for an evalhook
 178 stepper to do this.]
 179
 180
 181 %PRIMITIVE usage:
 182
 183 Note: %PRIMITIVE can only be used in compiled code.  It is a trapdoor into the
 184 compiler, not a general syntax for accessing "sub-primitives".  It's main use
 185 is in implementation-dependent compiler transforms.  It saves us the effort of
 186 defining a "phony function" (that is not really defined), and also allows
 187 direct communication with the code generator through codegen-info arguments.
 188
 189 Some primitives may be exported from the VM so that %PRIMITIVE can be used to
 190 make it explicit that an escape routine or interpreter stub is assuming an
 191 operation is implemented by the compiler.