Micro-optimization: Avoid byte register writes on x86-64 in LOAD-TYPE.
The optimization guide for AMD's x86-64 processors recommends not
to write a partial register but instead to use MOVZX to write the
corresponding 32/64-bit register. Otherwise the instruction would have
an unnecessary dependency on the most recent write to the register,
reducing the available instruction level parallelism. On Intel's
processors this is not necessary but doesn't hurt.
To follow this recommendation, modify LOAD-TYPE to use MOVZX instead of
a byte MOV and adapt the VOPs that call it: FUN-SUBTYPE, SET-FDEFN-FUN,
and WIDETAG-OF. This additionally spares a temporary register in
FUN-SUBTYPE and allows to shorten all paths through WIDETAG-OF by one
instruction.
The effect on code size is small and mixed.