Complete SSE instruction definitions for x86-64
* New instruction formats:
- 2-byte instructions with GP/mem source and XMM destination.
- 1- and 2-byte instructions with XMM source and GP/mem destination.
- F3-escape instructions GP/mem source and GP destination.
- 2-byte instructions with GP/mem source and GP destination.
* Complete support for SSE instruction sets:
- SSE3
- SSSE3
- SSE4.1
- SSE4.2
* Fix definition of pblendvb, blendvps, blendvpd: These require a third operand,
implicitly in XMM0.
* PEXTRW has a new 2-byte encoding in SSE4.1 which allows a memory address as
the destination operand. The new encoding is only used when dst is a memory
address, otherwise the old backward-compatible encoding is used.
* Fix 64-bit popcnt (F3 still comes REX.W), and make it check for operand sizes,
like the new CRC32.
* Slightly adapted from Jonathan Armond to work with Douglas Katzman's F3-specific
r, r/m instruction format.