use LEA Y, [X+X] instead of LEA Y, [X*2] where appropriate on x86-64