Improve basic block ordering for some loops.
Currently the compiler rotates most loops to minimize branching costs,
but for some loops this actually has the opposite effect (i.e., larger
and slower code being generated), namely for those that already have the
termination test at the end.
So, inhibit loop rotation if such a loop is detected. The decision is
based upon whether the loop's back edge starts with a conditional branch
(don't rotate) or an unconditional branch (do rotate). This heuristic is
correct in simple cases but may err, e.g. if a loop has multiple back
edges. As we didn't try before and don't try now to analyse complex
loops to aid the decision whether to rotate or not, or even how far,
hopefully any differences introduced here will cancel on average with
respect to their impact on performance and code size.
(Loops consisting of only one basic block -- which necessarily have the
termination test at the end -- were (obviously) treated correctly
already. But even small loops can easily consist of multiple blocks, for
example if the loop body contains code from inlining a function call.)