Hey Walter (et all, if you're out there):
With the gnu tools, optimizations on, and an Arm Cortex M3, this goes a _lot_ faster when you precede it with
#define ASSEMBLY_WORKS
than when you don't.
Yet you say that an optimizer should eat up the C code and spit out assembly that's better than I can do.
How come the difference? Is it the tools? I know it's not because it's the World's Best ARM Assembly, because I've learned a bit since I did it and could probably speed it up -- or at least make it cleaner.
CFractional CFractional::operator + (CFractional y) const { #ifdef ASSEMBLY_WORKS int32_t a = _x; int32_t b = y._x; asm ( "adds %[a], %[b]\n" // subtract "bvc .sat_add_vc\n" // check for overflow "ite mi\n" "ldrmi %[a], .sat_add_maxpos\n" // set to max positive "ldrpl %[a], .sat_add_maxneg\n" // set to max negative "b .sat_add_ret\n" ".sat_add_maxpos: .word 0x7fffffff\n" ".sat_add_maxneg: .word 0x80000001\n" ".sat_add_forbid: .word 0x80000000\n" ".sat_add_vc:\n" "bpl .sat_add_ret\n" "ldr %[b], .sat_add_forbid\n" "cmp %[a], %[b]\n" "it eq\n" "moveq %[a], %[b]\n" ".sat_add_ret:\n" : [a] "=r" (a), [b] "=r" (b) : "[a]" "r" (a), "[b]" "r" (b));
return CFractional(a); #else int32_t retval = _x + y._x;
// Check for underflow and saturate if so if (_x < 0 && y._x < 0 && (retval >= 0 || retval < -INT32_MAX)) { retval = -INT32_MAX; }
// check for overflow and saturate if so if (_x > 0 && y._x > 0 && retval