I'm working on optimizing code for a C8051F120 that needs to run at an extremely fast clip. The following few lines are repeated over, as fast as possible. The looping is a straightforward djnz. The bulk of the cpu time is spent on the following lines. This is repeated (cut and paste, but with the binary value changed) 8 times per loop.
; for reference rLEVEL equ R0 mDATAOUT DATA 64
; the code row0: movx A, @DPTR inc DPTR subb A, rLEVEL jc row1 orl mDATAOUT, #00000001b row1:
At row1 the next set of those 5 lines executes. Essentially what this is doing is taking the data byte at @DPTR, comparing it to the current value of rLEVEL, and setting a bit in mDATAOUT if it is greater. It does this for the sequential bytes at DPTR, but for the range of the bitfield (#00000001b to #10000000b).
Can anyone see a way to optimize out some cycles from this process? For reference, this is a chip doing video output. Even single cycle optimizations can be big, at the rate that this block is being iterated over.
Thanks to anyone that can help.
Alex McHale