8051 Speed Optimization

- A
- alexmchale
  
  Contact options for registered users
posted
18 years ago

Wed, Sep 28, 2005 3:17 PM

I'm working on optimizing code for a C8051F120 that needs to run at an extremely fast clip. The following few lines are repeated over, as fast as possible. The looping is a straightforward djnz. The bulk of the cpu time is spent on the following lines. This is repeated (cut and paste, but with the binary value changed) 8 times per loop.

; for reference rLEVEL equ R0 mDATAOUT DATA 64

; the code row0: movx A, @DPTR inc DPTR subb A, rLEVEL jc row1 orl mDATAOUT, #00000001b row1:

At row1 the next set of those 5 lines executes. Essentially what this is doing is taking the data byte at @DPTR, comparing it to the current value of rLEVEL, and setting a bit in mDATAOUT if it is greater. It does this for the sequential bytes at DPTR, but for the range of the bitfield (#00000001b to #10000000b).

Can anyone see a way to optimize out some cycles from this process? For reference, this is a chip doing video output. Even single cycle optimizations can be big, at the rate that this block is being iterated over.

Thanks to anyone that can help.

Alex McHale

- F
- Frieder Ferlemann
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Sep 28, 2005 3:35 PM

how about having mDATAOUT in bit addressable area and then doing:

subb A,rLEVEL mov C,mDATAOUT.0 row1:

Note, the subb you are using depends on the state of Carry, so your present code has some LSB strangeness:^) (you eventually could use an add instruction and then complement mDATAOUT after you processed all 8 bits)

Greetings,

Frieder

- F
- Frieder Ferlemann
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Wed, Sep 28, 2005 3:37 PM

Ooops, I meant to write: mov mDATAOUT.0,C

- T
- Tilmann Reh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Sep 29, 2005 6:12 AM

snipped-for-privacy@gmail.com schrieb:

You could also use CJNE for the comparison - it also sets carry, but is non-destructive, so you don't need to reload A from XRAM for each test:

movx a,@dptr cjne a,ar0,$+3 mov mDataOut.0,c cjne a,... mov ...,c

As like Frieder's approach, this one also resets the output bit in case the data value is lower than the treshold. If you want latching operation, you need to use conditional jumps again or consider the previous state:

movx a,@dptr cjne a,ar0,$+3 orl c,mDataOut.0 mov mDataOut.0,c cjne a,... orl c,... mov ...,c

Then it's only slightly faster than your original code, however the time is fixed (independent of the data), which might also be an advantage.

--
Dipl.-Ing. Tilmann Reh
http://www.autometer.de - Elektronik nach Maß.

- T
- Tilmann Reh
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Thu, Sep 29, 2005 6:27 AM

Tilmann Reh schrieb:

I forgot to mention: The polarity of the output bits is opposed to that of your code. You'd have to negate the data before comparison or CPL the output bits afterwards (if you can't switch the polarity otherwise).

--
Dipl.-Ing. Tilmann Reh
http://www.autometer.de - Elektronik nach Maß.