Re: Turn a divide into a multiply

Jim Granville · 2003-07-29T22:29:12+00:00

What are the DAC resolutions - 8 bits in the PCA, or more external ?I can see the above eqn would have rounding errors, but if you tuned both the step size, and the 1ms time, a better slope-fit would result.Work out your numeric limits, and what the tolerances are, and you geta handle on the number of bits of precision needed.The 8051 gives 16 bit results in both MUL and DIV[REM], so you mightget enough from that with some carefull scaling. Divide routines will give a remainder, but it is not always preserved.I recall we changed our Mod51 libraries to keep the remainder intact,usefull for formatted writes - allowed N MOD 10 and N DIV 10 in asingle call.-jg

J

Jonathan Kirwan 22 years ago

Oh. I guess I didn't understand. I'm not sure I do, yet.

You have a 1ms timer event. You need to go from a starting DAC value to an ending DAC value over X milliseconds, where X is not a constant. Yes? And that each millisecond, you post a new DAC value to the DAC, on your way there. And then I gathered that the method you were using before is "bresenham-like." So in that case, your division would only be required exactly once, not each millisecond step. Now, I thought I understood this to be a pain because even that once, taking place between two millisecond events, ate up 1/2 your time and you couldn't afford even that expense even if it was only once.

Now, let's say you want to proceed from 200 to 3000 in 10ms using the brute force modification (which I'm surprised you needed exposed to you in this way, since it was a straight forward change) to Bresenham. [I'm assuming that sometimes you might want to transition in 10ms, this way, or at least that it isn't out of the question.] So now, your code decides to proceed to calculate the next DAC value from the starting DAC value. This takes 140 iterations of the loop to get to this first value, roughly. (Going to the second value will take about twice that, or 280 iterations of the loop.) Are you telling me that this takes 30us?

If so, that's great. But I'm just a little curious.

Jon

Vote

J

Jonathan Kirwan 22 years ago

Ah hah!! Just as I suspected. Okay. All is clear, now.

Jon

Vote

J

Jonathan Kirwan 22 years ago

670us on a 18MHz RD2 '51 running in X2 mode means 670*3 = 2010 instruction cycles. That seems a little excessive.

Actually, it's rather standard to avoid all divisions. This is done by dividing up the unit-circle into 8 octants and folding. That way, the slope is always less that one and the dependent variable advances at a rate > might want to transition in 10ms, this way, or at least that it

I think it's the 'for()' looping on the mostly false 'if (x!=xprec)' that is doing it. The while() should either execute zero or one time.

Anyway, you should look more at your division routine. Consider writing your own.

Jon

Vote

B

bogax 22 years ago

This strikes me as an odd statement, since the whole point of Bresenham's algorithm (which comes (approximately) from an era when LSI meant having a whole four bit counter in one package) is that it avoids division, ie if it uses division, it's not Bresenham's

(and this thread is not the only place I've noticed this anomaly)

what is it evolving language? (I suppose you think a byte is 8 bits ;)

And as an aside/question, isn't there a term applied to these stepwise ratiometric integer calculations (other than Bresenham style or what ever)? (there is but it escapes me)

bogax

Vote

M

Morris Dovey 22 years ago

Neil...

I've never used the 8051, but have synthesized division using multiplication. Can you specify number of bits for divisor and dividend for me?

Is there a multiply operation that will handle divisor * quotient as a single operation? What are timings for multiplication operations? What are maximum multiplier/multiplicand/product sizes?

Is there a hard time requirement for each division or is it necessary only to achive a sufficiently small average?

Morris Dovey West Des Moines, Iowa USA C links at http://www.iedu.com/c

Vote

J

Jonathan Kirwan 22 years ago

Digital differential analysis and recurrences.

Jon

Vote

N

nospam 22 years ago

The OP doesn't know the difference between Bresenham and incremental algorithms. I told him he needed to about 20 posts ago in this subthread.

Vote

J

Jonathan Kirwan 22 years ago

In assembler, of course. That is, if you want to keep your old algorithm.

Do you know how it's done? Assume that your 32-bit numerator is placed in two 16-bit words labeled 'a' for the high word and 'q' for the low word. Also, assume that 'd' is your 16-bit divisor.

I'm going to talk about this as if you were writing in assembly. Some of the semantics simply aren't present in C.

First off, you need to check for overflow. This is done this way, upon entering your division routine:

if (a >= d) Overflow ();

This is an unsigned comparison, by the way.

If the upper half of your 32-bit value is at or above your divisor, your quotient will exceed your ability to store into 16 bits. So that test quickly eliminates this consideration. It also pretty much excludes 'd' being 0, because with an unsigned comparison, if d=0 then you are sure that 'a' must be at least = to 'd', if not greater.

So, now that that is out of the way, you need to perform some subtractions. But since different processors handle their carry in different ways for subtraction (for example, the MSP430 sets or clears the carry in the opposite way that the x86 processor does for subtractions and comparisons), it's handy to set up a value which is the two's complement of 'd':

nd= 1 + ~d;

Now, in assembly on some processors, this might just be "neg." In C, with unsigned ints, the exact way is to complement and then increment, as shown above.

Now, loop just 16 times:

for (i= 0; i < 16; ++i) {

and the body where the good stuff gets done,

carry= LeftShift (&a, &q); if (!carry) carry= Add (a, nd, &r); else Add (a, nd, &r); if (carry) { ++q; a= r; }

Or, the above could be re-written:

if (LeftShift (&a, &q)) Add (a, nd, &r); ++q; a= r; } else if (Add (a, nd, &r)) { ++q; a= r; }

The LeftShift() function shifts the two parameter values, assuming that the first is the high-order 16 bits and the second is the low-order, shifting in a zero into the least significant bit of the second parameter, and shifting out the most significant bit of the first parameter into the return value.

The Add() function adds the first two values together, giving the third and returning a '1' if a carry occurred.

...and then the end of loop:

}

That's about it.

In assembly, it's very easy to capture the carry out from the left shift operation. In C, not so easy. Also, in assembly, it is very easy to capture the carry out from the Add() operation of (a+nd)-->r.

I simply cannot imagine why that is taking 670us on your CPU. At 670us, this is about 42us per loop or 126 instruction cycles.

I count 38 cycles per loop, just doing a quick hand coding of the main loop shown above, but in assembly code. That means 608 cycles in the body and the loop conditional. There is some set up, I suppose. But you could inline this code to avoid the worst of it. There's 3 cycles to init the loop counter. I'm sure there are some other cycles you could toss in there. But it should come out to about 210us per division operation, with your 18MHz X2-type RD2 '51 device.

I'll post my code example if you want it.

Jon

Vote

J

Jonathan Kirwan 22 years ago

Sorry, that should be 33 cycles per loop, not 38. So this means about 185us per 32x16 division operation, or thereabouts.

Jon

Vote

4

42Bastian Schick 22 years ago

I jump in late, so maybe re-telling something one of the 40 posts before did, but

If # of milliseconds is not growing endlessly, why not have a table of 1/(# of ms) ? =>

step = abs(DACstart-DACend)*reztab[# of milliseconds]

Getting this table is kinda tricky as you must ignore overflows. But you can start and see what your compiler (gcc e.g. often uses reciprokal) gives you for certain values.

42Bastian Do not email to bastian42@yahoo.com, it's a spam-only account :-) Use @epost.de instead !

Vote

J

Jonathan Kirwan 22 years ago

Zapped it into a simulator tonight, shaved another cycle, and just tested the 32x16 divide routine using the algorithm I posted here in C. Applied a variety of inputs and it always beats 150us on your 18MHz system and always produces the right results. Typically runs in about 144us. Could be faster with a-priori information on your inputs, too.

Jon

Vote

Re: Turn a divide into a multiply

Join the Discussion

Didn't find your answer?