Oh. I guess I didn't understand. I'm not sure I do, yet.
You have a 1ms timer event. You need to go from a starting DAC value to an ending DAC value over X milliseconds, where X is not a constant. Yes? And that each millisecond, you post a new DAC value to the DAC, on your way there. And then I gathered that the method you were using before is "bresenham-like." So in that case, your division would only be required exactly once, not each millisecond step. Now, I thought I understood this to be a pain because even that once, taking place between two millisecond events, ate up 1/2 your time and you couldn't afford even that expense even if it was only once.
Now, let's say you want to proceed from 200 to 3000 in 10ms using the brute force modification (which I'm surprised you needed exposed to you in this way, since it was a straight forward change) to Bresenham. [I'm assuming that sometimes you might want to transition in 10ms, this way, or at least that it isn't out of the question.] So now, your code decides to proceed to calculate the next DAC value from the starting DAC value. This takes 140 iterations of the loop to get to this first value, roughly. (Going to the second value will take about twice that, or 280 iterations of the loop.) Are you telling me that this takes 30us?
If so, that's great. But I'm just a little curious.
670us on a 18MHz RD2 '51 running in X2 mode means 670*3 = 2010 instruction cycles. That seems a little excessive.
Actually, it's rather standard to avoid all divisions. This is done by dividing up the unit-circle into 8 octants and folding. That way, the slope is always less that one and the dependent variable advances at a rate > might want to transition in 10ms, this way, or at least that it
I think it's the 'for()' looping on the mostly false 'if (x!=xprec)' that is doing it. The while() should either execute zero or one time.
Anyway, you should look more at your division routine. Consider writing your own.
This strikes me as an odd statement, since the whole point of Bresenham's algorithm (which comes (approximately) from an era when LSI meant having a whole four bit counter in one package) is that it avoids division, ie if it uses division, it's not Bresenham's
(and this thread is not the only place I've noticed this anomaly)
what is it evolving language? (I suppose you think a byte is 8 bits ;)
And as an aside/question, isn't there a term applied to these stepwise ratiometric integer calculations (other than Bresenham style or what ever)? (there is but it escapes me)
I've never used the 8051, but have synthesized division using multiplication. Can you specify number of bits for divisor and dividend for me?
Is there a multiply operation that will handle divisor * quotient as a single operation? What are timings for multiplication operations? What are maximum multiplier/multiplicand/product sizes?
Is there a hard time requirement for each division or is it necessary only to achive a sufficiently small average?
In assembler, of course. That is, if you want to keep your old algorithm.
Do you know how it's done? Assume that your 32-bit numerator is placed in two 16-bit words labeled 'a' for the high word and 'q' for the low word. Also, assume that 'd' is your 16-bit divisor.
I'm going to talk about this as if you were writing in assembly. Some of the semantics simply aren't present in C.
First off, you need to check for overflow. This is done this way, upon entering your division routine:
if (a >= d) Overflow ();
This is an unsigned comparison, by the way.
If the upper half of your 32-bit value is at or above your divisor, your quotient will exceed your ability to store into 16 bits. So that test quickly eliminates this consideration. It also pretty much excludes 'd' being 0, because with an unsigned comparison, if d=0 then you are sure that 'a' must be at least = to 'd', if not greater.
So, now that that is out of the way, you need to perform some subtractions. But since different processors handle their carry in different ways for subtraction (for example, the MSP430 sets or clears the carry in the opposite way that the x86 processor does for subtractions and comparisons), it's handy to set up a value which is the two's complement of 'd':
nd= 1 + ~d;
Now, in assembly on some processors, this might just be "neg." In C, with unsigned ints, the exact way is to complement and then increment, as shown above.
The LeftShift() function shifts the two parameter values, assuming that the first is the high-order 16 bits and the second is the low-order, shifting in a zero into the least significant bit of the second parameter, and shifting out the most significant bit of the first parameter into the return value.
The Add() function adds the first two values together, giving the third and returning a '1' if a carry occurred.
...and then the end of loop:
}
That's about it.
In assembly, it's very easy to capture the carry out from the left shift operation. In C, not so easy. Also, in assembly, it is very easy to capture the carry out from the Add() operation of (a+nd)-->r.
I simply cannot imagine why that is taking 670us on your CPU. At 670us, this is about 42us per loop or 126 instruction cycles.
I count 38 cycles per loop, just doing a quick hand coding of the main loop shown above, but in assembly code. That means 608 cycles in the body and the loop conditional. There is some set up, I suppose. But you could inline this code to avoid the worst of it. There's 3 cycles to init the loop counter. I'm sure there are some other cycles you could toss in there. But it should come out to about 210us per division operation, with your 18MHz X2-type RD2 '51 device.
I jump in late, so maybe re-telling something one of the 40 posts before did, but
If # of milliseconds is not growing endlessly, why not have a table of 1/(# of ms) ? =>
step = abs(DACstart-DACend)*reztab[# of milliseconds]
Getting this table is kinda tricky as you must ignore overflows. But you can start and see what your compiler (gcc e.g. often uses reciprokal) gives you for certain values.
--
42Bastian
Do not email to bastian42@yahoo.com, it's a spam-only account :-)
Zapped it into a simulator tonight, shaved another cycle, and just tested the 32x16 divide routine using the algorithm I posted here in C. Applied a variety of inputs and it always beats 150us on your 18MHz system and always produces the right results. Typically runs in about 144us. Could be faster with a-priori information on your inputs, too.
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.