Division algorithm optimisation

- S
- SM
  
  Contact options for registered users
posted
19 years ago

Fri, Oct 22, 2004 12:50 PM

Hi,

I'd like to ask how I can optimise a binary division algorithm if the divisor has only half bits as compared to the dividend (ie. 128bit/64bit,

64bit/32bit, 32bit/16bit, 16bit/8bit). I remember vaguely that you divide the higher bits of the dividend and somehow use the result in the division of the dividend lower bits, but don't remember the necessary details. I'm asking this because I'm developing an integral division algorithm for an 8bit controller and as I deal only with the above cases, I'd like to speed it up a bit.

thanks

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Oct 22, 2004 5:02 PM

I do this routinely and have, recently. How many different divisions are you developing (64/32, 32/16, 16/8, 8/4, etc.), what are their fractional bits in their results (none is common for this, but it's worth asking), and what is the processor? (I can probably help suggest something, but the details are important.)

Jon

- B
- Ben Jackson
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Oct 22, 2004 5:55 PM

If this is for a PIC you might be interested in some macros I wrote which generate arbitrary division and multiplication functions:

formatting link

If you can puzzle out what it's doing it takes shortcuts for signed vs unsigned, slightly shorter (not just half) divisors etc.

--
Ben Jackson

http://www.ben.com/

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Oct 22, 2004 8:31 PM

It's quite impossible to discuss that meaningfully without knowing what operations you have available as building blocks of such an algorithm.

Do you have a 16/8 division operator? An 8/8 one? None at all? Essentially, you'ld be using the same technique that you should have been taught in primary school for dividing large numbers on paper, where the "native" operations are 1*1 decimal digit multiplication,

1+1 addition, 1-1 subtraction and 2/1 division.

And then there's the more serious hacks, like doing 16/8 division with a compile-time fixed divisor by a 16*8 multiply and a shift. E.g. integer x/9 is the same as (x*57)>>9.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- S
- SM
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Oct 22, 2004 8:34 PM

you

in

64/32, 32/16 and 16/8. I'd like to, if possible, keep it in one generic form, though.

is the

None.

If it matters, I need an unsigned version and a signed one, with the remainder having the same sign as the dividend. For now I just do the unsigned division and correct the signs later.

Probably one of the ST7 family controllers. For now I develop the code in C.

- S
- SM
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Oct 22, 2004 8:34 PM

Thanks, I'll check it out

- S
- SM
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Oct 22, 2004 8:45 PM

I program using C, so there is a 16/8 operator, but it's emulated (there's no div in the ST7 instruction set). I believe the 'pen-and-paper' technique using shifts and subtractions would be faster than using this C operator somehow for 64/32 and 32/16.

Sure, I use that, it's just that it's not optimised for eg. 16/8, it's as generic as 16/16. And AFAIR there's a technique for an optimisation in this field, not only when dividing by constants.

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Fri, Oct 22, 2004 11:37 PM

I don't use C for something like this. And it certainly defeats your earlier point, which you said was "I'd like to speed it up a bit." Are you prepared to read some materials and write this in assembly? (An optimal algorithm in written in C is NOT the same as an optimal algorithm in assembly, because the available semantics are different in each case.) Also, do you want a rounded result? (In other words, is 1/2 going to return 1 or 0?)

I also prefer to have routines that normalize both the numerator and denominator before applying the division algorithm and provide an exponent to reflect the net of the shifting process. This way, I'm guaranteed to retain maximum precision and can use this as part of a floating point routine, later. But I gather you aren't interested in that, at all.

Here's an example algorithm in MSP430 assembler, for example:

; -------------------------------------------------------------------------- ; DIV32u16uQR 32u/16u --> 16u R 16u ; -------------------------------------------------------------------------- ; ; This routine provides division of an unsigned, 32-bit dividend by an ; unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and ; an unsigned, 16-bit remainder. Essentially, this is ( R14:R15 / R13 ). ; ; No normalization takes place in this routine and no exponent is produced. ; ; Inputs: ; ; R13 unsigned 16-bit divisor ; R14 unsigned high-order 16 bits of the dividend ; R15 low-order 16 bits of the dividend ; ; Outputs: ; ; R14 unsigned 16-bit remainder ; R15 unsigned 16-bit quotient ; ; Scratches: ; ; R12 counter

Div32u16uQR mov #16, R12 ; set up the shift counter Div32u16uQR_0 rla R15 rlc R14 jc Div32u16uQR_1 sub R13, R14 jc Div32u16uQR_2 add R13, R14 dec R12 ; done? jnz Div32u16uQR_0 ; no -- continue ret ; yes -- return Div32u16uQR_1 sub R13, R14 Div32u16uQR_2 inc R15 dec R12 ; done? jnz Div32u16uQR_0 ; no -- continue ret ; yes -- return

A nice feature here is that it *does* produce the remainder. That is handy if you want to round the result, for example. Another version provides a little more speed, pairing bits into a restoring version. At the expense of some bookkeeping code.

; -------------------------------------------------------------------------- ; DIV32u16uQR 32u/16u --> 16u R 16u ; -------------------------------------------------------------------------- ; ; This routine provides division of an unsigned, 32-bit dividend by an ; unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and ; an unsigned, 16-bit remainder. ; ; No normalization takes place in this routine and no exponent is produced. ; ; Inputs: ; ; R13 unsigned 16-bit divisor ; R14 unsigned high-order 16 bits of the dividend ; R15 low-order 16 bits of the dividend ; ; Outputs: ; ; R14 unsigned 16-bit remainder ; R15 unsigned 16-bit quotient ; ; Scratches: ; ; R12 counter

Div32u16uQR mov #8, R12 ; set up the shift counter Div32u16uQR_0 rla R15 rlc R14 jc Div32u16uQR_3 sub R13, R14 jc Div32u16uQR_4 add R13, R14 ; restore rla R15 rlc R14 jnc Div32u16uQR_5 Div32u16uQR_1 sub R13, R14 Div32u16uQR_2 inc R15 dec R12 ; done? jnz Div32u16uQR_0 ; no -- continue ret ; yes -- return Div32u16uQR_3 sub R13, R14 Div32u16uQR_4 inc R15 rla R15 rlc R14 jc Div32u16uQR_1 Div32u16uQR_5 sub R13, R14 jc Div32u16uQR_2 add R13, R14 ; restore dec R12 ; done? jnz Div32u16uQR_0 ; no -- continue ret ; yes -- return

There's actually quite a lot of information out there on integer division, because this is and has been a long going concern for those designing computers and micros -- not just software people. You'll find a lot of good information in the VHDL and Verilog areas (the so called "sequential dividers" as opposed to "combinatorial dividers".)

If you cannot extract the algorithmic logic from the above and make sense of it, feel free to ask. It's not hard to see, if you lay this out on paper and work an example. I just don't want to have to write out a tutorial here, when it's likely that there are already many pages on the web on this subject that go into the gory details and I figure you can do some work, here. But once you "get it," it is almost 2nd nature and you can adapt the above to produce various fixed point versions or enhance it to support floating point (which will almost always be faster and better than the libraries you are 'given' -- a true, recurring fact that happens for reasons I cannot really fathom, unless those writing the FP libraries are more interested in just putting out boilerplate than thinking a little.)

Good luck, Jon

- J
- Jan-Hinnerk Reichert
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sat, Oct 23, 2004 8:46 AM

--------------------------------------------------------------------------

Hi Jon,

I can't see the reason for this jump. If I'm not mistaken, it is only taken if the result doesn't fit into 16-bit.

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Sun, Oct 24, 2004 4:36 AM

As long as we are bandying old assembly code, here is a thoroughly tested version of mine from 30 years ago. It runs on an 8080, z80, z180, etc. and doesn't require the added z80 registers etc. The break up provides useful routines for other operations. The head comments show net registers disturbed. It is extremely simple.

I later built a faster version around a 24/16 bit module, with the identical external appearance.

; ; 2's complement (BC), leave (B) in (A) ; A,B,C c2bc: dcx b ; " " ; 1's complement (DE), leave (B) in (A) ; A,D,E c1bc: mov a,c cma mov c,a mov a,b cma mov b,a ret ; ; Integer (unsigned pos.) divide (DEHL)/(BC)=>(DE) ; Remainder appears in (HL) ; Carry for overflow, when registers unchanged ; Divisor, remainder and quotient range 0 to 65535 ; Dividend range 0 to 4295*10^6 (approx.) ; F,D,E,H,L idiv: push psw mov a,e; Check for overflow sub c mov a,d sbb b jc idiv1; No overflow pop psw; Restore (A) stc; Mark overflow ret idiv1: push b call c2bc; Change (BC) sign xchg; Do arithmetic in (HL) mvi a,-16; Iteration count idiv2: push psw; Save iteration count dad h; Left shift (HLDE) rar; Save Carry out xchg dad h xchg jnc idiv3; No Carry into L inx h idiv3: ral; Regain Carry from H jc idiv4; Yes, generate quotient bit mov a,l add c; Test for quotient bit mov a,h adc b jnc idiv5; No bit idiv4: dad b; Subtract inx d; Insert quotient bit idiv5: pop psw; Get iteration count inr a jm idiv2; Not done pop b; Restore BC pop psw; Restore A ora a; Clear any Carry, no overflow ret

--
Chuck F (cbfalconer@yahoo.com) (cbfalconer@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
     USE worldnet address!