# Division algorithm optimisation

#### Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

•  Subject
• Author
• Posted on
Hi,

I'd like to ask how I can optimise a binary division algorithm if the
divisor has only half bits as compared to the dividend (ie. 128bit/64bit,
64bit/32bit, 32bit/16bit, 16bit/8bit). I remember vaguely that you divide
the higher bits of the dividend and somehow use the result in the division
of the dividend lower bits, but don't remember the necessary details. I'm
asking this because I'm developing an integral division algorithm for an
8bit controller and as I deal only with the above cases, I'd like to speed
it up a bit.

thanks

Re: Division algorithm optimisation

I do this routinely and have, recently.  How many different divisions are you
developing (64/32, 32/16, 16/8, 8/4, etc.), what are their fractional bits in
their results (none is common for this, but it's worth asking), and what is the
processor?  (I can probably help suggest something, but the details are
important.)

Jon

Re: Division algorithm optimisation

you
in

64/32, 32/16 and 16/8. I'd like to, if possible, keep it in one generic
form, though.

is the

None.

If it matters, I need an unsigned version and a signed one, with the
remainder having the same sign as the dividend. For now I just do the
unsigned division and correct the signs later.

Probably one of the ST7 family controllers. For now I develop the code in C.

Re: Division algorithm optimisation

I don't use C for something like this.  And it certainly defeats your earlier
point, which you said was "I'd like to speed it up a bit."  Are you prepared to
read some materials and write this in assembly?  (An optimal algorithm in
written in C is NOT the same as an optimal algorithm in assembly, because the
available semantics are different in each case.)  Also, do you want a rounded
result?  (In other words, is 1/2 going to return 1 or 0?)

I also prefer to have routines that normalize both the numerator and denominator
before applying the division algorithm and provide an exponent to reflect the
net of the shifting process.  This way, I'm guaranteed to retain maximum
precision and can use this as part of a floating point routine, later.  But I
gather you aren't interested in that, at all.

Here's an example algorithm in MSP430 assembler, for example:

;   --------------------------------------------------------------------------
;   DIV32u16uQR     32u/16u --> 16u R 16u
;   --------------------------------------------------------------------------
;
;   This routine provides division of an unsigned, 32-bit dividend by an
;   unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and
;   an unsigned, 16-bit remainder.  Essentially, this is ( R14:R15 / R13 ).
;
;   No normalization takes place in this routine and no exponent is produced.
;
;   Inputs:
;
;       R13     unsigned 16-bit divisor
;       R14     unsigned high-order 16 bits of the dividend
;       R15     low-order 16 bits of the dividend
;
;   Outputs:
;
;       R14     unsigned 16-bit remainder
;       R15     unsigned 16-bit quotient
;
;   Scratches:
;
;       R12     counter

Div32u16uQR         mov     #16, R12        ; set up the shift counter
Div32u16uQR_0       rla     R15
rlc     R14
jc      Div32u16uQR_1
sub     R13, R14
jc      Div32u16uQR_2
dec     R12             ; done?
jnz     Div32u16uQR_0   ; no -- continue
ret                     ; yes -- return
Div32u16uQR_1       sub     R13, R14
Div32u16uQR_2       inc     R15
dec     R12             ; done?
jnz     Div32u16uQR_0   ; no -- continue
ret                     ; yes -- return

A nice feature here is that it *does* produce the remainder.  That is handy if
you want to round the result, for example.  Another version provides a little
more speed, pairing bits into a restoring version.  At the expense of some
bookkeeping code.

;   --------------------------------------------------------------------------
;   DIV32u16uQR     32u/16u --> 16u R 16u
;   --------------------------------------------------------------------------
;
;   This routine provides division of an unsigned, 32-bit dividend by an
;   unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and
;   an unsigned, 16-bit remainder.
;
;   No normalization takes place in this routine and no exponent is produced.
;
;   Inputs:
;
;       R13     unsigned 16-bit divisor
;       R14     unsigned high-order 16 bits of the dividend
;       R15     low-order 16 bits of the dividend
;
;   Outputs:
;
;       R14     unsigned 16-bit remainder
;       R15     unsigned 16-bit quotient
;
;   Scratches:
;
;       R12     counter

Div32u16uQR         mov     #8, R12         ; set up the shift counter
Div32u16uQR_0       rla     R15
rlc     R14
jc      Div32u16uQR_3
sub     R13, R14
jc      Div32u16uQR_4
rla     R15
rlc     R14
jnc     Div32u16uQR_5
Div32u16uQR_1       sub     R13, R14
Div32u16uQR_2       inc     R15
dec     R12             ; done?
jnz     Div32u16uQR_0   ; no -- continue
ret                     ; yes -- return
Div32u16uQR_3       sub     R13, R14
Div32u16uQR_4       inc     R15
rla     R15
rlc     R14
jc      Div32u16uQR_1
Div32u16uQR_5       sub     R13, R14
jc      Div32u16uQR_2
dec     R12             ; done?
jnz     Div32u16uQR_0   ; no -- continue
ret                     ; yes -- return

There's actually quite a lot of information out there on integer division,
because this is and has been a long going concern for those designing computers
and micros -- not just software people.  You'll find a lot of good information
in the VHDL and Verilog areas (the so called "sequential dividers" as opposed to
"combinatorial dividers".)

If you cannot extract the algorithmic logic from the above and make sense of it,
feel free to ask.  It's not hard to see, if you lay this out on paper and work
an example.  I just don't want to have to write out a tutorial here, when it's
likely that there are already many pages on the web on this subject that go into
the gory details and I figure you can do some work, here.  But once you "get
it," it is almost 2nd nature and you can adapt the above to produce various
fixed point versions or enhance it to support floating point (which will almost
always be faster and better than the libraries you are 'given' -- a true,
recurring fact that happens for reasons I cannot really fathom, unless those
writing the FP libraries are more interested in just putting out boilerplate
than thinking a little.)

Good luck,
Jon

Re: Division algorithm optimisation

--------------------------------------------------------------------------

Hi Jon,

I can't see the reason for this jump. If I'm not mistaken, it is only
taken if the result doesn't fit into 16-bit.

Re: Division algorithm optimisation

If this is for a PIC you might be interested in some macros I wrote which
generate arbitrary division and multiplication functions:

http://ben.com/vcc/18fdivmul.zip

If you can puzzle out what it's doing it takes shortcuts for signed vs
unsigned, slightly shorter (not just half) divisors etc.

--
Ben Jackson
We've slightly trimmed the long signature. Click to see the full one.
Re: Division algorithm optimisation

Thanks, I'll check it out

Re: Division algorithm optimisation

It's quite impossible to discuss that meaningfully without knowing
what operations you have available as building blocks of such an
algorithm.

Do you have a 16/8 division operator?  An 8/8 one?  None at all?
Essentially, you'ld be using the same technique that you should have
been taught in primary school for dividing large numbers on paper,
where the "native" operations are 1*1 decimal digit multiplication,
1+1 addition, 1-1 subtraction and 2/1 division.

And then there's the more serious hacks, like doing 16/8 division with
a compile-time fixed divisor by a 16*8 multiply and a shift.  E.g.
integer x/9 is the same as (x*57)>>9.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: Division algorithm optimisation

I program using C, so there is a 16/8 operator, but it's emulated (there's
no div in the ST7 instruction set). I believe the 'pen-and-paper' technique
using shifts and subtractions would be faster than using this C operator
somehow for 64/32 and 32/16.

Sure, I use that, it's just that it's not optimised for eg. 16/8, it's as
generic as 16/16. And AFAIR there's a technique for an optimisation in this
field, not only when dividing by constants.

Re: Division algorithm optimisation

As long as we are bandying old assembly code, here is a thoroughly
tested version of mine from 30 years ago.  It runs on an 8080, z80,
z180, etc. and doesn't require the added z80 registers etc.  The
break up provides useful routines for other operations.  The head
comments show net registers disturbed.  It is extremely simple.

I later built a faster version around a 24/16 bit module, with the
identical external appearance.

;
; 2's complement (BC), leave (B) in (A)
; A,B,C
c2bc:   dcx     b
;       "       "
; 1's complement (DE), leave (B) in (A)
; A,D,E
c1bc:   mov     a,c
cma
mov     c,a
mov     a,b
cma
mov     b,a
ret
;
; Integer (unsigned pos.) divide (DEHL)/(BC)=>(DE)
; Remainder appears in (HL)
; Carry for overflow, when registers unchanged
; Divisor, remainder and quotient range 0 to 65535
; Dividend range 0 to 4295*10^6 (approx.)
; F,D,E,H,L
idiv:   push    psw
mov     a,e;            Check for overflow
sub     c
mov     a,d
sbb     b
jc      idiv1;          No overflow
pop     psw;            Restore (A)
stc;                    Mark overflow
ret
idiv1:  push    b
call    c2bc;           Change (BC) sign
xchg;                   Do arithmetic in (HL)
mvi     a,-16;          Iteration count
idiv2:  push    psw;            Save iteration count
rar;                    Save Carry out
xchg
xchg
jnc     idiv3;          No Carry into L
inx     h
idiv3:  ral;                    Regain Carry from H
jc      idiv4;          Yes, generate quotient bit
mov     a,l
add     c;              Test for quotient bit
mov     a,h
jnc     idiv5;          No bit
inx     d;              Insert quotient bit
idiv5:  pop     psw;            Get iteration count
inr     a
jm      idiv2;          Not done
pop     b;              Restore BC
pop     psw;            Restore A
ora     a;              Clear any Carry, no overflow
ret

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.