Division algorithm optimisation

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi,

I'd like to ask how I can optimise a binary division algorithm if the
divisor has only half bits as compared to the dividend (ie. 128bit/64bit,
64bit/32bit, 32bit/16bit, 16bit/8bit). I remember vaguely that you divide
the higher bits of the dividend and somehow use the result in the division
of the dividend lower bits, but don't remember the necessary details. I'm
asking this because I'm developing an integral division algorithm for an
8bit controller and as I deal only with the above cases, I'd like to speed
it up a bit.

thanks



Re: Division algorithm optimisation

Quoted text here. Click to load it

I do this routinely and have, recently.  How many different divisions are you
developing (64/32, 32/16, 16/8, 8/4, etc.), what are their fractional bits in
their results (none is common for this, but it's worth asking), and what is the
processor?  (I can probably help suggest something, but the details are
important.)

Jon

Re: Division algorithm optimisation
Quoted text here. Click to load it
you
in

64/32, 32/16 and 16/8. I'd like to, if possible, keep it in one generic
form, though.

Quoted text here. Click to load it
is the

None.

If it matters, I need an unsigned version and a signed one, with the
remainder having the same sign as the dividend. For now I just do the
unsigned division and correct the signs later.

Quoted text here. Click to load it

Probably one of the ST7 family controllers. For now I develop the code in C.



Re: Division algorithm optimisation

Quoted text here. Click to load it

I don't use C for something like this.  And it certainly defeats your earlier
point, which you said was "I'd like to speed it up a bit."  Are you prepared to
read some materials and write this in assembly?  (An optimal algorithm in
written in C is NOT the same as an optimal algorithm in assembly, because the
available semantics are different in each case.)  Also, do you want a rounded
result?  (In other words, is 1/2 going to return 1 or 0?)

I also prefer to have routines that normalize both the numerator and denominator
before applying the division algorithm and provide an exponent to reflect the
net of the shifting process.  This way, I'm guaranteed to retain maximum
precision and can use this as part of a floating point routine, later.  But I
gather you aren't interested in that, at all.

Here's an example algorithm in MSP430 assembler, for example:

;   --------------------------------------------------------------------------
;   DIV32u16uQR     32u/16u --> 16u R 16u
;   --------------------------------------------------------------------------
;
;   This routine provides division of an unsigned, 32-bit dividend by an
;   unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and
;   an unsigned, 16-bit remainder.  Essentially, this is ( R14:R15 / R13 ).
;
;   No normalization takes place in this routine and no exponent is produced.
;
;   Inputs:
;
;       R13     unsigned 16-bit divisor
;       R14     unsigned high-order 16 bits of the dividend
;       R15     low-order 16 bits of the dividend
;
;   Outputs:
;
;       R14     unsigned 16-bit remainder
;       R15     unsigned 16-bit quotient
;
;   Scratches:
;
;       R12     counter

Div32u16uQR         mov     #16, R12        ; set up the shift counter
Div32u16uQR_0       rla     R15
                    rlc     R14
                    jc      Div32u16uQR_1
                    sub     R13, R14
                    jc      Div32u16uQR_2
                    add     R13, R14
                    dec     R12             ; done?
                    jnz     Div32u16uQR_0   ; no -- continue
                    ret                     ; yes -- return
Div32u16uQR_1       sub     R13, R14
Div32u16uQR_2       inc     R15
                    dec     R12             ; done?
                    jnz     Div32u16uQR_0   ; no -- continue
                    ret                     ; yes -- return

A nice feature here is that it *does* produce the remainder.  That is handy if
you want to round the result, for example.  Another version provides a little
more speed, pairing bits into a restoring version.  At the expense of some
bookkeeping code.

;   --------------------------------------------------------------------------
;   DIV32u16uQR     32u/16u --> 16u R 16u
;   --------------------------------------------------------------------------
;
;   This routine provides division of an unsigned, 32-bit dividend by an
;   unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and
;   an unsigned, 16-bit remainder.
;
;   No normalization takes place in this routine and no exponent is produced.
;
;   Inputs:
;
;       R13     unsigned 16-bit divisor
;       R14     unsigned high-order 16 bits of the dividend
;       R15     low-order 16 bits of the dividend
;
;   Outputs:
;
;       R14     unsigned 16-bit remainder
;       R15     unsigned 16-bit quotient
;
;   Scratches:
;
;       R12     counter

Div32u16uQR         mov     #8, R12         ; set up the shift counter
Div32u16uQR_0       rla     R15
                    rlc     R14
                    jc      Div32u16uQR_3
                    sub     R13, R14
                    jc      Div32u16uQR_4
                    add     R13, R14        ; restore
                    rla     R15
                    rlc     R14
                    jnc     Div32u16uQR_5
Div32u16uQR_1       sub     R13, R14
Div32u16uQR_2       inc     R15
                    dec     R12             ; done?
                    jnz     Div32u16uQR_0   ; no -- continue
                    ret                     ; yes -- return
Div32u16uQR_3       sub     R13, R14
Div32u16uQR_4       inc     R15
                    rla     R15
                    rlc     R14
                    jc      Div32u16uQR_1
Div32u16uQR_5       sub     R13, R14
                    jc      Div32u16uQR_2
                    add     R13, R14        ; restore
                    dec     R12             ; done?
                    jnz     Div32u16uQR_0   ; no -- continue
                    ret                     ; yes -- return

There's actually quite a lot of information out there on integer division,
because this is and has been a long going concern for those designing computers
and micros -- not just software people.  You'll find a lot of good information
in the VHDL and Verilog areas (the so called "sequential dividers" as opposed to
"combinatorial dividers".)

If you cannot extract the algorithmic logic from the above and make sense of it,
feel free to ask.  It's not hard to see, if you lay this out on paper and work
an example.  I just don't want to have to write out a tutorial here, when it's
likely that there are already many pages on the web on this subject that go into
the gory details and I figure you can do some work, here.  But once you "get
it," it is almost 2nd nature and you can adapt the above to produce various
fixed point versions or enhance it to support floating point (which will almost
always be faster and better than the libraries you are 'given' -- a true,
recurring fact that happens for reasons I cannot really fathom, unless those
writing the FP libraries are more interested in just putting out boilerplate
than thinking a little.)

Good luck,
Jon

Re: Division algorithm optimisation

--------------------------------------------------------------------------
Quoted text here. Click to load it

Hi Jon,

I can't see the reason for this jump. If I'm not mistaken, it is only
taken if the result doesn't fit into 16-bit.

Quoted text here. Click to load it



Re: Division algorithm optimisation
Quoted text here. Click to load it

If this is for a PIC you might be interested in some macros I wrote which
generate arbitrary division and multiplication functions:

    http://ben.com/vcc/18fdivmul.zip

If you can puzzle out what it's doing it takes shortcuts for signed vs
unsigned, slightly shorter (not just half) divisors etc.

--
Ben Jackson
We've slightly trimmed the long signature. Click to see the full one.
Re: Division algorithm optimisation
Quoted text here. Click to load it

Thanks, I'll check it out



Re: Division algorithm optimisation
Quoted text here. Click to load it


It's quite impossible to discuss that meaningfully without knowing
what operations you have available as building blocks of such an
algorithm.

Do you have a 16/8 division operator?  An 8/8 one?  None at all?
Essentially, you'ld be using the same technique that you should have
been taught in primary school for dividing large numbers on paper,
where the "native" operations are 1*1 decimal digit multiplication,
1+1 addition, 1-1 subtraction and 2/1 division.

And then there's the more serious hacks, like doing 16/8 division with
a compile-time fixed divisor by a 16*8 multiply and a shift.  E.g.
integer x/9 is the same as (x*57)>>9.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: Division algorithm optimisation
Quoted text here. Click to load it

I program using C, so there is a 16/8 operator, but it's emulated (there's
no div in the ST7 instruction set). I believe the 'pen-and-paper' technique
using shifts and subtractions would be faster than using this C operator
somehow for 64/32 and 32/16.

Quoted text here. Click to load it

Sure, I use that, it's just that it's not optimised for eg. 16/8, it's as
generic as 16/16. And AFAIR there's a technique for an optimisation in this
field, not only when dividing by constants.



Re: Division algorithm optimisation
Quoted text here. Click to load it

As long as we are bandying old assembly code, here is a thoroughly
tested version of mine from 30 years ago.  It runs on an 8080, z80,
z180, etc. and doesn't require the added z80 registers etc.  The
break up provides useful routines for other operations.  The head
comments show net registers disturbed.  It is extremely simple.

I later built a faster version around a 24/16 bit module, with the
identical external appearance.

;
; 2's complement (BC), leave (B) in (A)
; A,B,C
c2bc:   dcx     b
;       "       "
; 1's complement (DE), leave (B) in (A)
; A,D,E
c1bc:   mov     a,c
        cma
        mov     c,a
        mov     a,b
        cma
        mov     b,a
        ret
;
; Integer (unsigned pos.) divide (DEHL)/(BC)=>(DE)
; Remainder appears in (HL)
; Carry for overflow, when registers unchanged
; Divisor, remainder and quotient range 0 to 65535
; Dividend range 0 to 4295*10^6 (approx.)
; F,D,E,H,L
idiv:   push    psw
        mov     a,e;            Check for overflow
        sub     c
        mov     a,d
        sbb     b  
        jc      idiv1;          No overflow
        pop     psw;            Restore (A)
        stc;                    Mark overflow
        ret
idiv1:  push    b
        call    c2bc;           Change (BC) sign
        xchg;                   Do arithmetic in (HL)
        mvi     a,-16;          Iteration count
idiv2:  push    psw;            Save iteration count
        dad     h;              Left shift (HLDE)
        rar;                    Save Carry out
        xchg
        dad     h
        xchg
        jnc     idiv3;          No Carry into L
        inx     h
idiv3:  ral;                    Regain Carry from H
        jc      idiv4;          Yes, generate quotient bit
        mov     a,l
        add     c;              Test for quotient bit
        mov     a,h
        adc     b
        jnc     idiv5;          No bit
idiv4:  dad     b;              Subtract
        inx     d;              Insert quotient bit
idiv5:  pop     psw;            Get iteration count
        inr     a
        jm      idiv2;          Not done
        pop     b;              Restore BC
        pop     psw;            Restore A
        ora     a;              Clear any Carry, no overflow
        ret

--
Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)
   Available for consulting/temporary embedded and systems.
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline