#### Do you have a question? Post it now! No Registration Necessary

**posted on**

October 22, 2004, 12:50 pm

Hi,

I'd like to ask how I can optimise a binary division algorithm if the

divisor has only half bits as compared to the dividend (ie. 128bit/64bit,

64bit/32bit, 32bit/16bit, 16bit/8bit). I remember vaguely that you divide

the higher bits of the dividend and somehow use the result in the division

of the dividend lower bits, but don't remember the necessary details. I'm

asking this because I'm developing an integral division algorithm for an

8bit controller and as I deal only with the above cases, I'd like to speed

it up a bit.

thanks

I'd like to ask how I can optimise a binary division algorithm if the

divisor has only half bits as compared to the dividend (ie. 128bit/64bit,

64bit/32bit, 32bit/16bit, 16bit/8bit). I remember vaguely that you divide

the higher bits of the dividend and somehow use the result in the division

of the dividend lower bits, but don't remember the necessary details. I'm

asking this because I'm developing an integral division algorithm for an

8bit controller and as I deal only with the above cases, I'd like to speed

it up a bit.

thanks

Re: Division algorithm optimisation

I do this routinely and have, recently. How many different divisions are you

developing (64/32, 32/16, 16/8, 8/4, etc.), what are their fractional bits in

their results (none is common for this, but it's worth asking), and what is the

processor? (I can probably help suggest something, but the details are

important.)

Jon

Re: Division algorithm optimisation

you

in

64/32, 32/16 and 16/8. I'd like to, if possible, keep it in one generic

form, though.

is the

None.

If it matters, I need an unsigned version and a signed one, with the

remainder having the same sign as the dividend. For now I just do the

unsigned division and correct the signs later.

Probably one of the ST7 family controllers. For now I develop the code in C.

Re: Division algorithm optimisation

I don't use C for something like this. And it certainly defeats your earlier

point, which you said was "I'd like to speed it up a bit." Are you prepared to

read some materials and write this in assembly? (An optimal algorithm in

written in C is NOT the same as an optimal algorithm in assembly, because the

available semantics are different in each case.) Also, do you want a rounded

result? (In other words, is 1/2 going to return 1 or 0?)

I also prefer to have routines that normalize both the numerator and denominator

before applying the division algorithm and provide an exponent to reflect the

net of the shifting process. This way, I'm guaranteed to retain maximum

precision and can use this as part of a floating point routine, later. But I

gather you aren't interested in that, at all.

Here's an example algorithm in MSP430 assembler, for example:

; --------------------------------------------------------------------------

; DIV32u16uQR 32u/16u --> 16u R 16u

; --------------------------------------------------------------------------

;

; This routine provides division of an unsigned, 32-bit dividend by an

; unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and

; an unsigned, 16-bit remainder. Essentially, this is ( R14:R15 / R13 ).

;

; No normalization takes place in this routine and no exponent is produced.

;

; Inputs:

;

; R13 unsigned 16-bit divisor

; R14 unsigned high-order 16 bits of the dividend

; R15 low-order 16 bits of the dividend

;

; Outputs:

;

; R14 unsigned 16-bit remainder

; R15 unsigned 16-bit quotient

;

; Scratches:

;

; R12 counter

Div32u16uQR mov #16, R12 ; set up the shift counter

Div32u16uQR_0 rla R15

rlc R14

jc Div32u16uQR_1

sub R13, R14

jc Div32u16uQR_2

add R13, R14

dec R12 ; done?

jnz Div32u16uQR_0 ; no -- continue

ret ; yes -- return

Div32u16uQR_1 sub R13, R14

Div32u16uQR_2 inc R15

dec R12 ; done?

jnz Div32u16uQR_0 ; no -- continue

ret ; yes -- return

A nice feature here is that it

***does***produce the remainder. That is handy if

you want to round the result, for example. Another version provides a little

more speed, pairing bits into a restoring version. At the expense of some

bookkeeping code.

; --------------------------------------------------------------------------

; DIV32u16uQR 32u/16u --> 16u R 16u

; --------------------------------------------------------------------------

;

; This routine provides division of an unsigned, 32-bit dividend by an

; unsigned, 16-bit divisor and produces an unsigned, 16-bit quotient and

; an unsigned, 16-bit remainder.

;

; No normalization takes place in this routine and no exponent is produced.

;

; Inputs:

;

; R13 unsigned 16-bit divisor

; R14 unsigned high-order 16 bits of the dividend

; R15 low-order 16 bits of the dividend

;

; Outputs:

;

; R14 unsigned 16-bit remainder

; R15 unsigned 16-bit quotient

;

; Scratches:

;

; R12 counter

Div32u16uQR mov #8, R12 ; set up the shift counter

Div32u16uQR_0 rla R15

rlc R14

jc Div32u16uQR_3

sub R13, R14

jc Div32u16uQR_4

add R13, R14 ; restore

rla R15

rlc R14

jnc Div32u16uQR_5

Div32u16uQR_1 sub R13, R14

Div32u16uQR_2 inc R15

dec R12 ; done?

jnz Div32u16uQR_0 ; no -- continue

ret ; yes -- return

Div32u16uQR_3 sub R13, R14

Div32u16uQR_4 inc R15

rla R15

rlc R14

jc Div32u16uQR_1

Div32u16uQR_5 sub R13, R14

jc Div32u16uQR_2

add R13, R14 ; restore

dec R12 ; done?

jnz Div32u16uQR_0 ; no -- continue

ret ; yes -- return

There's actually quite a lot of information out there on integer division,

because this is and has been a long going concern for those designing computers

and micros -- not just software people. You'll find a lot of good information

in the VHDL and Verilog areas (the so called "sequential dividers" as opposed to

"combinatorial dividers".)

If you cannot extract the algorithmic logic from the above and make sense of it,

feel free to ask. It's not hard to see, if you lay this out on paper and work

an example. I just don't want to have to write out a tutorial here, when it's

likely that there are already many pages on the web on this subject that go into

the gory details and I figure you can do some work, here. But once you "get

it," it is almost 2nd nature and you can adapt the above to produce various

fixed point versions or enhance it to support floating point (which will almost

always be faster and better than the libraries you are 'given' -- a true,

recurring fact that happens for reasons I cannot really fathom, unless those

writing the FP libraries are more interested in just putting out boilerplate

than thinking a little.)

Good luck,

Jon

Re: Division algorithm optimisation

If this is for a PIC you might be interested in some macros I wrote which

generate arbitrary division and multiplication functions:

http://ben.com/vcc/18fdivmul.zip

If you can puzzle out what it's doing it takes shortcuts for signed vs

unsigned, slightly shorter (not just half) divisors etc.

--

Ben Jackson

Ben Jackson

We've slightly trimmed the long signature. Click to see the full one.

Re: Division algorithm optimisation

It's quite impossible to discuss that meaningfully without knowing

what operations you have available as building blocks of such an

algorithm.

Do you have a 16/8 division operator? An 8/8 one? None at all?

Essentially, you'ld be using the same technique that you should have

been taught in primary school for dividing large numbers on paper,

where the "native" operations are 1*1 decimal digit multiplication,

1+1 addition, 1-1 subtraction and 2/1 division.

And then there's the more serious hacks, like doing 16/8 division with

a compile-time fixed divisor by a 16*8 multiply and a shift. E.g.

integer x/9 is the same as (x*57)>>9.

--

Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)

Even if all the snow were burnt, ashes would remain.

Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)

Even if all the snow were burnt, ashes would remain.

Re: Division algorithm optimisation

I program using C, so there is a 16/8 operator, but it's emulated (there's

no div in the ST7 instruction set). I believe the 'pen-and-paper' technique

using shifts and subtractions would be faster than using this C operator

somehow for 64/32 and 32/16.

Sure, I use that, it's just that it's not optimised for eg. 16/8, it's as

generic as 16/16. And AFAIR there's a technique for an optimisation in this

field, not only when dividing by constants.

Re: Division algorithm optimisation

As long as we are bandying old assembly code, here is a thoroughly

tested version of mine from 30 years ago. It runs on an 8080, z80,

z180, etc. and doesn't require the added z80 registers etc. The

break up provides useful routines for other operations. The head

comments show net registers disturbed. It is extremely simple.

I later built a faster version around a 24/16 bit module, with the

identical external appearance.

;

; 2's complement (BC), leave (B) in (A)

; A,B,C

c2bc: dcx b

; " "

; 1's complement (DE), leave (B) in (A)

; A,D,E

c1bc: mov a,c

cma

mov c,a

mov a,b

cma

mov b,a

ret

;

; Integer (unsigned pos.) divide (DEHL)/(BC)=>(DE)

; Remainder appears in (HL)

; Carry for overflow, when registers unchanged

; Divisor, remainder and quotient range 0 to 65535

; Dividend range 0 to 4295*10^6 (approx.)

; F,D,E,H,L

idiv: push psw

mov a,e; Check for overflow

sub c

mov a,d

sbb b

jc idiv1; No overflow

pop psw; Restore (A)

stc; Mark overflow

ret

idiv1: push b

call c2bc; Change (BC) sign

xchg; Do arithmetic in (HL)

mvi a,-16; Iteration count

idiv2: push psw; Save iteration count

dad h; Left shift (HLDE)

rar; Save Carry out

xchg

dad h

xchg

jnc idiv3; No Carry into L

inx h

idiv3: ral; Regain Carry from H

jc idiv4; Yes, generate quotient bit

mov a,l

add c; Test for quotient bit

mov a,h

adc b

jnc idiv5; No bit

idiv4: dad b; Subtract

inx d; Insert quotient bit

idiv5: pop psw; Get iteration count

inr a

jm idiv2; Not done

pop b; Restore BC

pop psw; Restore A

ora a; Clear any Carry, no overflow

ret

--

Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)

Available for consulting/temporary embedded and systems.

Chuck F ( snipped-for-privacy@yahoo.com) ( snipped-for-privacy@worldnet.att.net)

Available for consulting/temporary embedded and systems.

We've slightly trimmed the long signature. Click to see the full one.

#### Site Timeline

- » Reverse current into a lithium battery
- — Next thread in » Embedded Programming

- » Microcontroller with 7V supply and I/O tolerance ?
- — Previous thread in » Embedded Programming

- » [AVR] What assembler to use?
- — Newest thread in » Embedded Programming

- » Serial EEPROM or Serial Flash?
- — Last Updated thread in » Embedded Programming

- » Does the Full Bridge Phase Shift Zero Voltage Switched Converter with Current Doubler Sync...
- — The site's Newest Thread. Posted in » Electronics Design