Integer Square Root Algorithm for Processor with MUL, DIV

David T. Ashley · 2008-05-01T17:06:42+00:00

What is the best integer square root algorithm for a processor with MUL and DIV (integer multiplication and division) built-in?For one without MUL and DIV (or where the instructions operate on shorter operands than would be required to form the square in question)?Thanks for all ideas and thoughts.I will check TOACP, of course.The application is a low-cost microcontroller with a 3-axis accelerometer. To get the magnitude of the acceleration, we'd want to use sqrt(x^2+y^2+z^2). The squaring operations are cheap (MUL). The addition is cheap (ADD, ADC). The square root extraction is what I'm not sure how best to do.The algorithm that comes to mind is just to do a binary search by squaring potential solutions and comparing them to the quantity whose square root is to be extracted.For example, if the sum of the squares is 16 bits or less, and since the processor has a 8 x 8 = 16 MUL instruction, it should be possible to iterate to the solution in no more than 16 square/compare cycles.But maybe there is a better way?And it isn't clear what to do if the sum of the squares is a bit larger.Thanks for all.-- David T. Ashley ( (Consulting Home Page) (Personal Home Page) (GPL Publications and Projects)

M

Mark Borgerson 18 years ago

MUL and

orter

meter.

e addition

ure how

uaring

root is

the

to iterate

s

I think that algorithm is going to depend a lot on whether the processor has a 32-bit barrel shifter. Those aren't all that common on 16-bit and smaller embedded processors. IIRC, you do get them when you get up the the 32-bit processors such as the ARM. Would the=20 shift and add operations end up as a single instruction on the ARM7?

Mark Borgerson

Vote

P

Pubkeybreaker 18 years ago

Newton-Raphson for isqrt can be done with only integer arithemtic.....

Vote

N

Nils 18 years ago

Pubkeybreaker schrieb:

Please show me... I have to take roots of 64 bit integers on a 32 bit CPU and I currently do the two bit per iteration method. I can do an educated guess via tables, and the significant bit-doubling of the Newton-Raphosn method would be great.

However, my attempts at doing reciprocal square-roots in fixed-point always failed and never reached the exact result.

But maybe i'm just to dumb to do it..

Vote

V

Virgil 18 years ago

If it is using integer multiplication and division shouldn't it be

Rn+1 = (Rn + N/Rn) / 2

or even

Rn+1 = (Rn*Rn + N) / (2 * Rn)

Vote

D

Didi 18 years ago

But if you need even only one division per calculated root this will in most if not all cases make the algorithm slower than mine (which uses just as many mutiplications as there are bits in the result, I posted an example earlier to this thread -

formatting link

)

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Original message:

formatting link

Vote

V

Virgil 18 years ago

The formula that I objected to involved a multiplication by a non-integer in a supposedly integer only operation.

Admittedly a multiplication by 0.5 is easy enough as a binary right shift, but that is not what indicated.

Vote

C

Christopher Henrich 18 years ago

Here's an algorithm for finding r = sqrt(x^2 + y^2).

Assume x > y > 0. Let

x_1 = (4 x^2 + 3)x / (4 x^2 + y^2)

y_1 = y^3/(4 x^2 + y^2).

Then x_1^2 + y_1^2 = x^2 + y^2. If the transformation x,y) \mapsto (x_1, y_1) is iterated, then the point converges *cubically* to (r, 0).

I read this in a magazine many years ago.

Adapting it to your problem, I would first find r = sqrt(x^2 + y^2), then s = sqrt(r^2 + z^2).

I would use scaled integer arithmetic, that is, represent x etc. by j_x/D etc. where j_x is an integer and I choose D to be some convenient common denominator like 65536.

Christopher J. Henrich chenrich@monmouth.com htp://www.mathinteract.com

Vote

P

Paul E. Bennett 18 years ago

If you had focused on the first part;

A classical Forth puzzle is: What does the following do? : ???? -1 TUCK DO 2 + DUP +LOOP 2/ ; It is a standard method of calculating square root by summing the odd integers. It works for all unsigned integers from 0 to FFFFFFFF (or FFFF).

You would have realised that the definition ???? is actually a sqrt by the method stated. Even in the more spread out style used by many of us these days this would only have been 5 lines at most.

******************************************************************** Paul E. Bennett............... Forth based HIDECS Consultancy Mob: +44 (0)7811-639972 Tel: +44 (0)1235-811095 Going Forth Safely ..... EBA. www.electric-boat-association.org.uk.. ********************************************************************

Vote

D

Didi 18 years ago

Are you claiming that your unreadable 5 line algorithm which uses summing somehow (I am not able to read that line and from what I see I can say it is not worth learning how to read it) is in any way superior to my 12 line example which finishes in a guaranteed

16 times through the loop for any 32 bit integer - and can be read by any programmer? I might agree it is if the purpose is to waste CPU and programmer time.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Original message:

formatting link

Vote

C

CBFalconer 18 years ago

... snip ...

Which points out that the mathematics is more important than the coding. Three or four lines of C can easily produce the same effect.

[mail]: Chuck F (cbfalconer at maineline dot net) [page]: Try the download section. ** Posted from http://www.teranews.com **

Vote

J

James Waldby 18 years ago

[...]

It probably would be a little better to find r = sqrt(x^2 + z^2), then s = sqrt(r^2 + y^2), if x > y > z. On a processor with fast multiply and divide relative to comparison and swap operations it would make no difference, but on small cpu's the extra ops to put x,y,z in order would cost much less on average than the potentially avoided iteration steps.

Vote

4

42Bastian Schick 18 years ago

I guess this will benefit from the ff1 (Coldfire ISA A+ AFAIK) or cntlzw (PowerPC) opcodes, which return the position of the first 1 bit, and could be used to setup the loop counter (ff1 / 2).

42Bastian Do not email to bastian42@yahoo.com, it's a spam-only account :-) Use @monlynx.de instead !

Vote

D

Didi 18 years ago

It will indeed - it is really good to have the ff1 opcode back (as in the 020, there was a bfff1 there). I am a frequent cntlzw user on the PPC, but have not used it on this algorithm yet (never thought of it, I did it first for the CPU32 - or was it the HC11 - where there was no such opcode and have not changed it since). I may do so next time I dig in that vicinity.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Original message:

formatting link

Vote

P

Peter 18 years ago

"Paul E. Bennett" wrote

That's exactly what I posted, but this algorithm is not efficient for large numbers.

Vote

P

Paul Carpenter 18 years ago

and

shorter

accelerometer.

addition

how

squaring

is

iterate

I am glad someone else said that. as often I look at algorithms and then determine what results are needed and if some of the final satges are actually needed!

Too often the mathematicians and straight coders are looking for the perfect mousetrap where a sufficient one will do. By this I mean

Are our input or output ranges limited anyway (for _most_ light reading applications negative light is not possible!)

Do we need to process using the result in a way that a partial result will give the same relationship, i.e. could we just use the sum of squares or is the data being passed to a PC or higher computational device better suited to do the intensive calculations.

What is out native (8/16/32/64bit) and software (emulated floating point) ranges and limitations on what we are doing.

How often are we doing this (once a day, down to millions a second)

How fast do we have to do this (closely associated with previous and total system and application requirements)

How accurate is the response needed and implications of any inaccuracies (e.g. image processing will calculating image position results to 10 decimal places help us draw a circle on a fixed pixel array - integer).

-- Paul Carpenter | snipped-for-privacy@pcserviceselectronics.co.uk PC Services Timing Diagram Font GNU H8 - compiler & Renesas H8/H8S/H8 Tiny For those web sites you hate

Vote

L

lowcost 18 years ago

Vladimir Vassilevsky ha scritto: > How accurate do you want to be?

good question.

if you can tolerate +3 lsb or +7 % (that's better) accuracy, you can

convert polynomial to exponent (approx), to halve exponent, convert exponent to polynomial (approx).

none MUL_DIV: INC_DEC_SHIFT_ROT only.

best accuracy is at input values = 2^even. worst accuracy is at input values = 2^odd.

regards

Vote

E

Everett M. Greene 18 years ago

You can use the common binary form of the square root process and be accurate to 1 LSB for any value. The process involves only shifts and conditional subtraction, a variation of division. With some pre- and post-processing, it works for floating point values as well.

Vote

Integer Square Root Algorithm for Processor with MUL, DIV

Join the Discussion

Didn't find your answer?