Integer Square Root Algorithm for Processor with MUL, DIV

MUL and

orter

meter.

e addition

ure how

uaring

root is

the

to iterate

s

I think that algorithm is going to depend a lot on whether the processor has a 32-bit barrel shifter. Those aren't all that common on 16-bit and smaller embedded processors. IIRC, you do get them when you get up the the 32-bit processors such as the ARM. Would the=20 shift and add operations end up as a single instruction on the ARM7?

Mark Borgerson

Reply to
Mark Borgerson
Loading thread data ...

Newton-Raphson for isqrt can be done with only integer arithemtic.....

Reply to
Pubkeybreaker

Pubkeybreaker schrieb:

Please show me... I have to take roots of 64 bit integers on a 32 bit CPU and I currently do the two bit per iteration method. I can do an educated guess via tables, and the significant bit-doubling of the Newton-Raphosn method would be great.

However, my attempts at doing reciprocal square-roots in fixed-point always failed and never reached the exact result.

But maybe i'm just to dumb to do it..

Reply to
Nils

If it is using integer multiplication and division shouldn't it be

Rn+1 = (Rn + N/Rn) / 2

or even

Rn+1 = (Rn*Rn + N) / (2 * Rn)

Reply to
Virgil

But if you need even only one division per calculated root this will in most if not all cases make the algorithm slower than mine (which uses just as many mutiplications as there are bits in the result, I posted an example earlier to this thread -

formatting link
)

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Original message:

formatting link

Reply to
Didi

The formula that I objected to involved a multiplication by a non-integer in a supposedly integer only operation.

Admittedly a multiplication by 0.5 is easy enough as a binary right shift, but that is not what indicated.

Reply to
Virgil

Here's an algorithm for finding r = sqrt(x^2 + y^2).

Assume x > y > 0. Let

x_1 = (4 x^2 + 3)x / (4 x^2 + y^2)

y_1 = y^3/(4 x^2 + y^2).

Then x_1^2 + y_1^2 = x^2 + y^2. If the transformation x,y) \mapsto (x_1, y_1) is iterated, then the point converges *cubically* to (r, 0).

I read this in a magazine many years ago.

Adapting it to your problem, I would first find r = sqrt(x^2 + y^2), then s = sqrt(r^2 + z^2).

I would use scaled integer arithmetic, that is, represent x etc. by j_x/D etc. where j_x is an integer and I choose D to be some convenient common denominator like 65536.

--
Christopher J. Henrich
chenrich@monmouth.com
htp://www.mathinteract.com
Reply to
Christopher Henrich

If you had focused on the first part;

A classical Forth puzzle is: What does the following do? : ???? -1 TUCK DO 2 + DUP +LOOP 2/ ; It is a standard method of calculating square root by summing the odd integers. It works for all unsigned integers from 0 to FFFFFFFF (or FFFF).

You would have realised that the definition ???? is actually a sqrt by the method stated. Even in the more spread out style used by many of us these days this would only have been 5 lines at most.

--
********************************************************************
Paul E. Bennett...............
Forth based HIDECS Consultancy
Mob: +44 (0)7811-639972
Tel: +44 (0)1235-811095
Going Forth Safely ..... EBA. www.electric-boat-association.org.uk..
********************************************************************
Reply to
Paul E. Bennett

Are you claiming that your unreadable 5 line algorithm which uses summing somehow (I am not able to read that line and from what I see I can say it is not worth learning how to read it) is in any way superior to my 12 line example which finishes in a guaranteed

16 times through the loop for any 32 bit integer - and can be read by any programmer? I might agree it is if the purpose is to waste CPU and programmer time.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Original message:

formatting link

Reply to
Didi

... snip ...

Which points out that the mathematics is more important than the coding. Three or four lines of C can easily produce the same effect.

--
 [mail]: Chuck F (cbfalconer at maineline dot net) 
 [page]: 
            Try the download section.


** Posted from http://www.teranews.com **
Reply to
CBFalconer

[...]

It probably would be a little better to find r = sqrt(x^2 + z^2), then s = sqrt(r^2 + y^2), if x > y > z. On a processor with fast multiply and divide relative to comparison and swap operations it would make no difference, but on small cpu's the extra ops to put x,y,z in order would cost much less on average than the potentially avoided iteration steps.

Reply to
James Waldby

I guess this will benefit from the ff1 (Coldfire ISA A+ AFAIK) or cntlzw (PowerPC) opcodes, which return the position of the first 1 bit, and could be used to setup the loop counter (ff1 / 2).

--
42Bastian
Do not email to bastian42@yahoo.com, it's a spam-only account :-)
Use @monlynx.de instead !
Reply to
42Bastian Schick

It will indeed - it is really good to have the ff1 opcode back (as in the 020, there was a bfff1 there). I am a frequent cntlzw user on the PPC, but have not used it on this algorithm yet (never thought of it, I did it first for the CPU32 - or was it the HC11 - where there was no such opcode and have not changed it since). I may do so next time I dig in that vicinity.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Original message:

formatting link

Reply to
Didi

"Paul E. Bennett" wrote

That's exactly what I posted, but this algorithm is not efficient for large numbers.

Reply to
Peter

and

shorter

accelerometer.

addition

how

squaring

is

iterate

I am glad someone else said that. as often I look at algorithms and then determine what results are needed and if some of the final satges are actually needed!

Too often the mathematicians and straight coders are looking for the perfect mousetrap where a sufficient one will do. By this I mean

Are our input or output ranges limited anyway (for _most_ light reading applications negative light is not possible!)

Do we need to process using the result in a way that a partial result will give the same relationship, i.e. could we just use the sum of squares or is the data being passed to a PC or higher computational device better suited to do the intensive calculations.

What is out native (8/16/32/64bit) and software (emulated floating point) ranges and limitations on what we are doing.

How often are we doing this (once a day, down to millions a second)

How fast do we have to do this (closely associated with previous and total system and application requirements)

How accurate is the response needed and implications of any inaccuracies (e.g. image processing will calculating image position results to 10 decimal places help us draw a circle on a fixed pixel array - integer).

-- Paul Carpenter | snipped-for-privacy@pcserviceselectronics.co.uk PC Services Timing Diagram Font GNU H8 - compiler & Renesas H8/H8S/H8 Tiny For those web sites you hate

Reply to
Paul Carpenter

Vladimir Vassilevsky ha scritto: > How accurate do you want to be?

good question.

if you can tolerate +3 lsb or +7 % (that's better) accuracy, you can

convert polynomial to exponent (approx), to halve exponent, convert exponent to polynomial (approx).

none MUL_DIV: INC_DEC_SHIFT_ROT only.

best accuracy is at input values = 2^even. worst accuracy is at input values = 2^odd.

regards

Reply to
lowcost

You can use the common binary form of the square root process and be accurate to 1 LSB for any value. The process involves only shifts and conditional subtraction, a variation of division. With some pre- and post-processing, it works for floating point values as well.

Reply to
Everett M. Greene

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.