# Integer Square Root Algorithm for Processor with MUL, DIV

• posted

What is the best integer square root algorithm for a processor with MUL and DIV (integer multiplication and division) built-in?

For one without MUL and DIV (or where the instructions operate on shorter operands than would be required to form the square in question)?

Thanks for all ideas and thoughts.

I will check TOACP, of course.

The application is a low-cost microcontroller with a 3-axis accelerometer. To get the magnitude of the acceleration, we'd want to use sqrt(x^2+y^2+z^2). The squaring operations are cheap (MUL). The addition is cheap (ADD, ADC). The square root extraction is what I'm not sure how best to do.

The algorithm that comes to mind is just to do a binary search by squaring potential solutions and comparing them to the quantity whose square root is to be extracted.

For example, if the sum of the squares is 16 bits or less, and since the processor has a 8 x 8 = 16 MUL instruction, it should be possible to iterate to the solution in no more than 16 square/compare cycles.

But maybe there is a better way?

And it isn't clear what to do if the sum of the squares is a bit larger.

Thanks for all.

```--
David T. Ashley              (dta@e3ft.com)
• posted

There's never such a thing as "the best" way to do something that's been described so vaguely. Different circumstances render different aspects important, and thus change what "good" actually means.

There's one exception to the above rule: not doing something at all, if you can get away, is always the best way of doing it. So: do you really

*need* that sqrt() taken on your microcontroller?

Failing that you can still use use good old Newton-Raphson iteration to locate the zero of (x^2 - input).

x |--> x - (x^2 - input) / (2*x) = (x + input/x) / 2

• posted

Here is the simplest (but not the very fastest) sqrt algorithm for the CPU with cheap multiplication: //------------------------- s16 sqrt_s16(s32 x) { s32 tmp_result; s16 root, tmp_root, tmp_add;

root = 0; tmp_add = 0x4000;

while(tmp_add) { tmp_root = root + tmp_add; tmp_result = ((s32)tmp_root)*tmp_root; if(tmp_result >= 1; }

return root; } //-----------------------

DSP and Mixed Signal Design Consultant

• posted

Take a look here:

A zillion monkeys can't all be wrong.

** Posted from
**
• posted

d

tion

w
s

rate

=A0 =A0 =A0 =A0(GPL Publicatio= ns and Projects)

Approximate the square root by the lowest half of the bit representation of x^2 + y^2 (or any other easy, convenient approximation), and then use Newton-Raphson.

• posted

Jack Crenshaw had a series of columns on numerical methods in "Embedded Systems" a few years ago. Has a good book on the subject, as well.

(The past tense isn't meant to imply that his current columns aren't great too... ;-)

As far as the columns, the one on integer square roots is at

```--
Rich Webb     Norfolk, VA```
• posted

How accurate do you want to be?

let X = min(a,b)/max(a,b)

Then you can use a polynomial approximation:

sqrt(a^2 + b^2) ~ max(a,b)*(1 + P(X))

The simplest approximation of the 1st order (somewhat 10% accurate):

sqrt(a^2 + b^2) ~ max(a,b) + 0.318 * min(a,b)

In the 3-dimensional case, you have to do this procedure twice:

sqrt(x^2 + y^2 + z^2) = sqrt(max(x,y,z)^2 + sqrt(mid(x,y,z)^2 + min(x,y,z)^2))^2

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

• posted

For no MUL and DIV search for Dann Corbit's postings. Dann posted one a few years ago with N/2 interations. If you can't find it contact me off line.

The following square root routine was originally done for the 6805. It is probably not the fastest

The basic loop spits out one bit of result per pass. This routine takes a 16 bit argument and returns an eight bit integer and 8 bit fraction.

Walter Banks

============================================================ #pragma option v; #pragma has MUL;

long SQRT (long l) /* Square root V1.0.

Hayward/Banks.

This routine generates a square root to eight bit resolution taking a 16 bit argument normalizing the argument and returning a 16 bit real number that consists of 8 bits integer and 8 bit fractional parts. The MS8bits are the integer part.

i i i i i i i i . f f f f f f f f | | | | | | | | | | | | | | | | 128 |32 | 8 4 2 1 .5 | | | | | | .00390625 64 16 .25 | | | | --.0078125 .125- | | ----.015625 .0625-- ------.03125

These routines may be adapted for any purpose. No warranty is implied or given as to their usability for any purpose.

(c) Byte Craft Limited Waterloo, Ontario Canada N2J 4E4 (519) 888-6911

Walter Banks/Gordon Hayward */

{ unsigned int n,j,k; unsigned long t; n = 0; while( (l & 0xc000) == 0) { n++; l =1) { j = j | k; t = j * j; if (t > l) j = j & (~k); } t = j; return (t > 8; frac_part = result & 0xff; }

• posted

Actually you can do this in 8 multiplies. This is how I usually implement square root - I have probably reinvented it some 15+ years ago for myself. You just start with \$80, square and compare to the number you want the root of, and do successive approximation down to bit 0 (i.e. second time with \$40+result so far, then \$20 etc.).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

------------------------------------------------------

Original message:

• posted

If the values of x, y, z don't change too much from sample to sample you could use the previous value as the starting point for Newton- Raphson iteration.

For a different application (not involving acceleration or square roots), I performed only a single Newton-Raphson iteration per sample. For slow moving inputs, this gave perfect results, and for fast moving inputs, I got a free low-pass filter :)

• posted

Depends on needs. Probably your best bet is Newton approximation:

Rn+1 = 0.5 * (Rn + N/Rn)

and repeat until the error is negligible. Start with R == 1.0.

```--
[mail]: Chuck F (cbfalconer at maineline dot net)
[page]: ```
• posted

d

tion

w
s

rate

(I posted this also from TerraNews, but sometimes those posts take a few days to show up):

Take a look here:

A zillion monkeys can't all be wrong.

• posted

See:-

Nice and simple method that works quite well.

```--
********************************************************************
Paul E. Bennett...............```
• posted

Well my algorithm takes less lines and is a lot easier to read than those in HLL which were suggested, but I hope it counts in spite of that :-). Here is it excerpted from a post I have made many years ago, calculates a 16 bit squre root of a 32 bit number. The example is in CPU32 (or CF) assembly.

*
• calc. the square root of d1.l; return it in d1.l
• destroys d2,d3,d4
• sqrtd1 moveq #15,d2 counter clr.l d3 colect result here set loop,* bset d2,d3 try this move.w d3,d4 extra copy result so far mulu.w d4,d4 square cmp.l d1,d4 higher? bls keepbit branch not bclr d2,d3 else bit back down keepbit dbra d2,loop process all bits move.l d3,d1 update in d1 rts
*

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

------------------------------------------------------

Original message:

• posted

One easy way is to add up the number of odd integers which add up to the number (or just below it).

For example, using this algorithm, the square root of 20 is

1+3+5+7

i.e. 4 numbers

Not so efficient for large integers of course!

"David T. Ashley" wrote

• posted

nd

.

ition

ow

g

is

erate

I have Mr. Crenshaw's book "Math Toolkit for REAL-TIME Programming" "Square Root" runs from pages 43-87 and he discusses a boatload of methods. Unfortunately, I have misplaced the CD. Looking for it now.

• posted

First make sure you really need to find the square root. For many uses, you could just use the calculated square of the magnitude.

• posted

and

er

er.

ddition

how

ing

t is

e

iterate

A benchmark shows this to be the fastest on modern hardware, but I do not know about embedded CPUs. I guess they should be tested.

unsigned long isqrt26(unsigned long input) { unsigned long s =3D 0; unsigned long s21 =3D 1073741824;

if (s21

• posted

UL and

rter

eter.

re how

aring

oot is

the

o iterate

#include

// This one is REALLY bad for large inputs. unsigned long isqrt00(unsigned long x) { unsigned long s =3D 1; unsigned long remainder; if (x =3D s) { remainder -=3D s; s +=3D 2; } return (s - 1) >> 1; } // Integer Square Root function // Uses bisection to find square root

unsigned long isqrt01(unsigned long x) { unsigned long r, s, t; r =3D 0; for (t =3D 0x40000000; t; t >>=3D 2) { s =3D r + t; if (s >=3D 1; } return (r); }

#define LSIZE (sizeof(unsigned long)*CHAR_BIT)

/* Jos came up with this one... */ unsigned long isqrt06(unsigned long n) {

unsigned long l =3D 0, c =3D 0, y, k;

for (k =3D LSIZE >> 1; k--; n =3D 0; i--) if (y < xsq) { oldx =3D x; x -=3D 1 > 1; } } }

// Integer Square Root function // Uses bisection to find square root unsigned long isqrt05(unsigned long n) { unsigned long h, p =3D 0, q =3D 1, r =3D n;

while (q >=3D 1;

if (r >=3D h) { p +=3D q; r -=3D h; } }

return p; }

// Binomial Theorem -- O( 1/2 log2 N ) unsigned long isqrt07(unsigned long N) { unsigned long l2, u, v, u2, v2, uv2, n; if (2 > N) return (N); u =3D N; l2 =3D 0; while (u >>=3D 1) l2++; l2 >>=3D 1; u =3D 1L 24] > 22] =3D 0x4000000) return (sqq_table[n >> 20] > 18] =3D 0x100000) if (n >=3D 0x400000) return (sqq_table[n >> 16] > 14] =3D 0x40000) return (sqq_table[n >> 12] > 10] =3D 0x100) if (n >=3D 0x1000) if (n >=3D 0x4000) return (sqq_table[n >> 8]); else return (sqq_table[n >> 6] >> 1); else if (n >=3D 0x400) return (sqq_table[n >> 4] >> 2); else return (sqq_table[n >> 2] >> 3); else if (n >=3D 0x10) if (n >=3D 0x40) return (sqq_table[n] >> 4); else return (sqq_table[n > 4] > 6] =3D 0x10000) if (n >=3D 0x10000000) if (n >=3D 0x40000000) result =3D sq_table[n >> 16]; else result =3D sq_table[n >> 14] >> 1; else if (n >=3D 0x1000000) result =3D sq_table[n >> 12] >> 2; else result =3D sq_table[n >> 8] >> 4;

else return (sq_table[n] >> 8);

if (((result + 1) * (result + 1)) 2^16 // can be removed to get almost 2 times speed up, but accuracy will suffer. // however worst case will be out by 1.

return (result);

}

/* Fast integer square root */ unsigned long isqrt10(unsigned long x) { unsigned long squaredbit, remainder, root;

if (x =3D=3D 0) return 0;

/* Load the binary constant 01 00 00 ... 00, where the number of zero * bits to the right of the single one bit is even, and the one bit is as * far left as is consistant with that condition.) */ squaredbit =3D (long) ((((unsigned long) ~0L) >> 1) & ~(((unsigned long) ~0L) >> 2)); /* This portable load replaces the loop that used to be here, and was * donated by snipped-for-privacy@xmission.com */

/* Form bits of the answer. */ remainder =3D x; root =3D 0; while (squaredbit > 0) { if (remainder >=3D (squaredbit | root)) { remainder -=3D (squaredbit | root); root >>=3D 1; root |=3D squaredbit; } else { root >>=3D 1; } squaredbit >>=3D 2; } return root; }

#define Ns ((sizeof(unsigned long)) ((Ns - i) r) * r++; */

unsigned long isqrt02(unsigned long x) { unsigned long r, nr, m;

r =3D 0; m =3D 0x40000000; do { nr =3D r + m; if (nr >=3D 1; m >>=3D 2; } while (m !=3D 0);

if (x > r) r++; return (r); }

unsigned long isqrt14(unsigned long input) { unsigned long s; unsigned long s21; long j; s =3D 0; for (j =3D 15; j >=3D 0; --j) { s21 =3D (s =3D 0x40000000) { if (x >=3D 65535UL * 65535UL) return 65535; xn =3D sqq_table[x >> 24] > 22] =3D 0x4000000) xn =3D sqq_table[x >> 20] > 18] =3D 0x100000) if (x >=3D 0x400000) xn =3D sqq_table[x >> 16] > 14] =3D 0x40000) xn =3D sqq_table[x >> 12] > 10] =3D 0x100) { if (x >=3D 0x1000) if (x >=3D 0x4000) xn =3D (sqq_table[x >> 8] >> 0) + 1; else xn =3D (sqq_table[x >> 6] >> 1) + 1; else if (x >=3D 0x400) xn =3D (sqq_table[x >> 4] >> 2) + 1; else xn =3D (sqq_table[x >> 2] >> 3) + 1;

goto adj; } else return sqq_table[x] >> 4;

/* Run two iterations of the standard convergence formula */

xn =3D (xn + 1 + x / xn) / 2; nr1: xn =3D (xn + 1 + x / xn) / 2; adj:

if (xn * xn > x) /* Correct rounding if necessary */ xn--;

return xn; }

/* From:

*/ /* Let us examine an implementation of the second method. By Jim Ulery

*/ unsigned short isqrt36(unsigned long val) { unsigned long temp, g =3D 0, b =3D 0x8000, bshft =3D 15; do { if (val >=3D (temp =3D (((g =3D 1); return g; }

/* From: