Suitable Integer Square Root Algorithm for 32-64-Bit Integers on Inexpensive Microcontroller?

Hi,

I have the need in a microcontroller application to calculate floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.

What are the algorithms I should look at?

These small microcontrollers are characterized by having a 8-bit times 8-bit to give 16-bit result integer unsigned multiply instruction, and a similar

16/8 division instruction to give a quotient and a remainder. Mulitplications of larger integers have to be done by multiplying the bytes and adding.

I'm aware of these algorithms:

a)Adding consecutive odd integers and counting how many you have to add, i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.

b)Trial squaring, testing setting each bit to "1", i.e.

formatting link

c)Another method that seems to work (don't know the name of it), here is source code:

formatting link

What other options should I investigate?

Thanks, Datesfat

Reply to
Datesfat Chicks
Loading thread data ...

formatting link

formatting link

Here is some C code for integer square roots using many different algorithms:

formatting link

Worth a look:

formatting link

Consider also:

formatting link

Reply to
Dann Corbit

x = y^2 - (2^16)^2

Google is your friend :> There are many sqrt algorithms with different tradeoffs (speed/size).

I would, instead, ask that you might want to examine what *else* you know about "x". I.e., you've already constrained the range (why? does that give you any other information that can be exploited?); are there any other characteristics about how "x" is synthesized that you can exploit to divide and conquer?

E.g. if x = y * z, then knowing sqrt(y) and/or sqrt(z) simplifies your problem.

formatting link

formatting link

Reply to
D Yuniskis

x))

bit

r

es

The one I have been using last 15-20 years is probably your B, at least sounds like what I made up back then - successive approximation for each bit of the result. If multiplication is fast on the core you will be using it it is hard to beat.

Dimiter

Reply to
Didi

Thanks for the insight.

For small operands, I agree with your analysis.

However, for larger operands, the Babylonian method:

formatting link

looks very promising.

The issue is that for the cost of one division (the /2 and addition is virtually free) you get far more than one bit of additional information about the square root.

For larger operands, I don't believe that (b) will be the right answer any longer.

But I'm not sure yet ...

Thanks for all, Datesfat

Reply to
Datesfat Chicks

That quite certainly can't work for the input range you're targeting. You would be looking at up to four billion iterations if x was 2^64

A whole lot better already. It becomes even better if you update the square of your current result differentially, applying the binomial formula. As you update the result-to-be ('res' in the following), by adding a 1-bit at position 'k':

x --> x + 2^k

it's square, which eventually should be equal to the input number, updates too:

x^2 --> x^2 + 2*(2^k)*x + (2^k)^2 = x^2 + x

Reply to
Hans-Bernhard Bröker

Newton-Raphson or bisection, depending upon how slow division is. N-R is asymptotically faster (the number of correct digits doubles at each step), but each step requires a division.

Bisection is essentially your option b), but it can be computed much more efficiently than the algorithm given. You don't need arbitrary multiplies, only shifts, as:

(a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2 = a^2 + a.2^(k+1) + 2^2k

IOW:

uint32_t isqrt(uint64_t x) { uint64_t a = 0; uint64_t a2 = 0; int k;

for (k = 31; k >= 0; k--) {

uint64_t t = a2 + (a

Reply to
Nobody

Homework?

For microcontroller purpose, a 10-bit accurate sqrt approximation should probably suffice. I can hardly imagine anything practical that would require more then 14 significant bits. So, normalize your number and then do polynomial or piecewise linear approximation.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Reply to
Vladimir Vassilevsky

The identity you provided above is so obvious that I'm beating myself on the head with a hammer right now for not spotting that as an alternative to multiplying again on each iteration.

Ouch ouch ouch.

Thanks sincerely.

As you know, division of large integers is fairly expensive on a little microcontroller. The "bisection" insight you provided is definitely something I will investigate.

Datesfat

Reply to
Datesfat Chicks

It might be better to solve g(x)=0 where g(x) = 1/x^2 - y (the solution is obviously 1/sqrt(y) ) and then multiply by y. The reason is that NR iteration in this case does not involve any division except for a bit shift:

x' = x - g(x)/g'(x) = x - (1/x^2 - y) / (-2 / x^3) = x + x/2 - y*x^3/2 = x*(3 - y*x^2)/2

In the same way we can compute reciprocals using only multiplication (use h(x) = 1/x - y ).

dave

Reply to
David Rusin

The trick I have seen for finding a good starting value is to find the first '1' which you can do with table lookups on each byte and generate 2^(N/2) which is just bit shifts.*

Hence if the highest '1' is at bit 47, then 2^47

Reply to
Steve C

One thing I often do is simulate microcontroller algorithms on a PC. I would find out rather rapidly for the 2^32 case whether it always converged within three iterations. I don't know how long it would take, but definitely under an hour and probably much less.

However, some of the integers I'm interested in may be around 2^64. I believe that would be a problem even for a PC. So some formal analysis would be necessary.

Datesfat

Reply to
Datesfat Chicks

x))

bit

r

es

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D

From this site:

formatting link

(Copied from site, untested) Here a C-routine for integer-square-rooting for numbers between 0 and

65536:

int sqrt (int x) { int base, i, y ; base =3D 128 ; y =3D 0 ; for (i =3D 1; i x ) { y -=3D base ; // base should not have been added, so we substract again } base >> 1 ; // shift 1 digit to the right =3D divide by 2 } return y ; }

Enrico

Reply to
Enrico

Is this a homework question?

You might want to consider a 32 bit MCU with Floating Point Unit.

formatting link

--Vinnie

--------------------------------------- Posted through

formatting link

Reply to
vinnie

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.