Suitable Integer Square Root Algorithm for 32-64-Bit Integers on Inexpensive Microcontroller?

- D
- Datesfat Chicks
  
  Contact options for registered users
posted
14 years ago

Thu, Feb 4, 2010 7:22 PM

Hi,

I have the need in a microcontroller application to calculate floor(sqrt(x)) where x is approximately in the range of 2^32 - 2^64.

What are the algorithms I should look at?

These small microcontrollers are characterized by having a 8-bit times 8-bit to give 16-bit result integer unsigned multiply instruction, and a similar

16/8 division instruction to give a quotient and a remainder. Mulitplications of larger integers have to be done by multiplying the bytes and adding.

I'm aware of these algorithms:

a)Adding consecutive odd integers and counting how many you have to add, i.e. 1 + 3 = 4, 4 + 5 = 9, 9 + 7 = 16, 16 + 9 = 25, etc.

b)Trial squaring, testing setting each bit to "1", i.e.

formatting link

c)Another method that seems to work (don't know the name of it), here is source code:

formatting link

What other options should I investigate?

Thanks, Datesfat

- D
- Dann Corbit
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Feb 4, 2010 7:32 PM

formatting link

Here is some C code for integer square roots using many different algorithms:

formatting link

Worth a look:

formatting link

Consider also:

formatting link

- D
- D Yuniskis
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Feb 4, 2010 7:51 PM

x = y^2 - (2^16)^2

Google is your friend :> There are many sqrt algorithms with different tradeoffs (speed/size).

I would, instead, ask that you might want to examine what *else* you know about "x". I.e., you've already constrained the range (why? does that give you any other information that can be exploited?); are there any other characteristics about how "x" is synthesized that you can exploit to divide and conquer?

E.g. if x = y * z, then knowing sqrt(y) and/or sqrt(z) simplifies your problem.

formatting link

- D
- Didi
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Feb 4, 2010 8:19 PM

x))

bit

r

es

The one I have been using last 15-20 years is probably your B, at least sounds like what I made up back then - successive approximation for each bit of the result. If multiplication is fast on the core you will be using it it is hard to beat.

Dimiter

- D
- Datesfat Chicks
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Feb 4, 2010 8:33 PM

Thanks for the insight.

For small operands, I agree with your analysis.

However, for larger operands, the Babylonian method:

formatting link

looks very promising.

The issue is that for the cost of one division (the /2 and addition is virtually free) you get far more than one bit of additional information about the square root.

For larger operands, I don't believe that (b) will be the right answer any longer.

But I'm not sure yet ...

Thanks for all, Datesfat

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Thu, Feb 4, 2010 9:32 PM

That quite certainly can't work for the input range you're targeting. You would be looking at up to four billion iterations if x was 2^64

A whole lot better already. It becomes even better if you update the square of your current result differentially, applying the binomial formula. As you update the result-to-be ('res' in the following), by adding a 1-bit at position 'k':

x --> x + 2^k

it's square, which eventually should be equal to the input number, updates too:

x^2 --> x^2 + 2*(2^k)*x + (2^k)^2 = x^2 + x

- N
- Nobody
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Feb 5, 2010 1:38 AM

Newton-Raphson or bisection, depending upon how slow division is. N-R is asymptotically faster (the number of correct digits doubles at each step), but each step requires a division.

Bisection is essentially your option b), but it can be computed much more efficiently than the algorithm given. You don't need arbitrary multiplies, only shifts, as:

(a+2^k)^2 = a^2 + 2.a.2^k + (2^k)^2 = a^2 + a.2^(k+1) + 2^2k

IOW:

uint32_t isqrt(uint64_t x) { uint64_t a = 0; uint64_t a2 = 0; int k;

for (k = 31; k >= 0; k--) {

uint64_t t = a2 + (a

- V
- Vladimir Vassilevsky
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Feb 5, 2010 1:58 AM

Homework?

For microcontroller purpose, a 10-bit accurate sqrt approximation should probably suffice. I can hardly imagine anything practical that would require more then 14 significant bits. So, normalize your number and then do polynomial or piecewise linear approximation.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

- D
- Datesfat Chicks
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Feb 5, 2010 2:26 AM

The identity you provided above is so obvious that I'm beating myself on the head with a hammer right now for not spotting that as an alternative to multiplying again on each iteration.

Ouch ouch ouch.

Thanks sincerely.

As you know, division of large integers is fairly expensive on a little microcontroller. The "bisection" insight you provided is definitely something I will investigate.

Datesfat

- D
- David Rusin
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Feb 5, 2010 8:07 PM

It might be better to solve g(x)=0 where g(x) = 1/x^2 - y (the solution is obviously 1/sqrt(y) ) and then multiply by y. The reason is that NR iteration in this case does not involve any division except for a bit shift:

x' = x - g(x)/g'(x) = x - (1/x^2 - y) / (-2 / x^3) = x + x/2 - y*x^3/2 = x*(3 - y*x^2)/2

In the same way we can compute reciprocals using only multiplication (use h(x) = 1/x - y ).

dave

- S
- Steve C
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Fri, Feb 5, 2010 11:07 PM

The trick I have seen for finding a good starting value is to find the first '1' which you can do with table lookups on each byte and generate 2^(N/2) which is just bit shifts.*

Hence if the highest '1' is at bit 47, then 2^47

- D
- Datesfat Chicks
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Feb 7, 2010 5:23 AM

One thing I often do is simulate microcontroller algorithms on a PC. I would find out rather rapidly for the 2^32 case whether it always converged within three iterations. I don't know how long it would take, but definitely under an hour and probably much less.

However, some of the integers I'm interested in may be around 2^64. I believe that would be a problem even for a PC. So some formal analysis would be necessary.

Datesfat

- E
- Enrico
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Sun, Feb 7, 2010 6:26 AM

x))

bit

r

es

=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D

From this site:

formatting link

(Copied from site, untested) Here a C-routine for integer-square-rooting for numbers between 0 and

65536:

int sqrt (int x) { int base, i, y ; base =3D 128 ; y =3D 0 ; for (i =3D 1; i x ) { y -=3D base ; // base should not have been added, so we substract again } base >> 1 ; // shift 1 digit to the right =3D divide by 2 } return y ; }

Enrico

- V
- vinnie
  
  Contact options for registered users
Vote on answer
posted
14 years ago

Tue, Feb 16, 2010 9:35 PM

Is this a homework question?

You might want to consider a 32 bit MCU with Floating Point Unit.

formatting link

--Vinnie

--------------------------------------- Posted through

formatting link