How would my SoC calculate an integer square root?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hello everyone,

I am working with a Nordic nRF52 SoC.  The CPU subsystem of the nRF52 is an
 ARM Cortex-M4.  My development system is a Linux box running GCC.  My embe
dded systems toolchain is gcc-arm-none-eabi-4_9-2015q1, as recommended by N
ordic.

I have written a fair amount of code for this device, but now I have to exp
lore a bit deeper.  I have an algorithm that may require me to calculate a  
lot of square roots, and quickly.  However, the numbers in question are all
 small integers.

Even though the ARM Cortex-M4 has an FPU, it shouldn't need to get involved
 in this square root calculation.  Optimized algorithms for various math fu
nctions exist.  A nice discussion of the optimization of the integer square
 root algorithm, written by Dr. Jack Crenshaw, is found here:

http://www.embedded.com/electronics-blogs/programmer-s-toolbox/4219659/Inte
ger-Square-Roots

I am interested in understanding exactly what code will be compiled if I #i
nclude <math.h> in my program, and then call sqrt() with an int32 argument.
  It stands to reason that optimized code, like that shown in the blog abov
e, is what should be used.  How can I determine this?  

I am poking around in the gcc-arm-none-eabi-4_9-2015q1 folder for clues.  T
he only math-related file that I can find there is called tgmath.h, and it  
just #defines aliases which point back to math.h.  I am looking for source  
code.

Thanks for any advice you can share.


Re: How would my SoC calculate an integer square root?
On 05/01/17 10:27, snipped-for-privacy@itu.edu wrote:
Quoted text here. Click to load it

sqrt() takes a double-precision floating point number, so
your int32 will be promoted to double before sqrt is called,
and you'll get a double back.


Re: How would my SoC calculate an integer square root?
On Wednesday, January 4, 2017 at 3:46:43 PM UTC-8, Clifford Heath wrote:
Quoted text here. Click to load it
I #include <math.h> in my program,
Quoted text here. Click to load it

Thanks for your reply, Clifford.  That's actually not an encouraging answer
 though!  If that is the case, I think I will write the square root functio
n that was shown in Listing 4 of Jack Crenshaw's blog.  The floating-point  
function will spend time calculating digits of precision that I do not need
 for my application.

Re: How would my SoC calculate an integer square root?
Den torsdag den 5. januar 2017 kl. 01.17.19 UTC+1 skrev snipped-for-privacy@itu.edu:
Quoted text here. Click to load it
f I #include <math.h> in my program,
Quoted text here. Click to load it
er though!  If that is the case, I think I will write the square root funct
ion that was shown in Listing 4 of Jack Crenshaw's blog.  The floating-poin
t function will spend time calculating digits of precision that I do not ne
ed for my application.


the fpu instruction VSQRT.F32 takes 14 cycles if floats are enough that is  
probably hard to beat


Re: How would my SoC calculate an integer square root?
On 05/01/17 01:28, snipped-for-privacy@gmail.com wrote:
Quoted text here. Click to load it

As long as "-ffast-math" and at least -O1 are enabled, as well as
appropriate cpu flags, the "sqrtf" function should generate a VSQRT.F32
automatically.  Note that "sqrtf" is not the same as "sqrt".

Re: How would my SoC calculate an integer square root?
On Wednesday, January 4, 2017 at 4:28:53 PM UTC-8, snipped-for-privacy@gmail.com wrote:

Quoted text here. Click to load it

My information about the performance of FPU's is clearly a few years behind the times.  That's actually a very impressive result, and you're right, it would be quite hard to beat.

Re: How would my SoC calculate an integer square root?
On Thu, 05 Jan 2017 15:13:42 -0800, jladasky wrote:

Quoted text here. Click to load it

Keep in mind, as David pointed out, that you want to make sure it's a  
single-precision floating point: the FPU on the M4 doesn't do double-
precision math, so that would still go the horridly slow software route  
(although a smart, FPU-aware algorithm might get an initial guess from  
the FPU).

--  

Tim Wescott
Wescott Design Services
We've slightly trimmed the long signature. Click to see the full one.
Re: How would my SoC calculate an integer square root?
On Thu, 05 Jan 2017 17:44:36 -0600, Tim Wescott

Quoted text here. Click to load it

If I understand correctly, the OP wants a square root algorithm
accepting an integer and returning an integer (not a float).  

That algorithm will give precise values for sqrt of 1, 4, 9, 16,  25
and so on. The sqrt for other values contains a more or less large
error.  

The discussion about single vs double precision floats is not relevant
in this environment.

The OP said that the argument is a small integer and as long as the
argument  is 6 decimal digits or less, it can be represented exactly
in single precision float. Thus the result will have up to 4 decimal
digits in integer part and a few decimal digits in the fractional
part. Just convert the single precision value to integer to get away
the fractional part, if truly an integer result is required.

It should be noted that conversion from float to integer can take a
long time on some architectures (if no barrel shifter available), so
it may be a good idea to keep integer and float calculations separate
and avoid mixing them in the same calculations.


Re: How would my SoC calculate an integer square root?
On 06/01/17 06:54, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

Yes.


Yes.  The OP has not said if he wants round-down, round-to-nearest, or
does not care.

Quoted text here. Click to load it

It is very relevant.  An easy way to implement his requirements
(ignoring range checks) is:

int sqrt_int(int x) {
    return sqrtf(x);
}

But that requires the compiler flags that have been discussed, and
"sqrtf" instead of "sqrt", in order to compile to a VSQRT instruction
and a couple of conversion instructions.

Quoted text here. Click to load it

It should work fine - as long as doubles are avoided (to avoid the
slowdown).

Quoted text here. Click to load it

The M4F has float to integer conversion instructions (as do all cpus
with floating point hardware that I have ever seen).  Again, flags like
-ffast-math and -O1 are required to be sure that these are used without
library calls to handle the possibility of "weird" floats.

Your point is important if doubles are involved, or on an architecture
without hardware floating point.  Conversions back and forth between
integers and floating point formats can be relevant for the cost of the
operations.  However, they are not /that/ expensive - they are typically
not worse than general arithmetic like addition and multiplication, and
cheaper than division.  They may also give the compiler more
optimisation opportunities than floating point values (especially if you
have forgotten -ffast-math).



(And the ARM architecture has a barrel shifter.)

Re: How would my SoC calculate an integer square root?
On Fri, 06 Jan 2017 09:12:44 +0100, David Brown

Quoted text here. Click to load it

The time used to discuss what compiler flags to use could have better
used coding such simple function in assembler :-).


Re: How would my SoC calculate an integer square root?
On 06/01/17 11:21, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

No, it could not - it is better to learn how to use your tools (such as  
the compiler) properly, so that you can re-use the information to  
improve more code later, rather than to learn a single inconvenient  
work-around.

Understanding the point of -ffast-math and optimisation flags, and about  
float vs. double, will improve /all/ floating point code written by  
/everyone/ who reads this thread and learned from it.  A simple inline  
assembly function (which is, in fact, slightly fiddly if you want to get  
the best from it) would merely help one person in one single situation.

Give a man a fish, and all that.


Re: How would my SoC calculate an integer square root?
On Fri, 6 Jan 2017 16:50:02 +0100, David Brown

Quoted text here. Click to load it

I did not refer to in-line assembly in particular, but also to
separate compiled assembler code. Of course, you need to know the
parameter passing mechanism to separate compiled/assembled units and
for functions how to return results. In most languages, parameters are
passed on the stack, some push the first parameter first, while some
push the last parameter last.  

Quoted text here. Click to load it

Understanding the parameter passing mechanism in general is a useful
skill.



Re: How would my SoC calculate an integer square root?
On 06/01/17 18:08, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

Separately compiled assembly usually involves quite a bit more messing  
around than inline assembly.  You have to handle a variety of assembler  
directives to get it to work.  You need to understand more, in order to  
follow the calling convention properly (including preserving registers  
if and only if necessary).  And the result is a lot less efficient for  
short sections of code, because of the function call overhead.  Inline  
assembly is shorter, simpler, and can work together with the optimiser -  
though you still need to know what you are doing.

Quoted text here. Click to load it

The calling convention varies a lot between platforms.  On targets with  
plenty of registers, parameters are mostly passed in registers, with the  
stack used when there are too many registers or for things like structs.

Quoted text here. Click to load it

It /can/ be a useful skill, yes.  But there is no such thing as  
"parameter passing mechanism in general" as it varies between platforms.  
  And the skill is not nearly as helpful as the skill of understanding  
how to use your compiler.  Learning some of the basics of how compiler  
flags work and how compiler optimisation works is /vastly/ more useful  
than trying to write your own assembly functions.  It is also much  
easier, much safer (if you get it wrong you are likely to get slower  
code, but still equally correct code), much more portable, and leads to  
writing clearer and more maintainable code instead of cryptic assembly.

(I strongly advocate learning to /read/ assembly on the platforms you  
work with, in order to understand the results of your compilation.  But  
for most developers, if you are writing more assembly than a very  
occasional single-line inline assembly function, you should probably  
re-consider your development strategy.)



Re: How would my SoC calculate an integer square root?
On 06/01/17 00:44, Tim Wescott wrote:
Quoted text here. Click to load it

Not all FPU's give accurate square root results from their hardware
instructions.  The ARM documentation does not give details, so (unless
someone else knows better) we can assume that it /does/ give accurate
values, but not necessarily following IEEE specifications precisely.  I
have seen other FPU's equivalent to the M4F's where the square root
instruction is explicitly an approximation.  It would be good enough for
many uses, but for a highly accurate result you would use it as the
estimate and then run a couple of rounds of Newton-Raphson.

Quoted text here. Click to load it

A smart FPU-aware algorithm /might/ do that, but it is very unlikely in
practice.  It would not actually save many cycles - you can already get
a good initial estimate by simply taking the exponent from the floating
point number and halving it.  By the time you have handled the
conversions between double and floating point and vice versa, and
considering the cost of the rest of the double-precision software
floating point operations needed in the square root, I can't see you
saving more than a couple of percent off the total time.  It is not
worth building a special version of the library just for this one case -
other double-precision maths cannot, in general, get any benefit from
using the single-precision hardware.


Re: How would my SoC calculate an integer square root?
On Fri, 06 Jan 2017 08:46:11 +0100, David Brown

Quoted text here. Click to load it

If those cycles are the same as the actual clock oscillator cycles,
the 32 MHz oscillator would allow more than 2 million VSQRT
instructions each second.  

The OP estimated a thousand square roots each second, which would
consume less than  0.1 % of the clock cycles. No big deal and not a
reason for optimization.

Quoted text here. Click to load it

Not following the IEEE usually means it doesn't handle (extremely
small) denormalized values, might not generate proper +/- infinity
values f(for huge values) or properly handling NaN (Not a number)
cases.  

Quoted text here. Click to load it

Floating point values are nearly always approximations anyway. Only
some special cases (small integers) are exact, but you can't represent
exactly values like 0.1 or 0.01.  

To calculate the sqrt on a hardware with basic floating point
instruction but not hardware SQRT instruction, I have used the
following approach:
* normalize the value between 1..4 so that the exponent is even
* divide the exponent by two to get the result exponent
* using a polynomical with the mantissa bits to calculate the result
mantissa
* 4th order polynomicals give sufficient results for float and 8th
order for double.
* I have seen implementations (such as VAX) with 3rd order for single
and 6th for double degree.



Re: How would my SoC calculate an integer square root?
On 06/01/17 11:50, snipped-for-privacy@downunder.com wrote:
Quoted text here. Click to load it

He also made it clear that he wants to do other things during the time  
than just square roots, as well as wanting to minimise the time in order  
to reduce power consumption.

Quoted text here. Click to load it

Mostly true.  It is also not uncommon for rounding to be done in a  
different way, or to lose a LSB here or there compared to IEEE  
references.  Usually these details don't matter.  But if you compile  
without -ffast-math (or the relevant individual flags), the compiler  
will do its best to get IEEE conformance /exactly/ right - even if that  
means a library subroutine to handle the final details.

Quoted text here. Click to load it

Polynomials like this are sometimes used.  Usually the "best" choice of  
algorithm here depends on the relative speeds of division and  
multiplication.  Early floating point units (such as the VAX) were  
notoriously slow at division, so an algorithm that uses polynomials  
(addition and multiplication) is preferred.



Re: How would my SoC calculate an integer square root?
On Fri, 6 Jan 2017 16:55:18 +0100, David Brown

Quoted text here. Click to load it

He just has 99.9 % of the total CPU cycles available for the "other"
tasks.



Re: How would my SoC calculate an integer square root?
snipped-for-privacy@downunder.com writes:

Quoted text here. Click to load it

It looks like you did not read to the end of his sentence. It will be
consume 100% of the cycles if it only wakes up to do that one thing.

Modern MCUs can achieve very low power operation by sleeping most of the
time. Power consumption can then be nearly proportional to the wake duty
cycle. So going from "0.1%" to "0.2%" could be the difference between a
2 year and a 4 year life when powered by a coin cell say.


--  

John Devereux

Re: How would my SoC calculate an integer square root?
AT Thursday 05 January 2017 08:17, snipped-for-privacy@itu.edu wrote:

Quoted text here. Click to load it

One of the rules of optimization is: Only do it after you have numbers to  
compare. You are jumping to conclusions without any data. First benchmark  
and measure, then optimize, measure again and compare.

--  
Reinhardt


Re: How would my SoC calculate an integer square root?
On 1/4/2017 8:22 PM, Reinhardt Behm wrote:
Quoted text here. Click to load it

The first rule of optimizing is, "don't" until you find you clearly need  
it.

--  

Rick C

Site Timeline