#### Do you have a question? Post it now! No Registration Necessary

**posted on**

January 16, 2021, 3:07 am

I was working on doing various calculations in fixed point math, but the so

urces provide equations that are intended to be done in floating point. Th

ere are squares and coefficients with 10^-6 and other items that make using

fixed point a PITA. So I'm going to use floating point in the FPGA. It's

actually not so hard. Many years ago I worked on a machine that did 32 bi

t floating point math in hardware, so I know the basics. It's a bit odd th

at addition takes more steps than multiplication as the exponents must be t

he same value, so the lesser exponent must be adjusted up and the mantissa

adjusted down before the add. Multiplication can be done without this adju

stment. Both must be normalized after the operation, but the result of the

multiply only ever needs adjusting by one bit position while the addition

may need normalizing by the length of the mantissa.

The FPGA I'm working with has various limitation in the DSP MAC units that

makes it a bit harder to use than I would like. The adder can be set for a

ddition or subtraction, but not switched in real time. So I'm ditching the

accumulator and only using the multiplier with an add/sub chain from the F

PGA fabric. With that the floating point ops can be done in multiple cycle

s each.

In the end this will be easier than trying to adjust all the calculations a

nd coefficients to keep the fixed point calculations in a range that preser

ves significant bits. In particular a flow sensor that requires a square r

oot calculation involved a lot of detailed coefficients for various compens

ation factors could have been very problematic in fixed point.

Fortunately the clock speed is not all that fast so it won't require the fu

ll speed of the DSP block and external logic can be used to implement all t

he operations of floating point. The calculations aren't all that involved

and there are many clock cycles available so it won't be hard to configure

this. I'm a bit disappointed the DSP block wasn't more useful. I found t

hat in many configurations the accumulator output did not include the msb o

f the actual register, but rather it either duplicated the next to msb or w

as set to zero depending on the signed/unsigned setting of the output. Why

calculate it and then not make it available? The user can always choose t

o use it or not on their own, but if you don't give the choice...?

urces provide equations that are intended to be done in floating point. Th

ere are squares and coefficients with 10^-6 and other items that make using

fixed point a PITA. So I'm going to use floating point in the FPGA. It's

actually not so hard. Many years ago I worked on a machine that did 32 bi

t floating point math in hardware, so I know the basics. It's a bit odd th

at addition takes more steps than multiplication as the exponents must be t

he same value, so the lesser exponent must be adjusted up and the mantissa

adjusted down before the add. Multiplication can be done without this adju

stment. Both must be normalized after the operation, but the result of the

multiply only ever needs adjusting by one bit position while the addition

may need normalizing by the length of the mantissa.

The FPGA I'm working with has various limitation in the DSP MAC units that

makes it a bit harder to use than I would like. The adder can be set for a

ddition or subtraction, but not switched in real time. So I'm ditching the

accumulator and only using the multiplier with an add/sub chain from the F

PGA fabric. With that the floating point ops can be done in multiple cycle

s each.

In the end this will be easier than trying to adjust all the calculations a

nd coefficients to keep the fixed point calculations in a range that preser

ves significant bits. In particular a flow sensor that requires a square r

oot calculation involved a lot of detailed coefficients for various compens

ation factors could have been very problematic in fixed point.

Fortunately the clock speed is not all that fast so it won't require the fu

ll speed of the DSP block and external logic can be used to implement all t

he operations of floating point. The calculations aren't all that involved

and there are many clock cycles available so it won't be hard to configure

this. I'm a bit disappointed the DSP block wasn't more useful. I found t

hat in many configurations the accumulator output did not include the msb o

f the actual register, but rather it either duplicated the next to msb or w

as set to zero depending on the signed/unsigned setting of the output. Why

calculate it and then not make it available? The user can always choose t

o use it or not on their own, but if you don't give the choice...?

--

Rick C.

- Get 1,000 miles of free Supercharging

Rick C.

- Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed Point Arithmetic is a PITA!

On 2021/01/15 7:07 p.m., Rick C wrote:

I'd be more inclined to read your article if there wasn't a Tesla add at

the bottom. Any way you can turn that off?

I find those something 'free' ads annoying as they are designed to catch

one's eye - and I am trying to read the article...

John

--

Free, get a new tax free life, see the mysteries of the orient, (see

what I mean?)

Free, get a new tax free life, see the mysteries of the orient, (see

what I mean?)

Re: Fixed Point Arithmetic is a PITA!

On Saturday, January 16, 2021 at 12:38:42 AM UTC-5, John Robertson wrote:

I don't know what to tell you. If you read posts from the bottom up, my entire post will likely not make sense to you. I'm just sayin'...

All I can say is... Wow! But I'm sure you won't read this post either.

I don't know what to tell you. If you read posts from the bottom up, my entire post will likely not make sense to you. I'm just sayin'...

All I can say is... Wow! But I'm sure you won't read this post either.

--

Rick C.

+ Get 1,000 miles of free Supercharging

Rick C.

+ Get 1,000 miles of free Supercharging

We've slightly trimmed the long signature. Click to see the full one.

Re: Fixed Point Arithmetic is a PITA!

There's no law you have to use one or the other, if only one part of the

math is challenging to implement in fixed-point then just do that one in

floating point.

It's probably not worth the trouble when you have clock cycles/power to

burn, but on e.g. a more resource constrained device if you only use

floating point in one place the compiler/optimizer will only pull in

what's needed to do that one calculation and not the whole floating

point library/core.

Do you know the fast inverse square root?

*<https://ieeexplore.ieee.org/document/8073975>*

Re: Fixed Point Arithmetic is a PITA!

What to you need the subtraction for ? Just add negative values. To

make a negative value, simply invert all bits (1's complement) and

then add 1 (2's complement).

Alternatively, set the ALU to subtraction and use similar tricks for

addition, as in the very old CADET (Can't Add, Does Not Even Try :-)

computer.

Are you designing airplane or turbine wings that require a high

sampling rate flow measurement. In other applications the sample rate

requirement is not that high, so even a simple 8 bitter UC would do.

The conversion between int and float is costly so try to avoid them.

Of course, if the hardware has FindFirstBitSet instructions and multi

position arithmetic shifts with shift count in a register, the

conversion can be fast.

Since there appears to be quite a lot of bits available, use some

fixed point arithmetic,

https://en.wikipedia.org/wiki/Fixed-point_arithmetic

i.e. some bits are allocated to integer part and some bits allocated

to fractional bits. Addition does not require normalization and in

multiplication you need to discard a fixed number of bits from the

double word result (possible add 1 to the retained last bit, if the

discarded part was large (rounding).

Re: Fixed Point Arithmetic is a PITA!

What was so critical core memory temperatures ? Later core memories

did not need such preheat.

In very old computers using mercury filled tubes as acoustic delay

lines as recirculating shift registers, both the absolute temperature

and equal temperature between mercury tubes was critical. The speed of

sound in mercury varies with temperature. in order to get the serial

memory words out of two or more shift registers at the same time, the

tubes had to be at the same temperature.

---

Apparently in the 1950/60's it was popular to have an indirect bit in

every memory word. If a memory address load contained the indirect bit

set, the low bits of the memory address is used for a new memory

address access, which again could contain the indirect it set and so

on. Putting the address part pointing to itself and setting the

indirect bit caused a single instruction infinite loop :-).

This was possible on IBM 1620 but also on Honeywell DDP-x16 series

computers. Since a core memory read is actually a ReadModifyWrite,

each memory reference warms the core in that location. Using self

referencing pointers and letting it run too long would permanently

destroy that core memory location on DDP.. Some DDP models had

hardware protection against self referencing indirect access.

Did the IBM 1620 have such problems with self referencing indirect

pointers ?

Re: Fixed Point Arithmetic is a PITA!

On 1/16/21 3:03 PM, snipped-for-privacy@downunder.com wrote:

Apparently the core material was temperature sensitive, so was heated to

a bit above room temperature (40C ?).

I only used a 1620 briefly using FORTRAN, so I am not familiar with the

architecture details. It was replaced with an IBM 360 shortly after I

started programming, which is what I learned my early programming on

(FORTRAN and assembler).

My research advisor had used A 1620 (mod I apparently) extensively. They

had even replaced the the lookup tables for the arithmetic ops to create

new instructions to assist in x-ray crystallography. He had also used an

IBM 650 and 7094. He was not impressed with the 360 floating point.

Apparently the core material was temperature sensitive, so was heated to

a bit above room temperature (40C ?).

I only used a 1620 briefly using FORTRAN, so I am not familiar with the

architecture details. It was replaced with an IBM 360 shortly after I

started programming, which is what I learned my early programming on

(FORTRAN and assembler).

My research advisor had used A 1620 (mod I apparently) extensively. They

had even replaced the the lookup tables for the arithmetic ops to create

new instructions to assist in x-ray crystallography. He had also used an

IBM 650 and 7094. He was not impressed with the 360 floating point.

Re: Fixed Point Arithmetic is a PITA!

On 16.1.21 23.03, snipped-for-privacy@downunder.com wrote:

This is what made the Elliott 803 so sensitive for temperature and

the three-phase clock frequency in the computer. The registers were

delay lines and most of the logic was memory cores.

There was usually also another bit, own-page/zero-page. If the own

page was selected, the top bits of the address register were used

as left by the instruction fetch, giving an address on the same page

as the instruction. If the zero page was selected, the top address bits

were cleared, giving an address on the page starting at zero.

IIRC, the things that overheated were not the cores, but the core

line drivers. The cores are arranged in a x-y matrix, and each

matrix plane represents one bit. There is a third line through

each core, common to all cores in a bit plane. The line is usually

called the Z line, and it is used for reading out the results and

setting the proper bit value on write. All three groups of drivers

have to be fast and at the same time handle substantial voltages

and currents, which put the chips at he limits of the technology

available qat the time.

The indirect addressing was very common in the (mini)comouters

of the era, HP 21 series, Digital PDP-8, Data Gewneral Nova,

Honeywell DDP-112 and others.

On PDP-11, there was a bit in the addressing mode for selscting

indirect addressing, but the pointer did not have an indirect

bit, as the addressing in byte-based instead of based on 16 bit

words.

This is what made the Elliott 803 so sensitive for temperature and

the three-phase clock frequency in the computer. The registers were

delay lines and most of the logic was memory cores.

There was usually also another bit, own-page/zero-page. If the own

page was selected, the top bits of the address register were used

as left by the instruction fetch, giving an address on the same page

as the instruction. If the zero page was selected, the top address bits

were cleared, giving an address on the page starting at zero.

IIRC, the things that overheated were not the cores, but the core

line drivers. The cores are arranged in a x-y matrix, and each

matrix plane represents one bit. There is a third line through

each core, common to all cores in a bit plane. The line is usually

called the Z line, and it is used for reading out the results and

setting the proper bit value on write. All three groups of drivers

have to be fast and at the same time handle substantial voltages

and currents, which put the chips at he limits of the technology

available qat the time.

The indirect addressing was very common in the (mini)comouters

of the era, HP 21 series, Digital PDP-8, Data Gewneral Nova,

Honeywell DDP-112 and others.

On PDP-11, there was a bit in the addressing mode for selscting

indirect addressing, but the pointer did not have an indirect

bit, as the addressing in byte-based instead of based on 16 bit

words.

--

-TV

-TV

Re: Fixed Point Arithmetic is a PITA!

On 16/01/2021 08:28, bitrex wrote:

Provided that you keep track of the magnitudes integer or fixed point

implementation is much faster. I have worked on problems where the

scaling was critical even on REAL*4 floating point. Otherwise some of

the terms would overflow or denormalise during the calculations.

The first approach should always be to look for a normalisation of the

problem scaling that allows a pure integer implementation.

Another one that is handy (and missing from that review) depending on

how fast your hardware divide is the Pade approximation for sqrt.

sqrt(1+x) = 1 + 2x/(4+x)

-0.5 < x < 1 it is better than 1%

At the extremes it gives sqrt(0.5) = 0.7142 and sqrt(2) = 1.4

Tweaks to the coefficients can get it closer still or use it over a

narrower range of 0.25 either side of x=0). Sqrt(1.21) = 1.09976

That might be good enough already. A single NR loop will get you more

than enough digits for most measured sensor data.

The next highest order one is even more accurate and might well be

amenable to FPGA use since it is very 2^n based

sqrt(1+x) = 1 + x*(4+x)/(8+4x) 0.7083' & 1.416'

Provided that you keep track of the magnitudes integer or fixed point

implementation is much faster. I have worked on problems where the

scaling was critical even on REAL*4 floating point. Otherwise some of

the terms would overflow or denormalise during the calculations.

The first approach should always be to look for a normalisation of the

problem scaling that allows a pure integer implementation.

Another one that is handy (and missing from that review) depending on

how fast your hardware divide is the Pade approximation for sqrt.

sqrt(1+x) = 1 + 2x/(4+x)

-0.5 < x < 1 it is better than 1%

At the extremes it gives sqrt(0.5) = 0.7142 and sqrt(2) = 1.4

Tweaks to the coefficients can get it closer still or use it over a

narrower range of 0.25 either side of x=0). Sqrt(1.21) = 1.09976

That might be good enough already. A single NR loop will get you more

than enough digits for most measured sensor data.

The next highest order one is even more accurate and might well be

amenable to FPGA use since it is very 2^n based

sqrt(1+x) = 1 + x*(4+x)/(8+4x) 0.7083' & 1.416'

--

Regards,

Martin Brown

Regards,

Martin Brown

Re: Fixed Point Arithmetic is a PITA!

On Sat, 16 Jan 2021 10:43:18 +0000, Martin Brown

Some famous person said "If you really have to use floating point, you

don't understand the problem."

I wrote a math library for the 68K, called Zform, in signed format

32.32. That's as good as float for practical applications, and is a

lot faster. Z-to-Int conversion time is zero. Add/sub are fast because

there's no need to normalize. It was saturating too, in all cases, so

would cheerfully divide by zero or add two huge numbers. Great for

control loops.

Some famous person said "If you really have to use floating point, you

don't understand the problem."

I wrote a math library for the 68K, called Zform, in signed format

32.32. That's as good as float for practical applications, and is a

lot faster. Z-to-Int conversion time is zero. Add/sub are fast because

there's no need to normalize. It was saturating too, in all cases, so

would cheerfully divide by zero or add two huge numbers. Great for

control loops.

--

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.

John Larkin Highland Technology, Inc

The best designs are necessarily accidental.

Re: Fixed Point Arithmetic is a PITA!

That's behind the paywall. Is it this trick from the game Quake 3?

From my notes, don't ask me where the numbers come from.

1/wurzel x Trick aus Quake 3 kommt beim Normieren von Vektoren vor:

float InvSqrt (float x) {

float xhalf = 0.5f*x;

int i =

***(int***)&x;

i = 0x5f3759df - (i>>1);

x =

***(float***)&i;

x = x

***(1.5f - xhalf***x*x);

return x;

}

cheers, Gerhard

Re: Fixed Point Arithmetic is a PITA!

On 1/16/2021 6:18 AM, Gerhard Hoffmann wrote:

I think it was from Quake 1 circa 1996, when a lot of consumer PCs still

didn't have 3D-capable dedicated GPU; Pentium was also relatively new

and SSE/MMX instruction set hadn't been introduced yet so no help there

either initially.

The application was calculating per-pixel surface normals for e.g. then

taking dot products with the ambient lighting vectors and shading each

pixel appropriately. Had to be done on the general-purpose CPU if no GPU

was available and since it had to be computed for every pixel for every

frame it had to be fast.

By the time Quake 3 came out most PCs had some fashion of onboard

pipelined GPU for pixel and vertex "shader" computations so the trick

was redundant for PC graphics

I think it was from Quake 1 circa 1996, when a lot of consumer PCs still

didn't have 3D-capable dedicated GPU; Pentium was also relatively new

and SSE/MMX instruction set hadn't been introduced yet so no help there

either initially.

The application was calculating per-pixel surface normals for e.g. then

taking dot products with the ambient lighting vectors and shading each

pixel appropriately. Had to be done on the general-purpose CPU if no GPU

was available and since it had to be computed for every pixel for every

frame it had to be fast.

By the time Quake 3 came out most PCs had some fashion of onboard

pipelined GPU for pixel and vertex "shader" computations so the trick

was redundant for PC graphics

Re: Fixed Point Arithmetic is a PITA!

On Friday, January 15, 2021 at 7:08:01 PM UTC-8, snipped-for-privacy@gmail.com wrote:

Not just your calculations; astronomy, physics, statistics just does NOT always fit the fixed

point (or even floating point) standard of your machine.

I really miss VAX/VMS with G

Grit your teeth, and normalize everything, before you try to use

Not just your calculations; astronomy, physics, statistics just does NOT always fit the fixed

point (or even floating point) standard of your machine.

I really miss VAX/VMS with G

___float and H___float... but even that goes belly-up sometimes.Grit your teeth, and normalize everything, before you try to use

___Numerical___Recipes___in___XXX_, too.#### Site Timeline

- » Re: OT: The Deep State defined
- — Next thread in » Electronics Design

- » another Chinese stackup
- — Previous thread in » Electronics Design

- » OT A modest proposal
- — Newest thread in » Electronics Design

- » Low Noise Direct Coupled Preamp
- — Last Updated thread in » Electronics Design

- » OT A modest proposal
- — The site's Newest Thread. Posted in » Electronics Design

- » Low Noise Direct Coupled Preamp
- — The site's Last Updated Thread. Posted in » Electronics Design