# Fixed Point Arithmetic

• posted

I don't know why I thought it would be easy. The arithmetic is not so hard by itself. But changing all the equations to normalize the variables is n ot so easy. Then there is the issue of needing to assure the results don't grow out of range and I hadn't even given thought to the need for saturati ng arithmetic in some cases.

I'm designing a fixed point math engine to do moderately simple calculation s. I'm trying to avoid a large barrel shifter, so the data format is gener ally Q1.17. The first problem is getting the data into that format when it generally is in a range of 0 to 167772 or less. The formulas for using th e various data are calibrated for whatever the intended units are which res ults in changes to every coefficient and constant. It's a big job just map ping it all out.

The saturating arithmetic seems to be required when subtracting offsets. M ost of the sensors have a built in offset that needs to be calibrated out r esulting in a nominal zero result as a baseline value. With a modest amoun t of noise the zero wraps around. I think it is less of a problem at the h igh end as plenty of margin can be provided between the max values handled and the upper bound of the data format. Q1.17 provides a range of 0 to

Loading thread data ...
• posted

If your algorithm is entirely "feed forward" (think like FIR filter) then it's easy. Just compute the max numeric range at each step, and keep enough bits to never allow overflow. For the FPGA/ASIC designs I've done 90% of my designs fit here.

If your algorithm has feedback ( i.e. like IIR ), then it gets a bit more complicated. You need to put a rounding/truncation/saturation bit-reduction stage in somewhere. Carry enough bits for as long as you can, then explcitly add this bit-reduction stage.

The design (and placement in the processing chain ) of the this bit reduction block is very system dependent. For me, it's often best to punt some of this tuning to software, instead of always getting it perfect, the first time in hardware. At the bit-reduction stage include the ability for software to adjust the range by a a few bits - you shouldn't need anything near approaching a full-on IEEE dynamic floating point. I've only ever needed 2-3 bits of range control. Feed the software a saturation/overlow indicator from this bit-reduction block to help tune.

If you're entirely hardware, without software in the loop - your problem becomes even a bit harder.

In general, however, with todays FPGAs, it's usually much cheaper to err on the side of just adding more bits of resolution to give yourself more margin.

Or you can truly punt (like far too many folks dipping their toes in FPGA design) and decide you need floating point everywhere - and horribly overdesign, and consume resources left and right! (And still have rounding/truncation/saturation issues, but just not be aware of it on the onset..)

Regards, Mark

• posted

There's no IIR sections. No filtering really... yet. Some of the inputs are from ADC built into the FPGA with large integrators otherwise known as counters. Simple, but effective at eliminating noise, so no need for additional filtering.

The calculations are more along the line of compensating for offset and scale factor, a few require calculations. Actually, one is an integrator, turning flow rate into volume, but the range of that one is determined and overflow avoided.

Yeah, the reason for using the FPGA is the thinking that the "hardware" design will not be as hard to get through approvals as software. lol Not my idea, but I'm willing to help.

The requirements are not stringent, but the multipliers have 18 bits. Extending that uses a lot more resources and becomes more complex.

Floating point would seem to be overkill. There's not enough calculations to worry with numerical error if using floating point. Still, I'm not looking for that... I think. It would be interesting... I wish you hadn't mentioned it. lol

```--
Rick C.

+ Get 1,000 miles of free Supercharging ```
• posted

If you are trying to get accurate results near zero despite noise and variation, you want signed types. Saturating at zero is, mathematically, a rather arbitrary point even though it can be convenient in the implementation. Wrapping at zero is far worse. So use signed Q1.17 (-1.0 to +1.0) or if you need more range, signed Q2.16 (-2.0 to +2.0). In theory, you could have an asymmetric type with a range -0.5 to +1.5, but that's more complicated.

• posted

d by itself. But changing all the equations to normalize the variables is n ot so easy. Then there is the issue of needing to assure the results don't grow out of range and I hadn't even given thought to the need for saturatin g arithmetic in some cases.

ons. I'm trying to avoid a large barrel shifter, so the data format is gene rally Q1.17. The first problem is getting the data into that format when it generally is in a range of 0 to 167772 or less. The formulas for using the various data are calibrated for whatever the intended units are which resu lts in changes to every coefficient and constant. It's a big job just mappi ng it all out.

Most of the sensors have a built in offset that needs to be calibrated out resulting in a nominal zero result as a baseline value. With a modest amoun t of noise the zero wraps around. I think it is less of a problem at the hi gh end as plenty of margin can be provided between the max values handled a nd the upper bound of the data format. Q1.17 provides a range of 0 to

If you are using VHDL, which I think you are, there is a good, synthesizabl e (standardized, I believe) fixed-point library which handles saturation an d other modes and makes tracking the radix points easier. I used it once y ears ago and was happy with it. I haven't done any fixed-point in several years since I mostly do finite fields now. If you are using something like the DSP48 blocks in a Xilinx, remember there are hardware structures to ha ndle saturation and rounding which let you operate at full speed. If this is the project you've mentioned before, though, with very slow processing r ates, you oughtn't to be using hardware. I know you mentioned some rationa le below, but nonetheless, anything that doesn't need to do gigaoperations per second is ten times easier to design in C using floating-point.

• posted

ard by itself. But changing all the equations to normalize the variables is not so easy. Then there is the issue of needing to assure the results don' t grow out of range and I hadn't even given thought to the need for saturat ing arithmetic in some cases.

tions. I'm trying to avoid a large barrel shifter, so the data format is ge nerally Q1.17. The first problem is getting the data into that format when it generally is in a range of 0 to 167772 or less. The formulas for using t he various data are calibrated for whatever the intended units are which re sults in changes to every coefficient and constant. It's a big job just map ping it all out.

. Most of the sensors have a built in offset that needs to be calibrated ou t resulting in a nominal zero result as a baseline value. With a modest amo unt of noise the zero wraps around. I think it is less of a problem at the high end as plenty of margin can be provided between the max values handled and the upper bound of the data format. Q1.17 provides a range of 0 to >

ble (standardized, I believe) fixed-point library which handles saturation and other modes and makes tracking the radix points easier. I used it once years ago and was happy with it. I haven't done any fixed-point in several years since I mostly do finite fields now. If you are using something like the DSP48 blocks in a Xilinx, remember there are hardware structures to han dle saturation and rounding which let you operate at full speed. If this is the project you've mentioned before, though, with very slow processing rat es, you oughtn't to be using hardware. I know you mentioned some rationale below, but nonetheless, anything that doesn't need to do gigaoperations per second is ten times easier to design in C using floating-point.

Should not be using hardware??? What a curious thing to say. So the calcu lations should be done by the user in their heads? Radical! I'll have to get that into the requirements.

Yes, it is a block similar to the DSP48, but I see no mechanism to provide saturating arithmetic. It is an 18x18 multiplier (or two depending on conf iguration) and a three input adder/subtractor (configuration choice, not re al time darn it). One of the three adder inputs can be the output to make it an accumulator.

```--
Rick C.

-- Get 1,000 miles of free Supercharging ```
• posted

:

hard by itself. But changing all the equations to normalize the variables is not so easy. Then there is the issue of needing to assure the results do n't grow out of range and I hadn't even given thought to the need for satur ating arithmetic in some cases.

lations. I'm trying to avoid a large barrel shifter, so the data format is generally Q1.17. The first problem is getting the data into that format whe n it generally is in a range of 0 to 167772 or less. The formulas for using the various data are calibrated for whatever the intended units are which results in changes to every coefficient and constant. It's a big job just m apping it all out.

ts. Most of the sensors have a built in offset that needs to be calibrated out resulting in a nominal zero result as a baseline value. With a modest a mount of noise the zero wraps around. I think it is less of a problem at th e high end as plenty of margin can be provided between the max values handl ed and the upper bound of the data format. Q1.17 provides a range of 0 to <

2 and the goal will be to keep values in the range 0 to 1.0 as practical.

zable (standardized, I believe) fixed-point library which handles saturatio n and other modes and makes tracking the radix points easier. I used it onc e years ago and was happy with it. I haven't done any fixed-point in severa l years since I mostly do finite fields now. If you are using something lik e the DSP48 blocks in a Xilinx, remember there are hardware structures to h andle saturation and rounding which let you operate at full speed. If this is the project you've mentioned before, though, with very slow processing r ates, you oughtn't to be using hardware. I know you mentioned some rational e below, but nonetheless, anything that doesn't need to do gigaoperations p er second is ten times easier to design in C using floating-point.

lations should be done by the user in their heads? Radical! I'll have to ge t that into the requirements.

e saturating arithmetic. It is an 18x18 multiplier (or two depending on con figuration) and a three input adder/subtractor (configuration choice, not r eal time darn it). One of the three adder inputs can be the output to make it an accumulator.

I looked again at the DSP48E1, and it seems like it doesn't really saturate for you, but there is a fast pattern detector which will assert a flag for saturation.

You know the rule, though: if you can do it in software, do it in software .

• posted

te:

so hard by itself. But changing all the equations to normalize the variable s is not so easy. Then there is the issue of needing to assure the results don't grow out of range and I hadn't even given thought to the need for sat urating arithmetic in some cases.

culations. I'm trying to avoid a large barrel shifter, so the data format i s generally Q1.17. The first problem is getting the data into that format w hen it generally is in a range of 0 to 167772 or less. The formulas for usi ng the various data are calibrated for whatever the intended units are whic h results in changes to every coefficient and constant. It's a big job just mapping it all out.

sets. Most of the sensors have a built in offset that needs to be calibrate d out resulting in a nominal zero result as a baseline value. With a modest amount of noise the zero wraps around. I think it is less of a problem at the high end as plenty of margin can be provided between the max values han dled and the upper bound of the data format. Q1.17 provides a range of 0 to > > >

sizable (standardized, I believe) fixed-point library which handles saturat ion and other modes and makes tracking the radix points easier. I used it o nce years ago and was happy with it. I haven't done any fixed-point in seve ral years since I mostly do finite fields now. If you are using something l ike the DSP48 blocks in a Xilinx, remember there are hardware structures to handle saturation and rounding which let you operate at full speed. If thi s is the project you've mentioned before, though, with very slow processing rates, you oughtn't to be using hardware. I know you mentioned some ration ale below, but nonetheless, anything that doesn't need to do gigaoperations per second is ten times easier to design in C using floating-point.

culations should be done by the user in their heads? Radical! I'll have to get that into the requirements.

ide saturating arithmetic. It is an 18x18 multiplier (or two depending on c onfiguration) and a three input adder/subtractor (configuration choice, not real time darn it). One of the three adder inputs can be the output to mak e it an accumulator.

te for you, but there is a fast pattern detector which will assert a flag f or saturation.

e.

Depending on the issues behind "can".

Actually I am very much not a proponent behind "can". They were doing the software on an 8 bit Arduino. I think this will work just fine.

```--
Rick C.

-+ Get 1,000 miles of free Supercharging ```
• posted

It does look a little as if you may have jumped from the frying pan into the fire. I wouldn't choose an 8 bit Arduino or a small FPGA as the hardware for doing wide dynamic range low speed maths.

numbers you are using in this project. You can get decent free or paid for C or C++ tools with good debugging support (some certified for SIL level if you need it.) I've generally applied this rough rule of thumb for planning projects,

assembler is 10x the dev cost of C VHDL is 10x the cost of assembler

It assumes that the hardware is suitable for the job.

I've very rarely ended up with an FPGA deployed without a micro on the board as well. The hardware cost is often trivial - you can buy TI MSP430s with 16k code memory for under \$0.5. If your (small) FPGA needs a boot flash it's often cost effective to use a micro with a big on board flash as an an intelligent FPGA boot memory.

I suspect that it's far too late to change things now ......

MK

• posted

:

rote:

so hard by itself. But changing all the equations to normalize the variable s is not so easy. Then there is the issue of needing to assure the results don't grow out of range and I hadn't even given thought to the need for sat urating arithmetic in some cases.

culations. I'm trying to avoid a large barrel shifter, so the data format i s generally Q1.17. The first problem is getting the data into that format w hen it generally is in a range of 0 to 167772 or less. The formulas for usi ng the various data are calibrated for whatever the intended units are whic h results in changes to every coefficient and constant. It's a big job just mapping it all out.

sets. Most of the sensors have a built in offset that needs to be calibrate d out resulting in a nominal zero result as a baseline value. With a modest amount of noise the zero wraps around. I think it is less of a problem at the high end as plenty of margin can be provided between the max values han dled and the upper bound of the data format. Q1.17 provides a range of 0 to >>>>>

esizable (standardized, I believe) fixed-point library which handles satura tion and other modes and makes tracking the radix points easier. I used it once years ago and was happy with it. I haven't done any fixed-point in sev eral years since I mostly do finite fields now. If you are using something like the DSP48 blocks in a Xilinx, remember there are hardware structures t o handle saturation and rounding which let you operate at full speed. If th is is the project you've mentioned before, though, with very slow processin g rates, you oughtn't to be using hardware. I know you mentioned some ratio nale below, but nonetheless, anything that doesn't need to do gigaoperation s per second is ten times easier to design in C using floating-point.

alculations should be done by the user in their heads? Radical! I'll have t o get that into the requirements.

ovide saturating arithmetic. It is an 18x18 multiplier (or two depending on configuration) and a three input adder/subtractor (configuration choice, n ot real time darn it). One of the three adder inputs can be the output to m ake it an accumulator.

urate for you, but there is a fast pattern detector which will assert a fla g for saturation.

ware.

he software on an 8 bit Arduino. I think this will work just fine.

It's not too late to change things. I just don't want to go into a convers ation of why this was chosen. It would result is a bunch of additional pos ts and not be productive.

```--
Rick C.

+- Get 1,000 miles of free Supercharging ```

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.