#### Do you have a question? Post it now! No Registration Necessary

**posted on**

- Tim Wescott

January 25, 2017, 4:59 am

So, just doing a brief search, it looks like Altera is touting a floating

point slice in at least one of their lines.

Is this really a thing, or are they wrapping some more familiar fixed-

point processing with IP to make it floating point?

And, anything else you know.

TIA.

point slice in at least one of their lines.

Is this really a thing, or are they wrapping some more familiar fixed-

point processing with IP to make it floating point?

And, anything else you know.

TIA.

--

Tim Wescott

Wescott Design Services

Tim Wescott

Wescott Design Services

We've slightly trimmed the long signature. Click to see the full one.

Re: Hardware floating point?

I'm not sure what you are asking. What do you think floating point is

exactly? The core of floating point is just fixed point arithmetic with

an extra bit (uh, rereading this I need to make clear this is the

British "bit" meaning part :) to express the exponent of a binary

multiplier. To perform addition or subtraction on floating point

numbers the mantissa needs to be normalized meaning the bits must be

lined up so they are all equal weight. This requires adjusting one of

the exponents so the two are equal while shifting the mantissa to match.

Then the addition can be done on the mantissa and the result adjusted

so the msb of the mantissa is in the correct alignment.

Multiplication is actually easier in that normalization is not required,

but exponents are added and the result is adjusted for correct alignment

of the mantissa.

So the heart of a floating point operation is a fixed point ALU with

barrel shifters before and after.

--

Rick C

Rick C

Re: Hardware floating point?

Not sure what your point is. The principles are the same in software or

hardware. I was describing hardware I have worked on. ST-100 from Star

Technologies. I became very intimate with the inner workings.

The only complications are from the various error and special case

handling of the IEEE-754 format. I doubt the FPGA is implementing that,

but possibly. The basics are still the same. Adds use a barrel shifter

to denormalize the mantissa so the exponents are equal, a integer adder

and a normalization barrel shifter to produce the result. Multiplies

use a multiplier for the mantissas and an adder for the exponents (with

adjustment for exponent bias) followed by a simple shifter to normalize

the result.

Both add and multiply are about the same level of complexity as a barrel

shifter is almost as much logic as the multiplier.

Other than the special case handling of IEEE-754, what do you think I am

missing?

--

Rick C

Rick C

Re: Hardware floating point?

Altera claims it IS IEEE-754 compliant, but it is surprisingly hard to find any more detailed facts. And we all know how FPGA marketing works, so bit of doubt is very understandable...

The best I could find is this:

http://www.bogdan-pasca.org/resources/publications/2015

___langhammer___pasca

___fp___dsp

___block___architecture

___for___fpgas.pdf

In short: It appears that infinite and NaNs are supported, however sub-normals are treated as 0 and only one rounding-mode is supported...

Somewhere there is a video which shows that using the floating-point DSPs cuts the LE-usage by about 90%, so if you need floating point, I think Arria/Stratix 10 are really the best way to go...

Regards,

Thomas

www.entner-electronics.com - Home of EEBlaster and JPEG-Codec

Re: Hardware floating point?

On 1/25/2017 9:15 PM, snipped-for-privacy@gmail.com wrote:

That video may be for the

Most FPGAs have dedicated integer multipliers which can be used for both

the multiplier and the barrel shifters in a floating point ALU. The

adders and random logic would need to be in the fabric, but will be

That video may be for the

***entire***floating point unit in the fabric.Most FPGAs have dedicated integer multipliers which can be used for both

the multiplier and the barrel shifters in a floating point ALU. The

adders and random logic would need to be in the fabric, but will be

***much***smaller.
--

Rick C

Rick C

Re: Hardware floating point?

publications/2015

___langhammer___pasca

___fp___dsp

___block___architecture

___for___fpgas.pdf

Xilinx and Altera both support "DSP blocks" that do a multiply and add

(they say multiply and accumulate, but it's more versatile than that).

According to the above paper, Altera has paired up their DSP blocks and

added logic to each pair so that they become a floating-point arithmetic

block. Personally I think that for most "regular" DSP uses you're going

to know the range of the incoming data and will, therefor, only need

fixed-point -- but it looks like they're chasing the "FPGA as a

supercomputer" market (hence, the purchase by Intel), and for that you

need floating point just as a selling point.

--

Tim Wescott

Control systems, embedded software and circuit design

Tim Wescott

Control systems, embedded software and circuit design

We've slightly trimmed the long signature. Click to see the full one.

Re: Hardware floating point?

On 1/26/2017 11:19 AM, Tim Wescott wrote:

If you look around I thing you will find many uses for floating point in

the DSP market. It's not just a selling gimmick. I don't think the

many floating point DSP devices are sold because they look good in the

product's spec sheet.

Heck back in the day when DSP was done on mainframes the hot rods of

computing were all floating point. Cray-1, ST-100...

If you look around I thing you will find many uses for floating point in

the DSP market. It's not just a selling gimmick. I don't think the

many floating point DSP devices are sold because they look good in the

product's spec sheet.

Heck back in the day when DSP was done on mainframes the hot rods of

computing were all floating point. Cray-1, ST-100...

--

Rick C

Rick C

Re: Hardware floating point?

On Thursday, January 26, 2017 at 1:59:09 PM UTC-5, rickman wrote:

I am attempting to design a 40-bit single and 80-bit double hardware-

expressed form of an n-bit floating point "unum" (universal number)

engine, as per the design by John Gustafson:

http://ubiquity.acm.org/article.cfm?id30%01758

I intend an FPU, and 4x vector FPU for SIMD:

https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/fpu/fpu.png

https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/fpu_vector/fpu_vector.png

In my Arxoda CPU (design still in progress):

https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/overall_design.png

Thank you,

Rick C. Hodgin

I am attempting to design a 40-bit single and 80-bit double hardware-

expressed form of an n-bit floating point "unum" (universal number)

engine, as per the design by John Gustafson:

http://ubiquity.acm.org/article.cfm?id30%01758

I intend an FPU, and 4x vector FPU for SIMD:

https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/fpu/fpu.png

https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/fpu_vector/fpu_vector.png

In my Arxoda CPU (design still in progress):

https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/overall_design.png

Thank you,

Rick C. Hodgin

Re: Hardware floating point?

On Thursday, January 26, 2017 at 12:59:09 PM UTC-6, rickman wrote:

There was a rule of thumb in voice compression that floating point DSP took a third fewer operations than fixed point DSP. Plus probably faster code development not having to keep track of the scaling.

Jim Brakefield

*]> If you look around I thing you will find many uses for floating point in**]> the DSP market.*There was a rule of thumb in voice compression that floating point DSP took a third fewer operations than fixed point DSP. Plus probably faster code development not having to keep track of the scaling.

Jim Brakefield

Re: Hardware floating point?

ware.

It just all works better with dedicated hardware. Finding the leading one

for normalization is somewhat slow in the FPGA and is something that benefi

ts from dedicated hardware. Using a DSP48 (if we're talking about Xilinx)

for a barrel shifter is fairly fast, but requires 3 cycles of latency, can

only shift up to 18 bits, and is overkill for the task. You're using a ful

l multiplier as a shifter; a dedicated shifter would be smaller and faster.

All this stuff adds latency. When I pull up CoreGen and ask for the basi

c FP adder, I get something that uses only 2 DSP48s but has 12 cycles of la

tency. And there is a lot of fabric routing so timing is not very determin

istic.

Re: Hardware floating point?

I'm not sure how much you know about multipliers and shifters.

Multipliers are not magical. Multiplexers

***are***big. A multiplier has

N stages with a one bit adder at every bit position. A barrel

multiplexer has nearly as many bit positions (you typically don't need

all the possible outputs), but uses a bit less logic at each position.

Each bit position still needs a full 4 input LUT. Not tons of

difference in complexity.

The multipliers I've seen have selectable latency down to 1 clock.

Rolling a barrel shifter will generate many layers of logic that will

need to be pipelined as well to reach high speeds, likely many more

layers for the same speeds.

What do you get if you design a floating point adder in the fabric? I

can only imagine it will be

***much***larger and slower.

--

Rick C

Rick C

Re: Hardware floating point?

On 27/01/17 05:39, rickman wrote:

A 32-bit barrel shifter can be made with 5 steps, each step being a set

of 32 two-input multiplexers. Dedicated hardware for that will be

/much/ smaller and more efficient than using LUTs or a full multiplier.

Normalisation of FP results also requires a "find first 1" operation.

Again, dedicated hardware is going to be a lot smaller and more

efficient than using LUT's.

So a DSP block that has dedicated FP support is going to be smaller and

faster than using integer DSP blocks with LUT's to do the same job.

A 32-bit barrel shifter can be made with 5 steps, each step being a set

of 32 two-input multiplexers. Dedicated hardware for that will be

/much/ smaller and more efficient than using LUTs or a full multiplier.

Normalisation of FP results also requires a "find first 1" operation.

Again, dedicated hardware is going to be a lot smaller and more

efficient than using LUT's.

So a DSP block that has dedicated FP support is going to be smaller and

faster than using integer DSP blocks with LUT's to do the same job.

Re: Hardware floating point?

If I understand, you can do a barrel shifter with log2(n) complexity, hence

your 5 steps but you will have the combitional delays of 5 muxes, it could

limit your maximum clock frequency. A brute force approach will use more r

esoures but will probably allow a higher clock frequency.

Re: Hardware floating point?

On 27/01/17 16:12, Benjamin Couillard wrote:

The "brute force" method would be 1 layer of 32 32-input multiplexers.

And how do you implement a 32-input multiplexer in gates? You basically

have 5 layers of 2-input multiplexers.

If the depth of the multiplexer is high enough, you might use tri-state

gates but I suspect that in this case you'd implement it with normal logic.

The "brute force" method would be 1 layer of 32 32-input multiplexers.

And how do you implement a 32-input multiplexer in gates? You basically

have 5 layers of 2-input multiplexers.

If the depth of the multiplexer is high enough, you might use tri-state

gates but I suspect that in this case you'd implement it with normal logic.

Re: Hardware floating point?

On 1/27/2017 11:33 AM, David Brown wrote:

A barrel shifter is simpler than that. I believe in a somewhat parallel

method to computing an FFT, the terms in a barrel shifter can be shared

to allow this. (pseudo vhdl)

function (indata : unsigned(31:0), sel : unsigned(4:0))

return unsigned(31:0) is

variable a, b, c, d, e : unsigned(31:0);

begin

a := indata(31:0) & '0' when sel(0) else indata;

b := (a(30:0), others => '0') when sel(1) else indata;

c := (b(27:0), others => '0') when sel(2) else indata;

d := (c(23:0), others => '0') when sel(3) else indata;

e := (d(15:0), others => '0') when sel(4) else indata;

return (e);

end;

A barrel shifter is simpler than that. I believe in a somewhat parallel

method to computing an FFT, the terms in a barrel shifter can be shared

to allow this. (pseudo vhdl)

function (indata : unsigned(31:0), sel : unsigned(4:0))

return unsigned(31:0) is

variable a, b, c, d, e : unsigned(31:0);

begin

a := indata(31:0) & '0' when sel(0) else indata;

b := (a(30:0), others => '0') when sel(1) else indata;

c := (b(27:0), others => '0') when sel(2) else indata;

d := (c(23:0), others => '0') when sel(3) else indata;

e := (d(15:0), others => '0') when sel(4) else indata;

return (e);

end;

--

Rick C

Rick C

Re: Hardware floating point?

On 1/27/2017 3:17 AM, David Brown wrote:

Yes, I stand corrected. Still, it is hardly a "waste" of multipliers to

use them for multiplexers.

Find first 1 can be done using a carry chain which is quite fast. It is

the same function as used in Gray code operations.

Who said it wouldn't be? I say exactly that below. My point was just

that floating point isn't too hard to wrap your head around and not so

horribly different from fixed point. You just need to stick a few

functions onto a fixed point multiplier/adder.

I was responding to:

"Is this really a thing, or are they wrapping some more familiar fixed-

point processing with IP to make it floating point?"

The difference between fixed and floating point operations require a few

functions beyond the basic integer operations which we have been

discussing. Floating point is not magic or incredibly hard to do. It

has not been included on FPGAs up until now because the primary market

is integer based.

Some 15 years ago I discussed the need for hard IP in FPGAs and was told

by certain Xilinx employees that it isn't practical to include hard IP

because of the proliferation of combinations and wasted resources that

result. The trouble is the ratio of silicon area required for hard IP

vs. FPGA fabric gets worse with each larger generation. So as we see

now FPGAs are including all manner of functio blocks.... like other

devices.

What I don't get is why FPGAs are so special that they are the last hold

out of becoming system on chip devices.

Yes, I stand corrected. Still, it is hardly a "waste" of multipliers to

use them for multiplexers.

Find first 1 can be done using a carry chain which is quite fast. It is

the same function as used in Gray code operations.

Who said it wouldn't be? I say exactly that below. My point was just

that floating point isn't too hard to wrap your head around and not so

horribly different from fixed point. You just need to stick a few

functions onto a fixed point multiplier/adder.

I was responding to:

"Is this really a thing, or are they wrapping some more familiar fixed-

point processing with IP to make it floating point?"

The difference between fixed and floating point operations require a few

functions beyond the basic integer operations which we have been

discussing. Floating point is not magic or incredibly hard to do. It

has not been included on FPGAs up until now because the primary market

is integer based.

Some 15 years ago I discussed the need for hard IP in FPGAs and was told

by certain Xilinx employees that it isn't practical to include hard IP

because of the proliferation of combinations and wasted resources that

result. The trouble is the ratio of silicon area required for hard IP

vs. FPGA fabric gets worse with each larger generation. So as we see

now FPGAs are including all manner of functio blocks.... like other

devices.

What I don't get is why FPGAs are so special that they are the last hold

out of becoming system on chip devices.

--

Rick C

Rick C

#### Site Timeline

- » Go to church
- — Next thread in » Field-Programmable Gate Arrays

- » Anyone use 1's compliment or signed magnitude?
- — Previous thread in » Field-Programmable Gate Arrays

- » Tiny CPUs for Slow Logic
- — Newest thread in » Field-Programmable Gate Arrays

- » Tiny CPUs for Slow Logic
- — The site's Newest Thread. Posted in » Field-Programmable Gate Arrays