Hardware floating point?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
So, just doing a brief search, it looks like Altera is touting a floating  
point slice in at least one of their lines.

Is this really a thing, or are they wrapping some more familiar fixed-
point processing with IP to make it floating point?

And, anything else you know.

TIA.

--  

Tim Wescott
Wescott Design Services
We've slightly trimmed the long signature. Click to see the full one.
Re: Hardware floating point?
On 1/24/2017 11:59 PM, Tim Wescott wrote:
Quoted text here. Click to load it

I'm not sure what you are asking.  What do you think floating point is  
exactly?  The core of floating point is just fixed point arithmetic with  
an extra bit (uh, rereading this I need to make clear this is the  
British "bit" meaning part :) to express the exponent of a binary  
multiplier.  To perform addition or subtraction on floating point  
numbers the mantissa needs to be normalized meaning the bits must be  
lined up so they are all equal weight.  This requires adjusting one of  
the exponents so the two are equal while shifting the mantissa to match.  
  Then the addition can be done on the mantissa and the result adjusted  
so the msb of the mantissa is in the correct alignment.

Multiplication is actually easier in that normalization is not required,  
but exponents are added and the result is adjusted for correct alignment  
of the mantissa.

So the heart of a floating point operation is a fixed point ALU with  
barrel shifters before and after.

--  

Rick C

Re: Hardware floating point?
Quoted text here. Click to load it


I think you oversimplify FP.  It works a lot better with dedicated hardware.  

Re: Hardware floating point?
On 1/25/2017 5:07 PM, Kevin Neilson wrote:
Quoted text here. Click to load it

Not sure what your point is.  The principles are the same in software or  
hardware.  I was describing hardware I have worked on.  ST-100 from Star  
Technologies.  I became very intimate with the inner workings.

The only complications are from the various error and special case  
handling of the IEEE-754 format.  I doubt the FPGA is implementing that,  
but possibly.  The basics are still the same.  Adds use a barrel shifter  
to denormalize the mantissa so the exponents are equal, a integer adder  
and a normalization barrel shifter to produce the result.  Multiplies  
use a multiplier for the mantissas and an adder for the exponents (with  
adjustment for exponent bias) followed by a simple shifter to normalize  
the result.

Both add and multiply are about the same level of complexity as a barrel  
shifter is almost as much logic as the multiplier.

Other than the special case handling of IEEE-754, what do you think I am  
missing?

--  

Rick C

Re: Hardware floating point?

Quoted text here. Click to load it

Altera claims it IS IEEE-754 compliant, but it is surprisingly hard to find any more detailed facts. And we all know how FPGA marketing works, so bit of doubt is very understandable...

The best I could find is this:
http://www.bogdan-pasca.org/resources/publications/2015_langhammer_pasca_fp_dsp_block_architecture_for_fpgas.pdf

In short: It appears that infinite and NaNs are supported, however sub-normals are treated as 0 and only one rounding-mode is supported...

Somewhere there is a video which shows that using the floating-point DSPs cuts the LE-usage by about 90%, so if you need floating point, I think Arria/Stratix 10 are really the best way to go...

Regards,

Thomas

www.entner-electronics.com - Home of EEBlaster and JPEG-Codec

Re: Hardware floating point?
On 1/25/2017 9:15 PM, snipped-for-privacy@gmail.com wrote:
Quoted text here. Click to load it

That video may be for the *entire* floating point unit in the fabric.  
Most FPGAs have dedicated integer multipliers which can be used for both  
the multiplier and the barrel shifters in a floating point ALU.  The  
adders and random logic would need to be in the fabric, but will be  
*much* smaller.

--  

Rick C

Re: Hardware floating point?
On Thu, 26 Jan 2017 01:10:14 -0500, rickman wrote:

Quoted text here. Click to load it
publications/2015_langhammer_pasca_fp_dsp_block_architecture_for_fpgas.pdf
Quoted text here. Click to load it

Xilinx and Altera both support "DSP blocks" that do a multiply and add  
(they say multiply and accumulate, but it's more versatile than that).

According to the above paper, Altera has paired up their DSP blocks and  
added logic to each pair so that they become a floating-point arithmetic  
block.  Personally I think that for most "regular" DSP uses you're going  
to know the range of the incoming data and will, therefor, only need  
fixed-point -- but it looks like they're chasing the "FPGA as a  
supercomputer" market (hence, the purchase by Intel), and for that you  
need floating point just as a selling point.

--  
Tim Wescott
Control systems, embedded software and circuit design
We've slightly trimmed the long signature. Click to see the full one.
Re: Hardware floating point?
On 1/26/2017 11:19 AM, Tim Wescott wrote:
Quoted text here. Click to load it

If you look around I thing you will find many uses for floating point in  
the DSP market.  It's not just a selling gimmick.  I don't think the  
many floating point DSP devices are sold because they look good in the  
product's spec sheet.

Heck back in the day when DSP was done on mainframes the hot rods of  
computing were all floating point.  Cray-1, ST-100...

--  

Rick C

Re: Hardware floating point?
On Thursday, January 26, 2017 at 1:59:09 PM UTC-5, rickman wrote:
Quoted text here. Click to load it

I am attempting to design a 40-bit single and 80-bit double hardware-
expressed form of an n-bit floating point "unum" (universal number)
engine, as per the design by John Gustafson:

    http://ubiquity.acm.org/article.cfm?id30%01758

I intend an FPU, and 4x vector FPU for SIMD:

    
https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/fpu/fpu.png
    
https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/fpu_vector/fpu_vector.png

In my Arxoda CPU (design still in progress):

    
https://github.com/RickCHodgin/libsf/blob/master/arxoda/core/overall_design.png

Thank you,
Rick C. Hodgin

Re: Hardware floating point?
On Thursday, January 26, 2017 at 12:59:09 PM UTC-6, rickman wrote:
Quoted text here. Click to load it

]> If you look around I thing you will find many uses for floating point in  
]> the DSP market.

There was a rule of thumb in voice compression that floating point DSP took a third fewer operations than fixed point DSP.  Plus probably faster code development not having to keep track of the scaling.

Jim Brakefield

Re: Hardware floating point?
Quoted text here. Click to load it
ware.
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it
  
Quoted text here. Click to load it

It just all works better with dedicated hardware.  Finding the leading one  
for normalization is somewhat slow in the FPGA and is something that benefi
ts from dedicated hardware.  Using a DSP48 (if we're talking about Xilinx)  
for a barrel shifter is fairly fast, but requires 3 cycles of latency, can  
only shift up to 18 bits, and is overkill for the task.  You're using a ful
l multiplier as a shifter; a dedicated shifter would be smaller and faster.
  All this stuff adds latency.  When I pull up CoreGen and ask for the basi
c FP adder, I get something that uses only 2 DSP48s but has 12 cycles of la
tency.  And there is a lot of fabric routing so timing is not very determin
istic.

Re: Hardware floating point?
On 1/26/2017 9:38 PM, Kevin Neilson wrote:
Quoted text here. Click to load it

I'm not sure how much you know about multipliers and shifters.  
Multipliers are not magical.  Multiplexers *are* big.  A multiplier has  
N stages with a one bit adder at every bit position.  A barrel  
multiplexer has nearly as many bit positions (you typically don't need  
all the possible outputs), but uses a bit less logic at each position.  
Each bit position still needs a full 4 input LUT.  Not tons of  
difference in complexity.

The multipliers I've seen have selectable latency down to 1 clock.  
Rolling a barrel shifter will generate many layers of logic that will  
need to be pipelined as well to reach high speeds, likely many more  
layers for the same speeds.

What do you get if you design a floating point adder in the fabric?  I  
can only imagine it will be *much* larger and slower.

--  

Rick C

Re: Hardware floating point?
On 27/01/17 05:39, rickman wrote:
Quoted text here. Click to load it

A 32-bit barrel shifter can be made with 5 steps, each step being a set
of 32 two-input multiplexers.  Dedicated hardware for that will be
/much/ smaller and more efficient than using LUTs or a full multiplier.

Normalisation of FP results also requires a "find first 1" operation.
Again, dedicated hardware is going to be a lot smaller and more
efficient than using LUT's.

So a DSP block that has dedicated FP support is going to be smaller and
faster than using integer DSP blocks with LUT's to do the same job.

Quoted text here. Click to load it


Re: Hardware floating point?

Quoted text here. Click to load it

If I understand, you can do a barrel shifter with log2(n) complexity, hence
 your 5 steps but you will have the combitional delays of 5 muxes, it could
 limit your maximum clock frequency. A brute force approach will use more r
esoures but will probably allow a higher clock frequency.


Re: Hardware floating point?
On 27/01/17 16:12, Benjamin Couillard wrote:

Quoted text here. Click to load it

The "brute force" method would be 1 layer of 32 32-input multiplexers.
And how do you implement a 32-input multiplexer in gates?  You basically
have 5 layers of 2-input multiplexers.

If the depth of the multiplexer is high enough, you might use tri-state
gates but I suspect that in this case you'd implement it with normal logic.

Re: Hardware floating point?
On 1/27/2017 11:33 AM, David Brown wrote:
Quoted text here. Click to load it

Quoted text here. Click to load it

A barrel shifter is simpler than that.  I believe in a somewhat parallel  
method to computing an FFT, the terms in a barrel shifter can be shared  
to allow this.  (pseudo vhdl)


function (indata : unsigned(31:0), sel : unsigned(4:0))
      return unsigned(31:0) is
   variable a, b, c, d, e : unsigned(31:0);
begin
   a := indata(31:0) & '0' when sel(0) else indata;
   b := (a(30:0), others => '0') when sel(1) else indata;
   c := (b(27:0), others => '0') when sel(2) else indata;
   d := (c(23:0), others => '0') when sel(3) else indata;
   e := (d(15:0), others => '0') when sel(4) else indata;

   return (e);
end;

--  

Rick C

Re: Hardware floating point?

Quoted text here. Click to load it

Quoted text here. Click to load it
c.

Yeah, you're right.

Re: Hardware floating point?
On 1/27/2017 10:12 AM, Benjamin Couillard wrote:

Quoted text here. Click to load it

Technically N log(N).

--  

Rick C

Re: Hardware floating point?

Quoted text here. Click to load it

Quoted text here. Click to load it
s
as
d
.
t
.
d
I
ence your 5 steps but you will have the combitional delays of 5 muxes, it c
ould limit your maximum clock frequency. A brute force approach will use mo
re resoures but will probably allow a higher clock frequency.
Quoted text here. Click to load it

Yep true, thanks for the clarification

Re: Hardware floating point?
On 1/27/2017 3:17 AM, David Brown wrote:
Quoted text here. Click to load it

Yes, I stand corrected.  Still, it is hardly a "waste" of multipliers to  
use them for multiplexers.


Quoted text here. Click to load it

Find first 1 can be done using a carry chain which is quite fast.  It is  
the same function as used in Gray code operations.


Quoted text here. Click to load it

Who said it wouldn't be?  I say exactly that below.  My point was just  
that floating point isn't too hard to wrap your head around and not so  
horribly different from fixed point.  You just need to stick a few  
functions onto a fixed point multiplier/adder.

I was responding to:

"Is this really a thing, or are they wrapping some more familiar fixed-
point processing with IP to make it floating point?"

The difference between fixed and floating point operations require a few  
functions beyond the basic integer operations which we have been  
discussing.  Floating point is not magic or incredibly hard to do.  It  
has not been included on FPGAs up until now because the primary market  
is integer based.

Some 15 years ago I discussed the need for hard IP in FPGAs and was told  
by certain Xilinx employees that it isn't practical to include hard IP  
because of the proliferation of combinations and wasted resources that  
result.  The trouble is the ratio of silicon area required for hard IP  
vs. FPGA fabric gets worse with each larger generation.  So as we see  
now FPGAs are including all manner of functio blocks.... like other  
devices.

What I don't get is why FPGAs are so special that they are the last hold  
out of becoming system on chip devices.


Quoted text here. Click to load it

--  

Rick C

Site Timeline