Xilinx floating point core 1.0

K

kl31n 20 years ago

I'm having some hard time to understand what's wrong with this Xilinx floating-point core included in the last IP update for LogicCORE.

My design requires me to acquire data from an ADC and then, after some processing to do a division between a couple floating point numbers every

200ns.

The performances of the core aren't big enough to use just one, so I implemented a core which feeds several dividers(made with the Xilinx core) and then I reserialize it all.

The design works fine till I pass numbers with a period down to 260ns, going for lower periods the results get weird: the mantissa is correct, the exponent instead is always fixed to 00111111, whatever it's supposed to be instead.

If anybody can offer some insight or even suggest a way of debugging, it would be much appreciated because at the moment I don't have any idea of what could be wrong.

Thanks in advace,

kl31n

Vote

M

m 20 years ago

Before thinking about debugging, have you done any timing verifications ? Look at your timing reports first. Even with the slowest divider (at 30 cycle latency) and device (at around 200MHz) it shouldn't take more than 150ns though. Are you aware that the divider has a relatively large latency (30 Max apparently) ? How did you configure your dividers, what's your reported latency ? Again read your timing reports and pay attention to how you're feeding the dividers.

Vote

M

Mike Treseler 20 years ago

I would write a testbench and run a sim.

-- Mike Treseler

Vote

K

kl31n 20 years ago

Il Tue, 13 Dec 2005 16:40:06 GMT, mk ha scritto:

The timing report tells me that I can run my block at 157MHz and I'm running it at 100Mhz, so it has to be 330ns the latency(the core, being in basic mode to take advantage of the handshaking, has a 33 cycles latency).

By parallelizing to n dividers I expected I could pretend to have a latency of (ceil(33 / n) * T), while I cannot get under 260ns and this with whatsoever the number of dividers.

Thanks,

kl31n

Vote

K

kl31n 20 years ago

That's what I did, but I don't have access to Xilinx cores internals to really see what's going on and what I could do to fix the problem. The behavioural model always works like expected, the Post-Translate instead cannot work correctly with data entereing with a period smaller than 260n.

Thanks,

kl31n

Vote

R

Robin Bruce 20 years ago

Hi kl31n,

I'm a little confused about your post. You're saying you thought you could effectively reduce the latency by having multiple floating-point units? You could certainly increase the bandwidth by having multiple units, but the latency will remain 33 cycles. If latency is not a problem then so long as you meet bandwidth requirements you'll be OK. With new data appearing at a rate of 5MHz, and being consumed at a rate of 3.03n Mhz, where n is the number of FPUs, n of 2 should be more than sufficient. I'm confused by your comment: "I cannot get under 260ns and this with whatsoever the number of dividers."

If it doesn't work with 2 dividers, then you must have a problem that simply adding FPUs will never solve.

Sorry if this is exactly what you're saying, but the way you used the word latency worried me somewhat.

I'm guessing there's no way you could use a speed-optimised divide core? That way you could at least issue your inputs every 200ns, then pick up your outputs on the the other side every 200ns, provided that the latency of the FPU is acceptable.

Cheers,

Robin

Vote

R

rhnlogic 20 years ago

Parallelizing units only speeds up the latency of the completion of multiple operations, but each operation still takes the same latency of when using just a single unit, no matter how many units are in parallel.

For instance, one divider would take 66 cycles for 2 divides (at an average rate of 1 divide per 33 cycles), but 2 (or even 500) dividers would take 33 cycles for 2 divides (for an average rate of 16.5 cycles per divide). But in either case you don't get the first divide result in less than 33 cycles.

If anything parallelizing units might actually slow down max single operation latency, due to wire load, muxing, buffering or pipelining required to fan-out operands and control signals to the multiple pipes, plus fan-in the results.

And running the clock faster than worse case timing allows might only result in garbage.

IMHO. YMMV.

rhn A.T nicholson d.O.t C-o-M

Vote

M

Mike Treseler 20 years ago

Maybe that's just the way it is. You could open a case with Xilinx or use a different core or write your own code.

-- Mike Treseler

Vote

B

Ben Jones 20 years ago

Hi kl31n,

What exactly is your performance requirement? How often do you need to start a new divide operation? How long can you afford to wait for the result?

The Xilinx speed-optimized floating-point divider will run at well over

250MHz and allows you to initiate a new divide on every cycle (i.e. every 4ns) if you wish.

I'm afraid I don't understand what you mean by "pass numbers with a period down to 260ns", could you explain your circuit to me?

-Ben-

Vote

Xilinx floating point core 1.0

Join the Discussion

Didn't find your answer?