Virtex 4 not meeting timing constraints

Hi,

I have a design for a Virtex 4 SX35-10 that is not meeting my timing constraints. The only constraint is set in the ucf file as a clock period of 4.75 ns. Synthesis gives the following:

Timing Summary:

--------------- Speed Grade: -10

Minimum period: 7.680ns (Maximum Frequency: 130.213MHz) Minimum input arrival time before clock: 1.890ns Maximum output required time after clock: 5.810ns Maximum combinational path delay: 0.000ns

Doing a post map static timing analysis gives the following as the first error. (place and route fails)

Source: uut1/overlapadd1/fifo1/BU2/U0/ss/memblk/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_coreinst/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_blkmemdp_v6_2_xst/bm/mem/arch_v2/prim/4/b1/chk0/col/0/b2/mextd/arch_v2/c1/ram1/v2/d4096/by4/newSim8/RAMB16 (RAM) Destination: uut1/overlapadd1/f2_data_in_sig_0_BRB2 (FF) Requirement: 4.750ns Data Path Delay: 5.522ns (Levels of Logic = 1) Clock Path Skew: 0.000ns Source Clock: fast_clk rising at 0.000ns Destination Clock: fast_clk rising at 4.750ns Clock Uncertainty: 0.060ns

Does the post map report include estimates of routing delays? Can I constrain XST to provide better results, if so how? Is 210 MHz too fast for this speed grade FPGA? Running XST with higher effort does not seem to help.

thanks

Reply to
Scott Bekker
Loading thread data ...

uut1/overlapadd1/fifo1/BU2/U0/ss/memblk/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_coreinst/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_blkmemdp_v6_2_xst/bm/mem/arch_v2/prim/4/b1/chk0/col/0/b2/mextd/arch_v2/c1/ram1/v2/d4096/by4/newSim8/RAMB16

not sure. You said P&R failed -why? Were there unroutes? Timing failed?

There is a switch (in map setup I think) for XST to optimize for speed or area, which should be set to speed. But to get significant gains, you need to understand what the path is that is failing and how it fits in your design. Can you relate the source and destination names from that timing report back to the corresponding RAM and FF in your source code? Usually a typical design has many paths that are effectively not really ever exercised at full clock speed or maybe not at all. If you are lucky, the path that is failing is in this category. You might read up on the "multicyle" path and "ignore" constraints.

If the path that is failing really needs to run that fast, you can use tricks like pipelining to break up the large slow operation into several smaller faster ones. hmmm I just noticed the failing path is just 1 logic level, so pipelining probably won't help.

Depends entirely on the particulars of your design. A small state machine, probably no problem. 64 bit non pipelined single cycle accumulator, probably to slow.

It looks like the source of your failing path is the output of a fifo's sram. IIRC, the clock to data out of block RAM is significantly larger than that of FF's. If that's the case, maybe you can pull some trick like make the fifo output twice as wide, and feed that as a 2-cycle path into some sort of FF based mux that can run at full clock speed. In other words, if you can transfer twice as much data, you can take 2 clocks to do it, so it effectively only has to run at 105 Mhz.

-Jeff

Reply to
Jeff Cunningham

uut1/overlapadd1/fifo1/BU2/U0/ss/memblk/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_coreinst/fifo_generator_v2_2_fifo_generator_v2_2_xst_1_blkmemdp_v6_2_xst/bm/mem/arch_v2/prim/4/b1/chk0/col/0/b2/mextd/arch_v2/c1/ram1/v2/d4096/by4/newSim8/RAMB16

210 MHz is apparently too fast for YOUR DESIGN in this speed grade. Any speed grade Virtex4 is capable of quite a bit faster clocking, but you need to be somewhat careful in the design. I am currently working on a floating point FFT design for an XC4VSX55-10 that is clocked at 400 MHz.

If you look at the .twr timing report instead of the one that comes up in the gui, it gives more detail on the failing path, including an element by element break down of the failing path and the location of each element. Since there is only 1 level of logic, I am guessing that this failing path is sourced by a block RAM that does not have the output register enabled, and the destination has a LUT in front of the flip-flop, plus it is probably not located immediately adjacent to the BRAM. You'll want to increase or at least modify the pipelining to improve the performance, and turn on the output register on the BRAM (the clock to out of the BRAM is rather long without the output register).

Reply to
Ray Andraka

Thanks for the help, Ray. I added the register after the block ram and that fixed that timing error. I was then having more timing errors in a CoreGen FFT core. The design was running significantly slower than the data sheet specified. After a lot of playing around with tool settings, I finally found the problem. CoreGen showed the correct device on the bottom of the main gui page, however the device setting in the options was set to spartan 3. I corrected the setting, and now my design is making timing with default settings for all implementation tools. I think there is probably room for improvement as well.

Thanks again.

Scott

Reply to
Scott Bekker

improvement as well.

Glad to have been a help. As I indicated, with some diligence, you can get the slow speed grade V4SX (-10) to run at 400 MHz, which is the max clock rate of the BRAMs and DSP48's when fully pipelined. The fabric, with the exception of the carry chains, can run considerably faster. The carry chains are limited to about 10 bits at 400 Mhz, which is a shame.

Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.