Please HELP: timing problems on Virtex-4FX

Hello,

I am working with the ML410 board and the V4FX60 FPGA. For the past week and a half, I've been having problems meeting timing and I just can't figure out why. In my design, I need to use DDR2 and the APU controller as well as the FPU core. I also need two OPB peripherals: the RS232 and the SystemACE. These are the clocking requirements for my design:

CPU: 266 MHz PLB: 100 MHz OPB: 100 MHz DDR2: 100 MHz controller (on PLB), 200 MHz memory (DDR2-400) FCB: 266 MHz (fabric co-processor bus) FPU: 133 MHz (on FCB)

All of these are within the documented limits for the cores and this speed grade FPGA (-11). These are the steps I follow to get to this design:

  1. Start with BSB wizard and generate a 300 MHz CPU / 100 MHz PLB design with all the above mentioned IPs, except the FCB and the FPU.
  2. This design does synthesize, map, place and route, although PAR complains that timing constraints are not met for the DDR2 clocks. I talked with Xilinx about this and they assured me this is OK (they know about this issue). Everything works fine at this point, although I am not too happy with Xilinx's lack of explanation for this timing issue.
  3. Next, I modify the design to generate a CLKFX from one of the two DCMs to make the 266 MHz CPU clock. I also uncheck the CPMC405SYNCBYPASS option in the PPC405 configuration so that non- integer CPU-PLB ratios are allowed. I update the frequency in the software settings.
  4. Finally, I bring in the FCB and the APU-FPU cores. I tie the FCB to the processor clock, and use a CLKFX output on one of the DCMs to generate the 133 MHz clock for the APU-FPU core. I enable APU support in the PPC405 processor core.
  5. When I build this system, I get 3 timing constraints not met: 1 for the DDR2 (as seen previously, but according to Xilinx not to worry about, and 2 new ones). The 2 new failing constraints are the CPU 266 MHZ clock and the FPU 133 MHZ clock. I tried executing my software on this faulty hardware and it works sometimes, but not always. I do get DCM lock though.

This is how I have my two DCMs configured:

dcm_0 (DFS low, DLL low)

-------------------------------------- RST: net_gnd CLKIN: dcm_clk_s (from sys_clk_pin, 100 MHz) CLKFB: sys_clk_s (100 MHz) CLK0: sys_clk_s (PLB and OPB clock, 100 MHz, BUFG) CLKDV: ddr2_cal_clk (50 MHz, BUFG) CLK2X: dcm_0CLK2X (cascade to next DCM, 200 MHz, BUFG) CLKFX: apu_fpu_clk_s (FPU clock, 133 MHz, BUFG) LOCKED: dcm_0_lock (to next DCM)

dcm_1 (DFS high, DLL high)

-------------------------------------- RST: dcm_0_lock (from previous DCM) CLKIN: dcm_0CLK2X (from previous DCM, 200 MHz) CLKFB: clk_200mhz_s (200 MHz) CLK0: clk_200mhz_s (DDR2 clock, 200 MHz, BUFG) CLK90: ddr2_dev_clk_90_s (DDR2 clock, 200 MHz, BUFG) CLKFX: proc_clk_s (CPU clock, 266 MHz, BUFG) LOCKED: dcm_1_lock (to reset_block)

Like I said, I can't think of why this would not meet timing! Is it harder to route a 266 MHz CPU clock than a 300 Mhz design? What can I do to mitigate this? I tried various options in synthesis, map, PAR, and the Xplorer script. But I think it probably is something in my configuration of the DCMs, because it should work as I am not asking for a very high frequency.

Any help would be very greatly appreciated (I would hate to go back to a working 200 MHz design because of this).

Thanks

Dmitriy Bekker

Reply to
Dima
Loading thread data ...

Hi Dmitriy,

Thanks for the detailed information. Perhaps you could also post some of the failing paths from the timing report? Otherwise we're all a bit in the dark.

Cascading DCMs in the way you've described does tend to increase jitter on the resulting clock nets, which can often eat into your timing budget. A few questions:

(1) How full is your device? (2) Which version of EDK are you using; and, in particular: (3) ...which version the apu-fpu core?

Cheers,

-Ben-

Reply to
Ben Jones

Hi Ben,

Certainly. These are the filing paths from the timing report:

Failing path 1: 266 MHz CPU clock (proc_clk_s) Constraint: TS_dcm_1_dcm_1_CLKFX_BUF = PERIOD TIMEGRP "dcm_1_dcm_1_CLKFX_BUF" TS_dcm_0_dcm_0_CLK2X_BUF / 1.33333333 HIGH 50%

----------------------------------------------------------------------------- Check Worst Case Best Case Timing Timing Slack Achievable Errors Score

----------------------------------------------------------------------------- SETUP -0.703ns 5.859ns 24 4342 HOLD 0.380ns 0

0

-----------------------------------------------------------------------------

Failing path 2: 133 MHz FPU clock (apu_fpu_clk_s) Constraint: TS_dcm_0_dcm_0_CLKFX_BUF = PERIOD TIMEGRP "dcm_0_dcm_0_CLKFX_BUF" TS_sys_clk_pin / 1.33333333 HIGH 50%

----------------------------------------------------------------------------- Check Worst Case Best Case Timing Timing Slack Achievable Errors Score

----------------------------------------------------------------------------- SETUP -0.698ns 8.896ns 28 8584 HOLD 0.013ns 0

0

-----------------------------------------------------------------------------

Failing path 3: 200 MHz DDR2 clock (clk_200mhz_s) Constraint: TS_dcm_1_dcm_1_CLK0_BUF = PERIOD TIMEGRP "dcm_1_dcm_1_CLK0_BUF" TS_dcm_0_dcm_0_CLK2X_BUF HIGH 50%

----------------------------------------------------------------------------- Check Worst Case Best Case Timing Timing Slack Achievable Errors Score

----------------------------------------------------------------------------- SETUP -0.596ns 6.192ns 177 30754 HOLD 0.387ns 0

0

-----------------------------------------------------------------------------

The cascading DCMs are mainly for DDR2 clock and its clk_90 counterpart. This is the way they come out of BSB wizard. But path 3 does fail from the start (DDR2 clock). It comes out failing from the BSB wizard without me changing anything in the design.

(1) The design is not full at all. I am only using 20% of the slices now. (2) I am using the latest version of EDK (9.1i with SP1) (3) I am using the latest apu-fpu core (v3.0)

I found it odd that even without the apu-fpu core, my CPU net fails when I try to clock it at 266 MHz instead of 300 MHz. All I did there was change the CLKFX multiplier and divider values (and uncheck CPMC405SYNCBYPASS)! Is there something I am missing here?

I am also using the MGT protector core, which runs off proc_clk_s. Would that impact the timing? I will try without it.I did notice it LOCed one of my DCMs.

Thanks

Dmitriy

Reply to
Dima

Hi Dmitriy,

Well, those are the failing timespecs, certainly! More useful would be the paths themselves, from the TRACE report (.twr file), telling you the source and destination nets of the worst N paths in each timing group (where N is by default 3, I think). The clock uncertainty and skew are also reported there.

Looking again at your clocking architecture, I'm not sure that the DCM configuration you're using will guarantee a rising-edge alignment between the 266MHz and 133MHz clocks - which is a requirement for the apu_fpu core.

I haven't used the DDR2 core in the latest release but I understand you've been advised that this warning can be safely ignored? Doesn't sound ideal though. :-\

Excellent, thanks. The reason I wanted to check was that required clocking configuration changed considerably (as you might or might not have noticed) between v2.1 and v3.0 of the FPU core.

Yes, that's weird. I was about to ask what other logic you had running off proc_clk_s, but...

Good idea. Otherwise, the only things that are running off proc_clk_s are (a) the FCB bus interface, which is mostly just wires, and (b) the CPU itself, which just has minimum clock pulse widths (1.818ns in the configuration you're using, I believe).

Hope you get to the bottom of this soon...

Cheers,

-Ben-

Reply to
Ben Jones

Hi Ben,

Ahh yes, of course. I have pasted the failing paths right below this message. Unfortunatly, I couldn't attach the log file directly, so please excuse the poor formatting. I reran the build this time without the MGT protector core. Also, I implemented the workaround in AR 24326 which suppresses the DDR2 failing constraint. This note explains why it is safe to suppress the DDR2 timing failure. I am a lot more comfortable with this explanation than simply to ignore it. Nevertheless, the design still fails wiht proc_clk_s and apu_fpu_clk_s not meeting timing.

I also implemented a very basic design with only a few components: CPU, PLB BRAM, and OPB RS232. I changed the CPU frequency to 266.67 MHz by only modifying CLKFX on dcm_0 (this one only has a single DCM since no DDR2). I changed CLKFX to be M=8 and D=3 (instead of the original M=3, D=1 for 300 MHz). Even this basic design still fails timing! I have pasted the failing path for this design as well (see below).

Do you have a suggestion on how I should configure the DCMs for a

266-133 CPU-FPU ratio? Even a working basic design, without any memory (like DDR2) would help. These frequencies are within the limits and I am sure Xilinx must have tested this exact configuration to find the maximum allowable frequency (which is 275 MHz for the CPU -11 grade and 137.5 MHz for the FPU). If you have any suggestions here, I will try them. I am running out of ideas. I opened up a webcase today as well to see if I can get help from that avenue.

The failing paths are pasted in below. Thanks for your help.

- Dmitriy

================================================================================ DESIGN 1 with DDR2 (apu_fpu_clk_s) ================================================================================ Timing constraint: TS_system_i_dcm_0_dcm_0_CLKFX_BUF = PERIOD TIMEGRP "system_i_dcm_0_dcm_0_CLKFX_BUF" TS_sys_clk_pin / 1.33333333 HIGH 50%;

423953 items analyzed, 50 timing errors detected. (48 setup errors, 2 hold errors) Minimum period is 9.606ns.

-------------------------------------------------------------------------------- Slack: -1.053ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r (FF) Destination: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 (FF) Requirement: 3.750ns Data Path Delay: 4.227ns (Levels of Logic = 7) Clock Path Skew: -0.053ns Source Clock: system_i/proc_clk_s rising at 3.750ns Destination Clock: system_i/apu_fpu_clk_s rising at 7.500ns Clock Uncertainty: 0.523ns

Clock Uncertainty: 0.523ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns

Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------

------------------- SLICE_X28Y200.YQ Tcko 0.307 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r SLICE_X34Y189.F2 net (fanout=12) 1.127 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(0) SLICE_X34Y189.COUT Topcyf 0.491 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_dec_prod_0_and00001 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_cch[2].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_cch[1].cmux SLICE_X34Y190.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_subchain(1) SLICE_X34Y190.COUT Tbyp 0.076 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_cch[0].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_cch[1].cmux SLICE_X34Y191.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_subchain(1) SLICE_X34Y191.COUT Tbyp 0.076 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_cch[0].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_cch[1].cmux SLICE_X34Y192.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_subchain(1) SLICE_X34Y192.XB Tcinxb 0.339 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_dec_chain(3) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_cch[0].cmux SLICE_X37Y193.G4 net (fanout=6) 0.724 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_dec_chain(2) SLICE_X37Y193.Y Tilo 0.165 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_0_mux0001_map12 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux000125_sw0 SLICE_X36Y196.G4 net (fanout=1) 0.438 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/n4862 SLICE_X36Y196.Y Tilo 0.166 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state(2) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux000125 SLICE_X36Y196.F4 net (fanout=1) 0.135 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux0001_map11 SLICE_X36Y196.CLK Tfck 0.183 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state(2) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux000176 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 -------------------------------------------------

--------------------------- Total 4.227ns (1.803ns logic,

2.424ns route) (42.7% logic, 57.3% route)

-------------------------------------------------------------------------------- Slack: -1.037ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r (FF) Destination: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_fetch_state_3 (FF) Requirement: 3.750ns Data Path Delay: 4.211ns (Levels of Logic = 6) Clock Path Skew: -0.053ns Source Clock: system_i/proc_clk_s rising at 3.750ns Destination Clock: system_i/apu_fpu_clk_s rising at 7.500ns Clock Uncertainty: 0.523ns

Clock Uncertainty: 0.523ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns

Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_3 Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------

------------------- SLICE_X28Y200.YQ Tcko 0.307 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r SLICE_X34Y189.F2 net (fanout=12) 1.127 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(0) SLICE_X34Y189.COUT Topcyf 0.491 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_dec_prod_0_and00001 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_cch[2].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_cch[1].cmux SLICE_X34Y190.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_subchain(1) SLICE_X34Y190.COUT Tbyp 0.076 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_cch[0].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_cch[1].cmux SLICE_X34Y191.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_subchain(1) SLICE_X34Y191.COUT Tbyp 0.076 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_cch[0].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_cch[1].cmux SLICE_X34Y192.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_subchain(1) SLICE_X34Y192.XB Tcinxb 0.339 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_dec_chain(3) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_cch[0].cmux SLICE_X36Y197.G3 net (fanout=6) 0.819 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_dec_chain(2) SLICE_X36Y197.XMUX Tif5x 0.477 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_3_mux0001_map1 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_3_mux0001171 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_3_mux000117_f5 SLICE_X37Y197.G4 net (fanout=1) 0.303 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_3_mux0001_map7 SLICE_X37Y197.CLK Tgck 0.196 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state(3) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_3_mux000126 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_3 -------------------------------------------------

--------------------------- Total 4.211ns (1.962ns logic,

2.249ns route) (46.6% logic, 53.4% route)

-------------------------------------------------------------------------------- Slack: -1.015ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[1].r (FF) Destination: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 (FF) Requirement: 3.750ns Data Path Delay: 4.189ns (Levels of Logic = 6) Clock Path Skew: -0.053ns Source Clock: system_i/proc_clk_s rising at 3.750ns Destination Clock: system_i/apu_fpu_clk_s rising at 7.500ns Clock Uncertainty: 0.523ns

Clock Uncertainty: 0.523ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns

Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[1].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------

------------------- SLICE_X28Y201.YQ Tcko 0.307 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(31) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_fir_l1[1].r SLICE_X34Y190.F1 net (fanout=5) 1.165 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(30) SLICE_X34Y190.COUT Topcyf 0.491 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_dec_prod_2_and00001 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc0_cch[0].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_cch[1].cmux SLICE_X34Y191.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_subchain(1) SLICE_X34Y191.COUT Tbyp 0.076 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_subchain(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc1_cch[0].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_cch[1].cmux SLICE_X34Y192.CIN net (fanout=1) 0.000 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_subchain(1) SLICE_X34Y192.XB Tcinxb 0.339 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_dec_chain(3) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_idc2_cch[0].cmux SLICE_X37Y193.G4 net (fanout=6) 0.724 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_dec_chain(2) SLICE_X37Y193.Y Tilo 0.165 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_0_mux0001_map12 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux000125_sw0 SLICE_X36Y196.G4 net (fanout=1) 0.438 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/n4862 SLICE_X36Y196.Y Tilo 0.166 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state(2) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux000125 SLICE_X36Y196.F4 net (fanout=1) 0.135 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux0001_map11 SLICE_X36Y196.CLK Tfck 0.183 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state(2) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_state_2_mux000176 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 -------------------------------------------------

--------------------------- Total 4.189ns (1.727ns logic,

2.462ns route) (41.2% logic, 58.8% route)

-------------------------------------------------------------------------------- Hold Violations: TS_system_i_dcm_0_dcm_0_CLKFX_BUF = PERIOD TIMEGRP "system_i_dcm_0_dcm_0_CLKFX_BUF" TS_sys_clk_pin /

1.33333333 HIGH 50%;

-------------------------------------------------------------------------------- Hold Violation: -0.083ns (requirement - (clock path skew + uncertainty - data path)) Source: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[30].r (FF) Destination: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_fetch_id_class.fpu_maj (FF) Requirement: 0.000ns Data Path Delay: 0.454ns (Levels of Logic = 1) Positive Clock Path Skew: 0.014ns Source Clock: system_i/proc_clk_s rising at 3.750ns Destination Clock: system_i/apu_fpu_clk_s rising at 7.500ns Clock Uncertainty: 0.523ns

Clock Uncertainty: 0.523ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns

Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[30].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_id_class.fpu_maj Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------

------------------- SLICE_X28Y200.XQ Tcko 0.283 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(1) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_fir_l1[30].r SLICE_X29Y200.F4 net (fanout=3) 0.275 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(1) SLICE_X29Y200.CLK Tckf (-Th) 0.104 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_id_class.fpu_maj system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_if_fpu_maj1 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_id_class.fpu_maj -------------------------------------------------

--------------------------- Total 0.454ns (0.179ns logic,

0.275ns route) (39.4% logic, 60.6% route)

-------------------------------------------------------------------------------- Hold Violation: -0.039ns (requirement - (clock path skew + uncertainty - data path)) Source: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[26].r (FF) Destination: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_fetch_id_class.fpu_maj (FF) Requirement: 0.000ns Data Path Delay: 0.515ns (Levels of Logic = 1) Positive Clock Path Skew: 0.031ns Source Clock: system_i/proc_clk_s rising at 3.750ns Destination Clock: system_i/apu_fpu_clk_s rising at 7.500ns Clock Uncertainty: 0.523ns

Clock Uncertainty: 0.523ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns

Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[26].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_id_class.fpu_maj Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------

------------------- SLICE_X30Y200.XQ Tcko 0.283 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(5) system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_fir_l1[26].r SLICE_X29Y200.F3 net (fanout=9) 0.336 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/instruction(5) SLICE_X29Y200.CLK Tckf (-Th) 0.104 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_id_class.fpu_maj system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_if_fpu_maj1 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_fetch_id_class.fpu_maj -------------------------------------------------

--------------------------- Total 0.515ns (0.179ns logic,

0.336ns route) (34.8% logic, 65.2% route)

--------------------------------------------------------------------------------

================================================================================ DESIGN 1 with DDR2 (proc_clk_s) ================================================================================ Timing constraint: TS_system_i_dcm_1_dcm_1_CLKFX_BUF = PERIOD TIMEGRP "system_i_dcm_1_dcm_1_CLKFX_BUF" TS_system_i_dcm_0_dcm_0_CLK2X_BUF /

1.33333333 HIGH 50%; 1029 items analyzed, 34 timing errors detected. (34 setup errors, 0 hold errors) Minimum period is 5.958ns.

-------------------------------------------------------------------------------- Slack: -0.736ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/reset_block/reset_block/ Rstc405resetcore (FF) Destination: system_i/ppc405_0/ppc405_0/PPC405_ADV_i (CPU) Requirement: 1.250ns Data Path Delay: 1.528ns (Levels of Logic = 0) Clock Path Skew: -0.108ns Source Clock: system_i/sys_clk_s rising at 10.000ns Destination Clock: system_i/proc_clk_s rising at 11.250ns Clock Uncertainty: 0.350ns

Clock Uncertainty: 0.350ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.236ns Phase Error (PE): 0.231ns

Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetcore to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------

------------------- SLICE_X19Y169.XQ Tcko 0.291 system_i/RSTC405RESETCORE system_i/reset_block/reset_block/Rstc405resetcore PPC405_ADV_X0Y1.RSTC405RESETCORE net (fanout=1) 0.437 system_i/RSTC405RESETCORE PPC405_ADV_X0Y1.CPMC405CLOCK Tppcdck_RSTCORE 0.800 system_i/ppc405_0/ppc405_0/PPC405_ADV_i system_i/ppc405_0/ppc405_0/PPC405_ADV_i -------------------------------------------------------------

--------------------------- Total 1.528ns (1.091ns logic, 0.437ns route) (71.4% logic, 28.6% route)

-------------------------------------------------------------------------------- Slack: -0.701ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/reset_block/reset_block/ Rstc405resetchip (FF) Destination: system_i/ppc405_0/ppc405_0/PPC405_ADV_i (CPU) Requirement: 1.250ns Data Path Delay: 1.499ns (Levels of Logic = 0) Clock Path Skew: -0.102ns Source Clock: system_i/sys_clk_s rising at 10.000ns Destination Clock: system_i/proc_clk_s rising at 11.250ns Clock Uncertainty: 0.350ns

Clock Uncertainty: 0.350ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.236ns Phase Error (PE): 0.231ns

Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetchip to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------

------------------- SLICE_X19Y172.YQ Tcko 0.291 system_i/RSTC405RESETCHIP system_i/reset_block/reset_block/Rstc405resetchip PPC405_ADV_X0Y1.RSTC405RESETCHIP net (fanout=1) 0.558 system_i/RSTC405RESETCHIP PPC405_ADV_X0Y1.CPMC405CLOCK Tppcdck_RSTCHIP 0.650 system_i/ppc405_0/ppc405_0/PPC405_ADV_i system_i/ppc405_0/ppc405_0/PPC405_ADV_i -------------------------------------------------------------

--------------------------- Total 1.499ns (0.941ns logic, 0.558ns route) (62.8% logic, 37.2% route)

-------------------------------------------------------------------------------- Slack: -0.664ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_ldst_store_phase2 (FF) Destination: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_ardr_l1[29].r (FF) Requirement: 3.750ns Data Path Delay: 3.818ns (Levels of Logic = 2) Clock Path Skew: -0.256ns Source Clock: system_i/apu_fpu_clk_s rising at 0.000ns Destination Clock: system_i/proc_clk_s rising at 3.750ns Clock Uncertainty: 0.340ns

Clock Uncertainty: 0.340ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.184ns

Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_ldst_store_phase2 to system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_ardr_l1[29].r Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------

------------------- SLICE_X29Y204.YQ Tcko 0.291 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_ldst_store_phase2 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_ldst_store_phase2 SLICE_X29Y204.F1 net (fanout=36) 0.678 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_ldst_store_phase2 SLICE_X29Y204.X Tilo 0.165 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_ldst_store_phase2 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_ldst_use_direct1 SLICE_X51Y202.BX net (fanout=32) 2.259 system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_ldst_use_direct SLICE_X51Y202.CLK Tdick 0.425 system_i/ fcb_v10_0_FCMAPURESULT system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/ fpu_ldst_st_sngl.sdm_m5.mm[29].cmux system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_ardr_l1[29].r -------------------------------------------------

--------------------------- Total 3.818ns (0.881ns logic,

2.937ns route) (23.1% logic, 76.9% route)

--------------------------------------------------------------------------------

================================================================================ DESIGN 2 with no DDR2 (proc_clk_s) ================================================================================ Timing constraint: TS_system_i_dcm_0_dcm_0_CLKFX_BUF = PERIOD TIMEGRP "system_i_dcm_0_dcm_0_CLKFX_BUF" TS_sys_clk_pin / 2.66666667 HIGH 50%;

3 items analyzed, 3 timing errors detected. (3 setup errors, 0 hold errors) Minimum period is 5.438ns.

-------------------------------------------------------------------------------- Slack: -0.562ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/reset_block/reset_block/ Rstc405resetcore (FF) Destination: system_i/ppc405_0/ppc405_0/PPC405_ADV_i (CPU) Requirement: 1.250ns Data Path Delay: 1.514ns (Levels of Logic = 0) Clock Path Skew: -0.040ns Source Clock: system_i/plb_bram_if_cntlr_1_port_BRAM_Clk rising at 10.000ns Destination Clock: system_i/proc_clk_s rising at 11.250ns Clock Uncertainty: 0.258ns

Clock Uncertainty: 0.258ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.235ns Phase Error (PE): 0.140ns

Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetcore to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------

------------------- SLICE_X18Y74.XQ Tcko 0.307 system_i/RSTC405RESETCORE system_i/reset_block/reset_block/Rstc405resetcore PPC405_ADV_X0Y0.RSTC405RESETCORE net (fanout=1) 0.407 system_i/RSTC405RESETCORE PPC405_ADV_X0Y0.CPMC405CLOCK Tppcdck_RSTCORE 0.800 system_i/ppc405_0/ppc405_0/PPC405_ADV_i system_i/ppc405_0/ppc405_0/PPC405_ADV_i -------------------------------------------------------------

--------------------------- Total 1.514ns (1.107ns logic, 0.407ns route) (73.1% logic, 26.9% route)

-------------------------------------------------------------------------------- Slack: -0.426ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/reset_block/reset_block/ Rstc405resetsys (FF) Destination: system_i/ppc405_0/ppc405_0/PPC405_ADV_i (CPU) Requirement: 1.250ns Data Path Delay: 1.378ns (Levels of Logic = 0) Clock Path Skew: -0.040ns Source Clock: system_i/plb_bram_if_cntlr_1_port_BRAM_Clk rising at 10.000ns Destination Clock: system_i/proc_clk_s rising at 11.250ns Clock Uncertainty: 0.258ns

Clock Uncertainty: 0.258ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.235ns Phase Error (PE): 0.140ns

Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetsys to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) ------------------------------------------------------------

------------------- SLICE_X19Y74.YQ Tcko 0.291 system_i/RSTC405RESETSYS system_i/reset_block/reset_block/Rstc405resetsys PPC405_ADV_X0Y0.RSTC405RESETSYS net (fanout=1) 0.437 system_i/RSTC405RESETSYS PPC405_ADV_X0Y0.CPMC405CLOCK Tppcdck_RSTSYS 0.650 system_i/ppc405_0/ppc405_0/PPC405_ADV_i system_i/ppc405_0/ppc405_0/PPC405_ADV_i ------------------------------------------------------------

--------------------------- Total 1.378ns (0.941ns logic, 0.437ns route) (68.3% logic, 31.7% route)

-------------------------------------------------------------------------------- Slack: -0.426ns (requirement - (data path - clock path skew + uncertainty)) Source: system_i/reset_block/reset_block/ Rstc405resetchip (FF) Destination: system_i/ppc405_0/ppc405_0/PPC405_ADV_i (CPU) Requirement: 1.250ns Data Path Delay: 1.378ns (Levels of Logic = 0) Clock Path Skew: -0.040ns Source Clock: system_i/plb_bram_if_cntlr_1_port_BRAM_Clk rising at 10.000ns Destination Clock: system_i/proc_clk_s rising at 11.250ns Clock Uncertainty: 0.258ns

Clock Uncertainty: 0.258ns ((TSJ^2 + TIJ^2)^1/2 + DJ) / 2

  • PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.235ns Phase Error (PE): 0.140ns

Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetchip to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------

------------------- SLICE_X19Y75.XQ Tcko 0.291 system_i/RSTC405RESETCHIP system_i/reset_block/reset_block/Rstc405resetchip PPC405_ADV_X0Y0.RSTC405RESETCHIP net (fanout=1) 0.437 system_i/RSTC405RESETCHIP PPC405_ADV_X0Y0.CPMC405CLOCK Tppcdck_RSTCHIP 0.650 system_i/ppc405_0/ppc405_0/PPC405_ADV_i system_i/ppc405_0/ppc405_0/PPC405_ADV_i -------------------------------------------------------------

--------------------------- Total 1.378ns (0.941ns logic, 0.437ns route) (68.3% logic, 31.7% route)

--------------------------------------------------------------------------------

Reply to
Dima

Hi Dmitriy,

Thanks, that's very useful. It's clear that the failing paths are inside the apu_fpu core.

Well according to the report you attached, the timing analyser believes that the clocks are properly aligned, and that's good enough for me. However, you can see that the "clock uncertainty" is pretty high at >0.5ns - that's more than 10% of the cycle budget. However, this alone doesn't explain why the design is failing timing.

Not this exact configuration, but it certainly runs at 275 CPU : 137.5 FPU in V4FX-11. I know this, because I'm the guy who design, implemented and tested this core. :)

Looking at the details of the report, there are a couple of things I wouldn't expect. Firstly there's a massive net delay up front which suggests that the fetch-stage instruction register is getting placed sub-optimally. Secondly, there's at least one extra level of logic more than there should be on this critical path. The fact that the offending net names are suffixed with "_mapNN" suggests that the mapper has rearranged some of this logic and managed to make it worse. I've seen similar happenings before.

Therefore... what options are you passing to MAP? In case you didn't know, there is a .opt file in your EDK project's /etc directory that specifies this. By default, it does *not* use timing-driven mapping, which is highly recommended. There are a few other options which may help or hinder you here: -global_opt and -logic_opt are the ones that spring to mind. Just add (or remove) the options in the .opt file under the "Program map" heading, one option per line.

I know there is a script called "Xplorer" which will try out these different options for you automatically. Seems to me like an admission that the tools don't work properly, but it's not just FPGA tools that exhibit this sort of unpredicable speed-optimization-may-slow-things-down problem - C compilers are just as bad! But if you get desparate, you could try that.

Once again, I hope you can get this sorted soon...

Cheers,

-Ben-

Reply to
Ben Jones

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.