I am working with the ML410 board and the V4FX60 FPGA. For the past week and a half, I've been having problems meeting timing and I just can't figure out why. In my design, I need to use DDR2 and the APU controller as well as the FPU core. I also need two OPB peripherals: the RS232 and the SystemACE. These are the clocking requirements for my design:
All of these are within the documented limits for the cores and this speed grade FPGA (-11). These are the steps I follow to get to this design:
Start with BSB wizard and generate a 300 MHz CPU / 100 MHz PLB design with all the above mentioned IPs, except the FCB and the FPU.
This design does synthesize, map, place and route, although PAR complains that timing constraints are not met for the DDR2 clocks. I talked with Xilinx about this and they assured me this is OK (they know about this issue). Everything works fine at this point, although I am not too happy with Xilinx's lack of explanation for this timing issue.
Next, I modify the design to generate a CLKFX from one of the two DCMs to make the 266 MHz CPU clock. I also uncheck the CPMC405SYNCBYPASS option in the PPC405 configuration so that non- integer CPU-PLB ratios are allowed. I update the frequency in the software settings.
Finally, I bring in the FCB and the APU-FPU cores. I tie the FCB to the processor clock, and use a CLKFX output on one of the DCMs to generate the 133 MHz clock for the APU-FPU core. I enable APU support in the PPC405 processor core.
When I build this system, I get 3 timing constraints not met: 1 for the DDR2 (as seen previously, but according to Xilinx not to worry about, and 2 new ones). The 2 new failing constraints are the CPU 266 MHZ clock and the FPU 133 MHZ clock. I tried executing my software on this faulty hardware and it works sometimes, but not always. I do get DCM lock though.
Like I said, I can't think of why this would not meet timing! Is it harder to route a 266 MHz CPU clock than a 300 Mhz design? What can I do to mitigate this? I tried various options in synthesis, map, PAR, and the Xplorer script. But I think it probably is something in my configuration of the DCMs, because it should work as I am not asking for a very high frequency.
Any help would be very greatly appreciated (I would hate to go back to a working 200 MHz design because of this).
Thanks for the detailed information. Perhaps you could also post some of the failing paths from the timing report? Otherwise we're all a bit in the dark.
Cascading DCMs in the way you've described does tend to increase jitter on the resulting clock nets, which can often eat into your timing budget. A few questions:
(1) How full is your device? (2) Which version of EDK are you using; and, in particular: (3) ...which version the apu-fpu core?
The cascading DCMs are mainly for DDR2 clock and its clk_90 counterpart. This is the way they come out of BSB wizard. But path 3 does fail from the start (DDR2 clock). It comes out failing from the BSB wizard without me changing anything in the design.
(1) The design is not full at all. I am only using 20% of the slices now. (2) I am using the latest version of EDK (9.1i with SP1) (3) I am using the latest apu-fpu core (v3.0)
I found it odd that even without the apu-fpu core, my CPU net fails when I try to clock it at 266 MHz instead of 300 MHz. All I did there was change the CLKFX multiplier and divider values (and uncheck CPMC405SYNCBYPASS)! Is there something I am missing here?
I am also using the MGT protector core, which runs off proc_clk_s. Would that impact the timing? I will try without it.I did notice it LOCed one of my DCMs.
Well, those are the failing timespecs, certainly! More useful would be the paths themselves, from the TRACE report (.twr file), telling you the source and destination nets of the worst N paths in each timing group (where N is by default 3, I think). The clock uncertainty and skew are also reported there.
Looking again at your clocking architecture, I'm not sure that the DCM configuration you're using will guarantee a rising-edge alignment between the 266MHz and 133MHz clocks - which is a requirement for the apu_fpu core.
I haven't used the DDR2 core in the latest release but I understand you've been advised that this warning can be safely ignored? Doesn't sound ideal though. :-\
Excellent, thanks. The reason I wanted to check was that required clocking configuration changed considerably (as you might or might not have noticed) between v2.1 and v3.0 of the FPU core.
Yes, that's weird. I was about to ask what other logic you had running off proc_clk_s, but...
Good idea. Otherwise, the only things that are running off proc_clk_s are (a) the FCB bus interface, which is mostly just wires, and (b) the CPU itself, which just has minimum clock pulse widths (1.818ns in the configuration you're using, I believe).
Ahh yes, of course. I have pasted the failing paths right below this message. Unfortunatly, I couldn't attach the log file directly, so please excuse the poor formatting. I reran the build this time without the MGT protector core. Also, I implemented the workaround in AR 24326 which suppresses the DDR2 failing constraint. This note explains why it is safe to suppress the DDR2 timing failure. I am a lot more comfortable with this explanation than simply to ignore it. Nevertheless, the design still fails wiht proc_clk_s and apu_fpu_clk_s not meeting timing.
I also implemented a very basic design with only a few components: CPU, PLB BRAM, and OPB RS232. I changed the CPU frequency to 266.67 MHz by only modifying CLKFX on dcm_0 (this one only has a single DCM since no DDR2). I changed CLKFX to be M=8 and D=3 (instead of the original M=3, D=1 for 300 MHz). Even this basic design still fails timing! I have pasted the failing path for this design as well (see below).
Do you have a suggestion on how I should configure the DCMs for a
266-133 CPU-FPU ratio? Even a working basic design, without any memory (like DDR2) would help. These frequencies are within the limits and I am sure Xilinx must have tested this exact configuration to find the maximum allowable frequency (which is 275 MHz for the CPU -11 grade and 137.5 MHz for the FPU). If you have any suggestions here, I will try them. I am running out of ideas. I opened up a webcase today as well to see if I can get help from that avenue.
The failing paths are pasted in below. Thanks for your help.
- Dmitriy
================================================================================ DESIGN 1 with DDR2 (apu_fpu_clk_s) ================================================================================ Timing constraint: TS_system_i_dcm_0_dcm_0_CLKFX_BUF = PERIOD TIMEGRP "system_i_dcm_0_dcm_0_CLKFX_BUF" TS_sys_clk_pin / 1.33333333 HIGH 50%;
423953 items analyzed, 50 timing errors detected. (48 setup errors, 2 hold errors) Minimum period is 9.606ns.
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns
Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns
Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[31].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_3 Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns
Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[1].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_state_2 Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------
--------------------------- Total 4.189ns (1.727ns logic,
2.462ns route) (41.2% logic,
58.8% route)
-------------------------------------------------------------------------------- Hold Violations: TS_system_i_dcm_0_dcm_0_CLKFX_BUF = PERIOD TIMEGRP "system_i_dcm_0_dcm_0_CLKFX_BUF" TS_sys_clk_pin /
1.33333333 HIGH 50%;
-------------------------------------------------------------------------------- Hold Violation: -0.083ns (requirement - (clock path skew + uncertainty - data path)) Source: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[30].r (FF) Destination: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_fetch_id_class.fpu_maj (FF) Requirement: 0.000ns Data Path Delay: 0.454ns (Levels of Logic = 1) Positive Clock Path Skew: 0.014ns Source Clock: system_i/proc_clk_s rising at 3.750ns Destination Clock: system_i/apu_fpu_clk_s rising at 7.500ns Clock Uncertainty: 0.523ns
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns
Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[30].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_id_class.fpu_maj Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.367ns
Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/rs_fir_l1[26].r to system_i/apu_fpu_0/ apu_fpu_0/gen_apu_fpu_sp_full.netlist/fpu_fetch_id_class.fpu_maj Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.236ns Phase Error (PE): 0.231ns
Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetcore to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.236ns Phase Error (PE): 0.231ns
Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetchip to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.311ns Phase Error (PE): 0.184ns
Maximum Data Path: system_i/apu_fpu_0/apu_fpu_0/ gen_apu_fpu_sp_full.netlist/fpu_ldst_store_phase2 to system_i/ apu_fpu_0/apu_fpu_0/gen_apu_fpu_sp_full.netlist/rs_ardr_l1[29].r Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------
================================================================================ DESIGN 2 with no DDR2 (proc_clk_s) ================================================================================ Timing constraint: TS_system_i_dcm_0_dcm_0_CLKFX_BUF = PERIOD TIMEGRP "system_i_dcm_0_dcm_0_CLKFX_BUF" TS_sys_clk_pin / 2.66666667 HIGH 50%;
3 items analyzed, 3 timing errors detected. (3 setup errors, 0 hold errors) Minimum period is 5.438ns.
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.235ns Phase Error (PE): 0.140ns
Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetcore to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.235ns Phase Error (PE): 0.140ns
Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetsys to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) ------------------------------------------------------------
PE Total System Jitter (TSJ): 0.000ns Total Input Jitter (TIJ): 0.000ns Discrete Jitter (DJ): 0.235ns Phase Error (PE): 0.140ns
Maximum Data Path: system_i/reset_block/reset_block/Rstc405resetchip to system_i/ppc405_0/ppc405_0/PPC405_ADV_i Location Delay type Delay(ns) Physical Resource Logical Resource(s) -------------------------------------------------------------
Thanks, that's very useful. It's clear that the failing paths are inside the apu_fpu core.
Well according to the report you attached, the timing analyser believes that the clocks are properly aligned, and that's good enough for me. However, you can see that the "clock uncertainty" is pretty high at >0.5ns - that's more than 10% of the cycle budget. However, this alone doesn't explain why the design is failing timing.
Not this exact configuration, but it certainly runs at 275 CPU : 137.5 FPU in V4FX-11. I know this, because I'm the guy who design, implemented and tested this core. :)
Looking at the details of the report, there are a couple of things I wouldn't expect. Firstly there's a massive net delay up front which suggests that the fetch-stage instruction register is getting placed sub-optimally. Secondly, there's at least one extra level of logic more than there should be on this critical path. The fact that the offending net names are suffixed with "_mapNN" suggests that the mapper has rearranged some of this logic and managed to make it worse. I've seen similar happenings before.
Therefore... what options are you passing to MAP? In case you didn't know, there is a .opt file in your EDK project's /etc directory that specifies this. By default, it does *not* use timing-driven mapping, which is highly recommended. There are a few other options which may help or hinder you here: -global_opt and -logic_opt are the ones that spring to mind. Just add (or remove) the options in the .opt file under the "Program map" heading, one option per line.
I know there is a script called "Xplorer" which will try out these different options for you automatically. Seems to me like an admission that the tools don't work properly, but it's not just FPGA tools that exhibit this sort of unpredicable speed-optimization-may-slow-things-down problem - C compilers are just as bad! But if you get desparate, you could try that.
Once again, I hope you can get this sorted soon...
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.