Writing PCI constraints in Altera

Hi, I am fairly new to FPGAs. I am trying to write the constraints for the PCI module on an Altera Stratix device. I am using QuartusII for all synthesis and P&R. The PCI spec says I need to ensure a setup time of 7ns for all pins. The PCI clock itself works at 33Mhz. I want to know the following:

1) Is it okay if I just constraint the PCI clk of my design to 50Mhz (30ns for the 33Mhz clock and another 10ns to ensures that the setup time is met)? I realise this will be an overkill on the internal logic but may save me some effort. 2) The other way I think to do this is to constraint the PCI clk to 33MHz and specify the external delay on all the PCI signals to 7 or 8ns. While setting PCI clk to 33Mhz I also ticked the option of including external delays in the frequency calculation. Is this the correct approach? OR do I need to setup the tco. Thanks in advance. Regards Tushit
Reply to
tushit
Loading thread data ...

Hi Tushit,

You can get an idea of the type of constraints required by downloading the Altera PCI Megacore and studying the constraint files that ship with it. To download the PCI Megacore:

  1. Open
    formatting link
  2. Type in PCI in the IP Megasearch box.
  3. Click on the Try OpenCorePlus for PCI Compiler, 32 bit Master/Target.
  4. Download the Free Evaluation.
  5. Install it into a directory.
  6. cd to pci_compiler-v3.0.0\pci_mt32\const_files
  7. There are three constraint file scripts in this directory a. mt32_66_30_ep1c12f324c7_q40.tcl b. mt32_66_30_ep1s40f1020c6_q40.tcl c. mt32_stratixii.tcl

Study the constraints created in the mt32_stratixii.tcl in particular the procs set_pci_timing and constraint_file.

That should help answer your questions.

- Subroto Datta Altera Corp.

Reply to
Subroto Datta

If you need a setup time of 7ns, simply add a TSU_REQUIREMENT of 7ns. You can add this assignment individually to each of your pins (using the Assignment Editor), use wildcards to group pins together, or simply add a global Tsu requirement ("Timing Settings" DLG). Using Tcl (if you are a command-line type of guy), simply do:

set_global_assignment -name TSU_REQUIREMENT 7ns or set_instance_assignment -to * -name TSU_REQUIREMENT 7ns

(Use "quartus_sh --qhelp" for more info on Tcl)

Not really, increasing the code frequency is not going to help you with your I/O timing. If anything, it will make it worse. You need an I/O timing constraint to get the fitter to optimize the I/O path(s)

This will also work, specially in V4.0 where we added support for the new INPUT_MAX_DELAY constraint (a great improvement over the EXTERNAL_DELAY feature in V3.0), but in your case, it seems like a simple TSU requirement is all you need (at least in terms of time constraining the design)

And yes, the Tsu will only optimize your input path. For the output path, you need to specify a TCO_REQUIREMENT using the same methodology (or OUTPUT_MAX_DELAY in V4.0)

As Subroto indicated, studying the PCI core provided by Altera is a good way to learn how to do it.

-David Karchmer Altera Corp.

Reply to
David Karchmer

Hi Tushit,

As Subroto said, the best thing to do is to study Altera's PCI core to get all the constraints right.

Here's a quick summary of the constraints for 33 MHz PCI:

- 7 ns Tsu constraint on all inputs

- 11 ns Tco constraint on the outputs

- 33 MHz constraint on the PCI clock

- 0 ns Th constraint on the inputs

Don't forget the Th (hold-time) constraint, since the PCI spec needs it.

The Tsu and Tco constraints can instead be converted to clock path constraints with the INPUT_MAX_DELAY constraint as David said, but it would be easier to just set them as Tsu and Tco since then you don't have to work on precisely what INPUT_MAX_DELAY you have to set.

Vaughn

Reply to
Vaughn Betz

Hi, Thanks for all the help. I wrote the constraints as you have described, but I am not able to meet the setup time requirement. The PCI design was done originaly for an ASIC and changing it will be a big project by itself. My setup time on some paths is 11-12ns. This is because of a lot of comb. logic in the data path between pin and register. Is it possible to add delays to the clock path only for the register which has the setup time violation? This would mean that I would be trading off freq. for setup time. Does Quartus do this for me through any optimization options? I did see a tsu-freq trade off but that is opposite of what I need. Thanks again for all the help. Regards Tushit

Reply to
tushit

Hi Tushit,

It sounds like you have too many levels of logic on your set-up path. That is definitely the most difficult set of paths in PCI.

Quartus does not have an option to automatically delay the clock to a register. There are (tricky) ways to do it by hand, but I wouldn't recommend going down that route.

Which device and speed grade are you using? Which synthesis tool? Knowing what you're using will help me give more focused answers.

Altera's PCI cores have 2 or 3 levels of logic on the Tsu critical paths. The most critical paths are those involving trdy and irdy in most cases, since those high-fanout signals are harder to localize. So the most important thing to meeting PCI timing is to get a small number of levels of logic on those paths. If you are using Quartus Integrated Synthesis and finding it is not doing a good job on that path, you can put lcell buffers in your HDL to tell the mapper where you want the lcell boundaries. In most circuits this isn't necessary, but PCI is a case where synthesis can fall short.

Another, simpler option, is to turn on physical synthesis and see if it improves your results. Physical synthesis knows what the placement is, so it can make better informed decisions about what should be a logic cell than the front-end synthesis.

The good news is that if you get the levels of logic down to a reasonable level, the fitter should do the rest automatically for you, so long as you're using Quartus II 4.0 or later. We meet 66 MHz,

64-bit PCI with no place & route constraints in Stratix, so 33 MHz is easy for the fitter.

Hope this helps. Let me know how it turns out!

Vaughn Altera

Reply to
Vaughn Betz

Hi, You are right, the trdy,irdy, cben, framen are the problem areas. I am using quartus to do the synthesis and P&R. I looked at the timing analysis report and the report for delay in data path looks like this: I have edited slightly to make it readable...

------------------------------------------------------------------ Info: 1: + IC(0.000 ns) + CELL(0.976 ns) = 0.976 ns; Loc. = Pin_AT6; PIN Node = 'cben[3]' Info: 2: + IC(2.595 ns) + CELL(0.213 ns) = 3.784 ns; Loc. = LC_X92_Y16_N1; COMB Node = ' Info: 3: + IC(0.364 ns) + CELL(0.213 ns) = 4.361 ns; Loc. = LC_X92_Y16_N3; COMB Node = ' Info: 4: + IC(0.139 ns) + CELL(0.087 ns) = 4.587 ns; Loc. = LC_X92_Y16_N4; COMB Node = ' Info: 5: + IC(0.351 ns) + CELL(0.087 ns) = 5.025 ns; Loc. = LC_X92_Y16_N9; COMB Node = ' Info: 6: + IC(1.121 ns) + CELL(0.332 ns) = 6.478 ns; Loc. = LC_X91_Y19_N8; COMB Node = ' Info: 7: + IC(0.139 ns) + CELL(0.087 ns) = 6.704 ns; Loc. = LC_X91_Y19_N9; COMB Node = ' Info: 8: + IC(0.352 ns) + CELL(0.087 ns) = 7.143 ns; Loc. = LC_X91_Y19_N3; COMB Node = ' Info: 9: + IC(2.143 ns) + CELL(0.213 ns) = 9.499 ns; Loc. = LC_X82_Y31_N6; COMB Node = ' Info: 10: + IC(0.340 ns) + CELL(0.087 ns) = 9.926 ns; Loc. = LC_X82_Y31_N9; COMB Node =' Info: 11: + IC(1.658 ns) + CELL(0.087 ns) = 11.671 ns; Loc. = LC_X88_Y27_N8; COMB Node = ' Info: 12: + IC(1.527 ns) + CELL(0.087 ns) = 13.285 ns; Loc. = LC_X82_Y31_N2; COMB Node = ' Info: 13: + IC(1.641 ns) + CELL(0.087 ns) = 15.013 ns; Loc. = LC_X81_Y26_N0; COMB Node = ' Info: 14: + IC(0.139 ns) + CELL(0.087 ns) = 15.239 ns; Loc. = LC_X81_Y26_N1; COMB Node = ' Info: 15: + IC(0.593 ns) + CELL(0.087 ns) = 15.919 ns; Loc. = LC_X82_Y26_N5; COMB Node = ' Info: 16: + IC(0.366 ns) + CELL(0.213 ns) = 16.498 ns; Loc. = LC_X82_Y26_N1; COMB Node = ' Info: 17: + IC(0.918 ns) + CELL(0.364 ns) = 17.780 ns; Loc. = LC_X85_Y26_N2; REG Node = ' Info: Total cell delay = 3.394 ns Info: Total interconnect delay = 14.386 ns

--------------------------------------------------------------------------- The delay in clock path is about 4ns and this gives a tsu of 13 ns or so. It is going through a lot of combo nodes (I think 17!!). Will it help to do a manual fitting.

To check if the routing delays could be reduced I cleaned up my device and did a syn and P&R only with the PCI module. I assume this will give a better P&R fit but I still got a similar slack for tsu. My device util. with the full design in 75% of a stratix EP1S80 C6 grade. With only PCI this goes down to ~20%.

I also tried the physical synthesis of combo logic option but this didn't help.

Someone suggested reducing the fanout of the signals by duplicating them, but I assume Quartus must be doing that for me. I know xilinx has a "max fanout" setting, though I couldn't find it in quartus. If I need to do this manually how will I do this?

If all else fails I will have to look into redesigning the combo logic manually. Thanks and regards Tushit

Reply to
tushit

To set the Max Fanout use the Quartus II Assignment Editor. The steps are as follows:

  1. Click on Assignments->Assignment Editor
  2. Click on the Logic Options Button in the top right.
  3. Double Click on am empty cell in the To column. You can either type in your instance name whose fan out you want to restrict or click on the arrow button which will bring up the node finder. You can select the name in the node finder and hit OK.
4.In the Assignmnet Name field down select Maximum Fan-Out from the drop down.
  1. In the Value Column type on the Fan-Out number.

Alternatively if you know the name of the instance whose Fan Out you want to restrict from the timing report, right click on the name in the timing report and select Locate to Assignment Editor. This will open up the Assignment Editor and populate the To column for you. Then follow steps 2, 4 and 5 above.

- Subroto Datta Altera Corp.

Reply to
Subroto Datta

[... snip ...]

Hi Tushit,

I don't think you'll have much luck with manual placement and routing, or emptying the device of other logic. The problem is simply too many logic levels on the Tsu critical path.

Maximum fanout constraints aren't going to be much help here either, since in the PCI cores I've seen the high-fanout signals are trdy and irdy, and since those are sourced by IOs you can't duplicate them.

You'll have to redesign the Tsu-critical logic, or guide the technology mapper to a better solution for Tsu by adding lcell buffers to your HDL.

Regards,

Vaughn Altera

Reply to
Vaughn Betz

Hi Vaughn, Subroto Thanks for all your help. I am abandoning trying to meet the setup time since the project is a prototyping of an ASIC design on FPGA and will not go to a customer. As long as the PCI works on some PC with reasonable reliability we will be happy and the design does seem to work okay even with the 7ns slack on the setup time. I think this may be because the PCI slot of my PC supports 66Mhz PCI in the same slot and so the motherboard and PCI chip on it may have lower tco and propagation delay than the PCI spec. requires, giving me extra margin for the tsu. Thank you once again. Regards Tushit

Reply to
tushit

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.