CLK input DOES NOT use clk pin ( Altera Stratix II)

Hi All!

I have a project that use Altera Stratix II 2S180 as ASIC prototype. Because the ASIC has too many interface therefor too many clk and some of the clk does not route to fpga's dedicated clk pin ,for eg, pci clk does route to an normal I/O pin .

Because the fpga and the board expensive,the BOSS does not want to make a new board. After I read throught 2S180's datasheet and throught a lot ,I found this is a very hard problem because : 1 ) Global buffer tree's delay is very long , about 5ns. 2 ) From PAD to core , normal I/O has about 1ns's delay, 3 ) I can't use PLL to compensate I/O delay or global buffer delay since PLL's input must be a clk input pin or a global buffer. 4) Inserting LCELL into datapath of input signal will make my Tco bad.

How can I deal with this ? Is altera here ?

Reply to
huangjie
Loading thread data ...

It sounds like you just hit the classic ASIC to FPGA conversion problem of too many clocks. We have done a lot of this kind of work and generally it is best to plan FPGA use into the the IP from the start to make the conversion path easy.

One thing to do to try and do is obviously to try and reduce the numbers of clocks. Often ASIC designs will use gated clocks because it makes for smaller logic than having local clock enabled flip-flops. Often this does create designs with large numbers of clocks which does not sit well with most FPGA fabrics. Xilinx do have some tool support for locally routed clocks to cover this situation but I am not sure if Altera can offer this facility as yet.

Consider if you can alter your IP to use clock enables instead of a generated gated clock/s. Alternative if you board has multiple FPGAs look at partitioning to minimise the numbers of clock or to improve the distribution against your FPGA resources available. Often using a multiple FPGA platform is superior to using a single large FPGA based platform for ASIC prototyping.

John Adair Enterpoint Ltd. - Home of Broaddown1. The ASIC Prototyping Platform.

formatting link

Reply to
John Adair

Thank you for your replay ! But the board is built before I enter the company and the BOSS does not want to make a new board.

The ASIC has too many clock just because tt has too many interface but not gated clock.

Reply to
huangjie

route to an > normal I/O pin.

How fast are the clocks that are not on the dedicated clock pins? If they are slow enough, you can sample them with a faster clock to generate an enable signal on the edge you want, and run your internal logic on the faster clock using that enable. The code would be different for your FPGA vs. your ASIC though:

FPGA: process (fastclk) begin if RISING_EDGE(fastclk) then if enable = '1' then ...

ASIC: process (pinclk) begin if RISING_EDGE(pinclk) then ...

(It's for situations like this that I wish VHDL had a pre-processor like C) It might be tricky at PCI speeds, but if this is a prototyping system, you may be able to slow down your PCI clock.

Regards, John

Reply to
JustJohn

Unfortunatly,the clock does not slow enough,eg, one at 125M,pci at

33MHZ. Since they are interface to other device they can't slow down.
Reply to
huangjie

So clock everything at 125 MHz and use clock enables. Then use FIFO's or the infamous double latch to transfer between the 33MHz and 125Mhz clock domains.

Simon

Reply to
Simon Peacock

Thanks for your suggestion ! But first ,how to use "the infamous double latch" ? second, my asic does not have only one 125M clk, instead it have 5 more , and all of them are input from external chip and have no any frequency or phase relations.

Simon Peacock =E5=86=99=E9=81=93=EF=BC=9A

Reply to
huangjie

A/ Forget the ASIC.. Design the FPGA.. then work out how to translate that into an ASIC. The two are so totally different that if you try to design for both you will ultimately fail.

B/ The double latch.....

clk_transfer : process (rst, clk) is begin if (rst = reset_active_c) then tmp '0'); data_out '0'); elsif rising_edge(clk) then tmp1 >

Reply to
Simon Peacock

I have understood your idea, and know why yours work but mine cann't . Just because your slow clock is slow ,and mine is very fast. How can I deal with 125M clocks just as it is 2M ? How fast my "reference" for 125M ? Perhaps I can use a group of some phase-shift clocks to get a clk enable signals. Thank you again!

Reply to
huangjie

There are several possible solutions.

  1. Stratix II clocks don't have to come from dedicated clock inputs to reach the global clock networks. The dedicated clock inputs can reach the global clock networks without using any regular routing, so they result in less clock delay to your registers, and that is useful if you need a fast Tco to another chip. However, any I/O can reach dedicated global clock networks by using regular routing to get to the global network drive point. A clock constructed this way will have extra delay to reach each register, but the skew within the clock domain will still be fine.

This will happen automatically when you compile in Quartus II -- no need to do anything.

If you have 16 or fewer clocks, you are done. 33 MHz PCI has a loose enough Tco that you should comfortably meet it even with the larger clock delay that results from not using a dedicated clock pin.

  1. Quartus II only promotes non-PLL clocks to "chip-wide global networks" by default. There are 16 of these. If you have more than 16 clocks in your design, you probably want to use the 32 regional (1/4 chip) global networks as well. You can tell Quartus II to put a clock on a regional network by using the assignment editor to make a

"global signal = regional clock"

assignment to the clock signal. Since regional clocks can only reach 1/4 of the chip, you should make these assignments carefully -- ensure that all fanouts of the clock can be placed in the quadrant of the chip near the I/O driving the clock. Generally you should use up all 16 chip-wide global clocks first, and then use the regional clocks for the lower fanout clocks, or clocks that need faster Tco on registers driving output I/Os (regional clocks have lower delay).

If you have a clock that fits in 1/2 the chip, but not in 1/4 of the chip, use "global signal = dual regional clock" to combine two regional clock networks into one 1/2 chip-wide network for that clock signal. This burns two of your 32 regional clocks though.

  1. You can use locally routed clocks. Such clocks use general routing, and have higher skew than the dedicated (chip-wide global or regional) clock networks. However, they have low delay if the clock fanout is low, and hence can be good for Tco to an output I/O. To minimize the skew on such networks, you should make the assignment:

"maximum clock arrival skew = 0"

to the clock signal. This will tell the fitter to optimize this signal for low-skew. The skew we achieve is generally quite reasonable on such clocks (~300 - 600 ps, with higher fanout clocks near the upper end of the range), but it still isn't as good as that of a global clock. Hence I'd recommend the global clock approaches (#1 and #2) first. If you need more than 48 clocks (a lot!) use this technique to make low-skew locally routed clocks for the lowest fanout clocks.

  1. You could redesign your circuit to use fewer clocks, as other posters have suggested, but I suspect from your description that that is not necessary, and Stratix II in fact has plenty of clocks for what you need.

Regards,

Vaughn Betz Altera [v b e t z (at) altera.com]

Reply to
Vaughn Betz

Just to correct you .. Just because 125 MHz is the reference... that doesn't mean it can't be an ungated clock too!!! you don't have to multiply the reference up .. you just run the entire device at 125 MHz and "ignore" the other clocks.

However .. the suggestion by Vaughn is also good.. lots of clocks (if you can use them)

Simon

Reply to
Simon Peacock

Thanks for Betz and Simon. To Simon, my design have some clock at 125M without any phase and frequence relations but not only one, so which one should be the reference ? To Betz, my trouble is NOT too many clocks but tow many interface clocks not connected to the dedicated clock pin.Although some clocks slow eg:33M PCI,but some of very fast 125M. I know I can use global clock,but how to calculate the delay of global clock? Interface has a valid data window about 4ns, how can I or how many ns I should shift the global ? My problem is skew between chip internal and chip external ,but not skew in chip internal.

Reply to
huangjie

To figure out the delay of a global clock that isn't driven by a dedicated global clock pin you should put your design (or at least a skeleton of it) in Quartus II and compile it. Then you can get a full timing report to see where you stand. Make sure you set your timing constraints correctly. If you are running out of global clocks, start trying out some of the assignments I mentioned in my previous post.

Working out complex delays that include routing from the datasheet isn't going to work well, so it is much better to move on to Quartus.

In terms of meeting your I/O timing: if your clock has higher delay, you likely have to delay the data to meet your setup/hold window. Quartus will do that automatically for you. Just set the appropriate Tsu and Th constraints on your input pins. 4 ns is a pretty wide window, so you should meet timing when you compile using the default Quartus settings. For maximum safety, you should both optimize & check timing for both the slow timing corner (slow transistors, low V, high T) and the fast timing corner (fast transistors, high V, low T). To do this, turn on "optimize fast corner timing" in the fitter settings, and turn on "Report combined fast & slow timing" in the Timing Settings->More Settings dialog.

*If* there is a problem, you can refine your clocking strategy to reduce the delay to the registers capturing the input data. The Quartus assignment editor lets you set different fanouts of a clock to different types of global resources. Set the "from" node to the clock node, and the "to" node to the appropriate clock fanouts. So you could make the registers that capture input data on your 125 MHz clock use local routing ("global signal = off"), while letting the rest of the clock net be routed on a chip-wide global network and hence have low skew. If you do this, you have a potential hold time problem from the "low-delay-clock" capture registers to the "higher-delay-clock" other registers in the clock domain. If the Quartus timing analyzer flags such a hold violation, you can set Assignments->Settings->Fitter Settings->Optimize Hold Time to "All Paths" and re-compile: now Quartus will insert datapath delay on these register to register transfers to fix the hold violation for you.

So basically there are algorithms and options in Quartus to make exactly this kind of system work. Most likely everything will work out fine if you simply make all the timing assignments and compile. If there are problems, there are many controls and additional optimization algorithms that can be turned on to help you close timing.

Regards,

Vaughn Betz [v b e t z (at) altera.com]

Reply to
Vaughn Betz

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.