Manual Partitioning to Multiple FPGAs

Hi, I have a design which does not fit on my Altera Stratix device. I need to split it onto 2 Stratix devices. Is it possible to manually do this? I can't afford a partitioning software. The clock frequency for the design after fitting will be around 30MHz and I can run the design at a speed slower than that achieved after fitting. So can I safely operate the design at say 20Mhz if Quartus was to ensure a speed of 30Mhz on a single larger FPGA? Slowing the FPGA by

10 MHz would mean I have an extra 100ns delay which will be used up by the interconnect delay between the 2 FPGAs(due to rise time/fall time of IO pins). Assuming this approach works, approx. how much extra delay should I leave for the interconnect delays? Are there any other issues I should be aware of? Thanks Tushit
Reply to
tushit
Loading thread data ...

Apart from the obvious organisational problem - how to make the split - there may be some tricky issues about clock synchronisation. You need to be sure that setup AND HOLD times are met in both devices.

Sorry? 30MHz is 33ns period, 20MHz is 50ns period; sounds like only 17ns extra, to me.

Remember that the propagation delay of a typical output driver is dominated by the capacitance it is driving. Propagation delay across a typical PCB is around 2ns per foot, so that should be OK unless your FPGAs are a long way apart or you let your PCB autorouter do silly things. Data sheet specs for FPGA output drivers usually tell you how the delay increases as a function of capacitance, so it should all be fairly predictable.

If you are splitting one FPGA into two, it seems likely that the signals from one FPGA to the other will drive only one FPGA input in most cases. Therefore the capacitive slowdown should be modest.

Be ready to add pipeline stages in the design, to cope with the very large propagation delays of FPGA I/O pad structures. But 30MHz should be easy to achieve across the boundary.

Key suggestion: DON'T supply one FPGA's clock from an output on the other FPGA. Instead, be sure to supply BOTH FPGA's clocks from the same source. The worst-case skew between the two FPGA's clock buffer delays should be very much smaller than propagation delays of each FPGA's output pads; if this is true, you will have no problems with hold time.

-- Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how VHDL * Verilog * SystemC * Perl * Tcl/Tk * Verification * Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, Hampshire, BH24 1AW, UK Tel: +44 (0)1425 471223 mail: snipped-for-privacy@doulos.com Fax: +44 (0)1425 471573 Web:

formatting link

The contents of this message may contain personal views which are not the views of Doulos Ltd., unless specifically stated.

Reply to
Jonathan Bromley

I'd redesign it for a split but better yet get a bigger part (if possible) even if it means porting it to a different vendor. Cutting up a design into multiple pieces, even with tools that promise to do it, is dicey. You have timing issues, added real-estate as well as potential problems with scaling (if applicable). In addition, the design becomes bounded and dependent on the partition.

Reply to
fabbl

You may have some unused resources like block ram and multiplier/dsp blocks that could be used for logic.

Consider trying other synthesizers.

-- Mike Treseler

Reply to
Mike Treseler

I wonder about this because I'm looking into moving data across device boundaries for a project. The approach I am favoring at the moment is to have a source-synchronous bus + control + clock leave device A and enter device B. The output clock would be generated via DDR method within the IOB. It would seem to me that --assuming careful PCB layout-- this method might be preferable to having an external clock generator feed devices A and B. Am I missing something? I can see that with proper DCM configuration it truly doesn't matter which way you go (or it shouldn't)?

--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Martin Euredjian
 Click to see the full signature
Reply to
Martin Euredjian

Hi Tushit,

In Quartus II 4.0 try setting the following Logic Options:

a) Auto Packed Registers : to either Minimize Area or Minimize Area with chains. This is set in the Assignment Settings->Fitter Settings->More Settings Dialog:.

b) Optimization Technique is set to Area. This is set in the Assignment Settings->Analysis and Synthesis settings.

- Subroto Datta Altera Corp.

Reply to
Subroto Datta

I think one important idea is to use an approach that you are comfortable with. What is "best" probably depends upon details that haven't been specified yet.

What are you going to use for the main clock on device B? Can you run the whole chip off the source-synchronous clock, or does it need to run off a normal clock, in sync with device A. If the latter, then you need a FIFO or such to get across the clock boundary.

How fast are you running? Can you afford pipeline delays? Will it work if you put a pipeline stage at the output IOBs and another at the input IOBs? If things are slow enough so that works, it avoids the clock re-sync tangle.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
 Click to see the full signature
Reply to
Hal Murray

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.