Question about Bottom-Up Incremental Compilation Methodology in Quartus II

- X
- X.Y.
  
  Contact options for registered users
posted
16 years ago

Thu, Jul 26, 2007 2:01 AM

I am new to use Bottom-Up Incremental Compilation Methodology in Quartus and I have a question about it. I have exported partition from subproject and imported it to top-level design successfully. However, I can import a partition for only one time. In my project, I need to import the same partition for multiple times. Unfortunately, when I insert two blocks into the top-lever design and import the same partition, Quartus II report Error: "Found conflicting placement requirements for Partitions preserving Placement". Maybe, I made mistakes in some settings. Could someone tell me what should I do to solve the problem?

P.S. I am using Quartus II 6.0 and Cyclone II EP2C35F672C8, thanks a lot!

- B
- Ben Twijnstra
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Jul 26, 2007 9:30 PM

Hi X.Y.

Currently, Quartus uses absolute placement for imported blocks. So, every block that you import is forced to be placed onto the same location, generating the eror you descibe.

At this moment, the only thing you can do is to run multiple fits for every instance into a non-overlapping LogicLock region (in order to make sure that you don't get conflicting location assignments), and import every separate result into your toplevel project to solve your problem.

I do know though, that Altera is working on a solution to this issue.

Best regards,

Ben

- S
- Subroto Datta
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Thu, Jul 26, 2007 9:37 PM

Hi X.Y,

The Incremental Compilation flow currently does not allow the imported .qxp to be "stamped" onto different instances. This is coming. One workaround is to have a different HDL file and name for each instance. Admittedly, this is not ideal but in many cases is an easy solution. (If you're making changes on the top-level file, it's painful to repeat in multiple files. But if the changes are in the HDL files beneath that entity, then it all works smoothly after the initial set-up.)

One flow Iused often, mainly because it works and is easy, is the pseudo-bottom up flow. This basically involves putting partitions on the hierarchies that are in the same level as the one/s you are interested in and set them to Empty(so they have no logic, but nothing gets removed). I then work on the partitions I want with quick compiles. Then, when I get what I want, I set that partition to post- fit and either set the other partitions to Source or delete them altogether(making everything else one big partition). It's quick and easy without creating sub-projects, making sure their layout fits into the top-level, etc. Also, in Q7.1 you can export a .qxp from sub partitions, so you can always save off your results. This works with multiple instances of the same thing, since they now have different instances(and locations).

What end goal are you using Incremental Compilation flow for? Are you trying to reduce compile times, are you trying to preserve performance, or something else?

- Subroto Datta Altera Corp.

- X
- X.Y.
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 27, 2007 8:25 AM

Thanks for your reply! My end goal is trying to preserve performance. In our project, I use one Cyclone II FPGA to process four groups of image signal which comes from four cameras. The processing algorithms of the four groups of image signal are all the same. As a result, I plan to build a subproject implementing the processing of one of the four signals and export it as a partition. Then, I build a top level project and import it four times. Certainly, I will do four different pin assignments for the four partitions. It appears that LogicLock can do it also, am I right?

- S
- Subroto Datta
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jul 27, 2007 10:34 PM

I would recommend against using LogicLock for preserving performance(which is done through back-annotation of location assignments). LogicLock is excellent for floorplanning, but can have issues with these back-annotated assignments. That portion of the LogicLock flow is really meant to be replaced by the Incremental Compilation flow.

One thing I want to make sure of, does your design not meet timing when run flat? Also, is it large portions of your design or just a small sub-section that continually fails timing? I'm assuming it doesn't meet timing when put together, and it's not just a single block, as the strategy for these flows can be slightly different.

Do your four equal blocks connect to each other? Is there some central, common logic? Do they connect to pins? The problem I've seen with what you're trying to do is a good placement of a single block isn't good everywhere. For example, let's say you put them into the four quadrants of the device. In the lower-level you optimize one for the top-left corner, so the connections it makes to pins are all placed along the top-and-left side, and the connections you make to internal logic are on the bottom and right sides. Now, if you try to keep that placement but move it to an instantiation on the bottom- right, your pin and logic connections are reversed, and if these paths are critical at all, they can fail timing.

Just to go over the pseudo-bottom up flow again, take your top-level design and:

1) Put a partition on all four instances, and any thing else you want to put a partition on. 2) Floorplan the partitions(most likely into quadrants) (This is can be optional) 3) Set three of the four to empty and let the fitter work on the fourth one(say top-left region.) 4) Set the top-left region to Post-Fit and set a second partition to Source(or Post-Synthesis) and fit it 5) Repeat onto the third and fourth partition

6) If any of them still doesn't make timing, you can back and refit that one while leaving the rest post-fit.

The nice thing about this flow is each region is aware of pin locations, as well as any logic that is not set to empty. So if there is some central block of logic, it can optimize placement to connect to that. If the pin assignments have a different layour for all four instances, the fitter can optimize for that.

Hope this helps, Subroto Datta Altera Corp.

- X
- X.Y.
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jul 30, 2007 1:11 PM

Hi, Subroto,

Thanks for your reply, I have tried the pseudo-bottom up flow you recommend. It works well! Once I fit a partition, I can find its Floorplan Region in Timing Closure Floorplan. At last, there are four regions for the four partitions.

There are some answers for your questions as flow,

1, My design doesn't meet timing when run flat. 2, In my design, the four equal blocks connect to each other. 3, There is some central, common logic and they connect to pins.

And Besides, there are some questions I want to ask you:

1, What do you mean by "The problem I've seen with what you're trying to do is a good placement of a single block isn't good everywhere." 2, Actually, in your method, we need to put all the design partitions in a single Quartus project. The different thing is the compiling flow you told me. It is not a real bottom up flow (because it involves sub projects), so you call it pseudo- bottom up flow, am I right? 3, You have ever said " Also, in Q7.1 you can export a .qxp from sub partitions, so you can always save off your results. This works with multiple instances of the same thing, since they now have different instances (and locations). " in your first letter. Do you mean, we can import one sub partition multiple times in Q7.1, however, not in Q6.0?

Best regards.

Yours sincerely, X. Y.

- S
- Subroto Datta
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jul 30, 2007 8:00 PM

One problem is that the memory/DSP is not always uniform in the FPGA. For example, if you have an MRAM in the top of one instance, it may no longer be at the top of another instance and it would require a completely different fit. But issues can also be more subtle. For example, since they all talk to each other, the instance in the top- left quadrant will want to place logic in its bottom-right corner to talk to the other quadrant that is diagonal from it. If you tried to have identical placement for the entity in the bottom right quadrant, it would now have logic in its lower-left corner that has to go across the entire quadrant to get to its destination, and could fail timing.

2) Correct. This flow doesn't require any sub-projects to be created, which is why I call is pseudo-bottom up. 3) No, you still can't import a sub-partition multiple times(at least you can't import it and keep the placement). But in this flow you can export a .qxp for the top-left instance, one for the bottom-left, one for the bottom-right and one for the top-right. So now you've got four .qxp files representing the four quadrant instances(and with different/better placement for each one.)

Does the instance meet timing when run by itself? If not, how far off is it?

Hope this helps, Subroto Datta Altera Corp.

- X
- X.Y.
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Jul 31, 2007 3:06 AM

Hi, Subroto, Thanks for your help. The instance meets timing when run by itself. However, I meet a new problem when I try to optimize my design (I mean the instance). In my old instance, I use the same clock when image capture (storage), image display, and image processing. This clock, which is named "pclk", has a frequency of 24MHz. It is slow. The frequency of image capture and display cannot be changed because of the requirement of other device. So I want to increase the frequency of image processing. It involves SRAM reading, writing, and data processing. I use a PLL to acquire a clock of 72MHz. This is the problem. SRAM will also be read when image capture, and written when display. That means the clock, the address bus and data bus will be switched between the state of image capture/display and image process. Actually, I use two blocks: one for image capture/display and another for image process. And I use BUS MUX to switch address bus and data bus. Meanwhile, I use LPM MUX to switch the two clock of different frequency. Unfortunately, the instance does not meet timing. In Timing Analyzer Summary, it reports, Clock setup: 'pclk' has a slack of

-4.152ns and Clock hold: 'pclk' has a slack of -4.216ns. Could you be kind enough to tell me how to solve this problem?

Best Regards, Yours sincerely, X.Y.