Picking the best synthesis result before implementation

Out of curiosity, I wrote a script to explore with different options in the Vivado software (2014.4), especially on the synthesis options under SYNTH_ DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after synthesis , just enough to get the timing estimate. I explore everything except the d irective because it seems like you use the directive, you cannot manually s et the options

My goal is to see if it will give me a better result before I move on to im plementation. However, out of the 50 different results I see that a lot of the estimated worst slacks and timing scores are the same. About 40% of the results report the same values. I ran on 3 sample designs and it gave me t he same thing.

So my question is, is there a way to differentiate what is a better synthes is result? What should I look at in the report?

Reply to
James07
Loading thread data ...

Did you also differentiate by resource usage? Same timing result and lower usage would count as better, but sometimes different settings will, after optimisation, yield the same result.

It's also worth trying ISE, with both the old and new VHDL parser (though switching parsers is more likely to dance round bugs than improve synth results).

While Vivado is relatively new, ISE has been heavily tuned across the years and I wouldn't be surprised to find it sometimes gives better results.

If you try it, I'd be interested to see your conclusions.

-- Brian

Reply to
Brian Drummond

You can't use the old parser on 6 or 7 series parts. It's OK to use the newer parser for older parts, but the use_new_parser switch is ignored for 6 or 7 series. So in effect there's only one XST implementation to try if you are using 7-series parts. ISE does allow you to use SmartXplorer to investigate different canned sets of options, though. I usually find that you need to individually tune the settings to get the best results.

--
Gabor
Reply to
GaborSzakacs

he Vivado software (2014.4), especially on the synthesis options under SYNT H_DESIGN, like FSM_extraction, MAX_BRAM etc. The script stops after synthes is, just enough to get the timing estimate. I explore everything except the directive because it seems like you use the directive, you cannot manually set the options

implementation. However, out of the 50 different results I see that a lot o f the estimated worst slacks and timing scores are the same. About 40% of t he results report the same values. I ran on 3 sample designs and it gave me the same thing.

esis result? What should I look at in the report?

  1. Lower area utilization with similar timing results would be considered g ood. However, it will be even better to take a look at the individual utili zation of resources like LUTs, BRAM and DSP blocks. You may want to choose a synthesis result that allows you to add more features to your design in t he future. Such features may require BRAM or DSP in different proportions. So, it might be good to see the synthesis results, especially area, with re spect to expected feature changes in the future.
  2. Power is another factor that you may consider when deciding which is a b etter synthesis result. If you have two synthesis results, where one uses a lot of LUTs while the other uses a lot of DSP blocks, it is very likely th at the one with DSP blocks will dissipate lesser dynamic power. This is bec ause DSP blocks are optimized hard IP blocks on the device.
  3. Have you analysed your results with respect to pin assignment? If pin as signment is critical to how your FPGA will be placed on the board, you may want to see the synthesis results with that perspective. Under no pin assig nment constraint, the tool automatically assigns pins to the design. Pin as signment constraint is not applied by the tool during "synthesis-only" run. But the default pin assignment and corresponding synthesis results can be analyzed with respect to your planned pin assignment.
  4. If a large percentage of synthesis results give similar results, it also means that the tool is not finding many opportunities to perform various o ptimizations. It could be because your design is already very well architec ted or it could be that it needs to be re-architected if you are aiming for certain specific performance measures. As the designer, you know better wh ich is the case with the design.
Reply to
Sharad

As far as I can tell, the resource usage is almost the same and similar. I am taking another look. On the first glance, for the 40% I mentioned, they look almost the same, which is also partly why I can't tell these clones tr oopers apart.

Yes, I am intending to try it on ISE. The latest (and last!) ISE version 14 .7 works on one of the older V7 devices. I will try that and see what is th e result, although I am not so sure if it gives estimated timing scores aft er synthesis. Need to look into it.

Reply to
kt8128

good. However, it will be even better to take a look at the individual uti lization of resources like LUTs, BRAM and DSP blocks. You may want to choos e a synthesis result that allows you to add more features to your design in the future. Such features may require BRAM or DSP in different proportions . So, it might be good to see the synthesis results, especially area, with respect to expected feature changes in the future.

better synthesis result. If you have two synthesis results, where one uses a lot of LUTs while the other uses a lot of DSP blocks, it is very likely that the one with DSP blocks will dissipate lesser dynamic power. This is b ecause DSP blocks are optimized hard IP blocks on the device.

assignment is critical to how your FPGA will be placed on the board, you ma y want to see the synthesis results with that perspective. Under no pin ass ignment constraint, the tool automatically assigns pins to the design. Pin assignment constraint is not applied by the tool during "synthesis-only" ru n. But the default pin assignment and corresponding synthesis results can b e analyzed with respect to your planned pin assignment.

This is a good point. No, I haven't got to that step. Based on what I under stand from the Vivado flow, that happens during place_design phase. Hmm... so perhaps the next step is to take that 40% results and continue running t hem till end of place_design, and check out the timing estimates. I guess t he later it is in the flow, the more accurate it becomes.

so means that the tool is not finding many opportunities to perform various optimizations. It could be because your design is already very well archit ected or it could be that it needs to be re-architected if you are aiming f or certain specific performance measures. As the designer, you know better which is the case with the design.

I wouldn't say it is already well-architected. Sometimes my hands are tied and I can't change the code. So I am exploring ways to work the tools to my advantage. Thanks for the helpful comments.

Reply to
kt8128

My experience is the timing numbers from synthesis are totally bogus. You need to do a place and route if you want to compare timing data. Even then you can get noticeable improvements in timing by running more than one routes with different settings. So the connection back to your synthesis parameters is hard to explore without a lot of work. Using one pass on place and route may show synthesis option A to be the best by 4% but when you explore the routing options you may find synthesis option B is now 7% better.

I think this problem space is very chaotic with small changes in initial conditions giving large changes in results.

I worked on a project once where the timing analysis tools were broken saying the project met timing when it didn't. The design would fail on the bench until we hit it with cold spray. I tried using manual placement to improve the routing, but everything I did to improve this feature made some other feature worse or even unroutable.

We automated a process of tweaking the initial seed parameter to get multiple runs each night. The next day we would test those runs on the bench with a chip warmer. Eventually we found a good design and shipped it. Ever since then I have treated the entire compile-place-route process like an exploration of the Mandelbrot set.

Is there a particular problem you are having with the results? Is the design larger than you need? If you haven't done a place-route I guess it can't be that it is too slow. If you are just trying to "optimize" I suggest you don't bother and just move on to the place and route. See what sorts of results you get before you spend time trying to optimize a design that may be perfectly good.

There is a rule about optimization. It says *don't* unless you have to. Optimizing for "this" can make it harder to get "that" working or at very least result in spending a lot of time on something that isn't important in the end.

--

Rick
Reply to
rickman

Yes, I understand that and have seen that myself. Part of it is why I am st ruggling to qualify what is a "good" synthesize result, with meeting timing as the end goal. For example, let say "A" synthesis set has 10% of meeting timing with various P&R settings. "B" synthesis set has only 5%. *Somethin g* has got to be that difference.

This is hilarious!

I have done place-route a couple of times and it takes around 8 hours. (1 h our for synthesis) I tried different directives as well and it gave me a va riety of results.

I understand how I am approaching this may not be practical in the grand sc heme of things. BUT I got curious when I read in the V design methodology t hat if you get -300ps after post-synthesis, you can definitely meet timing. I also vaguely remember an illustration showing synthesis has a 10x effect on end results. I wonder how and who did these estimations.

Reply to
kt8128

I think there is little about your synthesis result that can be easily measured in a meaningful way to predict the timing result of routing. That is what I mean about it being "chaotic". It is much like predicting the weather more than a week out. You can see general trends, but hard to predict any details with any accuracy. So the weather man just doesn't try.

In FPGAs the synthesis result has no insight into routing so they just measure the logic delays and then add a standard factor for routing. Routing can be impacted by the logic partitioning in ways that are hard to predict. I'd be willing to speculate it is a bit like the way they proved in general the task of predicting the run time of a computer algorithm will take as much run time as the algorithm itself. So the best way to estimate run time is to run the task. Best way to estimate routing result is to run routing. Routing is often half the total path time, so without good info on that there is no decent guess to timing.

this was also some time ago using the Altera Max+II tools when Quartus was the "current" tool. Trouble was Altera didn't support the older devices with the new Quartus tool. We were adding features to an existing product so we didn't have the luxury of using the new tools with new parts. Eventually they relented and did support the older parts with Quartus, but it was well after our project was done. I expect we weren't the only customer to want support for older products.

Must be a large project. The project we were on would load up multiple runs on many CPUs overnight. This would give us many trials to sort through the next day. Best if this is done on a design that has passed all logic checks and even runs in the board with a reduced clock or cold spray.

I'm not sure what a "10x effect" means. But sure, a bad synthesis will give you a bad timing result. On large projects it is hard to deal with timing issues sometimes. You might try breaking the project down to smaller pieces to see if they will meet timing separately. Perhaps you will find a given module that is a problem and you can focus on code changes to improve the synthesis? I don't think you can do tons just using tweaks to tool parameters.

Are your modules partitioned in a way that lets each one be checked for timing without lots of paths that cross?

--

Rick
Reply to
rickman

It does. If you can't see what you want in the summary, read the .syr (Synth report) file.

-- Brian

Reply to
Brian Drummond

It is possible that you are giving tools too simple test-cases. Try giving them something complicated - like big designs with VERY much interconnectivity that also need to be fast - and see how they fare then.

Reply to
Aleksandar Kuktin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.