Remapping from Virtex-II to Virtex-4

Anyone done any "what-if" remapping of Virtex-II designs to Virtex-4? I wanted to do this to see how the new technology performed, mainly to see if it was worth the trouble to upgrade some existing designs. We did this quite successfully some years back, stepping from Virtex-E to Virtex-II. The main obstacle then was the new size Block RAM going from

4kbit to 18 kbit apiece. If we left our Unisim and CoreLib components untouched we wasted 3/4 of the RAM, but if the number of Block RAMs in the chip was sufficient, all we had to do was to update the LOC-constraints for pins and DCM's in the .ucf-file. ISE even managed to re-target the Virtex-E DLLs to Virtex-II DCMs. Brilliant!

So I hoped it would be even better this time, since the Block RAMs are the same size, but there seems to be more to this than meets the eye. I commented out all the LOC-constraints in the ucf and had a go, after resynthesizing to XC4VLX instead of XC2V. But alas, I get a fatal error in MAP, complaining about SLICEL and SLICEM types of components. I suspect that this has to do with some of our CoreLib components, since they are the only place where there might be RLOC constraints in the EDIF, but before I go and re-generate all these I am curious to know if there is an easier way.

I am not out to squeeze the full performance out of the XC4VLX right now, but would like a "ball-park" figure of what might be expected in terms of utilization and speed, before we go ahead and commit to a full-scale conversion. That is why I don't want to spend too much time.

Regards, /Lars

Reply to
Lars
Loading thread data ...

One problem you have is that in Virtex-4 only half of the slices can support lut used as memory. In V2 all slices could be used. We have seen similar things in Spartan-3 particularly if you have used elements such as 32x1 ram. Alternative you may have tried to use a memory type lut where there isn't one due to using a RPM or constraint that simply isn't valid.

John Adair Enterpoint Ltd. - Home of MINI-CAN. The Spartan-3 CAN Bus Development Board.

formatting link

Reply to
John Adair

Aha! I knew that, but the access to that particular memory cell in my decaying brain was not operating at the time. That would make it hard to re-target CoreLib components I suppose...

Thank's for setting me straight! /Lars

Reply to
Lars

FOr the most part, a VirtexII design can be pretty much dropped into a virtex 4. You hit on one of the places you will have trouble: the slice M/slice L thing. The V4 CLB structure is substantially similar to the V2 structure except only even columns have the logic for LUT ram. Thus if you have an RPM with SRL16's or RAM16's placed in it, those have to go in even columns. There is also a bug in the mapper that causes problems if an RPM macro with memory elements straddles a BRAM or DSP column such it thinks that that any memory elements to the right of the DSP/BRAM column are in the wrong type of column even if they aren't. The work-around is to break the RPM up into smaller sub-RPMs that fit between the BRAM/DSP columns.

The other place you will have difficulty is if you have instantiated MULT18x18 primitives in the design, as these have to be converted to DSP48's. With only one register like the Mult18x18s, you will be disappointed with the performance, but it will work with a 1:1 replacement.

OK, so paying attention to these two issues will get your design into a Virtex4, but you won't reap the full benefit. You'll find the fabric carry chains are not any faster than the same speed grade (and in some cases are actually slower) V2. Also, the clock to output times on the BRAM without an added output register and unpipelined multiplier are not any faster. To get the performance promised, you need to turn on the pipelining in these elements so that the multiplier has a 3 clock pipeline (input, middle and output registers) and the BRAM a 2 clock pipeline (there is an added output register).

The big gains in V4 for signal processing type stuff are had with the DSP 48 slice's adder, which is quite a bit faster than the fabric carry chains. Unfortunately, using it is basically a clean sheet redesign because you also need to use the pipeline registers there to get the speed.

So in short, you can put your V2 design into V4 without a lot of effort, but you will likely be disappointed when it doesn't run any faster. In order to get the speed advantages, you need to redesign to the architecture.

Reply to
Ray Andraka

Thank you Ray, that was a good summary! Seems like we have our work cut out for us if we want the full potential, and I beleive I have to re-think the usefulness of my original intent of a quick "ball-park figure"... /Lars

Reply to
Lars

Hi

One thing more. As I saw moment before, DCM

- you can't use/set CLK_FEEDBACK="2X"

- only "1X" or "NONE"

regards

Jerzy Gbur

Reply to
jerzy.gbur

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.