Is it possible that a Virtex II device performs below its spec?

- M
- MM
  
  Contact options for registered users
posted
20 years ago

Mon, Feb 2, 2004 8:15 PM

I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The timing analyzer reports the clock period to be below 19ns. However, in practice, only one device out of 3 works at this speed. Two others were happy when I slowed the clock to 45 MHz (I didn't try any intermediate frequencies). The design basically consists of a 3rd party IP core, for which I don't have a source (I believe it was designed in schematic), some state machines, a bus interface and some Coregen memories. The bus runs at slower clock, but it is fully decoupled from the IP core (through the memories). The IP core is a fully synchronous design according to its author. The clock comes directly from an external crystal oscillator. I tried looking at unconstrained paths in the timing analyzer, but couldn't see anything suspicious...

Any ideas to where to look?

Thanks, /Mikhail

--
To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

- B
- Brannon King
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Feb 2, 2004 8:38 PM

I would suggest running your internal clock signal out to a pad and probing it. That was how I determined I needed to run DCI on some of my input pads instead of the default TTL. Sometimes the DCM would lock to the wrong frequency, but it would always get it correct when I slowed the frequency down. It ended up being reflective noise on the line which the DCI cleared up. As for running below specs, as far as I understand that should only happen when the incoming power or temperature are out of spec.

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Feb 2, 2004 8:58 PM

Brannon,

probing

I am not using the DCM at all... The internal clock is what comes in and it is only 50 MHz.

That doesn't seem to be the case. The temperature is room or slightly higher, and the core voltage I measured at 1.506V...

/Mikhail

--
To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

- A
- Austin Lesea
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Feb 2, 2004 9:43 PM

MM,

The most likely cause is that your design does not, in fact, meet timing. This comes about by many ways, most usually a bug in the constraints, or in the design itself.

Sorry that it isn't some bizarre unheard of problem, but I can only guess based on the thousands of cases that come through.

Common problems: use of the wrong clock edge, design did not specify global resources so clocks are being routed using general interconnect, unconstrained paths leading to inefficient placement by the tools. Multi-cycle constraints confusing the tool and leading to no constraints at all.

Aust> I have a design, which is supposed to work in XC2V2000-5 at 50 MHz. The

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Feb 2, 2004 10:45 PM

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Mon, Feb 2, 2004 11:01 PM

Austin,

I agree. However, it is not one of the basic things such as clock not using global resource. Here are the relevant parts of the PAR report:

+----------------------------+----------+--------+------------+-------------

| Clock Net | Resource | Fanout |Max Skew(ns)|Max Delay(ns)|

+----------------------------+----------+--------+------------+-------------

| clk50_bufgp | Global | 2655 | 0.314 | 1.330 |

+----------------------------+----------+--------+------------+-------------

| lclk_bufgp | Global | 600 | 0.300 | 1.316 |

+----------------------------+----------+--------+------------+------------- +

----------------------------------------------------------------------------

--
  Constraint                                | Requested  | Actual     |
Logic
                                            |            |            |
Levels
----------------------------------------------------------------------------

- J
- John Retta
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Tue, Feb 3, 2004 3:17 PM

Here are a couple of things to try - [1] Take a look at the the -u report ... keep adding contraints until unconstrained paths drop to zero. [2] Take gate level netlist into simulator, and see if there are problems with simulation. [3] Make an effort to ensure that all I/O connecting to core are registered. Observations - [1] Trace does a terrific job with synchronous paths, and (answering your original question), part problems are typically more design problems (but it sounds like you already accept this .... just looking for some ideas). [2] On the marginal boards, hit with shot of cold spray to see if chips start to opperate at 50 Mhz. [3] The symptom that one board works, but two don't is a little of a puzzler. That indicates problem may not be in time domain crossings but rather in synchronous paths which do not meet timing, where device specific process variations take have an effect. Or it could also mean there is something marginal at the PWB level .... GND scheme, decoupling, marginal voltages that push two units under threshold. (Check VCC levels ...sorry to state obvious) [4] Key might be to isolate block that is really failing. Is it really core? Something like a "signature" analysis on outputs of a block for periods that result in identical processing are helpful. ie... Do outputs of block 1 across an identical data set differ among the "good chip" vs the "bad" devices. [5] If you can over constrain your clock frequency for the entire design, or just the core, then try place-and-route with modular aproach, that might give you margin on your synchronous paths.

Anyway .... good luck. Regards, John Retta Owner and Designer Retta Technical Consulting Inc.

303-926-0068

email : snipped-for-privacy@rtc-inc.com web :

formatting link

"MM" wrote in message news:bvmaua$u760t$ snipped-for-privacy@ID-204311.news.uni-berlin.de...

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Tue, Feb 3, 2004 5:43 PM

MM, Just a thought, did you check the decoupling of the 1.5V supply? Those MLCCs all look the same, if the board got fitted with 10pF rather than

10nF, it might affect the performance! You could also try reducing the 19ns constraint, but I'd still be worried about the thing failing when the timing says it should pass. Good luck, please let us know how you get on! Syms.

- B
- B. Joshua Rosen
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Tue, Feb 3, 2004 5:56 PM

It could be a reset path, the timing analyzer doesn't check them unless you add the following to your UCF file

ENABLE= reg_sr_q;

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Tue, Feb 3, 2004 7:47 PM

I don't think there are any async resets in the core but I will try it... Thanks.

/Mikhail

--
To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Tue, Feb 3, 2004 8:16 PM

Thanks to everyone who replied. I think John's comments cover everyone else's ideas, so I will answer here..

I will try this.

problems

I can't see anything wrong in the simulator...

They are.

problems

Correct.

This is problematic as the thing doesn't fail completely when it fails, it rather generates erroneous data once in a while...

Electrical problems are not likely. I have designed quite a few FPGA boards, many in produciton. This one is not much different from what I did before and it is well decoupled, etc. The voltages are all fine...

I think it is the core, however I can't say that for sure. The board is a decoder of some sort. It acts as PCI bus master and takes data from the host memory and puts into an onboard buffer. Then the core takes it from that memory, decodes and puts into the output buffer memory. Finally, data from the output buffer is DMA'ed into the host memory. What I see is that sometimes data in the board output buffer is slightly corrupted (usually in the LSB of one of a 1000 words). If I simply read my buffer in a loop, the data is always the same, it fails only when run through the decoder. It doesn't fail every time, it can go fine for over 100 cycles sometimes...

Well, I have several versions, one of them constrained to below 18 ns and still failing. It must be some other unconstrained path or perhaps a different kind of error, but then why it works at 45 MHz?...

/Mikhail

--
To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Feb 4, 2004 12:09 AM

Sounds like you need to narrow the problem a bit more. Apparently, writing into the buffer is OK if you can read back data all day without errors. What happens if you DMA a known test pattern into the host memory? Do you get errors there? How about if you bypass the 3rd party core, does data trasfer OK then?

If the buffer memory is external to the FPGA, I'd look really closely at the signal integrity and timing at the RAM interface. Also verify with FPGA editor that the I/O, particularly all of the I/O to the RAM are in fact registered in the IOB. Check that you have the appropriate pin slew rates, delays, drive strength etc on all the pins connecting to the RAM as well as your host. You need to somehow verify that the problem is occuring in the FPGA and not in the DMA transfer. I'd suggest putting a test pattern generator or read from internal memory and checking the DMA'd data to make sure it isn't getting garbled in the process due to either bus timing or bus collisions.

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Feb 4, 2004 12:40 AM

Sounds a good idea, you just need to keep the device cold. eg ICE in a aluminium cup, or a peltier cooler, or whole shibang in the freezer.... ( tho local cooling is better, as it focuses on the device ) You can also heat it, and check the error rate degrades further ?

-jg

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Wed, Feb 4, 2004 5:03 AM

Ray,

writing into

happens

there?

I have done a lot of tests with regards to the memories and I am pretty sure that part works. However not everything I can try easily. Bypassing the core sounds like a good idea, but I can't do it in the exisiting design. Input and output data formats are different and that would require quite a bit of redesign. What I verified was access to the input/output buffers from the host side. DMA is irrelevant because when an error happens the content of the board and host buffers is always the same. It doesn't matter whether the buffer is read with single PCI target reads or if the DMA is used. And, yes, you can read this buffer for all day long with the same result. Everything seems to point towards the block, which actually puts data in the buffer, i.e. the core...

the

The memories are internal. The only external part is a PCI bridge.

/Mikhail

--
To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

- M
- Mike Treseler
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Feb 5, 2004 5:52 PM

Synthesize the core all by itself, and see if that works. Next time, get source code, or write your own.

How is the data synchronized from buffer to core?

How is the output buffer synchronized to the cpu?

Smells like a synchronization problem.

-- Mike Treseler

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Thu, Feb 5, 2004 7:18 PM

I can't really test it without the rest of the design. In simulation all seems fine.

Not always our choice... Besides, it is truly a big and complex core...

that

The core is designed to work with BRAM. It puts out read enable when it needs data. The clock is common for the core and the read side of the buffer.

It sits on the PCI controller's local bus. A state machine in the FPGA programs a DMA channel in the PCI controller and it starts reading the buffer...

It sure does...

/Mikhail

--
To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

- W
- William Wallace
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Feb 6, 2004 7:24 AM

Do a post PAR simulation. Note that xilinx uses the same times for min/typ/max in their sdf files. (Do a search in their web database for ways to get min sdf timing). Do min and max timing simulation.

What are the two clock rates? Are you using dual ported FIFOs? Are they getting full?

Trust but verify (do post PAR timing sim).

Get a scope out and look for reflections on your signals, especially clocks.

Take the device that works and use a hair dryer to warm it, see if it fails.

Look at using a DCM.

Are you gating any clocks?

- M
- MM
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Fri, Feb 6, 2004 2:23 PM

I did, although not the min/max... The problem with this is that I can only simulate a few data frame cycles as it takes very long. Surely the problem would have to manifest itself during the first frame, but I didn't see it... Perhaps I need to repeat the whole thing...

at

The clock rates are 38 MHz for the local bus and 50 MHz for the core and the state machine that controls it.

As soon as I get my hands on the hardware I will. At the moment my management is satisfied with the thing working reliably at 45 MHz and all the hardware went to software guys...

Why?

No.

/Mikhail

--
To reply directly:
matusov at square peg ca
(join the domain name in one word and add a dot before "ca")

- W
- William Wallace
  
  Contact options for registered users
Vote on answer
posted
20 years ago

Sat, Feb 7, 2004 7:57 AM

You know, we are all groping in the dark without having access to your design specifcaitons, board design, and test benches.

DCMs. Without seeing your design, I can only speculate how or if a DCM will help in your design. If you've used DCMs before, you probably considered it already. If you haven't, browse application notes from Xilinx and see if it will help in your situation. One possible reason is to simply generate a PLL.

Your original question (is it possible...): Where did you get your parts? Disty? Gray Market (dumpster diver)?
Try to meet 100 MHz timing. Look at the long paths. Fix these. E.g., move combinatorial logic from the Q side a flip flop to the D side of the flip flop. If you have a good version of Synplify, it can do some of this for you.

Do you have a clue where the failure is occuring? Off chip interface? Boundary between clock domains? Recommendation: Divide and conquer. E.g., run the dual port FIFO flags to pins and monitor those. If it repeatedly fails without any anomolies there, you know that is not your problem. I am groping in the dark not having your specification or implementation. But divide and conquer works best.
If the simulations are lengthy, by the time you read this, you could have run one long simulation.

If the problem is the different clock domains, it will be hard to find these problems. Xilinx have very small set/hold times, and it is actually hard to hit them in simulation, even if you try to get a setup and hold violation. Try modeling some random jitter on one of your clocks sources during the simulation, or sliding the frequency.
Do you have a self-checking test bench?

If you think it is between the clock domains, study your implementation of any status signals you are passing between the two domains.
Work with the software guys to develop test cases to narrow down a scenerio that makes the failure occur more often.

Have you put offset specifications in your UCF file?
Are you doing any fixed point multiplication? Do you have multi-cycle paths? Are you sure all of your signals are synchronous to the clocks they are sampled on.

Anyway, these are all pretty generic obvious things too look at. Only you and your software guys can divide and conquer.