Best Async FIFO Implementation

Hi all,

Does there exist a best implementation of Asynchronous FIFO?

Any suggestions will be appreciated! Best regards, Davy

Reply to
Davy
Loading thread data ...

All members of the Virtex-4 family from Xilinx have a (hard-coded=full-custom) FIFO controller in each of their BlockRAMs. It accepts different clocks for read and write (called "asynchronous operation") at any frequency up to 500 MHz. Capacity is 18 Kbits, the width is 4 to 36 bits, and the depth is accordingly from 4K to 512 addresses (depth and width can easily be expanded with additional BlockRAMs) There is an EMPTY and a FULL flag, and also an ALMOST EMPTY and an ALMOST FULL flag, both fully programmable (with 1-address granularity).

I designed the crucial asynchronous empty arbitration logic, and it works perfectly: We tested it by writing data at ~200 MHz into the FIFO, and reading it out at ~500 MHz, and the asynchrous empty-detect logic had worked flawlessly for all those >10e14 operations when we stopped the test after a week. No real FIFO application will probably ever go empty 200 million times a second... The high performance is due to very fast and compact full-custom logic, and our long experience in analyzing and dealing with the effects of metastability.

Peter Alfke, Xilinx Applications (posting from home)

Reply to
Peter Alfke

I guess it depends on what you're looking for. At minimum, it should *work* ... Then the rest is a compromise of resources/speed/feature(like almost empty/full flags,...)/...(reliability?)

Sylvain

Reply to
Sylvain Munaut

For simulation, are the Xilinx FIFO models any faster than before? Just recently I had to write fully-synchronous FIFO models to accelerate the simulations and achieved 100X (one hundred times) improvement.

RAUL

Reply to
raul

Simulating asynchronous clocking must be very difficult and time consuming (I dare not use the word "impossible" for fear of being flamed). How do you cover all clock phase relationship, down to the femtosecond level? Synchronizers operate with that kind of timing resolution. Peter Alfke, speaking for himself.

Reply to
Peter Alfke

Event-based simulation allows you to have very fine resolutions. Just make sure that all your signals crossing clock domains are flopped and that there are no Clock-to-Q delays involved in your model. I have run the fast FIFO models in ModelSim PE 6.1a and Veritak 1.75A and they have indentical behavior to the Xilinx models.

Reply to
raul

Raul, this may just reveal my ignornce, but anyhow:

How do you model metastability, which needs sub-femtosecond resolution? How do you model that an asynchronous FIFO generates its EMPTY flag in time, even under the most adverse timing conditions between the two incoming clocks? Those have been things that kept me awake at night :-(

Peter Alfke

Reply to
Peter Alfke

Kim, thank you for that clarification. That means I was right in considering any simulation of metastability-causing asynchronous clocking impossible. There is no substitute for creativity, circuit analysis, some deep thinking, and experimentation. All of that we have done to verify the metastable behavior of our flip-flops, and to verify the behavior of our asynchronous FIFO in Virtex-4. Obviously, one can always simulate the effect that a given metastable delay has on the rest of the circuitry, but one cannot simulate the origin of the metastable delay. Peter Alfke, Xilinx Applications

Reply to
Peter Alfke

Hi,

There is no need to simulate metastability. The RTL simulations are functional. All conditions of empty and full have been verified with directed and random behavior over long simulations with clocks sliding past each other. The FIFOs are as assymetrical as 128 bits in and 16 bits out and with clocks as different as 37.125 MHz and 100 MHz.

The simulations have been proven correct in the lab on Virtex-2 Xilinx FPGAs running for several hours with real data.

ModelSim PE's code profiler said that time was being spent mostly in the Xilinx FIFOs.

RAUL

Reply to
raul

Usually in RTL simulations you don't even want to model things like that. Most important thing is to get fast simulation times for the whole design. And at least in the past Xilinx models were overly complex for pure RTL simulations, and usually own simulation models were needed to get the speed.

The correctness of the async fifos must come from the design, reviews etc. It's impossible to simulate all the cases.

Of course with netlist simulations timing accurate models are needed, but that is small part of simulations. That is usually done to check timing constraints and synthesis bugs (if formal verification tools are not part of the users toolset). Asynch portions are almost impossible to simulate. Nowadays there are also formal tools that check clock domain crossing correctness etc. Those tools can even inject errors during simulation that could be caused by metastability (the places are found by the formal portion).

--Kim

Reply to
Kim Enkovaara

Hi, Davy -

You may want to browse a number of papers on my web page for coding guidelines and coding styles related to multi-clock design and asynchronous FIFO design.

At the web page:

formatting link

Look for the San Jose SNUG 2001 paper: Synthesis and Scripting Techniques for Designing Multi-Asynchronous Clock Designs

Look for the San Jose SNUG 2002 paper: Simulation and Synthesis Techniques for Asynchronous FIFO Design

Look for the second San Jose SNUG 2002 paper (co-authored with Peter Alfke of Xilinx): Simulation and Synthesis Techniques for Asynchronous FIFO Design with Asynchronous Pointer Comparisons

Peter likes the second FIFO style better but the asynchronous nature of the design does not lend itself well to timing analysis and DFT.

I prefer the more synchronous style of the first FIFO paper.

I hope to have another FIFO paper on my web page soon that uses Peter's clever quadrant-based full-empty detection with a more synchronous coding style.

We spend hours covering multi-clock and Async FIFO design in my Advanced Verilog Class. These are non-trivial topics that are poorly covered in undergraduate training. I have had engineers email me to tell me that their manager told them to run all clock-crossing signals through a pair of flip-flops and everything should work! WRONG!

Regards - Cliff Cummings Verilog & SystemVerilog Guru

formatting link

Reply to
cliffc

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.