Tool to help detecting race conditions with asych inputs?

Hello,

do you know any tool, that would help detecting race conditions due to asynchronous inputs?

I had a design with asynchronous inputs. I inspected the rtl code to ensure, the asynch inputs would only be used if they are stable with respect to the specification. Unfortunately I missed a line, where an asynchronous input release an synchron reset. The synthesis generated a race condition which lead to disfunction of the design. After founding the problem it was very easy to see the failure in the netlist. But it seem to me very hard to detect the problem without knowing that it would happen, because whether timing analysis (maybe not propper done) nor equivalence checking nor gate level simulation failed.

Are there tools, that would help in such cases? I don't like the idea to spend hours and days inspecting netlists for asynchronous inputs I use to ensure, that this failure won't happen a second time.

I know, that it would be best to avoid asych. inputs by inserting registers, but I have some designs with hard area constraints and other designs with timing constraints that didn't permit the use of registers for all inputs.

bye Thomas

Reply to
Thomas Stanka
Loading thread data ...

One clean approach is to run all external async inputs through the standard 2-FF synchronizer. Then they are synchronous and normal tools will work.

The key tool for that is a pair of eyeballs. Scan the source code and make sure that the only place an async inputs go is into the synchronizers.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
 Click to see the full signature
Reply to
Hal Murray

No. That would difficult problem even if you were given all the gate and route delay ranges.

You found the source of one race condition. There are no doubt others that will introduce themselves over time, temperature, state and input variations.

I don't either. Consider doing whatever is necessary synchronize all the inputs to the system clock.

That's it.

That is an engineering problem. There are always alternatives.

-- Mike Treseler

Reply to
Mike Treseler

Thinking about this some more...

What does that actually mean? If a signal is asynchronous (relative to some other clock/signal) how/when can it be stable?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
 Click to see the full signature
Reply to
Hal Murray

Stable phase could mean the signal has already been properly synchronized.

But a "stable" signal transition could also occur exactly at the active clock edge.

--Mike Treseler

Reply to
Mike Treseler

I used to use a tool like this when I was at Agilent. It was written in-house (Hi Mark!).

[snip]

Note that a synthesis tool will sometimes *create* races or glitches (e.g. when a ff used in a sychroniser gets replicated due to fanout - yes, this happened in real-world designs). Such problems *cannot* be caught by inspecting the RTL source; the only way is to look at the post-synth netlist. We ended up using the post-PAR back-annotated VHDL netlist (although I guess Verilog would do just as well).

If you try, you could probably write such a tool in a few days, assuming you already know how to program in a text processing language such as Perl. (I suppose you could use C if you must.) It is a fairly simple matter to trace all signals in the netlist back (via combinatorial logic) to either the output of something that is clocked (which Mark called a synchronous element, e.g. ff, bram, SRL), or to a pin.

By the time all the feature creep had ended, the tool we used checked for:

  1. Any clock gating (i.e. if the clock input of any synchronous element is driven by combinatorial logic).
  2. A list of all clocks used. You'd be suprised how often extra clocks turn up, particularly in code written by less experienced engineers.
  3. Glitches, which we defined as a synchronous element with data input(s) that could be traced back to more than one source in another clock domain (including pins).
  4. Races, which we defined as a synchronous element which feeds more than one synchronous element in a different clock domain.
[Note: I don't think this is quite the same as the classic definition of glitch and race, but it was ok for our purposes.]
  1. Any use of async set or reset. It would trace all of these back to their ultimate source. (Ideally, this would just be a single pin called "reset" or something similar.)

We had the problem of integrating large chunks of design written at multiple sites, and this tool saved lots of time by finding problems that couldn't be found in simulation and would only show up in the lab intermittently (e.g. it crashes once every 500 boots). Indeed, it found several problems before we even had an inkling a problem existed!

The majority of our problems were due to cross-clock domain paths inside a single FPGA, but the same issues could apply to signals coming from pins. Prior to the creation of this tool, I estimated about half the debug time on some projects was due to improperly handled cross clock domain signals. Many of the bugs were in "proven" legacy code that had been "working fine" for years. There weren't that many bugs, it's just that they took a long time to find compared with straighforward functional bugs.

Regards, Allan.

Reply to
Allan Herriman

Hi,

mike snipped-for-privacy@comcast.net (Mike Treseler) wrote

:). Indeed there was a possible second race condition for a very unusual input constellation, but I ensured, that there were no other race conditions by inspection of every path from asynchronous inputs to registers. Even over temperature and voltage. This job was very nasty and seems to me very errorprone when having more than about 10 pathes to inspect. So I wonder whether there exist allready tools helping you doing this job.

Impossible for this design due to hard area constraints.

Tell me your employer, it seems very good to have a job, where an fpga designer has the possibillity to deny projects with hard constraints

*g*.

Beside the hard area criteria, when a design had to fit into an given fpga with no possibillity to get an bigger fpga to place the neccessary FF to synchronise every input, there are other designs with timing criteria, that didn't allow to synchronise by inserting two ff between two clock domains. Whenever your designs allows you no clock cycle to respond on requests, you have to deal with asynchronity by using other technics like handshake (if possible).

bye Thomas

Reply to
Thomas Stanka

[Big snip of feature list.]

This seems like a great candidate for a FPGA related open-source project.

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
 Click to see the full signature
Reply to
Hal Murray

Agreed.

Allan.

Reply to
Allan Herriman

[snip]

Thanks for this list. I try to setup a script helping me to check for this list. I will post, if the script is running stable.

I agree with you that it is very hard and long lasting to debug errors regarding race conditions :). Especially if you can't find the source of the problems by RTL inspection.

bye Thomas

Reply to
Thomas Stanka

Yes, thanks Allan, for the excellent posting. I like the idea of having a way to verify the "known-good" designs that are not well documented.

I would note that for *new* designs, all of these defects can be prevented with the right set of design rules.

-- Mike Treseler

Reply to
Mike Treseler

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.