I'm sorry but I can't picture the timing from your description. You
have two circuits with one input in common. You then ask "Will the AND
gate implementation in an FPGA do that?" I don't understand what you
Are you saying that enable is not asserted *until* at least 150 ns after
interrupt_check rising edge? Or that enable is *held* for 150 ns after
the interrupt_check rising edge?
Is the idea that interrupt_detect is not considered by other logic until
interrupt_ack is asserted? Or is interrupt_detect supposed to be
conditioned by enable rather than interrupt_request?
AND gates in an FPGA work like real AND gates. The LUTs are designed
not to glitch when inputs change, but the output should remain the same.
LUTs may glitch when multiple inputs change and some combination of
values during the change would cause the output to change. However
this is usually due to the skew between inputs. Also in an AND gate
of any size that fits in one LUT, any combination of other inputs
would not make the output go high, and therefore you should not get
a glitch on the outputs.
On the other hand, FPGAs are not really designed to do asynchronous
sequential logic well. What you're trying to do is typically done
using a high-speed free-running clock to sample the input signals
and then make decisions synchronously.
One input in common? Well crap, I screwed up the verilog. Try this
always @(posedge interrupt_check) interrupt_detect You then ask "Will the AND gate implementation in an FPGA do that?" I
If an actual AND gate has an input that's 0, the output will be 0
regards of the other input. If the other input is 0, is 1, is a clock,
or even if it's somewhere in-between because the driver has gone
metastable, the output of the AND gate will be 0. My question was, can
I depend on that from a synthesized AND gate in an FPGA?
The former; enable is not asserted until at least 150ns after
interrupt_check rising edge.
Both, I think. Well, interrupt_detect is not used by any other logic,
only interrupt_ack. interrupt_detect is internal to just what you see
there while interrupt_ack is the signal that goes on to the rest of the
To see more of the context of this question, I'm working on bus
arbitration circuitry for the Unibus and QBUS. I'm writing up what I'm
doing in a short paper that you can find at the URL below. The paper
isn't done yet but it's starting to get closer.
I've read this before, that FPGAs are not really suitable for
asynchronous logic. And yet, if the gates are glitch-free than I'm not
seeing the problem with doing what I'm suggesting here. Converting the
input signals to synchronous seems like a bunch of extra work for
something that ought to be fairly straightforward.
In the larger picture, I do have a desire someday to play around with
asynchronous designs. Not this project with the QBUS but a future
project, possibly even implementing an entire processor asynchronously.
Being able to use FPGAs would sure be easier than having to get out the
Aren't there usually handy latches in the input buffer
structure? Better safe than frustrated...
I'd also like to prototype a fully asynchronous processor in
an FPGA. The Microsemi (ex Actel) Igloo/ProASIC3 parts have no
LUTs. An element of the fine grained fabric can either be a
latch or the equivalent of a LUT3. But, you may have to
hand-wire the input delays if timing is really critical?
It seems to me that the 2 wire 4 state logic should be fastest,
because only one of the wires needs to make a single transition
to indicate the next data phase.
On-chip RAM would seem to be a problem though - any ideas?.
That's what I thought. This makes sense.
The answer is that in the case of Xilinx parts it is well known that the
LUTs are glitchless for any one input changing. That is the real
question you seem to be asking.
I believe this is also true for other manufacturers, but I've never
explicitly asked. The Lattice devices are derived from the Xilinx
designs through a license bought from Lucent a long time ago. So their
fundamental LUT design is the same and should work the same. The Altera
parts are different in some ways, but I expect the aspect of the LUTs
that make them glitchless is the same.
This comes from using transmission gates as the logic elements in the
multiplexer that selects the output from the LUT. The logic controlling
the pass transistors is timed to break before make and the capacitance
on the output line is enough to hold a value until the next transmission
gate is turned on. So if both driven levels are the same there is no
I'll pass on reading the paper just now, but keep posting your progress.
I'd like to catch up on this effort at some point.
I used to have an LSI-11, but at some point someone convinced me to toss
it out along with the 8 inch floppy drives, etc.
It's not so much that LUTs can't be used for asynchronous designs,
rather the tools don't lend themselves to asynchronous design analysis.
In fact, the tools can optimize away some of the logic unless you know
how to prevent that.
If you want to code at the level of LUTs, then you can do whatever you
want with FPGAs. You can even use HDL, it's just a lot harder than
synchronous design, a LOT harder.
This doesn't seem to be documented in any data sheet or app note that
I've found, but you can get verbal confirmation. Here's a link where a
Xilinx representative confirms it in a forum conversation, twice.
The reason they are glitch free is because of the use of pass
transistors as the muxing elements. A 4 input LUT has 16 memory
elements and a number of series pass transistors. Only one path through
the pass transistors is turned on at a time. The logic is timed so they
are break before make. This leaves the output of the switches in a high
impedance state briefly during a change and the parasitic capacitance is
enough to hold the logic state. It's that simple.
I did a Unibus design back in the days of one-time programmable
(fusible-link) PALs. I remember it was a bitch without using a
clock, but I got away without one by using at least one delay-line.
One thing I recall is that boards that actually plugged into the
bus had six connector sections, where the A and B sections were
not used. The pinout of the other sections was in an internal
DEC document I got through some sort of third-hand source. If you
look at the spec in DEC's external documentation, it only describes
the A and B sections. However you quickly see that these can only
be used to bring bus signals in and out of a chassis, not to go from
board to board. That's because they only have a single pin for
each daisy-chain signal. The Qbus was much better documented.
Oh, that's interesting. Part of my plan for doing async design is to
develop my own tools as well. I thought I couldn't use FPGAs to realize
my designs but if that's not the case that'd be really useful.
At some point, then, I'll have to learn about how one codes at the level
of LUTs. That project is for the future though.
Part of my interest in async is the idea that the design can be
Yeah, the dual-rail encoding seems the best match for normal digital
logic. I've seen references to one-of-four as well but I don't
understand why it's better.
One of my thoughts is that there are and will always be synchronous
parts and systems that this would need to interface to. I want to
make sure that the async dev tools do a good job of handling the
transition between the two worlds.
I am not current on async designs, but I believe at least some async
designs are self timed. This actually puts more rigorous requirements
on delays as the timing path has to be slower than the logic path. But
someone pointed me to info on a method of using the logic to provide the
timing signal. It makes the logic much larger though. Not sure what it
does to the real world, useful timing.
I don't think you'll be able to do any of that in FPGAs, at least not soon.
I believe the reason they started using sync RAMs in FPGAs was more
about the user interface that it was the technology. Users were abusing
the timing of async RAMs, so they gave them a sync interface which is
harder to abuse. I think distributed RAM is still async on read and
that should be good enough since even async RAM is really synchronous on
writes, just not edge driven.
One way is to write behavioral HDL code and apply attributes to keep
signals as wires and to not combine with other logic. In some FPGA
families you can instantiate LUTs. I haven't done any of this in
decades, so I don't know anything about it anymore. Someone was posting
about this not too long ago, maybe earlier this year.
I had to instantiate LUTs recently (as a last resort) and it was pretty straightforward. But why would you want to do this? Just resync your async signals to a clock. Writing your own tools is probably quixotic.
It's in XAPP024.pdf, which doesn't seem to be on Xilinx's web site.
This is for the XC3000 series. I understand that more recent series
(about 10 generations now!) behave in a similar manner, but you won't
find any documentation saying that it's so.
Here's something written by Peter Alfke in a this thread from 2001:
"Here is what I wrote ten years ago ( you can find it, among other
places, in the 1994 data book, page 9-5:
"Function Generator Avoids Glitches
Note that there can never be a decoding glitch when only one select input
changes. Even a non-overlapping decoder cannot generate a glitch problem,
since the node capacitance would retain the previous logic level...
When more than one input changes "simultaneously", the user should
analyze the logic output for any intermediate code. If any such code
produces a different result, the user must assume that such a glitch
might occur, and must make the system design immune to it...
If none of the address codes contained in the "simultaneously" changing
inputs produces a different output, the user can be sure that there will
be no glitch...."
This still applies today.
Peter Alfke, Xilinx Applications
There are two general ways I know to provide async timing. One is to
have timing circuits run in parallel with the logic circuits and this
implies the timing requirements you mention here.
The other way to provide timing is to have the logic signals themselves
come with their own, inherent validity signal. Delay insensitive (or
quasi-delay insensitive) is the name for this idea and the dual-rail
encoding that I referred to is one way to implement that.
It definitely results in larger logic circuits but proponents argue that
that's balanced by no longer needing all the complication of carefully
tuned H-trees for clock distribution. Obviously FPGAs have the clock
distribution networks already so using an FPGA to implement async design
doesn't see any benefit on that side of things. I'm interested in FPGAs
for this only because they're a really convenient source of programmable
logic. I can swing programming an FPGA; I can't manage doing my own
custom VLSI. FPGAs could let me experiment with the idea which is
really my intermediate goal.