Low-power FPGAs?

(snip)

It takes different kinds of logic. Ordinary FF's would be useless, so standard FPGA's would not be such a good fit. Synchronous design can use a lot of logic keeping things synchronized. Also, it means that the whole system is clocked at the rate of the slowest subsystem.

Extra logic and extra design work is then needed to make up for the speed lost to the synchronous requirement.

(snip)

I don't see why you couldn't prototype in an FPGA. It might take a larger FPGA than a synchronous design, and all those FF's would be wasted. Also, the design tools tend to assume synchronous logic putting more work on the designer.

For the PDP-10 line, the KA-10 was asynchronous (self timed) while the KI-10 and KL-10 were synchronous, pretty much the same instruction set.

-- glen

Reply to
glen herrmannsfeldt
Loading thread data ...

If current peaks are smaller.. than the PSU is smaller => smaller can be more efficient. Also if current doesn't die down instantaneously but decays due to L & C, then in theory, circuits which idle between clocks would draw more power than a circuit which gathers all the idles at the end and just sits there as there is no capacitors to charge and discharge on every clock. (Just a theory!)

All the rest valid points.. all made it the thesis too... And you did hit the nail on the head with traffic cops.. That's the big chunk of true async HDL. Sync runs at the speed of the slowest device async goes as fast as possible. so traffic lights are on every corner. The other big chunk is the variable delay used set minimum times for signal transportation or for work to happen. That's why FPGA's don't work.. no async delays. you would be forced to use constant delays and that breaks the async rules.

Also I believe that a flip flop that is clocked but does nothing, still consumes more power than a flip-flop doing nothing. (i.e. no clock) CMOS is all about changes not static conditions. The 'average' for async systems doing the same task as sync systems is longer static periods.

then there aren't any clock buffers in an async system, no need, in large systems, they can form a significant chunk of power.

Your right about different architectures too... As a sync designer I synchronise everything. its the mind set... but to design async you need to think async... and its not easy.. all the rules are thrown out for a new set. Async is all about just in time delivery.

And, of course, there are some async designs that draw more power than sync designs... but as time goes by they will get better. and you have a silicon overhead for all the extra 'timing' circuits 20% maybe .. and in reality, today, silicon is more expensive than power. for the manufacturer that is.

Simon

peak

idle

running..

wants/needs,

to

a

the

get

P4

thread..

write,

and

and

(thru

between

Reply to
Simon Peacock

These currents spikes are never seen by the PSU. They are so short that you have to have decoupling capacitors on your board (and very close to the chips) to smooth them out so they don't cause havoc with all the chips on the board.

But this is apples and oranges, both have leaves and need to be peeled before you make juice. Async may not have clock trees and a clocked idle FF may use more power than one with no clock, but an async circuit has to propagate the clock through a delay circuit that is guaranteed to take more time than the logic circuit. That certainly will take power * the number of circuits. So which is more, apples or oranges?

I am not clear. I agree that the rules are very different. But I am not sure the rules have been discovered yet. If all the circuits are in a straight line, then each circuit is triggered by the one before it and you have no timing problems. If you have any parallel paths, you need a way to align the data at the end. Or you have to use a different sort of structure. Having all the different circuits idle except the one being clocked is wasteful of the silicon. You should be able to keep all of the silicon busy if possible by the instruction being executed.

But it is the chip user that pays for the silicon *and* the power. So they determine the relative importance.

Are there any clear, concise guidelines written up anywhere for how to design async circuits. I wouldn't mind reading about that in my spare time (if I can find any).

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

I think the PSU does see the short spikes... just over a longer period.

I don't know if there's more than just PHd Thesis about at the moment. Philips had something.. but it might be in house and I'm sure there's tons of stuff if you ask the right person. The stuff I have is getting old to old really.. but it is still relevant as the field doesn't seem to have moved very far in the last 8 years.

When you talk about keeping all the silicon busy.. that's a fairy tail. When did your last design use every bit of silicon for every clock? I don't believe they do.. any piece of complex silicon doesn't use itself 100% all the time. My own designs work in bursts... I even build a simple FIFO using shift registers which swap out ever second cycle.. so they're not used all the time .. even less than 50%. Other parts only use one clock in 8. quite in efficient really.. but I only have one 16 MHz clock so that's the way it is. I don't know how much async logic would help for designs like this.. mainly because the input is sync and the output too.. but that's just one app.

"... but an async circuit has to propagate the clock ..." You are thinking sync again.. Async circuits don't have a clock.. so nothing to propagate they rely on hand shakes, delays and just in time. Clocks are forbidden except in cross over designs (that the sync designer hasn't got his head around yet!)

There are tricks to aligning parallel paths.. known by sorcerers and magicians everywhere but not witch doctors.

And yes.. chip users do pay for power and silicon. But as long as they pay, the suppliers don't complain. so they decide what's best for them.. and more chips per wafer is a big plus on the balance sheet. Of course this rule doesn't so much apply to Intel. but they have a bonus of windows.. where you can never have too much power.

Simon

decays

draw

just

clock.

async

signal

the

async

large

to

sync

silicon

reality,

is.

Reply to
Simon Peacock

Philips have released some devices over the years.

Now obsolete, but quite impressive given the process/power results, is an OTP Telephony 80C51 variant :

formatting link

-jg

Reply to
Jim Granville

Context! The point was that async logic spreads the spikes out while sync logic lumps them together. Actually that is not really true except for the IO currents. All the logic elements in a chip switch as the inputs change. With the different delays between FFs there are a lot of individual switching currents. Even sync logic uses async combinatorial logic, but the FFs are slaved to a common clock.

Regardless, whether you group all the switching current spikes together or spread them out, the PSU sees them the same way due to their high speed and the slowness of the PSU. These switching currents run much faster than the PSU switches or an LDO could possibly respond!

I didn't say 100%. I just said you need to keep it as busy as possible. If an async circuit has five stages with different delays, you have to wait for all five stages to complete before starting a new data flow. Although I guess if you used handshaking between each stage, rather than just a self timed clock, you could then keep each stage busy, but only at the rate of the slowest stage. But doen't that sound familiar?

Actually, I think FIFOs were among the earliest async logic components. Back in TTL days a FIFO was actually a series of registers with a handshake circuit connecting adjacent registers. If you wrote a word to an empty FIFO you had to wait for the data to ripple through before you could read it at the other end.

Yes, both async and sync sequential circuits have a clock. In async the clock just passes between adjacent stages. You are calling it a handshake, but this is used as a clock on FFs somewhere. Else how do you trigger the FFs?

Hmmmm... well this could go on all day. I still stand by my point that most of the claims of how async circuits are better don't hold water. They may be different, but not necessarily better. I'm not sure anyone has given a single way in which async circuits are *clearly* better.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

I just thought of one way async circuits are a PITA for the board designer to use. Sync circuits are typically designed to run correctly over temp. The chip builder provides a spec for temperature. Then your circuit speed is set by the clock.

An async circuit is designed to work over a temp range, but is self timed. So the board designer has to test over temperature to make sure the circuit speed will be fast enough. I was just looking at the data sheet for the P87CL888 and realized that it would be very hard to verify the speed of the software vs. your system requirements. The board designer would have to verify the system at worst case temperature, voltage *and* process... again, the self timed clocking is not an advantge when your system requirements have to be met at the slowest speed.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

The word clock implies a global clock, or at least a clock that goes to every flip-flop (storage element) in a large section of the chip. The handshake signals in async design are local, rather than global in nature (with, among other things, the benefit of greatly reduced EMI).

Take a look at the book "Asynchronous Circuit Design" written by Chris J. Meyers and published by Wiley in 2001. Chapter 9 gives some examples that clearly contradict your comments. One example (RAPPID at Intel) give a simultaneous 3:1 improvement in speed, a 50% improvement in power, and a much larger input voltage range over the synchronous design using the same fab process, at the expense of 22% more chip area. Other examples showed similar results.

--

Phil
Reply to
Phil Short

I'm missing something. Why test the board as compared to read the worst case numbers off the data sheet and see if they are fast enough?

It's the same problems as checking setup/hold times. Just turned inside out.

Is the info not in the data sheet?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.
Reply to
Hal Murray

So why hasn't async technology grabbed a bigger chunk of the market?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.
Reply to
Hal Murray

I would assume that async technology is being held back, in part, because of the lack of widespread availability of design tools, designers familiar with the techniques involved, and of good production test and characterization tools, and so forth. A chicken and egg situation is how I would put it.

--

Phil
Reply to
Phil Short

A good summary, and it also has to do with 'path of least resistance'

- it was easier/cheaper to simply shrink to gain speed. That is starting to no longer be true, and at the same time ASYNC tools are getting better [hence the Philips/ARM announcement, which is really a Tool Chain one].

-jg

Reply to
Jim Granville

J.

that

a

same

showed

because

familiar

how

I believe this is the perfect answer to the problem.. why ?? because no one demands it! If enough people wanted, and more important, laid cash out .. Xilinx would build a self timed, fully async FPGA. But I pity the support staff, and of course Mentor and Symplicity tools get thrown out... for another set of course. so there's another $50k to spend, and the extra pity for their support staff..

So reality is.. its probably too expensive to change, until the current sync technology has reached its limit, the tools and design know how is already paid for. Of course there can always be another Microsoft.. who comes up with the perfect 'toy' that everyone wants (and think they need) that uses async technology and completely flips the industry over in a matter of years. :-) From that point.. its most likely to be an ultra fast, ultra big, ultra low power 'something' for computers or a play station. Maybe a graphics chip that draws almost no power and does a billion polygon's a millisecond.

Simon

Reply to
Simon Peacock

Because the data sheet won't tell you how fast your software will run. Trying to measure the speed of software is very difficult considering all the permutations of paths that it can take. In DSP work this become very critical since it often is very much real time. But DSP algorithms are often are less complex to analyze than control programs or other tasks that embedded micros are running. I have never seen anyone try to count clock cycles (or ns for async circuits) for each instruction and analyze a program of any complexity. The best they normally do is measure it in a simulator or on the bench. A sync circuit would have the advantage of always having the same timing regarless of temp, voltage and process. The async circuit will vary with those parameters and will need to be verified.

Perhaps a prorating figure will be provided to say that if you meet timing with a 20% margin at 25C and worse case Vdd, then at 70C it will run ok with the worse process. But my understanding is that process variations can be even wider than 20%. I seem to recall a conversation with Xilinx suggesting that you provide 50% or more between max and min delays.

Async circuits don't remove the issues of meeting a "clock" timing. They just push the problem to the system level when you have to meet real time requirements.

What info would you like them to spec?

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

I never said "global" and I don't see why you would infer that when we were talking about the async circuits.

I don't have a copy of that book. Those sound like great results. But there are a lot of other variables and only a handful of examples don't prove the method. The Philips async 8051 (which is discontinued after only a couple of years) doesn't seem to have any special advantages. It is (was) not cheaper than sync chips, it was not lower power (2-5 mA at

4 MIPs) and the lack of predictable speed would be a major issue in my book.
--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

Or maybe the tools are not being developed because there are no clear advantages to async circuits?

Please explain to me in simple terms where the speed, size and power advantages come from? I still have not seen it.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

If you have a few spare years it might be able to be explained. The problem is I don't believe there is anybody here who can explain it. Maybe its like RDRAM. In theory great.. in practise its moved so slow that advances in sync logic passed it by.

The part I've read is "for the same function, async circuits draw less power, consume 20% more silicon, and run 15% faster." The key here is "for the same function". I've said before, that a piece of silicon with the same geometry and same number of transistors running at the same clock rate draws the same power. It doesn't really matter what its doing. So if you have

90% of the silicon working for a sync circuit, and 90% working for an async circuit, there is no saving. The saving is async doesn't run all the time, no clocks, no nothing. This is where async gets its gains. I'm sorry I don't have formulas or detailed numbers, but I don't usually design async logic :-)

Simon.

J.

that

a
a

same

showed

because

familiar

how

Reply to
Simon Peacock

I have always held the idea that if a person can not explain something clearly, then they likely don't really understand it themselves. At lease that was always my problem. :) Back in college I had a roommate who asked why the tides bulged on *both* sides of the earth and not just on the moon side. I kept trying to explain it and finaly realized that I didn't really know how to explain it because I didn't understand what was pulling the tide on the opposite side. Eventually I figured out that it was centrifugal force.

I still believe there are *no* things that are hard to understand, only things that are not well understood. And of course, in this case, things that are not really accurate...

I understand what you are saying, but in a real world circuit, async devices don't just stop running of their own accord to save power. Consider what is initiating the async circuit. Either it is running in a feedback mode triggering itself when it completes each pass, like a CPU; or it is triggered from an external event, like a clock! In both cases the async circuit runs all the time, in fact an async CPU won't be executing NOPs (and even NOPs require circuits to draw power), it will be running code in a loop if nothing else. It can shut down by executing code to go into a low power state waiting for an external or timer interrupt, but so can a sync CPU.

I'm not trying to be a PITA, but no one here has really given this much thought. I keep reading a lot of stuff that is very generalized and does not really describe async vs. sync circuits once you dig a bit.

There are differences, such as the clocking method. But they are apples and oranges and until you squeeze them a bit you won't get any juice. What I mean is which one works better depends on how well the details can be optimized.

Rick "rickman" Collins

snipped-for-privacy@XYarius.com Ignore the reply address. To email me use the above address with the XY removed.

Arius - A Signal Processing Solutions Company Specializing in DSP and FPGA design URL

formatting link

4 King Ave 301-682-7772 Voice Frederick, MD 21701-3110 301-682-7666 FAX
Reply to
rickman

Rick, I agree with you on this whole ASYNC thing. No-one in this thread has offered any explanation as to why ASYNC should out-perform SYNC circuits, or addressed your concerns. They just offer quotes from academics whose research grants depend on it, or suggest that it's too complex to explain. As you say, after many years of research, the dearth of commercial applications is pretty damning. On the other hand, 4 billion years of natural selection can't be wrong, I'm pretty sure the logic circuit in my head is asynchronous. At least that's what it's telling me now! Brains run on about 20 Watts. Cheers, Syms.

Reply to
Symon

Performance of a sync device depends on the clock rate, which depends on the worst case delays through combinatorial logic and routing delays. For example, if the clock period of a design is determined by the delay through a multiplier array, the time between the completion of a simple addition and the next clock edge could be quite long. Performance of an async device depends, in some sense, on average (rather than maximum) delays, and so the result can be available quite sooner.

Not very damning at all. There are many examples in which superior technologies have failed in the market place, with VHS versus Beta being the standard example, and GAs vs Si another example, and BeOS yet another. Lack of success can be due to a lot of factures unrelated to the technology or product itself. Bad marketing, bad timing, network effect, etc. Factors other than technological merit are quite often the reason that products, technologies, and companies succeed or fail, and using failure as evidence of a lack of technological merit is totally fallacious logic.

--

Phil
Reply to
Phil Short

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.