Async Processors

Jim Granville · 2006-02-08T21:25:02+00:00

Further to an earlier thread on ASYNC design, and cores, this in the news : and a little more is here with some plots here: They don't mention the Vcc of the compared 968E-S, but the Joules and EMC look compelling, as does the 'self tracking' aspect of Async. They also have an Async 8051 -jg

R

rickman 20 years ago

I had a chance to think about this further and I think the localized variables in path delays actually hurt the async device more than it does the sync device.

The idea behind the sync clocked design is to deal with all the issues that make the logic delay time so variable. Instead of trying to match delays with the clocking, the entire issue is lumped into the clock domain. The clock period has to be larger than the worst case delay through the logic plus an additional margin for the skewing of the clock. Minimizing clock skew is the purpose of the clock tree. So this is typically very small and only needs to be added to the logic delay to get the minimum clock period.

The async processor must match the clock delay with the logic delay and always keep the clock delay slightly larger. There are always variations in timing of similar components due to statistical factors. Even if it is out at the 3 sigma point, by having a million transitors on a die, you have to account for the few that are either fast or slow. The worst case would be a fast clock path and a slow logic path. This skewing must be considered at the logic and clock level. In the end you end up having to allow for the deviation in both directions which means it is doubled.

So the async design likely must have larger margins added to the design of the handshake path and the result is it will have a slower maximum speed compared to a sync design.

Vote

J

Jim Granville 20 years ago

You are too focused on the MHz - forget the MHz for a moment, and look at the pJ and uV/m. Many, many designers would be very happy to get those gains, and still be in the same MHz ballpark.

The High end CPU you mentioned, quotes 15,000 gated clock elements. At that count, it has to asympotpe to async performance anyway, and it becomes a semantics exercise what you call a device with that many gated/local/granular clocks...

-jg

Vote

A

Andy Peters 20 years ago

Packets are dropped, but for congestion reasons, not because the air handling in the switch room set the temperature up a few degrees.

-a

Vote

R

rickman 20 years ago

We are still not communicating. You are assuming that the numbers from the vendor are a valid comparison. They don't say didly about what software was running and what was done to save power in the sync clocked version. I don't accept that the chip is unique in its power savings. I consider this to be just one way to save that level of power, and likely one of the more difficult ways.

But it *is* a sync clocked design. My point is merely that an async design does nothing that I can't do in a good sync clocked design if power savings is my goal. Further I expect that it is easier to do a sync design because of the experience we have with it and the way it easily maps to the system level requirements.

Consider testing. I have to test my design against system level requirements. In a sync design, this just means you test a range of code which may not even be the code the unit is shipped with. This testing can be done at any temperature and any voltage and any device. Verification that the devices work at the clock speed is done separately. System testing just has to verify that the clock speed is fast enough.

Wiith an async processor you need to test your full code for all paths under worst case conditions (including process) to verify that you meet the system timing requirements. How the heck do you do that??? With FPGAs you use static timing analysis to verify that you meet timing. Do they have static timing analysis for software?

Vote

F

fpga_toys 20 years ago

It happens when the router is unable to keep up with the traffic, aka congestion. Doesn't really matter why it's slower than the arrival rate.

If async can deliver a faster packet processing rate when the environmentals are better, then the cogestion point is moved higher for those periods, unlike being locked to worst case performance.

Vote

F

fpga_toys 20 years ago

rickman wrote: > We are still not communicating.

you got that right.

Wrong. In EVERY case, for each instance a chip used, you need to design/test for worst case parameters, and hope they do not change over the chip's life due to migration and other effects, to run the chip against the wall with sync designs ... you have to leave margin, that varies chip by chip. With async, that problem simply is not a problem.

Again, just like the point about designing to be glitch free for power, you again have this wrong because you lack the understanding of current async design methodologies, and continue to argue based on that lack of understanding.

Wrong, this is not necessary. The problem is addressed at design by using verifiably correct logic constructions to build the logic which are safe and hazzard free by design. After that, it doesn't matter what process or environmental variations may impact the device.

Again, you completely lack the understanding of async design, and do not even understand what you are claiming is false.

Learn about async design, before you just assume more false positions, and continue to argue from baseless positions.

Vote

F

fpga_toys 20 years ago

Now ... the point is, that the clock net will consume considerable power and is pure overhead. On FPGAs the clock net is typically responsible for 20-40% of total device power. With fast processors, 30% of the power is in the clock distribution alone:

formatting link

This causes huge die heating, cooling, and thermal gradient problems, and large clocked designs start out with this huge overhead in power ... that frankly pretty much just goes away with async. This is not a new problem, it's been the core problem for about five years now. And is getting very well understood. Consider:

formatting link

where the author makes this point:

"Circuits with very high power consumption exhibit significant thermal gradients across the chip. This is problematic because conventional timing analysis assumes a single temperature for the entire device, even though it is well known that timing is temperature dependent. The usual response is simply to minimize overall chip temperature through the use of more sophisticated, and expensive, packaging."

The application logic consumes power independent of the clock nets, and is where the real work is done. Async adds additional logic but few transistions, which may consume a very small static current (compared to the total clock net power), to completely remove the clock net and associated power. The power win for async in designs, is when the extra power for the ack transistions is less than the clock net it replaces. Since the ack routing is generally very short, it's frequently considerably lower power than the globablly routed clocks. For async ASIC designs, that frees up the clock net metalization and buffers, to use for the application logic, which offsets the additional logic and routing to help balance costs, or improve them.

So the point is, that the applicaion logic power costs are fixed, and driven by the design. The clock net overhead is just that, and using a careful async design the clock net power can be removed and replaced with async logic that has lower power costs.

The benefits are that the clock skew in the designs no longer limits performance, nor does worst case environmentals, which are growing worse due to unbalanced heating of the die. Async avoids these problems by design, at modest costs which are offset by removing the clock net resources.

Vote

R

rickman 20 years ago

I was going to ignore your posts because you seem to insist on being rude. But I will say this before dropping this discussion with you.

I am not talking about testing the chip as you indicate above. I am talking about the system level design issues. When I write code on a sync clocked CPU, the execution time is deterministic, even if it is too complex for me to analyze 100%. I will see the same results all the time. So the chip must be tested over temperature,etc, but I only need to test my software at room temperature. In the async clocked CPU, the system level timing varies with all the things I can't control. So I have no way to test my system to be sure it will work, except to test at room temp and then derate for the three big factors, temp, voltage and process. Then we are back to where we started, but now we have to leave slack in the hardware between the clock and data path and we also have to leave slack again at the system level.

This has nothing to do with how you design or build the CPU chip. This is an artifact of a non-deterministicly timed CPU. In the end there are very few apps where an async clocked processor has any real benefit.

So please don't be rude and tell me I don't understand the design when you don't understand my statements.

Vote

F

fpga_toys 20 years ago

I rather feel the same way, when you strongly dismiss points with the assertion that you know best, and everyone else has to be wrong.

Vote

J

Jim Granville 20 years ago

Shouldn't the engineer in you say 'show me the silicon?', rather than sweeping dismissal of all things async, including published info.

Of course, it is their own comparison, but normally such comparisons try to talk up what are actually small differences [just look at Xilinx vs Altera marketing noise ...]. The data they show, for EMC and pJ/Opcode, is orders of magnitude stuff.

or its EMC improvement ?

Strange, then, that you believed the other device's specs, (also pre-silicon) with the 15,000 gated clocks, straight off ?

Who claimed this was easy ? It's what they must have done in the tools area, that impresses me as much as the (claimed) silicon results.

-jg

Vote

I

Interfacebus.Engineer 20 years ago

I see a link to my web site referenced here concerning logic glitches;

formatting link

However; because the post is so large, I need to look at it tomorrow for a proper reply.

I do see many references to power dispassion. Power dispassion is not an issue, as one gate with a transient, dissipates very little power. An FPGA design always has a 'near-by' flip flop.

I need to read the full post.

The main site is:

formatting link

Vote

R

rickman 20 years ago

I did not see ANY data that was "orders of magnitude". I saw that they were about three times less power at an equivalent speed.

I'm not saying that the technology can't save power. I am saying that you don't have to toss out the baby with the bath water. They are comparing a power optimized design to a non-power optimized one. We also know nothing about the program they used which may favor the async processor because it does not try to save power in the sync processor.

Hey, if there is real data out there showing me how this works and that it is clearly better, fine. I'm just saying this is not that sort of data.

The EMC is significant, but again, is it being compared to an EMC optimized sync processor... no. I have seen standard clocked designs that were optimized for EMC.

I'm not doubting the data, I'm doubting the comparison. Do you see the difference?

Yes, I am sure it was a lot of work and that is part of my concern with it. But as long as it is *their* work, if they start making chips that solve my system problems better than other chips, then I'll use them. But this chip actually runs slower max speed in the same process. Did you notice that? The clocked processor runs up to 100 MHz, IIRC while the async processor was only 77 MHz room temp!

Vote

F

fpga_toys 20 years ago

Nice job presenting the material on that page :)

Vote

A

Allan Herriman 20 years ago

Have you ever designed a router?

Most packets are handled by the data plane. Packets will be dropped if there's congestion even if the logic is infinitely fast, so any argument about async or sync is moot. (However these would typically be designed with sync logic for sound engineering reasons.)

A (hopefully small) subset of packets are handled by the control plane, and throughput and latency are improved by having a faster processor. It doesn't matter whether its internal processing is async or sync, as long as it is as fast as possible and it can interface to the various sync devices surrounding it (memories, backplane, etc.).

Regards, Allan

Vote

F

fpga_toys 20 years ago

yes, and no. I've built routers out of comodity parts, but not an ASIC or FPGA one.

Only on current production high end routers. Consumer routers still seem to be processor based, as a good sided FPGA/ASIC for a router core still seems to be out of reach for a $49 wireless retail router or wireless router.

Certainly true on larger wire speed routers. But again, only on the high end gear at this point.

Vote

A

Allan Herriman 20 years ago

The end customers would describe your hypothetical low end box as a router, but in the context of this discussion (and newsgroup), it's just a general purpose processor with some comms interfaces with some software that gives it some meaning. In that context, I agree that a faster processor is a good thing, and IF an async processor will give better performance (either in terms of throughput, latency or power) on average, then it is also good.

Please bear in mind that most comms interfaces (e.g. Ethernet) are synchronous in nature (at least at the physical layer). One has real time requirements to meet. Synchronous design makes a lot of sense for a hardware router or switch, etc. (As it does for the products we design here, but I won't bore you with the details.)

Regards, Allan

Vote

F

fpga_toys 20 years ago

Simply not true for all async designs, especially those that generate the ack function from the outputs of the local gates -- in those cases there are no "margins" added at all, as it is implictly not necessary by the design of the logic. This works well for fast routing and low fanout, where gate delays are small and routing delays are small. For larger designs, where the global clock skew rapidly exceeds these timings, there is significant gain. 10 year old Phased Logic (ack encoded as phase), and other async designs with ack based on logic outputs (rather than a separate timing path) are a good example:

formatting link

These designs compete well where global clock skew and environmental margins greately exceed the typical timing of a logic element and short route delay.

Vote

J

Jim Granville 20 years ago

( but above ) : "I'm just saying this is not that sort of data."

I'll leave others to decode that, I'm lost..

There is another, perhaps clearer press release here :

formatting link

Here, the key comparison is "at equivalent performance the ARM996HS consumed a factor of 2.8 less power than the ARM968E-S, or 36 percent, according to simulation benchmark data from Handshake Solutions"

as I have said before, many designers will grab that with both hands. If this is offered as FAB ready, those designers will not actually care about the details.

and they also say

"the ARM996HS is being promoted for its low electromagnetic footprint, another benefit of clockless performance which would make the processor core suitable for automotive and mixed-signal applications."

Better EMC is also not to be sneezed at...

but I liked this comment too - Seemed very relevent...

" Richard York, ARM?s ARM996HS product manager, said ARM would not rush to introduce clockless versions of other cores. ?It [asynchronous logic] will take some time to become widely accepted because it is very different,? he told EE Times. " and " ?A self-timed Cortex M3 would be a fascinating product but we want to see how this product goes in the market first.? "

I look forward to the silicon, both ARM and 80C51 versions.

-jg

Vote

P

Predictor 20 years ago

rickman wrote: "This reminds me a bit of the way fuzzy logic was claimed to be such an advance, but when you looked at it hard you would find little or no real advantage. Do you see many fuzzy logic projects around anymore?"

Yes, quite a few. Bruno Di Stefano posted these 10 recently:

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ GPS car navigation from German company NAVIGON comes to USA

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Emerson Wins $13 Million Contract to Digitally Automate Korea's Largest Coal-Fired Power Plant

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Socket Communications Unveils Industry First Cordless Ring Scanner for Bluetooth Enabled Mobile Computers

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Fujitec eases bottlenecks

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ WCC Announces ELISE for Linux

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Bluetooth "ring scanner" works with Windows Mobile handhelds

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Online Marketing: Google Enhances Filters Once Again - How will it affect you?

formatting link

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Handwriting recognizer supports square screens, VGA

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ CalliGrapher 8.2 for Mobile Devices

formatting link

) ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ The Weather Wizards

formatting link

) "With the Forecast Icing Potential (FIP) tool, pilots can make themselves aware of expected icing hazards along their route up to 12 hours in advance. FIP provides a high-tech, color, weather map and a flight-route display of icing potential at flight levels from 3,000 to 18,000 feet. The algorithm analyzes weather data from a vertical column perspective. It determines the cloud top and base heights, checks for embedded cloud layers, and identifies precipitation types. Once the likely locations of clouds and precipitation are found, the physical icing situation is determined, and a fuzzy logic method is used to determine the icing potential. Every three hours the model generates forecasts out to 12 hours. The user can select forecast times from three-, six-, nine-, and 12-hour intervals to plan safe routes of travel." ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Vote

R

rickman 20 years ago

So that is what, perhaps 1000 to 1 ratio of fuzzy logic projects to hard logic projects?

I have never explored fuzzy logic myself, but Bob Pease has and I read what he had discovered about it. There were a lot of claims that were not supported. Of course the lack of evidence of a positive does not prove a negative. But I have seen little reason to consider fuzzy logic a perferable alternative to any other type of design and implementation.

Likewise, I don't see any reason to belive that async clocked logic design has significant advantages over standard sync clocked logic design for most applications.

There has been a lot of speculation here, a little evidence from vendors was presented (which may or may not have been biased, we don't have sufficient info to judge) and a lot of links of largely irrelevant stuff was given. I stand my my analysis of the logic that was described to me.

The main issues are two; async logic is claimed to provide more speed and/or run with lower power consumption. Other than some very unusual applications where it is ok to not meet deadlines and drop data, no one has presented a way that the added speed can be used. Likewise, I have not seen any convincing evidence of lower power than you can achieve using sync designs if your goal is to reduce power consumption.

Consider that the example that started all this compared an async processor that is *slower* than the sync processor they are comparing it to. The sync processor also has little designed into it to save power, it just runs full tilt unless you put it in a low power mode directly when you have nothing to do. But this is not the only way to use sync logic. You can be smart about managing clock gating. I think there are other ways to save power in both sync and async designs, but I won't go into that here. I want to explore that myself in an FPGA design I am doing.

Vote

Async Processors

Join the Discussion

Didn't find your answer?