If clocks too slow then switch to asynchronous ?

- S
- Skybuck
  
  Contact options for registered users
posted
16 years ago

Fri, Jun 8, 2007 9:09 AM

Hello,

If the limit has been reached for generating clock signals then switch to asynchronous circuitry design ?

For now the cpu makes multiple cycles per clock tick. (That's what the cpu multiplier is for)

How long can that be a solution ?

Bye, Skybuck.

- R
- Rene Tschaggelar
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jun 8, 2007 1:47 PM

To the contrary actually. Asynchroneous reception means there has to be a clock usually power-of-2- multiple of the bit rate.

Rene

--
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jun 8, 2007 1:50 PM

Asynchronous designs are *way* harder to do. It is much harder to automate the process.

When you go through a register, the setup and hold times can be checked at the input and then the timing of the output can be assumed for further checking. In the asynchronous case, you have to follow through all the logic paths and figure the delays at each step. If there are many parts and many paths the number of computations gets huge.

Some people are going with a very fast clock and declocking sections of the chip when they are not needed. This way they can lower the average power to prevent overheating without lower performance in most cases. They include a bit of logic that slows things down if the CPU gets too hot.

There is a new direction where the grain size of the declocking is made very small. This gets most of the reduction in power that an asynchronous design could do without making the design so much harder. I predict that the next step on this path will be the local monitoring of temperature.

- S
- Skybuck
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jun 8, 2007 10:56 PM

Huh ?

Asynchronous cpu's should not need a clock.

It's like domino's, use it to signal stuff.

Bye, Skybuck.

- R
- Rene Tschaggelar
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Sat, Jun 9, 2007 3:48 PM

Oops, I was thinking about a UART and SPI. What speed are you takling about ? I know synchroneous circuits with 3 GBits. Beyond that ?

Rene

--
Ing.Buero R.Tschaggelar - http://www.ibrtses.com
& commercial newsgroups - http://www.talkto.net

- S
- sirinath
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Jun 12, 2007 9:46 AM

- T
- Torben =?iso-8859-1?Q?=C6gidiu
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Jun 12, 2007 12:01 PM

formatting link

Torben

- M
- MitchAlsup
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Jun 12, 2007 3:10 PM

For the most part, the clock rate of CUs has stopped progressing because of power disipation issues, not because we cannot make the clock signal go faster. Secondarily, the wire wall (wires are getting slower as gates are getting faster) means that more clock cycles are necessary to talk to remote parts of the chip. And finally, the memory wall means that even if we sped up the clock rates, {Donning Nomex} little performance drops to the bottom line due to the vast latencies of a main memory read.

So, in effect, that limit has only been reached under the assumption that power disipation is limited (to about 100 Watts). If somebody comes up with a scheme whereby 1KWatts of power can be removed from a chip of 13mm**3, and it costs about $10 in volume, then the clock rate race will be "ON" again.

But even if the second paragraph becomes true, there is good reason to believe that more performance can be placed on a die via multiprocessors than through ever faster/bigger CPUs with more cache/ predictors/function-units that deliver ever less advancement per unit area or per unit power (performance per Watt is often negative right now as these things are added/extended).

Basically as long as the input clock can be detected with less than a handful of picoseconds of (short term) jitter, the PLL multipliers can multiply up that frequency to at least 10 GHz (maybe as high as 30 GHz) with adequate end point jitter control. The Cray {1, XMP, YMP,

2,...} computers kept a refrigerator sized boxes within a fraction of a nanosecond of uncontrolled skew. All it takes is the power needed to run the clock distribution network and a determined enginerring staf to distribute the clocks.

It is that power that contributes to the lack of clock scaling you see today.

Mitch Alsup No longer at AMD.

- S
- Stefan Monnier
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Jun 12, 2007 5:51 PM

I'm not even sure that's true. The $10 cost will be dwarfed by the cost of the 1kW of power (plus air-conditioning, ...). Yes, there would surely be some interest in such monsters, but such a renewed "clock rate race" would probably stay confined to a fairly small market compared to what we've seen at the end of last century.

Stefan

- A
- acd
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Wed, Jun 13, 2007 7:44 AM

In a last-year's issue of the IEEE Journal on Solid-state circuits was an asynchronous flipflop and logic design style. As an example they used a multiplier. I was shocked by the overhead required for the asyncrhonous handshaking. Comparing this with the aggressive 11 SOI (if I am not mistaken) design of the Cell's SPEs synchronous design gets us much further. The on-chip clock generation I think is in principle not harder than the handshaking of an asynchronous circuit.

Andreas

- R
- Robert Myers
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Fri, Jun 15, 2007 7:34 PM

Hmmm. My brief review of the subject a couple of years back led me to the perception that one of the reasons for going asynchronous is that it can result in lower power operation for comparable performance. I also came away with the perception that asynchronous isn't common because it isn't common; i.e., little design experience, inadequate tools, formidable design challenges.

You've proposed two walls: a power wall and a memory wall. The memory wall has been pounded to the end of the earth and I'd rather not go there again. If you could beat the power wall with asynchronous operation, I'll bet there's a market.

Robert.

- S
- sirinath
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jun 18, 2007 6:19 AM

Asynchronous is the way forward. there are various synchonisation mechanisms. Resently I was reading a article on sunlabs about processor called FleetZero which they have made

- S
- sirinath
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jun 18, 2007 6:21 AM

Hi,

Is there any possibility of a Ph.D. Studentship possition there?

Regards Sum> sir> > Does any body know what are the research groups that are there looking

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jun 18, 2007 2:27 PM

I disagree. I don't see it as a path to any major break throughs. I think it is tuning to a local maximum.

Google has been having trouble posting. Before I got further I will post this

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jun 18, 2007 2:41 PM

It seems to be working so I will say more.

You can get about as much reduction in power by using a fine grained declocking of the chip. Declocking allows all the normal design methods to be used and reduces the troubles in following the prop. delays through all paths.

Asynchronous design only reduces the number of transistors and the power consumption by a nearly fixed percentage. It doesn't make the growth in each follow a slower curve. To break the growth off the curve it is on, we need a technology that goes away from using a logic gate for each logic operation.

To explain what I mean by this, take the case of an AND logic gate implemented with a rely. The coil is connected to one signal and the NO contact is connected to the other. The NC contact is perhaps grounded and the COM is the output. This makes a logic gate that does the needed function. If you need to implement (A and B) and (A and C) and (A and D), you would be tempted to put in three relays and need about 3 times the power. You could, however, use a relay that has three sets of contacts and require less than three times the power. This is the sort of thing that a silicon version of would allow us to break of the current power growth curve.

- J
- Joseph H Allen
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Mon, Jun 18, 2007 8:20 PM

How does async design compare to latch-based skew-tolerant design? With skew-tolerant design, you care only about the propogation delay through the latch, and don't care so much about the clock. This lets you borrow time from a shallow pipe-line stage for use in a deep-stage. Anyway, it seems that this methodology has many of the advantages of async design, but without it's problems: mainly that you don't have to worry about glitches.

Another question I have is logic size: yes with async design you do not have a large global clock network, but the async design elements tend to be larger (to avoid hazards). It would be interesting to see a comparison between the best clocked logic with the best async logic. Both with scan chains, or whatever is used to check for silicon defects.

--
/*  jhallen@world.std.com AB1GO */                        /* Joseph H. Allen */
int a[1817];main(z,p,q,r){for(p=80;q+p-80;p-=2*a[p])for(z=9;z--;)q=3&(r=time(0)
+r*57)/7,q=q?q-1?q-2?1-p%79?-1:0:p%79-77?1:0:p158?-79:0,q?!a[p+q*2
]?a[p+=a[p+=q]=q]=q:0:0;for(;q++-1817;)printf(q%79?"%c":"%c\\n"," #"[!a[q-1]]);}

- T
- Torben =?iso-8859-1?Q?=C6gidiu
  
  Contact options for registered users
Vote on answer
posted
16 years ago

Tue, Jun 19, 2007 7:25 AM

Quite true. But in some cases you can make the async circuit smaller, as you don't need to optimize for rare worst-case delays.

An example: The simplest adder is a ripple-carry adder, but that can in the worst case take O(N) to settle (where N is the number of bits). Hence, sync designs tend to use carry-lookahead or carry-select adders that have a worst case propagation of O(log(N)), but are considerably larger than ripple-carry adders. However, a ripple-carry adder has an average delay of O(1), so an async ripple-carry adder can be faster (on average) than a sync carry-lookahead or carry-select adder. And smaller too.

Torben