Low-power FPGAs?

.. But the tools ARE being developed = that's where this sub thread started. Sure, they are not what FPGA users would call mainstream yet ( and probably will not be for FPGA design ), but Philips are one of the more cautious companies, and they are also in a position of having made real silicon.

If you look at Figs 49 thru 52 in the Philips 87C888, you get some idea. The MIPS/Watt values are very good, especially on what was a relatively old process. See also how MIPS/Watt scales with Vcc.

Async is not going to displace Sync designs in all areas, but it does illuminate design pathways for lower power. One of those, is Vary of Vcc.

Presently FPGA's spec only ONE Vcc, but a recent thread covered an emerging potential for Wider variances on Vcc. This is somewhat innate in the silicon, it just needs the mindset and specs change to use it.

-jg

Reply to
Jim Granville
Loading thread data ...

It seems to me that you want something that is impossible. On one hand, you seem to want an explanation that is reduced to bite-sized slogans. On the other hand, you want to argue with the simplified explanation, picking on details that are to complicated to fit into one sentence. So which do you want - a simple slogan, or a detailed, nuanced discussion? You can't have both. You've had the first, and decided to argue with it. So you must really want the second, in which case you should really take the time to read what much more qualified persons have written about at length in books and formal papers.

One obvious source of juice is the difference between the longest and shortest combinatorial delays (i.e. flip-flop output delay plus routing delays plus LUT delays plus flip-flop setupt time plus (perhaps) clock skew). The clock period in a sync design is determined by the maximum delay. However, the device still has to wait for an entire clock period even during a cycle when all relevant combinatorial delays are much less than the maximum. This would not be the case in an async design, where the performance of a circuit over a period of time is more likely to be a multiple of the average combinatorial delay rather than a multiple of the maximum combinatorial delay.

EMI reduction due to spreading the switching current spikes over time comes for free in an async design, rather than required special clock chips.

--

Phil
Reply to
Phil Short

I'm assuming we have a good data sheet that lists the worst case times for each instruction. That's generally true for simple sync CPUs. It gets more complicated with high performance CPUs.

If the program is simple, you can trace the flow. For example, with a DSP system you know how many times you go around the loop in a filter or FFT.

For a sync system, you can count cycles. For an async system, you could probably write some software do do the equivalent sort of bookkeeping.

What do people do for complicated systems? I'd probably toss a counter into the wait loop and figure out what fraction of the CPU was idle. Maybe make a histogram and see how far out the tail goes. Round up more if the cost of failure is higher.

With an async system, I'd expect you could do the same sort of thing. Maybe use timers rather than a spin/poll loop to keep with the low-power philosophy. But now the problem is that you have to correct for temp, voltage, and process. Temp and voltage you can measure. You can probably measure process by running some calibration code. But you still have to add a fudge factor for the software. How important is the software uncertantity relative to the hardware uncertantity?

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.
Reply to
Hal Murray

It strikes me that the hardware vendor would characterize and bin the devices in some way that would be meaningful, and provide design tools that would aid in this manner. For a fixed-function device (e.g. UART or fifo) the binning process at the semiconductor vendor would take care of the speed issue in just the same manner as is done today (from the viewpoint of someone incorporating the part in their design). For an instruction-programmable part (i.e. CPU) this would also be the same as for today (parts speed-graded by the manufacturor, user doesn't have any meaningful way to compare parts). And who knows if there will ever be async programmable logic parts, but if there is, again the manufacturor speed-grades the parts and has to provide layout and (worst-case) static timing tools to the designer (no more 'magic' than the current FPGA/CPLD situation).

--

Phil
Reply to
Phil Short

However, sync circuits cope with this by just waiting more cycles for the result to appear. The async circuit maybe squeezes the last little bit of performance out, but at the expense of a whole load of handshaking stuff.

Well, that's your opinion. My opinion is that the market is rarely wrong, especially when the technology has been around for decades, and it's a error of judgement to cherry pick one or two examples in the past where marginally better technology failed to disprove this. The exception proving the rule and all that. If async stuff was really 3 times faster and used 50% of the power, as you quoted in a previous post, we'd most likely see a whole lot more of it. Just my opinion! Best, Syms.

Reply to
Symon

"Not ... mainstream" is putting it mildly...

What are you comparing it to? My copy of the data sheet is dated 2002 and says superceeds the 2000 version. Most of the chips I would say run with about the same MIPs/W are also 4 years old. MSP430, PIC16, AVR...

Varying Vcc reduces power for *ALL* chips. I know for a fact that most of the PIC MCUs are designed for a range of Vcc, often 2.7 to 5.5 volts. In fact the power varies with the square of the voltage since both the current and the voltage change.

This is not at all "innate" in silicon. The chips just need to be designed for a range of Vcc rather than optimized for the best Vcc as FPGAs are.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

Actually async circuits also have to leave margin. The "handshake" is timed by delays in the silicon which must be given some margin over the slowest path through that section of the combinatorial logic. So just like you would set your system clock speed a bit slower than the worst case combinatorial delay path, they must do the same, just on a lower level.

When the external system has real time requirements, then the async chip must meet that speed requirement at its slowest. Then running faster is of no benefit. You just end up in your idle loop running more cycles and burning more power.

Actually his examples don't really show anything. Beta vs. VHS was a marketing issue because Sony wanted unreasonable licensing fees and I don't think there *IS* any marketing on async logic. GAs vs. Si is not an issue of one being better, each has their advantages and each is used when appropriate.

This whole discussion is getting too long. If there are any facts I would like to hear them.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

No Phil, am not asking for "bite-sized slogans". I am asking for a simple explanation of how this technology works better. Is it really that complex that it can't be explained?

Yes, but I explained how in a real time system, this only moves the problem from the clock domain to the system domain. Your chip can run at faster speeds when it is cooler or just a faster chip (process) but that won't be of any value since you have to design your system to the worst case chip delay.

If you are talking about the different paths within the chip, I still don't agree that there is a significant difference. Sync logic is balanced so that the different circuits have about the same delay so that the clock speed can be optimized. So there is not much waste between the separate circuits. Within a given circuit the async logic still has to wait for the logest delay since it has no way of knowing what speed the logic will run. Remember the async handshake is really a delayed clock and must be delayed more than the worst path through the combinatorial logic.

Yes, this is one advantage that async circuits have. But it is certainly not enough to warrant the efforts required for async design.

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

They don't have to be designed for multiple voltages. You can design them for the primary target and then characterize them (aka fill in the blanks in the data sheet) at other voltages.

Might be an interesting market opportunity. Similar to what Xilinx is doing with only testing to meet a specific design. Just run the tests at a different voltage. (after figuring out how fast the chip should go so and making another set of speed files for the tools and ...)

--
The suespammers.org mail server is located in California.  So are all my
other mailboxes.  Please do not send unsolicited bulk e-mail or unsolicited
commercial e-mail to my suespammers.org address or any of my other addresses.
These are my opinions, not necessarily my employer's.  I hate spam.
Reply to
Hal Murray

Ok, I think we are closing the loop here. Doesn't this make it clear that when designing with an async part, you still have to allow for the worst case timing in your system design? Doesn't that eliminate any speed advantage async parts might have by running faster when they can?

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

Yes, it is too complex to explain by someone who is not a professional writer over a channel with multiple-hour round-trip delays, no visual feedback and no decent way to draw diagrams to explain points.

I wasn't talking about the system domain. You are correct in hinting that there are significant problems interfacing between sync and async chips, such that the literature suggests that most of the speed benefits of async design may be lost in this case. It seems that it would be better if everything in the system is async, rather than a mix. Just another problem in gaining more widespread acceptance.

Every circuit that I have ever designed has had a lot of waste. This includes both microprocessor-based systems and FPGA designs. Ignoring large combinatorial arithmetic elements, the control logic alone has had a large amount of non-fixable variation in delay. In any given circuit there have been places where accidents (or deliberate) of state machine design allow the output of one register to be directly connected to the input of another register, and other places in the same logic block where there may be half a dozen or dozen layers of logic, with no way to allow signals in the latter path to take multiple clocks before they are needed. All of these _differences_ in delay can add up.

In fact, there are some (all?) async methodologies where there are no flip-flops at all, the state information ends up being 'kept' in the delay elements. In any case, the absence of any sensible way to use async design methods with FPGAs has prevented me from taking the time to totally wrapping my head around the detailed design methods. They are so different from sync design methodologies that I suspect that you have misconceptions about how things would work in an async design, with no way for me to determine what these misconceptions are in these usenet interactions.

It would be best if you were curious to know more about async design to go back to more fundamental sources than usenet. I personally am always very suspicious of condensed or regurgitated descriptions, and prefer to go back to the most basic readable source. Takes more time, but then I am better able to understand what the real situation is, and to separate urban legend from the facts.

As you note in another post, this thread has gone on too long.

Best,

--

Phil
Reply to
Phil Short

Although risking pouring petrol on the embers of this discussion, the following might be of interest:

ARM And Philips' Handshake Solutions Collaborate To Develop Clockless Processor

formatting link

Regards,

John

Reply to
John Williams

Yes, that was the trigger to this thread, first noted by Symon on 28 Oct. Some silicon should appear in 2005, and then 'like process' comparisons can be made. It could be that this will be used for Async versions of the ARM-Cortex, (

formatting link
) which would make 'like core' comparisons harder :)

-jg

Reply to
Jim Granville

Oh no, I've triggered thread-recursion - prepare for a usenet meltdown! :)

John

Reply to
John Williams

To update this, for a topical example of SoC design to vary Both CLK and VCC, I see this :

Synopsys, UMC, join ARM, National for low-power SoC demo

formatting link
?articleID=52500027

It states:

"The demonstrator is set to use adaptive voltage scaling as well as frequency scaling. The system is expected to make use of the lowest voltage and frequency required to meet software deadlines while maintaining user quality."

So, sometime in 2005, we might see a 'like process' comparison between the above and this Async alternative

formatting link

and maybe the FPGA vendors will start to follow this ?

-jg

Reply to
Jim Granville

formatting link
?articleID=52500027

I seriously doubt that FPGA vendors are looking at async design as anything other than a far out possibility. The big problem is not any of the technical issues, but one of tools, support and training. This is not unlike the problems of using hydrogen powered cars. There is no support mechanism for distributing the H2, no repair shops than can work on it, emergency services are not prepared to deal with it... Just having chips and a tool to support async design would only be a part of the problem. It would be a *MAJOR* paradigm shift which will not come until some huge pressure is forced upon them.

In other words, don't hold your breath...

--

Rick "rickman" Collins

rick.collins@XYarius.com
Ignore the reply address. To email me use the above address with the XY
removed.

Arius - A Signal Processing Solutions Company
Specializing in DSP and FPGA design      URL http://www.arius.com
4 King Ave                               301-682-7772 Voice
Frederick, MD 21701-3110                 301-682-7666 FAX
Reply to
rickman

The new example I gave was NOT async; it was intended to show how some of the advantages of Async design (Vcc and Clk scaling) can be used in sync designs. In Async, the 'clock' scales with Vcc automatically, in Sync you need to adjust both Vcc and Clk, and probably add some HW support so you can (using their words) "make use of the lowest voltage and frequency required to meet software deadlines while maintaining user quality." ie it would be nice to have some deadline margin feedback, for better system behaviour than 'slow it down until it crashes' :)

What UMC/ARM/Natsemi are doing, COULD be applied to FPGAs with not much effort.

To show how the FPGA vendors are following the power problems, look at

formatting link
The detail is more disappointing than the headline, but it IS a simple first step. ie calling a device with all power but IO's removed, IO tri-stated, and needing a re-config for startup, 'quiescent' is a bit of a stretch. Smarter design would allow the IO state to preserve, maybe that will come in newer families.

-jg

Reply to
Jim Granville

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.