disappointing 550Mhz performance of V5 DSP slices

A

airtom 20 years ago

Hello, Can anyone give i explanation for the disappointing 550Mhz performance of V5 DSP slices? Couldn't we hope 1GHz multipliers with 65nm technology?

By the way, why are not the multipliers pipelined to increase the performance?. Is there any chance to see pipelined multipliers in virtex-6?

Vote

B

Ben Jones 20 years ago

You can hope for what you like! :) 1GHz multipliers would be very hard to use if the logic fabric and memories can't keep up. If you can't feed them with input data they will spend a lot of time doing nothing... do you have an application in mind that requires 1GHz multipliers? How would you propose to engineer them into your wider FPGA design?

The DSP48E is already pipelined. Obviously, adding extra (bypassable) pipeline stages within the multiplier would increase the maximum clock speed, but it also adds latency and silicon area, and increases power consumption. Everything's a tradeoff...

Note that there are many enhancements in the DSP48E over the Virtex-4 DSP48 block, not the least of which being a larger multiplier (25x18). So a direct MHz-to-MHz comparison with the previous generation is not entirely fair.

Cheers,

-Ben-

Vote

A

airtom 20 years ago

in some problems with pipeline techniques, logic fabric can keep up For memories, i am sure u can design 2x faster memories

Scientific computation

Agreed

Agreed I am just noticing that moore law is not respected relative to dsp frequency and wanted to know the reason for this(is it technological problems, or strategic problems to satisfy the maximum number of customers) Cheers, Thom

Vote

S

Stephen Craven 20 years ago

Tom,

Moore's Law actually refers to transistor sizes, not clock frequencies. Interconnect delay at these deep submicron sizes has significantly reduced clock frequency scaling - just look at how Pentium frequencies have stopped increasing.

Xilinx could significantly pipeline the multiplier to get a much higher clock rate, but, as mentioned previously, keeping it fed with data from the configurable fabric would be difficult at higher frequencies.

MathStar has a field programmable array with 1GHz DSP performance, but it's not bit-level configurable.

Stephen

Vote

F

Falk Brunner 20 years ago

Stephen Craven schrieb:

Guys, guys. Can you ever get enough? I guess no! :-( First, clock frequency is not necessary processing power. Second, 550 MHz isnt a piece of cake, even if there are a few other fast(er) competitors. Third, I think that all those fency Pentium/Athlon/Whatever CPUS are heavy pipelined. Fourth, blablablablablablablablablablablablablabla

And, Iam not a follower of those stock market philosophy of ever (exponential) rising profit, or here in the electronic world ever (exponential) rising processing power/clock frequency. This "law" was suprisingly valid for a long time, it still can be fullfilled, even with the great challenges of deep sub-micron technology. But there is an end to all things. Remember, the transfer function of a diode is also exponentional over many decades, until it ends up in smoke.

Regards Falk

P.S. Reminds me again of some old basic rules for programmers. 1st Your CPU ist always too slow. 2nd You have always too less RAM. 3rd The compiler is lousy.

Vote

A

Austin Lesea 20 years ago

All,

A recent Intel presentation at an IEEE Workshop admitted that clock frequency has max'd out, and now has to go down (not up) in order to not create heat.

We have known that for years now. So has AMD.

The only choice is "multi."

Intel proposes a future with more than 200 x86 cores on one die, with a "communications fabric" and many memories. All on one die. Small software problem to be solved by the need to have it solved....

One attendee of the conference (not me!) quipped, "sounds like you are describing a FPGA..."

Boy did the presenter get mad! To be ccompared to a lowly FPGA! He was spitting venom back at the attendee. "There is no comparison! FPGAs are fine grained, and this is not!"

Sounds like if that is the only difference, the FPGA wins. Again.

Oh, and I can't wait for Intel to stub their toes on that "communications fabric" (left as an exercise for the student). Or the software.

I think that we are all dissapointed: no high K gate dielectric, so the supply voltage can't scale anymore, worsening variation in threshold voltages, because not only can you count the layers in the gate dielectric on your fingers, but you can count the ions that got implanted, too. Not only does the source-drain leak when off, but gates leak now (at 65nm and below).

A new fab costing 2B$. No clear path for lithography.

Good thing we make a standard product, and can afford to keep developing it. ASSP vendors will have to consolodate, and reduce their offerings. Real tough times ahead for some business models. Only place to get the latest IP will be from FPGA vendors....

The future is ~500 MHz, more stuff, and voltages slowly decreasing to

0.8 volts.

But, we can still get twice as many transistors per unit area, all the way down to 22nm. And that increases thoughput and processing power.

45nm, 35nm, and then 22nm. Life in the old horse yet. 2B 6 input LUTs in V8? 100 Mb of BRAM? 2,000 DSP processors? Crystal ball is getting very hazy....Aunti Em! Aunti Em! She is holding her head! (apologies to the Wizard of Oz movie).

After that, we really are looking for that disruptive technology with which we can make a new FPGA.

Now if I could only get those unobtainium wafers to yield....

Austin

Vote

M

MikeShepherd564 20 years ago

I agree with the sentiment. Most of us aren't pushing the speed of our devices. We use FPGAs for other reasons.

Of course, there are always "leading edge" applications that need the highest performance, but it's a small part of the market and a minority interest.

Unfortunately, speed is the usual news from manufacturers: "This will run at 800MHz now and 950MHz by the end of next quarter...".

Yawn!

Maybe the speed freaks can form their own newsgroup ("FPGA overclockers"?) Stand by for photos of water-cooled chips, lit with blue LEDs.

Vote

P

Peter Alfke 20 years ago

Look at other maturing areas:

Commercial aviation has hardly gotten faster since the 747 arrived more than 30 years ago. Automobiles are hardly getting faster, except for at the lunatic fringe.

100 m dash has improved a few measly percent since Jesse Owens in 1936, 70 years ago ! Baseball records improve mainly through chemistry...

But, as Austin wrote, we are still trying. It is now easier to make circuits smaller, make bigger chips, improve yield, and thus lower cost, than it is to make circuits faster.

Peter Alfke

Vote

P

pbdelete 20 years ago

Maybe we'll see "Xilinx inside" within 20 years ;)

Maybe machines with fpgas interconnected in a giant "web of interconnects" will be the feature. And parallell computeing as the only way to harness that capability.

One could even take processed silicon plates and have them unmap faulty chips and interconnect the rest to have that functionality in one go.

Vote

F

Falk Brunner 20 years ago

snipped-for-privacy@btinternet.com schrieb:

STRIKE!

You made my day!

(Ok, its evening now, but nevertheless)

Regards Falk

Vote

F

Falk Brunner 20 years ago

Peter Alfke schrieb:

Same marketing problem in car industry. So the change to slogans like

" . . you wont arrive faster, but much more relaxed."

Not too bad. Maybe its time for the software to catch up.

Regards Falk

Vote

J

JJ 20 years ago

Over on toms hardware they have been have a good time overclocking a very affordable $130 Pentium D 805 (dual core) from 2.66 GHz to something like 4.1GHz.bypassing the $1K cpus from Intel & AMD.

But the speed gain varied from soso to 2x was achieved at very great cost in extra heat output, the nos are quite large. Typically the power starts at 95W but goes out to 200W or so on the cpu for the extra performance and needed water cooling and they also cranked the voltage way up for the last 10% stretch.

The sweet spot I think would be to stay near the 3.6GHz limit of air cooling with the giant Zalman coolers and near the nominal voltage, the extra prize points just not worth the hassle.

Now I said all that in 3 paras, they took 45 pages.

Still my money is on parallel slower cpus too, but what else would a Transputer person say. All this stuff about 200 x86s on a chip doesn't seem so difficult considering Moores law applied to density over 25yrs. Since almost all software is sequential today and most of the software to exploit 200 cores will have to be rewritten anyway, the holdover of x86 ISA is really looking plain silly, just what will compatitibility mean to run old code on only 1 of 200 cores.

If you design a cpu for massive scaling with support for fine grain concurrency you are better off with a design that does it well and you get many more than 200 on the same chip. Now if the array is programmed in a language that is 50/50 HDL and traditional Cxx, the difference between Transputer arrays and FPGA arrays is only a matter of granularity, they both execute concurrent processes.

Personaly I think it will be along time though before Intel rediscovers how to make lots of cpus cooperate the way it was done 20 yrs ago already.

John Jakson transputer guy FPGAs & Transputers 2 sides of the same coin

Vote

J

Jan Panteltje 20 years ago

On a sunny day (17 May 2006 11:17:54 -0700) it happened "JJ" wrote in :

AMD came out with four core today.

Vote

K

Kees van Reeuwijk 20 years ago

Except that there is no way to compile standard software to an FPGA, or even to compile freshly created software in anything near a normal programming langage. I hope you don't expect that the bulk of C/C++/Java/C# programmers will learn VHDL or Verilog.

Of course programming a 200-core x86 processor in C/C++/Java/C# is a software engineering nighmare too, but with enough coding discipline there is at least a slim chance that you can get a team of ordinary programmers to produce working software for it.

It is very hard to predict what a viable mainstream architecture will look like in ten years, but unless a lot of work is done to create better compilers for them, it surely isn't going to be an FPGA. I wouldn't bet on the 200-core x86 either, but that's because I'm an optimist.

In an emergency, I would prefer a 10000-core Transputer to either of them, even if it would mean resurrecting Occam, but I hope someone can come up with something more imaginative.

Vote

J

Jim Granville 20 years ago

.. and for the OP worrying about poor DSP speed, this today from freescale :

formatting link

One of these could allow a design to click-down a V5 size :) On the V5 price scale, this is tolerable....

10.5MBytes of included RAM solves one big problem, and 4 cores, 1GHz each sound usefull. I think they say ( almost as an afterthought) they have two PowerQUICC cores ?

-jg

Vote

J

JJ 20 years ago

Yes indeed, just wonder when the x86 folks are going to say a word about || programming the the darn thing. After 2 x86 cores, I suspect most end users are not going to be running enough apps to make any real difference and the folks that will be able to use bigger n multi cores are not enough to justify the push to much higher n cores.

A Transputer PE uses 1 BlockRam & 500 odd Luts/FFs and delivers about

100 32b mips using a 300MHz clock. The largest FPGAs can hold >500 BlockRams and perhaps 200 functional PEs leaving the rest for MMUs, would get pretty hot I imagine. But darn, not enough I/Os to make it work.

I am glad to see V5 now has 400MHz DDR I/Os for full RLDRAM2 performance, perhaps V6 will keep up with the next bump to 533MHz for RLDRAM3.

Ultimately, both FPGAs and massively parallel cpu core arrays are going to end up in the same place, I/O starved at about the same I/O pin frequencies with free computing on the inside either fine, medium or course grained. FPGAs already have tools to manage fine grained concurrency, Transputers used to, and the latter who knows when.

John Jakson transputer guy

Vote

J

JJ 20 years ago

I'd say the guy has no clue about massive concurrency except in the x86 sense. I'd take the comparison as a complement not an insult.

How about a hybrid of C++ subset with Verilog subset, ie a process as the object language where processes with interconnects look both like HDL module instance hierarchies (and can be synthesized to hardware) and also look like C classes with event driven ports and OO methods, data. It would need a runtime not much different than a event wheel simulatiuon engine some of which could be right in the cpu scheduler if its hosted on a Transputer.

The real irony is that while the Transputer has been gone for what 10 years, a 200 node machine could probably be built in an FPGA right on the edge of thermal, memory issues but lots of smaller FPGAs give many more I/O pins and better heat spreading.

The other thing I have been saying for some time is that the sequential cpu has a Memory Wall problem with maybe 1000 clocks per missed cache event while a modern latency hiding Transputer can have relatively SRAM memory like performance using RLDRAM. You replace the Memory Wall for a Thread Wall, not really a wall for CSP people.

John Jakson transputer guy

Vote

J

Jan Panteltje 20 years ago

On a sunny day (Thu, 18 May 2006 07:54:53 +1200) it happened Jim Granville wrote in :

I am not sure, QUICC I, II, is there a 3? Extra security. The AMD announcement made me think about Sony Cell, if that still is worth it. I have studied the Cell a bit in depth, IBM uses it in servers, but it seems to me AMD is eating parts of the Cell intended market this way....

PowerQuiCC is a PowerPC based system? This AMD move may threaten that too. And Intel is dead already. Bit of-topic though (FPGA).

180$ is a bit much for a settop box processor, ASICS for H264 decoding exists and are coming on the market.

Where will it go?

Vote

E

Eric Smith 20 years ago

I don't think so. If they had, it would be featured on their web site somewhere.

What they did announce today is the Turion 64 X2 dual core mobile processor.

Yesterday they announced energy efficient desktop processors (single and dual core).

Vote

J

Jan Panteltje 20 years ago

On a sunny day (17 May 2006 14:30:35 -0700) it happened Eric Smith wrote in :

formatting link

Run it through a translater, it is in German/

Vote

disappointing 550Mhz performance of V5 DSP slices

Join the Discussion

Didn't find your answer?