The next step: A way to produce flexible gallium arsenid wafers in quantity has been found

- J
- Jan Panteltje
  
  Contact options for registered users
posted
13 years ago

Wed, May 26, 2010 9:38 AM

The next step: A way to produce flexible gallium arsenid wafers in quantity has been found:

formatting link

Intel considers producing CPUs on the stuff.

Now the THz processor? Multicore is dead ;-)

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, May 26, 2010 12:30 PM

has been found:

formatting link

Not till someone figures out how to make P-channel GaAs FETs that are worth anything. Hole mobility in GaAs is pitiful. Building a modern processor out of NMOS would make for rather interesting power dissipation densities--i.e. the whole thing would turn to lava.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal
ElectroOptical Innovations
55 Orchard Rd
Briarcliff Manor NY 10510
845-480-2058
hobbs at electrooptical dot net
http://electrooptical.net

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, May 26, 2010 2:27 PM

been found:

formatting link

Quit recycling these press-release "discoveries"... not 0.1% of them ever amount to anything.

And multicore is surely the future, for reasons beyond device physics. The big reason is that programmers have demonstrated their inability to manage a single core system.

John

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, May 26, 2010 5:58 PM

On a sunny day (Wed, 26 May 2010 07:27:59 -0700) it happened John Larkin wrote in :

has been found:

formatting link

Well it is an important discovery, and also solar cells with 35 % efficiency is big news. You are free not to read them.

Well you seem to be unable to specify those reasons, Multicore is a desperate jump to stay ahead of competition,

As i pointed out before, normal problem as in a Turing system solving is very difficult to break up in parallel threads. That leaves serial processing, and the extra cores, IF they are usable for that, can then be used as signal processors, for example encoding - decoding - filtering what not, where there is a case of some sort of streaming, But in case of vector operations there is dedicated hardware that can work better then a general purpose core. For example graphics cards with very powerful GPUs and FPGA solutions. Beyond a few (Sony had 6 or so cores in the PS2 Cell) programing these becomes a horror. With very little increase of performance as reward.

Exactly 100% wrong, no problem running thousands of little threads or tasks on a single core, as most of the time these things wake up very rarely. There is a trade-off point where adding more tasks will make no longer any sense. I still have to see the application that has that problem, reaches that point. Just the fact that *you* cannot do thread programming does not mean it is not done very well every day, I have no problem with in Linux.

You viewpoint is well known, I was stirring you a bit with that 'dead multicore' remark, it is true however. If I had the choice between a 1 core 300 GHz x86 architecture CPU, and a 300 core 1 GHz, I would always chose the 300 GHz, The dividing of a few linear programs over 300 cores is not only impossible, it would bring you nothing but bottlenecks if even partly attempted.

Whatever Intel, or whoever for as far as I am concerned the President of the world may claim, this is the fact. It is YOU who have been misled by all that marketing. Larrabee has flopped as a proof of that, Nvidia dedicated hardware has won. I do not give a shit what any professor hired by any big semi manufacturer spouts about multicore. Do a + b on a 100 core, and we will talk again ! LOL Reality!

Spelsjeker has detected an error, and will now reboot your computer.

- M
- Michael A. Terrell
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Wed, May 26, 2010 11:15 PM

has been found:

formatting link

It would end the 'Use a PIC' posts if Microchip switched to that process. ;-)

--
Anyone wanting to run for any political office in the US should have to
have a DD214, and a honorable discharge.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 12:17 AM

has been found:

formatting link

Easy. Do a 50 GHz, billion-transistor CPU in RTL.

John

- J
- Jim Thompson
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 12:23 AM

has been found:

formatting link

With a static power of? What a stupid idea! ...Jim Thompson

--
| James E.Thompson, CTO                            |    mens     |
| Analog Innovations, Inc.                         |     et      |
| Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    |
| Phoenix, Arizona  85048    Skype: Contacts Only  |             |
| Voice:(480)460-2350  Fax: Available upon request |  Brass Rat  |
| E-mail Icon at http://www.analog-innovations.com |    1962     |
             
      The only thing bipartisan in this country is hypocrisy

- J
- Jim Thompson
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 12:27 AM

has been found:

formatting link

And a fundamental error... how does RTL work with MOS? ...Jim Thompson

--
| James E.Thompson, CTO                            |    mens     |
| Analog Innovations, Inc.                         |     et      |
| Analog/Mixed-Signal ASIC's and Discrete Systems  |    manus    |
| Phoenix, Arizona  85048    Skype: Contacts Only  |             |
| Voice:(480)460-2350  Fax: Available upon request |  Brass Rat  |
| E-mail Icon at http://www.analog-innovations.com |    1962     |
             
      The only thing bipartisan in this country is hypocrisy

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 1:42 AM

has been found:

formatting link

You might be able to do something with a 3D stack, putting GaAs on top of Si. The problem there is that you really need the lower metal layers (fine pitch, short lines) to connect the transistors in a gate. My old colleague John Bowers of UCSD and his group figured out how to get InP on silicon, which would be another approach.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal
ElectroOptical Innovations
55 Orchard Rd
Briarcliff Manor NY 10510
845-480-2058
hobbs at electrooptical dot net
http://electrooptical.net

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 2:43 AM

has been found:

formatting link

So far, the defect density of compound semiconductors kind of makes the point moot. I'm still astonished that anybody can run a 100M device silicon IC through scores of process steps and mask layers and get anything to work.

Too bad nobody is making opamps out of phemts. I recall that some people used to make (bad) opamps out of all NPN transistors.

John

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 2:54 AM

has been found:

formatting link

is big news.

that,

filtering

better then a general purpose core.

a horror.

sense.

done very well every day,

multicore' remark,

world may claim,

spouts about multicore.

There are several reasons for multicore. One is the speed of light limitations. Signals in lines with repeaters travel at about c/10, so even at 3 GHz you can't have synchronous operation over more than a few millimeters' radius. Another is power dissipation: since standby power is going through the ceiling due to tunnelling through the gate insulator and the inability to completely turn off a very small FET, you need to be able to turn off major sections of a chip most of the time. Multicore makes that much easier. Another is redundancy: if you make a

100-core chip and advertise it as an 80-core, your yield goes up.

There are other reasons, but those are the biggies.

Cheers

Phil Hobbs

--
Dr Philip C D Hobbs
Principal
ElectroOptical Innovations
55 Orchard Rd
Briarcliff Manor NY 10510
845-480-2058
hobbs at electrooptical dot net
http://electrooptical.net

- K
- Kevin McMurtrie
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 7:00 AM

Software for 8 core systems is my job. There are two problems:

1) Very few developers understand multithreading to a useful degree. Usually they know what a semaphore is but they don't know how to manage lock contention. The only fix for this is demanding higher standards.

2) Most tasks CAN be broken up. The problem is that classic threading tools have gobs of overhead, and that limits multithreading to tasks with a low rate of interaction or repetition. Only very recent SDKs have addressed this using lightweight task queues and executors.

Apple has some good docs on solving this:

formatting link

0903.pdf

Java 1.5 and beyond have a standardized system for lightweight tasks too. The Sun implementation could be better but it's easy enough to swap in a custom one.

--
I won't see Google Groups replies because I must filter them as spam

- R
- Robert Baer
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 8:03 AM

has been found:

formatting link

Perhaps you do not mean "bendable"...

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 9:48 AM

On a sunny day (Thu, 27 May 2010 00:00:40 -0700) it happened Kevin McMurtrie wrote in :

I see it this way: If I have a render session running that takes 300 minutes on a 1 GHz (as example), then, if memory speed keeps track and all that, it will run in 3 minutes on a

300 GHz. I had many of those long render sessions running, usually start late at night, ready in the morning.

But if somebody came with a 300 core 1 GHz I would see no advantage, on the contrary, that person would spend the next 300 *days* rewriting all soft and trying to take advantage of those extra cores, perhaps getting a 10x (more likely 3x, or even less) speed improvement. By that time (300 days later) I would have rendered thousands of productions, WITHOUT ever having to modify a single program or script, or even recompile. The extra speed would however allow me to use much better effects and incredibly nice features, do it all in real time in full resolution, resulting in a better end result.

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 9:49 AM

On a sunny day (Thu, 27 May 2010 01:03:01 -0700) it happened Robert Baer wrote in :

has been found:

formatting link

OK, if you like.

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 11:55 AM

*Cough*

How long do you figure the original software took to write? 300 days? If they had designed it for 300-core operation from the get-go, they wouldn't have had that problem.

Sounds like a failure of management to me :)

Tim

-- Deep Friar: a very philosophical monk. Website:

formatting link

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 3:12 PM

On a sunny day (Thu, 27 May 2010 06:55:46 -0500) it happened "Tim Williams" wrote in :

You are an idiot, I can hardly decrypt your rant. The original soft was written when there WERE no multicores. And I wrote large parts of it, AND it cannot be split up in more then say 6 threads if you wanted to. But, OK, I guess somebody could use a core for each pixel, plus do multiple data single instruction perhaps, will it be YOU who writes all that? Intel has a job for you!

64 bit x86 is still around and one of the reasons AMD was successful with that, is that it would run EXISTING code, not all though, but even a recompile was easy. But multicore is a totally different beast.

I'd love to see a 300 GHz gallium arsenide x86 :-) I would buy one.

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 5:00 PM

Strange, it's perfect American English.

Sounds like you aren't trying hard enough. Design constraints chosen early on, like algorithmic methods, can severely impact the final solution. Drawing pixels, sure, put a core on each and let them chug. Embarrassingly parallel applications are trivial to split up. If there's some higher level structure to it that prevents multicore execution, that would be the thing to look at.

And yes, it may result in rewriting the whole damn program. Which was my point, it may be necessary to reinvent the entire program, in order to accommodate new design constraints as early as possible.

GPUs have been doing it for decades.

No. SIMD is an instruction thing, not a core thing. Not at all parallel. SIMD tends to be cache limited, like any other instruction set, running on any other core. The only way to beat the bottleneck is with more cores running on more cache lines.

FWIW, maybe you've noticed here before, I've done a bit of x86 assembly before. I'm not unfamiliar with some of the enhanced instructions that've been added, like SIMD. However, I've never used anything newer than 8086, not directly in assembly.

More and more, especially with APIs and objects and abstraction and RAM-limited bandwidth, assembly is less and less practical. Only compiler designers really need to know about it. The days of hand-assembled inner loops went away some time after 2000.

If Cray were still around, I bet they would actually be crazy enough to make John's GaAs RTL monster.

Tim

--
Deep Friar: a very philosophical monk.
Website: http://webpages.charter.net/dawill/tmoranwms

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 6:59 PM

On a sunny day (Thu, 27 May 2010 12:00:04 -0500) it happened "Tim Williams" wrote in :

Let's see, let's use something simple right, ??? a H264 encoder. Now just, in some simple words, how would you split that up over 300 cores? I think you have no clue what you are on about, honestly, have you ever written a video stream manipulation tool? Perhaps like this:

formatting link

And that then is a 'filter' (I do not like the word filter, as filters 'remove' something, and this adds something) will have to fit the API of yet an other stream processor (transcode in this case) that has to use about every codec available. You can split things up a bit, maybe over 5 or 6 cores, but after that splitting becomes next to impossible. What does help is dedicated hardware for things like this, say use a graphics card GPU for encoding. Multicore? 300?????????????????????? forget it.

blabber. Write it, publish it if you have the guts,

Insanity tries to invent the impossible.

Many have gone before you, and none are remembered. It is like a plane with one half wing missing, great design, but it will NOT fly.

Actually it *is* parallel, I have some nice code here that does decryption that way, but it is only possible if you need to do the same operations . What happens is that in one, say 32 bit wide, instruction you do a logical operation on 32 single bit streams.

Dunno what you are saying here. The whole thing is limited by cache, if you can execute in cache without reloading from memory it will be faster. Of course moving data from one to an other core via whateverhaveyou infrastructure is a big bottleneck. Architectures exist, many, none is perfect.

SIMD is not an instruction, it is a way of processing, although there are specific instruction like MMX in X86 that use that.

Well, you should really try to see the difference between ffmpeg with asm optimisation enabled (default), or compile it with flag C only (for processors where the time critical parts written in asm are not available), it is shocking, all of the sudden it hugs the system, I have tried to compile it both ways on x86, and VERY quickly went back to the asm version. FYI ffmpeg is sort of the Linux Swiss codec knife.

If these wafers work, maybe it could be done, I dunno much about semiconductor manufacturing, so maybe it needs totally different processes, and at that speed for sure VERY short connections, but with GaAs some expertise exists, maybe they can easily do optical on chip too?

- T
- Tim Williams
  
  Contact options for registered users
Vote on answer
posted
13 years ago

Thu, May 27, 2010 9:37 PM

^^^^^^ ^^^^^^ Well there's your problem, using streams. So not only is your problem an early design constraint, it's a fundamental construct of your operating system. Pipes and streams, such ludicrosity.

Now, if you download the whole damn file and work on it with random access, you can split it into 300, 3000, however many pieces you want, as fine as the block level, maybe even frame by frame.

It's my understanding that most video formats have frame-to-frame coherence, with a total refresh every couple of seconds maybe (hence those awful, awful videos where the error builds up and not a damn thing looks right, then WHAM, in comes a refresh and everything looks ok again, for a while). So the most you could reasonably work with, in that case, is a block. Still, if there's a block every 2 seconds, and you assign each block to a seperate core, there's 7200 cores you can chug to an average movie. That movie might be ~4GB, which might transfer in a few seconds over the computer's bus. More than likely, it would take longer to send to all the cores than it would take for all of them to do their computations.

Obviously, there is little point in >1k cores in such a bandwidth limited application which only takes a few minutes on ordinary systems anyway, so obviously, you use such systems to solve much more complex problems, like quantum mechanics. folding@home work units range in size from a few megs to hundreds, and they all take about the same processing time (a day or two on modern processors). Such activities are clearly not storage-bandwidth-limited, and would gain a lot from a multicore approach. Which is, after all, how they are implemented. From the ground up.

Tim

--
Deep Friar: a very philosophical monk.
Website: http://webpages.charter.net/dawill/tmoranwms