50 cores

J

John Larkin 16 years ago

formatting link

John

Vote

M

Muzaffer Kal 16 years ago

So they finally figured out what to do with Larrabee.

Muzaffer Kal DSPIA INC. ASIC/FPGA Design Services http://www.dspia.com

Vote

R

Robert Baer 16 years ago

Eh? A little blather concerning "a new class of chips" and a hint might not be related to X86 technology..where does the 50 cores come in? And what is this "new" technology? Programmed in pure Mandarin?

Vote

J

John Larkin 16 years ago

Two gigabuck-level Intel mistakes: Launching the Itanic and dumping ARM.

John

Vote

M

Martin Brown 16 years ago

A version of it that doesn't require accepting spam from WSJ

Strangely there is nothing about the 32 multicore chip on Intels own website that I can see. It looks like a no news press release that has been widely reported but is mostly fluff with no substance. Odd that since the chips contain 32 core it is a dimer with only 50 cores used.

Regards, Martin Brown

Vote

D

Dirk Bruere at NeoPax 16 years ago

nVidia - 512 cores. I suspect that 512 simple cores will out-compute 50 complex cores.

Dirk http://www.transcendence.me.uk/ - Transcendence UK http://www.blogtalkradio.com/onetribe - Occult Talk Show

Vote

J

JosephKK 16 years ago

4.html

=46or good statistics and historical data try here:

formatting link

Vote

M

MooseFET 16 years ago

50 seems an odd number. I would expect a power of 2 or a power of 3 number of cores.

The power of 2 number is just because things tend to be doubled and doubled etc.

The power of 3 number is because if you imagine a hypercube like arrangement where each side is a bus for communication directly between cores, it makes sense to have 3 processors on a bus because while A and B are talking, C can't be having a conversation with either. This would allow the array or cores to get information quickly between themselves. It assumes that they each have a cache that the transfer works to sync.

At some point, adding more of the same cores stops working as well as adding some special purpose hardware to a fraction of the cores.

Not every core needs to be able to do a floating point at all. Some would be able to profit from a complex number ALU or perhaps a 3D alu.

Chances are, one core would get stuck with the disk I/O etc that core would profit from having fast interrupt times. The others less so.

Vote

J

John Larkin 16 years ago

Maybe they did 64 and only get 50 to work?

Eventually we'll have a CPU as every device driver, and a CPU for every program thread, with real execution protection. No more buffer overflow exploits, no more crashed OSs, no more memory leaks.

John

Vote

V

Vladimir Vassilevsky 16 years ago

Instead we will have racing, deadlocks, data coherency issues, state save/restore problems, unpredictable arbitration and a version hell. Thanks, but no thanks. The development for the system with one core is heck of a lot simpler.

Vladimir Vassilevsky DSP and Mixed Signal Design Consultant

formatting link

Vote

J

John Larkin 16 years ago

That would be different somehow?

John

Vote

V

Vladimir Vassilevsky 16 years ago

Every new core or master on the bus adds a dimension to the problems.

VLV

Vote

M

Martin Brown 16 years ago

From what limited technical info has leaked out it seems the chips are

32 core so heaven knows where this 50 core number comes from. Perhaps a pair of the prototype 32 core chips have 50 working cores between them.

The whole thing appears to be PR fluff for Wall Street and investors to drool over - there is very little about it on their website.

And replace them with horrendous memory contention and cache coherency problems - great!

Mickeysoft can barely cope with programming on 4 cores. I see plenty of "interesting* race condition faults in XL2007. The easiest to provoke is programatically drawing moderately large graphs and altering the axes. It is quite easy for the code that modifies the axis to run before the axis object had been instantiated on a multicore box.

Regards, Martin Brown

Vote

J

John Larkin 16 years ago

The trend seems to be to have a bunch of cores around a shared L2 cache. The usual hardware cache coherency stuff can be done, easier than in lots of other situations. Add a pile of bulletproof hardware semaphores. All single-clock synchronous logic. Paradise.

It's obvious by now that the software boys are never going to get it right. It's time for multitasking to be managed by hardware.

John

Vote

J

John Larkin 16 years ago

Synchronous logic. Hardware semaphores. Absolute hardware protections. OS that shares nothing with apps. Bulletproof.

That's because they are trying to run Windows on more cores, and worse, often trying to distribute one computational problem among multiple cores. That is insane.

John

Vote

M

Martin Brown 16 years ago

Absolute hardware protection can be done on one CPU with segmented architecture and a viciously defensive TLB. Even better if you use Harvard architecture which for obvious reasons prevents data execution.

If your multi-CPUs share a common flat address space as is currently in vogue any protection your separate physical cores offer is largely illusory. You would be better off with virtual CPUs and a tiny hypervisor with slightly paranoid behaviour watching over them.

Hardware contention is trivial for two CPUs, requires slight though for three (but is still trivial) and can go wrong for four. Contention issues for multiple CPUs N scale as N(N-1) this becomes non-trivial for N>4 - at least if you want to gain some performance from adding the extra CPU.

Distributing some computational problems across multiple cores is the only way to get them done fast enough. That is how just about all the ray tracing engines in fancy graphics cards do it. SIMD.

Where it gets tricky is when you split up a complex task and do not understand what you are doing. All too common in software these days and also pretty common in hardware.

Regards, Martin Brown

Vote

J

John Larkin 16 years ago

I was thinking that each would have an MMU (a real MMU, with serious privilige categories, not an Intel toy) that was controlled by the OS CPU, not by the local one. DRAM is cheap, so dump virtual memory and make the world a better place.

Let a video card do that. PCs don't need speed all that much anymore. By major speed problem is that Windows has so much overhead, and everything slows down by *seconds* whenever Windows gets a little confused. Multiple CPUs would fix that and, in real life, be a lot faster. Ordinary people don't do computational fluid dynamics sort of stuff.

John

Vote

P

Paul Keinanen 16 years ago

If you are sharing the same RAM chips between multiple cores, you are still going to end up with a single (physical) address space.

Execution prevention as well as read only data pages has been done by TLBs in mid 1970's minicomputers, so this is not really anything new.

Of course, in a multi core system each core must have their own TLBs and must have a trusted method to set up these TLBs.

Having separate TLBs for each core is not so bad, since even now, some architectures have the TaskId as part of the virtual address, thus, a full TLB reload is not required during task switching.

Vote

M

MooseFET 16 years ago

Multiple cores will be able to do all of those things and more. There will be a large shared memory space to allow great gobs of data to be handed back and forth. This will be where one CPU can step on the output of another as it is being handed off to the 3rd and 4th. When running the multi-core version of Windows-9, there will still be crashes and the computer will still be just fast enough to run Freecell.

Thinking about doing something like a sort on a multicore machine with caches on each core has started me thinking about a bit of code I wrote a long time ago. It was a sort of files up in the megabyte size range when RAM was restricted to 48K of free space. The trick to making it go fast is to sort chunks that will fit into memory and then do a merge operation on the sorted chunks. I nested the merge operation partly within the sort to save one level of read-process-write.

Vote

J

John Larkin 16 years ago

Right. And if you dump virtual addressing, you don't need a gigantic number of mapping registers.

John

Vote

50 cores

Join the Discussion

Didn't find your answer?