a dozen cpu's on a chip

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, May 14, 2008 4:15 PM

I don't know why you think I am a Windows fan. We are for the time being stuck with a situation where backwards compatibility with old Intel x86 code and Doze (and before that with 8080 code) has determined the future because the alternatives were generally too painful to contemplate.

The desktop PC industry is stuck in that legacy straight jacket. Games machines and home entertainment devices are less constrained.

The Z8000 sank without trace despite Olivetti adopting it. TIs 99k had a similar fate (I think) or did military usage keep it in production?

Computing hardware is now pretty good. Would that software design and development processes were anything like as robust as hardware ones.

I would like to see thread level hardware support appear in CPUs, but I see no reason why the 4 ring process privilege structure that served the VAX so well is inadequate for the humble desktop PC. A multitasking OS can be robust if it is designed correctly with that objective.

Your attempt at irony falls flat.

Regards, Martin Brown

** Posted from

formatting link

**

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, May 14, 2008 6:50 PM

In most systems most tasks are idle - but occasionally you will need to do something that takes a *lot* of CPU power. E.g. media transcoding, recompiling the linux kernel, FPGA synthesis, video editing, ray tracing.

Your "lots of little cores" scheme is not much use in this scenario, since all the near-idle tasks could have been done in a single multitasking core anyway, and the (likely single) heavy duty task runs

*really slowly* on one of your little cores.

What is actually needed is a single core that is just about as fast as possible. Only when it runs up against practical limits is it then worth going to 2-core, 4-core etc. And you can then rewrite the "heavy duty task" to split itself over multiple cores. But the most efficient scheme would still be to have all the lightweight tasks on one of the cores, and the single heavy duty program spread over the remainder.

And this is what we have.

--

John Devereux

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, May 14, 2008 7:27 PM

once ;-) ),

Three, actually. One for the 6800, one for the PDP-11, one for the LSI-11. All were pre-emptive multitaskers with rather general i/o architectures. The worst was the 6800; it only has two 8-bit accumulators and couldn't even push/pop the index register; I wrote that one in longhand, in two weeks, in Juneau Alaska, and mailed it back to the folks back hone, in sections, so they could type it in and run it. It had one bug.

We did a color graphics display system for the PDP-11 version, back when color graphics was a novelty, to display stuff sloshing through pipelines (real pipelines, with valves and pumps.)

Nowadays my embedded products just use simple state machines; haven't really needed an rtos in years. We may do an embedded Linux thing soon, because we want to embed a mini-ITX processor board and do a couple of gigabit ethernet links. Two processors could be interesting here.

As I said, transistors are free.

mind

Nice song, but he died of a heroin overdose, like so many other artistic types.

We are having a fierce heat wave. It's 65F already, barely past noon.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, May 14, 2008 8:10 PM

Of course this is what we have. But what will we have 10 or 20 years out?

The GHz race is slowing down to a standstill; everybody is going to more cores to get more mips per chip. The new PR race will be for number of cores, not GHz.

FTTH is coming; soon a good hunk of the population will have gigabits per second pouring into their houses.

Nanometer geometries are happening, but still with UV lithography. So yields are going to suffer, and yields on a single-core CPU with a few billion transistors won't be great.

Heat sinks probably won't get much better.

So, things will stay the same?

John

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 1:54 AM

On May 14, 12:27 pm, John Larkin wrote: [....]

once ;-) ),

I'll jump in here to say that I've written one for the 8051 and one for a PDP-8.

The PDP-8 one was many years back. The PDP-8 doesn't know what a push or pop or even a call or return is. This made it require a lot of strange rules be applied to the tasks.

[....]

It is really too bad that the small PICs don't have some way to have memory accessed from the outside via a port or some sort of DMA. There have been several times I have thought that it would be nice to unload the keyboad scanning, LCD controlling, ADC rattling out from the main CPU in something.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 2:53 AM

once ;-) ),

The 8 was sort of a dog; it probably ruined the minds of a generation of programmers. Rick Merrill wrote the FOCAL language interpreter for it, a complete editor, user program buffer, runtime system, floating point, formatted print, all that in 4k 12-bit words. He synthesized a lot of "modern" stack constructs to work arould the 8's klunkiness. I've met Rick, very nice guy.

What you need is multiple processors!

John

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 8:08 AM

He also gets a very nasty I/O contention bottleneck problem thrown in for free. If the prime objective is to waste as much power and silicon with no useful payback then it would be a near perfect design!

In this particular case for desktop PCs we have reached a point where the available hardware is more than adequate for most peoples needs. Word processing and basic image manipulation doesn't really tax modern kit - even video editing is easy on a PC now.

It can run HDTV or games at insane frame rates - we may see more 3D capable display hardware for gamers which will double the bandwidth requirement and that is probably the endgame - at least until some killer consumer app requiring an order of magnitude more CPU comes along.

In case you haven't noticed PCs have now become mass consumer items sold just like TVs, fridges, ovens. And they do not have to obey Moores law (even if bleeding edge kit continues to get faster).

Ever more memory is guaranteed until 4GB is standard and there may then be a clean break when 64bit OS's come of age but the hardware is now good enough to satifsy most users needs as it is.

So far there are no killer apps that force a migration to 64bit OSs. I am tempted for running some chess puzzles - having 64 bit registers makes for very fast 8x8 bitboard implementations.

I doubt that. Having more than 4 cores for a general purpose mainframe has been the downfall of major players in the past. I expect 4 or maybe at a pinch 8 cores to be the practical limit in normal domestic PCs. (and don't be an early adopter of N>4 unless you enjoy lots of strife)

Unless you have software that is designed to run efficiently on multicore CPUs you just end up with very expensive wasted resources.

If there is a new PR race I am more inclined to think it will be along the lines of effective MIPS/W with CPUs that shutdown or throttle back when demand is low rather than galloping through the idle task at 3G instructions per second. Portable devices capable of lasting many days on a single recharge and streaming video for instance. ARM already have some potent offerings in this market segement.

But what will they do with it? You can quickly fill a TB disk, but you can only watch HDTV at realtime human compatible speeds.

Single core has reached its limits for now. Dual core and quad core will certainly improve things for those of us that hammer PCs and gamers (but the latter also need exotic dedicated display coprocessor hardware).

They probably will not evolve so rapidly as in the recent past. Memory getting cheaper means is will soon be possible to populate the entire

4GB address space of 32bit Doze (minus gaps). Disks the same. We will see faster solid state magnetic and flash drives and maybe other memory technologies make it to market that are still toys in the lab.

I don't think things will stay the same, but I don't think your idea of a processor for every task is even remotely where things are headed.

Regards, Martin Brown

** Posted from

formatting link

**

- J
- John Devereux
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 8:14 AM

will

No, I agree there will be a move to more cores.

You seemed to be saying that the software complexity / reliability problem could be solved by putting every process on a separate core. I don't see the number of cores as being relevant to this. Current designs already have available mechanisms for isolating processes as much as desired. And there will always be a need for processes to communicate, so the problems of synchronisation, deadlocks and resource contention remain whether or not you have a process-per-core.

Essentially, a massively multicore chip will be under-utilised if the software only runs one task per core - unless the cores are so underpowered that they cannot run the few computationally intensive tasks well.

--

John Devereux

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 1:17 PM

do once ;-) ),

The 8 used no more transistors than absolutely needed. It was for quite a while, the most common computer in the world. I always use the CDP1802 as the low end for my comparisons so I say that the PDP-8 was a nice machine.

[....]

In a lot of cases a fast serial bus linking multiple chips would be good. The keyboard has a lot of leads on it. It would be nice to reduce it to only two signals.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 1:50 PM

will

"As desired" is the key phrase. Most of the industry seems to think that means "pretty well, usually, it only crashes once a week or so". I want it to mean "absolutely, it can't lock up." Programmers need to be protected against themselves, ans the only real protection is hardware.

And there will always be a need for processes to

So *design* all that into the hardware. There's certainly less resource contention if there's lots of CPUs available, each with some local memory and code cache.

Nanometer transistors are fast and free. Quit worrying about keeping them busy so we can get past using 50-year old concepts in a multitask, gigabit world. The majority of guys here are adamant that things will never change, a pretty radical position for engineers to take. Newsgroups seem to attract that type.

At this moment, my PC is running 389 threads, on one CPU.

John

- P
- panteltje
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 5:24 PM

Actually they are not, those 80-cores will be difficult to make (yield), look at all the stuff IBM had to throw away with Cell, that is why the Sony PS3 has one peripheral processor less enabled in the Cell. The 100% chips were too expensive,

Unix has indeed a very long history, and has pretty much matured. The techniques work.

mm you keep sticking that in every bodies mouth, but when I asked how you would spread a monolithic resources sucking application over 'n' CPUs you remained silent. And that is one issue. The other one you conveniently forget is that, if each core has its own memory, where is the overhead in moving data... sync. etc.

You _STILL_ do not see what is happening. Let me try to explain, and here I will use the program 'top', something you will get familiar with soon if you are going to use Linux.

'Top' just type it in a terminal (xterm, rxvt), is you, on top of the world, looking at what happens on your system, let's do it:

Tasks: 103 total, 4 running, 99 sleeping, 0 stopped, 0 zombie Cpu(s): 1.7% us, 6.3% sy, 40.5% ni, 49.2% id, 0.7% wa, 0.3% hi,

1.3% si Mem: 385964k total, 380468k used, 5496k free, 53952k buffers Swap: 499992k total, 536k used, 499456k free, 147596k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

22033 root 35 19 13684 9304 1504 R 21.5 2.4 12:16.19 mcamip 22034 root 35 19 24424 11m 1392 S 20.2 3.1 11:11.65 ffmpeg 3137 root 14 -1 77396 25m 2904 S 1.7 6.8 26:12.98 X 2940 root 18 0 3052 1468 980 S 0.3 0.4 0:42.65 sh 11984 root 15 0 2228 1120 832 R 0.3 0.3 0:00.08 top 31154 root 20 5 43864 2432 1836 R 0.3 0.6 0:04.40 xdipo 1 root 15 0 1908 648 552 S 0.0 0.2 0:00.58 init 2 root 34 19 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/ 0 3 root RT 0 0 0 0 S 0.0 0.0 0:00.00 watchdog/ 0 4 root 10 -5 0 0 0 S 0.0 0.0 0:00.11 events/ 0 5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper 6 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 29 root 10 -5 0 0 0 S 0.0 0.0 0:00.04 kblockd/ 0 30 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 93 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 114 root 15 0 0 0 0 S 0.0 0.0 0:01.94 pdflush 115 root 10 -5 0 0 0 S 0.0 0.0 0:01.35 kswapd0 116 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/ 0 117 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 jfsIO 118 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 jfsCommit 119 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 jfsSync 120 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 xfslogd/ 0 121 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 xfsdatad/ 0 288 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 292 root 10 -5 0 0 0 S 0.0 0.0 0:00.03 reiserfs/ 0 367 root 20 -4 2092 844 340 S 0.0 0.2 0:01.23 udevd

Now I stopped the cut and paste here, because, as you can see, all the lower processes use ZERO CPU cycles, (so they sleep), only mcamip and the ffmpeg h264 encoder use some CPU cycles. So would it make sense to assign all those other 102 processes to a separate CPU core? NO, because it would only create more overhead, as the different cores would use their own memory and now you have to move data all the time between cores.. while doing it in task switching uses the same memory. Even in the most demanding multimedia (and somebody wrote today that with multimedia the PC box is like a dishwasher, just a consumer application), would at most use 4 or 5 demanding CPU intensive threads... The rest can and should be done in one core task switching. This is actually what Sony does in the PS3. And in cases where you may think: Hey I will run a webserver and everybody who connects gets his own CPU (Apache thread), well, with 80 cores you would very soon get the message: No more cores available, try later :-) So you need the task switcher anyways,

So, as you intent to go and use Linux, the following commands you will need to type and understand: top ps netstat route ifconfig vmstat

Feel free to ask me about it, there are actually thousands more.

- P
- Phil Hobbs
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 6:09 PM

The whole reason for multicores is that single processor performance has plateaued. You can't run a much higher clock speed or your chip will turn to lava due to the high Vdd required. Transistor scaling has stopped--as the litho dimensions shrink, you get lots more, __slower_and_leakier__ transistors. Sub-65 nm transistors stink.

Going to higher and higher uniprocessor complexity (deeper pipelines, more speculative execution, and so on) has bogged down because at some point, you have to know so much in advance about what code path your program will take that there's no point in running it.

So uniprocessor performance isn't getting much better with time, and that isn't going to change soon. What else is there to do but build multicores? You can get amazing performance on some things with a big SMP, and if they aren't being used, turn 'em off to save power.

As far as dedicating a core for this and a core for that, last time I looked my desktop machine had over 1100 threads running, on a dual-processor Xeon box with XP. Multicores will be multitasking for awhile.

Also don't forget the old saw that "Intel giveth and Microsoft taketh away." I have every confidence that MS will find a way to make a

64-core machine feel like an 8088.

Cheers,

Phil Hobbs

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 8:46 PM

Famous quotes from Bill Gates:

Nobody needs more than 640K

The Internet isn't going anywhere

Famous quote from Ken Olsen:

Nobody needs a computer at home.

So, speculate.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 9:24 PM

If you want 250 cores, build 300 and use the 250 that work. So a chip can have 50 defects and you can still sell it. Or build one giant CPU on the same silicon and toss it if has a single defect.

I think AMD is already selling multicore processors that have one bad core. Xilinx sells FPGAs with bad cells, but that work for a given compiled design.

I already suggested that a few of the cpu's could be floating-point monster number crunchers, and most could be dumber, slower integer machines. A TCP/IP stack doesn't need much floating point power.

They'd surround a shared cache. They wouldn't bother the common cache when they execute out of local cache, or when the use the small local stack and variables rams. That makes the shared cache much more efficient, since it not being invalidated by a lot of unnecessary traffic.

One thing I've always thought that CPUs should have is hardware task switching, a register that declares which task or thread the core is running. That would instantly remap everything... the registers, the memory mapping, everything. That would make context switching have zero overhead, and allow full hardware protection. But nowadays, one might just as well have multiple cores. That would be faster, and avoids some cache efficiency and pipeline issues.

It is interesting that 100% of the responses to my posts have been destructive, and none additive. I sure hope you guys don't actually work that way.

John

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 9:43 PM

In that case, my Holy Grail is revised to "have one of the cores run the OS only, and it shall be 100% hardware protected from all the others."

Or just have 4096 cores?

John

- R
- Rich Grise
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, May 15, 2008 10:25 PM

For some reason, this made me think of The Connection Machine:

formatting link

ISTR reading about it, and the bottleneck was that nobody knew how to write software for it. Somewhere in the article somebody sayd, "Hey, we've got hundreds of undergrads who will work for free..." Which, I presume, didn't help much. ;-)

Cheers! Rich

- R
- Robert Baer
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, May 16, 2008 8:10 AM

** Do not think so..each CPU "needs" to be fed; the amount of cache (for each one) makes no difference in the amount of instructions and data that each one demands - only the timing. Double the "cores" means double the needs...

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, May 16, 2008 8:12 AM

That was clearly false even when he said it, although at the time memory came in small chunks and populating all 640k was *very* expensive.

Lets turn the question around for a moment. What is a home user going to do on a PC that cannot be done on a system already capable of decoding HDTV broadcasts and rendering a virtual world at TV frame rates?

Home entertainment systems and PCs will likely converge as digital media become increasingly dominant.

Did Bill Gates *actually* say this and in what context? Reference please.

In most places where this exact statement occurs the meaning was "The Internet is here to stay". I think he did said something silly about the Internet in the pre WWW era but I don't think those are his words.

OTOH he did get $1 license fee off IBM for every IBM PC sold (IBM failed to anticipate the size of the market) and the rest is history.

I already have and in this thread too but you were too busy cut & pasting adverts for newly launched multi-core research chips to notice.

You are begining to take on the air of a netkook harping on how a CPU per thread will solve all problems and bring sweetness and light to the world.

Regards, Martin Brown

** Posted from

formatting link

**

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, May 16, 2008 10:25 AM

On a sunny day (Thu, 15 May 2008 14:24:57 -0700) it happened John Larkin wrote in :

Well, how much you get from a wafer... and you still need to test 300, testing is complicated even with single cores, takes time, configure the chip... Intel needs you!

Yes I have heard about 3 cores, have not used or seen any though.

If I understand this right (but it is no answer to my question), then that is what we already have. For example the Cell

formatting link

has one Powerprocesor core and 8 SPU. This year, quote: 'In May 2008, IBM introduced the high-performance double-precision floating-point version of the Cell processor, the PowerXCell 8i[16], at the

65 nm feature size. ' Why do I carry on about Cell? At least I know something about that processor. But also notice the size of the chip, in the picture in the above link. Now imagine 80 cores, add some FPU real estate too... Even at 45 nm the 80 core should be huge = expansive.

Well, like I said, Intel needs you :-)

mm There are many architectures, I dunno all of them.

I noticed Mr Brown's remark to you, and I second it:

You are beginning to take on the air of a netkook harping on how a CPU per thread will solve all problems and bring sweetness and light to the world.

Some other aspects, you likely remember the 'transputer', it died.

When you write a threaded program, the thread can access the same memory (say variables) as the main routine. If however the thread was to run on a different CPU with its own memory, then you'd have to move data (copy memory). Look at the Element Interconnect Bus in the Cell link above. I think the English expression is 'bottle neck' for such data paths between the different cores. What it in reality boils down to, is that one will have to make very conscious choices what to run on what core, when and how to transfer data between the different units, something that could perhaps not be done by a compiler or OS without guidance from a designer.

So, to put it simply, many cores (> 10, 80, whatever) may have no practical advantage, even if some of those cores are only FPU, pipelines need to be filled, prefetches, the whole thing needs to be synchronised. It _REALLY_ is _VERY_ complicated. Especially for a 'general purpose' application and computer. For a media box: demodulation, decryption, decoding, graphics, can simply in many cases be assigned to a few cores, but after 6 ???

And those PCs will be media boxes, things like VOIP are small things that can run in the background too.

I am already way out here, sure, but the guys do HD editing on an Intel dual core, fast enough, real time.

You do not see any washing machines with 200 HP engines either...... So unless some new application surfaces, 10 cores should be enough? I put the question mark because of Bill Gates '640kB is enough for everybody'. But he never had vision! He even gambled wrong on Blu-Ray. Good salesman though.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, May 16, 2008 10:44 AM

Unless and until there is software to efficiently exploit large processor clusters for general purpose use it doesn't matter.

Neither does the core kernel for an operating system. Your model serves only to waste silicon real estate and electrical power to no good end.

Want to bet?

The fastest way to bring your "uncrashable" independent CPUs with shared common memory model to its knees would be to set a few small tasks running flat out in several cores allocating and deallocating memory at random and hiting it with read/writes at worst case strides for the cache. The OS would still run but its performance would be dire.

There are CPUs coming along (in production?) with hardware support for threads and context switching. And that does make good sense. Some extra hardware support for memory allocation and garbage collection might also be handy but is not mainstream.

But create all sorts of other I/O bandwidth bottlenecks that you conveniently gloss over in your hazy rose tinted view.

That is because your idea would not work as you intend and you are completely deaf to any criticism.

The research work at Intel is on speculative multi-threading and other methods to allow multicore hardware to deliver real world performance increases in the future - a short review online at:

formatting link

And this is a very long way from your naive CPU per thread world view.

Regards, Martin Brown

** Posted from

formatting link

**