Re: Intel details future Larrabee graphics chip

- U
- UltimatePatriot
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 8, 2008 4:58 PM

The Cell BE IS the current future.

VERY powerful. Ten times that of a PC in MANY areas. It will improve too.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 8, 2008 5:03 PM

For small N this can be made work very nicely.

Existing cache hardware on Pentiums still isn't quite good enough. Try probing its memory with large power of two strides and you fall over a performance limitation caused by the cheap and cheerful way it uses lower address bits for cache associativity. See Steven Johnsons post in the FFT Timing thread.

If it is anything like the development of OS/2 you get to see very bright guys reinvent things from scratch that were already known in the mini and mainframe world (sometimes with the same bugs and quirks as the first iteration of big iron code suffered from).

NT 3.51 was a particularly good vintage. After that bloatware set in.

CPU cycles are cheap and getting cheaper and human cycles are expensive and getting more expensive. But that also says that we should also be using better tools and languages to manage the hardware.

Unfortunately time to market advantage tends to produce less than robust applications with pretty interfaces and fragile internals. You can after all send out code patches over the Internet all too easily ;-)

Since people buy the stuff (I would not wish Vista on my worst enemy by the way) even with all its faults the market rules, and market forces are never wrong...

Most of what you are claiming as advantages of separate CPUs can be achieved just as easily with hardware support for protected user memory and security privilige rings. It is more likely that virtualisation of single, dual or quad cores will become common in domestic PCs.

There was a Pentium exploit documented against some brands of Unix. eg.

formatting link

Loads of physical CPUs just creates a different set of complexity problems. And they are a pig to program efficiently.

Regards, Martin Brown

** Posted from

formatting link

**

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 8, 2008 8:16 PM

Yes. Everybody thought they could write from scratch a better (whatever) than the other groups had already developed, and in a few weeks yet. There were "two inch pipes full of piss flowing in both directions" between graphics groups.

Code reuse is not popular among people who live to write code.

NT followed the classic methodology: code fast, build the OS, test/test/test looking for bugs. I think there were 2000 known bugs in the first developer's release. There must have been ballpark 100K bugs created and fixed during development.

Intel was criminally negligent in not providing better hardware protections, and Microsoft a co-criminal in not using what little was available. Microsoft has never seen data that it didn't want to execute. I ran PDP-11 timeshare systems that couldn't be crashed by hostile users, and ran for months between power failures.

So program them inefficiently. Stop thinking about CPU cycles as precious resources, and start think that users matter more. I have personally spent far more time recovering from Windows crashes and stupidities than I've spent waiting for compute-bound stuff to run.

If the OS runs alone on one CPU, totally hardware protected from all other processes, totally in control, that's not complex.

As transistors get smaller and cheaper, and cores multiply into the hundreds, the limiting resource will become power dissipation. So if every process gets its own CPU, and idle CPUs power down, and there's no context switching overhead, the multi-CPU system is net better off.

What else are we gonna do with 1024 cores? We'll probably see it on Linux first.

John

- D
- Dirk Bruere at NeoPax
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 8, 2008 10:25 PM

I was doing/learning all this stuff 30 years ago. We even developed a loosely couple multi uP system where each module had a comms processor, and apps processor and an OS processor. Back then all these problems had already been analysed to death, and solutions found (if they existed). The future of Intel/MS R&D ought to be reading IEEE papers from the 60s/70s

--
Dirk

http://www.transcendence.me.uk/ - Transcendence UK
http://www.theconsensus.org/ - A UK political party
http://www.onetribe.me.uk/wordpress/?cat=5 - Our podcasts on weird stuff

- C
- Chris M. Thomasson
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Fri, Aug 8, 2008 11:57 PM

One point:

RCU can scale to thousands of cores; Linux uses that algorithm in its kernel today.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 3:53 PM

This won't bother *nix class OS's They have been scaled past 10 thousand cores already. Other OS are on their own.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 3:58 PM

At that point you should integrate them directly into the display. Then you could get to get to giga core systems.

That reminds me of an article / paper i once read about Cache Only Memory Architecture (COMA). Only they did seem to be able to get it to work though.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 4:02 PM

OK. How do you deal with I/O devices, user input and hot swap?

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 4:09 PM

I/O and user interface, just like now: device drivers and GUI's. Just run them on separate CPUs, and have hardware control over anything that could crash the system, specifically global memory mapping. There have been OS's that, for example, pre-qualified the rights of DMA controllers so even a rogue driver couldn't punch holes in memory at random.

But hot swap? What do you mean? All the CPUs are on one chip.

John

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 4:15 PM

hardware shift register,

is a floppy controller

acceleration.

set,

processor

Why would it? The design could also use hundreds or thousands of dedicated I/O controllers. If you want to talk about real bottlenecks look at memory and data bus limitations.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 4:28 PM

hardware shift register,

is a floppy controller

acceleration.

set,

you

processor

A lot of hardware sorts of stuff, like tcp/ip stack accelerators, coule be done in a dedicated cpu. Sort of like using a PIC to blink an LED. Part of the channel-controller thing was driven by mot wanting to burden an expensive CPU with scut work and interrupts and context switching overhead. All that stops mattering when cpu's are free. Of course, disk controllers and graphics processors would still be needed, but simpler ones and fewer of them.

Multicore is especially interesting for embedded systems, where there are likely a modest number of processes and no dynamic add/drop of tasks. The most critical ones, like an important servo loop, could be dedicated and brutally simple. Freescale is already going multicore on embedded chips, and I think others are, too. The RTOS boys are *not* going to like this.

John

- R
- Robert Myers
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 4:30 PM

mmhmm.

Bandwidth per flop is headed toward zero.

Robert.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 4:36 PM

hardware shift register,

is a floppy controller

acceleration.

set,

you

processor

What bottlenecks? Most PC's have speed to burn. What they don't have is security, reliability, or simplicity. But more cpu's, each with a little local ram, surrounding a shared cache, have got to be more efficient than a single CPU thrashing between 60 or so processes.

Or maybe things will never change, just like they never changed in past years.

John

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 4:52 PM

Yeah, to people with broadband. Back when XP SP2 came out i was still on dial up, MS send me a CD for free. Consider costs like that before spouting.

Why virtualize them? I can have them physically. Of course M$ PC style software still cannot use them efficiently. Nor can they use

64-bit effectively and usually make poor use of SSE, SSE2 etc.,

Mostly due to MS-DOS and follow ons style group think. We have a generation of programmers that never learned partitioning properly.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 5:03 PM

I have run compute bound stuff on a PC that took hours (about 5 1/2 to run) and i wrote it myself. It was clean and efficient, just compute bound. I tried it on a recent machine, took about 10 minutes. Yet the general performance of the general PC application on the typical PC seems to have no performance improvement for the past 10 years. What do you think is the cause?

We have already seen it on Linux, in the form of parallel supercomputers. With more cores as well.

- J
- John Larkin
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 5:20 PM

A given program will run far faster on modern iron. But modern apps have mostly factored increased cpu speed and memory into their designs, and bloated up to match.

formatting link

"First, some words about the meaning of "kernel". Operating Systems can be written so that most services are moved outside the OS core and implemented as processes.This OS core then becomes a lot smaller, and we call it a kernel. When this kernel only provides the basic services, such as basic memory management ant multithreading, it is called a microkernel or even nanokernel for the super-small ones. To stress the difference between the

Unix-type of OS, the Unix-like core is called a monolithic kernel. A monolithic kernel provides full process management, device drivers,file systems, network access etc. I will here use the word kernel in the broad sense, meaning the part of the OS supervising the machine."

Most popular os's (Win, Linux, Unix) are big-kernel designs, to reduce inter-process overhead. That makes them complex, buggy, and paradoxically slow.

John

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 9, 2008 5:48 PM

On a sunny day (Sat, 09 Aug 2008 10:20:40 -0700) it happened John Larkin wrote in :

Just to rain a bit on your parade, in the *Linux* kernel, many years ago, the concept of 'modules' was introduced. Now device drivers are 'modules', and are, although closely connected, and in the same source package, _not_ a real pert of the kernel. (I am no Linux kernel expert, but it is absolutely possible to write a device driver as module, and then, while the system is running, load that module, and unload it again. I sort of have the feeling that your knowledge of Linux, and the Linux kernel, is very academic John, and you should really compile a kernel and play with Linux a bit to get the feel of it.

Unix has been around decades, got more and more perfectioned, Linux and BSD are incarnations of it.

There was some old saying that went like this (correct me hopefully somebody knows it more precisely): "Those who criticise Unix are bound to re-invent it'.

- B
- Bill Todd
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 10, 2008 9:58 AM

the same

is very academic John,

Er, the discussion that John quoted above referred not to what is compiled with the kernel but to what executes in the same protection domain that the kernel does (as it is my impression Linux modules do). Perhaps John is not the one who needs to develop a deeper understanding here.

- bill

- J
- Jan Panteltje
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 10, 2008 10:38 AM

On a sunny day (Sun, 10 Aug 2008 05:58:13 -0400) it happened Bill Todd wrote in :

the same

kernel, is very academic John,

He mentioned 'monolithic', and with modules, the Linux kernel is _not_ monolitic. You can load a device driver as a module (after you configured it to be a module before compilation, the kernel config gives you often a choice), and then that module will even be dynamically loaded, including other modules it depends on, and unloaded too if no longer used (that device). This keeps memory usage low, and prevent that you need to reboot if you add a new driver.

As to 'protection domain' be aware that even if you were to run device drivers on a different core (one for each device???) then you will still have to move the data from one core to the other for processing, and how protected do you think that data is? It is all illusion: 'More cores will solve everything.'. I wonder how many here actually use Linux, compiled a kernel, wrote modules and applications, and even can write in C. I'd rather have a discussion with them, then the generalised bloating about systems they never even had hands on experience with. In that case sci.electronics.design becomes like sci.physics, bunch of idiots with even more idiotic theories causing so much noise that the real stuff is obscured, and your chance to learn something is zero. This is my personal rant, I am a Linux user, written many applications for it, did some work on drivers too. Academic bullshit I know about too, in my first year Information Technology I found an error in the text book, reported it, professors do not always like to be corrected, I learned that. There was a project that you could join, about in depth study of operating systems, and, since I actually wrote one, I applied for the project, was promptly rejected. Where did those guys go? Microsoft?????? I will listen to John Larkin's theory about how safe multicore systems are after he writes a demo, or even shows someone else's that cannot be corrupted. Utopia does not exist.

- C
- Chris M. Thomasson
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 10, 2008 11:04 AM

That's a great point! It just seems that the approach could possibly be beneficial to all sorts of applications. Could you help me out here and give some examples of a couple of applications that simply could not tolerate the approach at any level? When I say any level I mean allocations starting at lowest common denominator from it orgin... This being trying thread local heap, then core local heap, and so on and so forth...

I see problems. Well, with mega-core systems, the per-core memory is going to be limited indeed! Its analogous to programming a Cell with its dedicated per-SPE memory; something like 256 kb. When the local allocation to a SPE is exhausted, well, DMA to the global memory is going to need to be utilized. I know this works because I have played around with algorithms using the IBM Cell Simulator.

formatting link

programming the Cell is VERY FUN!!!!