Re: The Age of Crappy Concurrency: Erlang, Tilera, Intel, AMD, IBM, Freescale, etc...

In article , "Del Cecchi" writes: |> |> If it is such a great idea, why can't he convince some venture |> capitalists it is so?

Now, THAT's unfair. Persuasiveness and inventiveness very rarely occur in the same person.

However, let's stick to the technical aspects, as we are better at those - well, I am :-) To describe the author of that as an idiot is unfair on the vast majority of idiots. This is actually SO bad that I find it amusing. Let's select some of the points he has highlighted.

I'll get right to the point. If your multicore CPU or concurrent programming language or operating system does not support fine-grain, instruction-level parallelism in a MIMD (multiple instruction, multiple data) environment, it is crap.

OK, I'll buy that. I have been saying something similar for years, after all, except that I probably don't mean what he does. I will post a SEPARATE message on the points I agree with him.

Fast, fine-grain, instruction-level parallelism using MIMD. What is the point of parallelism otherwise? ...

Exposure of ignorance, plus refusal to learn. 'Nuff said?

Easy software composition. This means a graphical interface for non-algorithmic programming, among other things. It also means plug-compatible components. Just drag'm and drop'm. No more (text-based) computer languages, por favor!

Ditto. That has been a hot research topic for 30+ years and, like the semantic analysis of natural languages, the more that you learn, the less of the problem that you even understand. It's actually a deep psychological problem as well as an IT one.

Deterministic timing of events at the elementary operation level. Deterministic temporal order is a must for reliability. This is possible only by adopting a convention whereby all instructions (elementary operations) are parallel reactive processes and have equal durations based on a virtual system-wide clock.

Ditto. Mathematicians can prove both that it is inherently not scalable and that it it isn't enough, and engineers have confirmed that the mathematics matches reality.

Automatic resolution of data dependencies. This eliminates blind code and otherwise hidden side effects of code modification. It is an essential ingredient for reliability. It can only be done using a reactive synchronous software model.

Ditto. Both mathematicians and engineers can witness that it helps with only one cause of such problems, there are other known approaches to reliability, and his pet technique doesn't provide it anyway.

Impregnable security. This is possible only in a system that enforces deterministic timing.

Ditto. Both mathematicians and engineers can witness it is twaddle.

Regards, Nick Maclaren.

Reply to
Nick Maclaren
Loading thread data ...

I said that I would post the points I agreed with. Well, excluding trivial ones, there are two:

If your multicore CPU or concurrent programming language or operating system does not support fine-grain, instruction-level parallelism in a MIMD (multiple instruction, multiple data) environment, it is crap.

He's right. The current hardware, operating systems and languages make it artificially difficult to do this, though there are a few signs of slight improvement. It should be possible for an ordinary, unprivileged application to manage, schedule, synchronise, suspend, restart, communicate with and otherwise control all of the cores/ microthreads that it has been provided with. WITHOUT needing to ask the operating system to do it.

Basically, unless it becomes possible for ordinary application developers to experiment with new, very low-level paradigms, we are unlikely to see much progress. Better support would not solve the problem, but would enable research and development.

However, DESIGNING architectures and programming languages to make that possible is damn hard - you can make everything deterministic and defined, and run like a drain - or you can leave everything unspecified, and make the design unusable. Designing a worthwhile compromise needs a great team, and the lack of salesdroids riding on their backs.

Implicit parallelism at the design level, ...

He's right there, too. You can't bolt parallelism onto a serial design and expect either good RAS or good performance. But we've known that for at least three decades.

Regards, Nick Maclaren.

Reply to
Nick Maclaren

In article , snipped-for-privacy@cse.ucsc.edu (Eugene Miya) writes: |> In article , |> snipped-for-privacy@netscape.net wrote: |> > But, that's impossible. Since the idiots who invented MIMD, |> > where idiots with atomic clocks, not people with computers. |> |> No person "invented" MIMD. Mike Flynn at Stanford described the idea of |> multiple independent streams before any existed during the time of early |> time sharing. Mike has far from an idiot and perhaps one of the |> brightest guys in the field when he retired. All he did was partition |> the language (English).

MIMD processing has been used for large scientific calculations for some centuries. All the computers to is to implement the old manual techniques, just a little faster and on a larger scale :-)

Regards, Nick Maclaren.

Reply to
Nick Maclaren

No person "invented" MIMD. Mike Flynn at Stanford described the idea of multiple independent streams before any existed during the time of early time sharing. Mike has far from an idiot and perhaps one of the brightest guys in the field when he retired. All he did was partition the language (English).

Erlang: boy, that's a name I've not heard in a while.

Reply to
Eugene Miya

ahahaha... Eugene, you are responding to a stupid usenet robot. zzbunker has been messing around on the newsgroups for years. Read zzbunker's autistic-sounding response again. It makes no sense, does it? Almost all of its replies have the word "since" in it, no originality. Text-based or language-based AI, also known as symbolic AI, is the the worst possible way to research AI but zzbunker's creators never seem to get the point. Every once in a while, it can be very funny. ahahaha... AHAHAHA... ahahaha...

Reply to
Traveler

In article , snipped-for-privacy@cse.ucsc.edu (Eugene Miya) writes: |> |> Ah Mr. Maclaren..... Going to be in Reno?

I doubt it, I am afraid.

|> The arguable adjective in your response is "large." |> Your key second word is scale. As Mike published his basic ideas |> in 1966 thru the early 70s I would say that "large prior to the mid-20th |> century" were really pretty small scale for that. The argument for |> things like trig tables as big scale was that their algorithms are |> comparatively simple. They blow up in fairly easily understood known ways.

Yes and no. A fairly reliable source once told me that some of those calculations were done by hundreds of inter-relating groups/people, which scale wasn't reached in practical computing until the late

1970s, perhaps even early 1980s. And it wasn't just tables we were discussing, which I agree are embarassingly parallel. But I failed to track down an authoritative reference.

One big difference is that the manual MIMD could rely on a skilled human to do the overall control and error recovery, whereas designing for computers needs to produce a precise algorithm.

Regards, Nick Maclaren.

Reply to
Nick Maclaren

Ah Mr. Maclaren..... Going to be in Reno?

The arguable adjective in your response is "large." Your key second word is scale. As Mike published his basic ideas in 1966 thru the early 70s I would say that "large prior to the mid-20th century" were really pretty small scale for that. The argument for things like trig tables as big scale was that their algorithms are comparatively simple. They blow up in fairly easily understood known ways.

When I took a job in the building which had the ILLIAC wing I worked with ex-ILLIAC IV guys (still here) who used MIMD and "concurrent" specifically rather than "parallel". I think they were splitting hairs. Half the time. The other half now I agree with them. Those guys did by in large still only really did SIMD computing. Their problem with singularities was and is still quite small (otherwise vector processors would not have been as successful either). I think the ICL DAP guys would go along with that.

Really embarrassingly, easily decomposed MIMD can luckly happen on the web say in places where power is cheap and those special places where people are willing to pay for it is still fairly small scale. Scale here might be say 8-9 orders of magnitude (not my figure).

I think Mike quit at the right time. I recall after I gave a seminar for him on performance measurement that he smiled when I cited Muybridge's work which took place on another part of the Stanford campus. He told that class of students to go by and see the monument to that work. Stanford was really lucky to have him.

People are only confusing a description with a prescription and natural languages are mostly just conventions. The users define a language not the language lawyers.

Reply to
Eugene Miya

You must have half or more of comp.arch rolling on the floor laughing, because you can could say substitute my name for his, but neither of you openly posts like most of the rest of the c.a. respondents with our real names.

We'll remember that the next time c.a. gets together for dinner.

Reply to
Eugene Miya

Pity. I'm only there 2 separate days. I have to come back down to the Bay Area and convince the founding chair to come up for 1 day.

Oh yes, that kind of thing started in the small scale with guys like Kepler (merely an era place holder) and on to the 100s (of correspondents) by the 1800s. You need a reference?

There's (as you realize) quite a number of click work experiments in addition to the SETI@home/proteinfolding@home such as a colleagues Martian crater counting work. The width and independence of the instruction stream. You will get IBM people like Lynn who are happy with large scale job shopping but that doens't help the high end fine grain data interested types (Dennis/Arvind). None of that helps some of automated decompossition problem (just had to do a Wikipedia session).

Reply to
Eugene Miya

[...]
[...]

Humm... I can offer an abstraction over some very low-level distributed multi-threading paradigms, however, its obviously not that popular because I have only managed to license it to a mere handful of interested clients... Luckily for me, those that actually used it, ended up liking it rather quickly.

Oh well...

Reply to
Chris Thomasson

In article , "Chris Thomasson" writes: |> |> > Basically, unless it becomes possible for ordinary application |> > developers to experiment with new, very low-level paradigms, we are |> > unlikely to see much progress. Better support would not solve the |> > problem, but would enable research and development. |> |> Humm... I can offer an abstraction over some very low-level distributed |> multi-threading paradigms, however, its obviously not that popular because I |> have only managed to license it to a mere handful of interested clients... |> Luckily for me, those that actually used it, ended up liking it rather |> quickly.

At a guess, I am talking about a lower level than you are. I really am talking about enabling application-controlled parallelism on the scale of 5-10 machine instructions. And I don't mean overhead - I mean that you can parallelise 20 instructions and get them to run in the time of 15.

My analysis, and that of some other people, is that you need to get down there to make use of the accessible parallelism in a lot of existing, serial code. Now, there is another viewpoint that doing that is a stupid idea, and the only sane approach is to redesign, which may well be correct.

I think that it could be done - but not starting from here. But I do believe that, even starting from here, we could get down to a fairly small scale. What we CAN'T do is to do that in UNPRIVILEGED code using existing hardware architectures and operating system facilities. And it is that which I feel needs attention.

Regards, Nick Maclaren.

Reply to
Nick Maclaren

Okay.

In the case of traversing a dynamic linked data-structures, I would like to batch-up and parallelize a plurality of loads to the list anchor(s) and/or the "next" pointer(s), and perhaps pre-fetch a number of nodes ahead to "attempt" a possible reduction of memory latency. I guess you could bind the group affinity to a group of processors, and only operate on its local resources, think NUMA here. Something like Tilera allowing to segregate groups of processors from running certain code. I would like to be able to pragmatically compse the cache-coherency semantics of a multi-core chip at OS boot (e.g., 64+ cores) into small groups of 2-4 processors, which are adjacent to each other, into ccNUMA semantics wrt the groups per-core-memory, and connect the groups together using a relaxed NUMA model in the sense that a group of processors is remote entity.

I guess you could focus FPGA/chip-programming to create groups of cores and dramatically weaken, or strengthen the cache-coherency on a per-group basis.

It seems extremely complicated to get a practical design.

Reply to
Chris Thomasson

[...]

When are the chip vendors going to be making their own memory? Anybody trying to integrate a fairly large amount of physical memory and multi-core chip arrays into a single entity? Programming model is distributed NUMA, simple?

:^0

Reply to
Chris Thomasson

Would the parallel/pipelined computer (all female, it appears) system at Los Alamos that Feynman describes in his book apply?

Jan

Reply to
Jan Vorbrüggen

In article , =?ISO-8859-1?Q?Jan_Vorbr=FCggen?= writes: |> |> > Yes and no. A fairly reliable source once told me that some of those |> > calculations were done by hundreds of inter-relating groups/people, |> > which scale wasn't reached in practical computing until the late |> > 1970s, perhaps even early 1980s. And it wasn't just tables we were |> > discussing, which I agree are embarassingly parallel. But I failed |> > to track down an authoritative reference. |> |> Would the parallel/pipelined computer (all female, it appears) system at Los |> Alamos that Feynman describes in his book apply?

Yup :-) Did he describe the actual calculations? If so, that would be an excellent reference.

Regards, Nick Maclaren.

Reply to
Nick Maclaren

Los

It seemed pretty clear to me from the description, including the way they handled computational errors on the fly, that this was iterating difference equations to solve differential equations numerically - that is, the usual stuff you would expect.

Jan

Reply to
Jan Vorbrüggen

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.