Re: The Manifest Destiny of Computer Architectures

glen herrmannsfeldt · 2011-09-14T21:06:16+00:00

I am reading in but readers may havedifferent ideas.Reminds me of stories about Russian processors that came witha 'bad instruction' list the way disk drives (used to) come witha bad blocks list. If you follow such conferences, you necessarily get far-out ideas.But if you look at the actual processors in use today, theyare not so different from 40 years ago. Bigger and faster, yes,but otherwise not that different.I suppose there are some problems that could work that way. A web browser updating multiple windows on a page could farm outeach to a different task. But many computational problems don'tdivide up that way.Very few problems divide up that way. For those that do, staticreconfiguration is usually the best choice. Dynamic reconfigurationis fun, but most often doesn't seem to work well with real problems.APL hasn't been popular over the years, and it could have donemost of this for a long time. On the other hand, you might lookat the ZPL language. Not as high-level, but maybe more practical.Consider that direct decendant of the 35 year old Z80 are stillvery popular, among others in many calculators and controllers.New developments might be used for certain problems, but the oldproblems can be handled just fine with older processors.For many years now, the economy of scale of people buyingfaster processors to browse the web or run spreadsheets hassupplied computational sciences (computational physics, computationalchemistry, and computational biology) with cheap, fast machines.Machines that wouldn't have had sufficient economy of scalewithout those other uses. The whole idea behind GPU processorsis that the economy of scale of building graphics engines forgamers can also be used for computational science.Consider the latest deviation from traditional processor design,the VLIW Itanium. VLIW has been around for years, and never didvery well. Some thought its time had come, but it is sinking justlike the similarly named boat.-- glen

S

Scott Michel 14 years ago

ACM killed off SIGAPL about 5-6 years ago. Sorry to see it go.

Have a look at the DARPA HPCS languages, notably, Chapel, Fortress and X10. Not entirely sure about their respective statuses, but they were an attempt in the HPC arena to raise the level of abstraction.

-scooter

Vote

N

nmm1 14 years ago

No, they aren't. Sorry. They raise the level above that of the mainstream 1970s and 1980s languages, but not above that of later ones such as, say, modern Fortran.

What they do do is to try to raise the level of the abstraction of the parallelism. I found Chapel and Fortress a bit unambitious, and wasn't convinced that any of them were likely to be genuinely and widely useful. However, I mean to take another look when (!) I get time. May I also draw your attention to BSP?

Despite a lot of effort over the years, nobody has ever thought of a good way of abstracting parallelism in programming languages. All viable ones have chosen a single model and stuck with it, though sometimes (as with MPI) one programming model can be easily and efficiently implemented on another as well.

Regards, Nick Maclaren.

Vote

R

Rob Warnock 14 years ago

+--------------- | For example, there are people starting to think about genuinely | unreliable computation, of the sort where you just have to live | with ALL parths being unreliable. After all, we all use such a | computer every day .... +---------------

Yes, there are such people, those in the Computational Complexity branch of Theoretical Computer Science who are working on bounded-error probabilistic classes, both classical & in quantum computing:

formatting link

Bounded-error probabilistic polynomial In computational complexity theory, bounded-error probabilistic polynomial time (BPP) is the class of decision problems solvable by a probabilistic Turing machine in polynomial time, with an error probability of at most 1/3 for all instances. ...

formatting link

BQP In computational complexity theory BQP (bounded error quantum polynomial time) is the class of decision problems solvable by a quantum computer in polynomial time, with an error probability of at most 1/3 for all instances. It is the quantum analogue of the complexity class BPP. ...

Though the math seems to be way ahead of the hardware currently... ;-}

-Rob

----- Rob Warnock

627 26th Avenue San Mateo, CA 94403

Vote

N

nmm1 14 years ago

That is a model for describing parallelism of the message-passing variety (including the use of Von Neumann shared data), and is in no reasonable sense an abstraction for use in programming languages.

BSP is. Unfortunately, it is not a good one, though I teach and recommend that people consider it :-(

Regards, Nick Maclaren.

Vote

B

Bakul Shah 14 years ago

CSP?

Vote

B

Bakul Shah 14 years ago

I have not seen anything as elegant as CSP & Dijkstra's Guarded commands and they have been around for 35+ years.

But perhaps we mean different things? I am talking about naturally parallel problems. Here is an example (the first such problem I was given in an OS class ages ago): S students, each has to read B books in any order, the school library has C[i] copies of the ith book. Model this with S student processes and a librarian process! As you can see this is an allegory of a resource allocation problem.

It is easy to see how to parallelize an APL expression like "F/(V1 G V2)", where scalar functions F & G take two args. [In Scheme: (vector-fold F (vector-map G V1 V2))]. You'd have to know the properties of F & G to do it right but potentially this can be compiled to run on N parallel cores and these N pieces will have to use message passing. I would like to be able to express such decomposition in the language itself.

So you will have to elaborate why and how CSP is not a reasonable abstraction for parallelism. Erlang, Occam & Go use it! Go's channels and `goroutines' are easy to use.

Vote

N

nmm1 14 years ago

Well, measure theory is also extremely elegant, and has been around for longer, but is not a usable abstraction for programming.

Such problems are almost never interesting in practice, and very often not in theory. Programming is about mapping a mathematical abstraction of an actual problem into an operational description for a particular agent.

Perhaps the oldest and best established abstraction for programming languages is procedures, but array (SIMD) notation and operations are also ancient, and are inherently parallel. However, 50 years of experience demonstrates that they are good only for some kinds of problem and types of agent.

Regards, Nick Maclaren.

Vote

J

Jon Elson 14 years ago

Hmmm, one additional tidbit. Some boards reflowed at the same time have been stored in a lab environment. These boards in question were stored in my basement for six months. The lab env. boards show no sign of the whiskers. Conditions in my basement are not bad at all, but it is likely more humid down there than in the lab. So, I guess this means don't store lead-free boards in humid conditions.

Jon

Vote

N

Nico Coesel 14 years ago

IMHO this is the wrong solution. Actually it is not a solution at all. You really should get in touch with someone who has experience in this field in order to solve the problem at the root.

Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------

Vote

A

Andrew Reilly 14 years ago

That's not really all that surprising though, is it? Hardware that exhibits programmable parallelism has taken many different forms over the years, especially with many different scales of granularity of the parallelisable sequential operations and inter-processor communications, The entire issue of parallelism is essentially orthogonal to the sequential Turing/von-Neuman model of computation that is at the heart of most programming languages. It's not obvious (to me) that a single language could reasonably describe a problem and have it map efficiently across "classical" cross-bar shared memory systems (including barrel processors), NuMA shared memory, distributed shared memory, clusters, and clouds (the latter just an example of the dynamic resource count vs known- at-compile-time axis) all of which incorporate both sequential and vector (and GPU-style) resources.

Which is not to say that such a thing can't exist. My expectation is that it will wind up being something very functional in shape that relaxes as many restrictions on order-of-execution as possible (including order of argument evaluation), sitting on top of a dynamic execution environment that can compile and re-compile code and shift it around in the system to match the data that is observed at run-time.

That is: the language can't assume a Turing model, but rather a more mathematical or declarative one. The compiler has to choose where sequential execution can be applied, and where that isn't appropriate.

Needless to say, we're not there yet, but I expect to see it in the next dozen or so years.

Cheers,

Andrew

Vote

G

glen herrmannsfeldt 14 years ago

Yes, but programs tend to follow the mathematics of matrix algebra.

A language that allowed for parallel processing of matrix operations, independent of the underlying hardware, should help.

Note that both the PL/I and Fortran array assignment complicate parallel processing. In the case of overlap, where elements changed in the destination can later be used in the source, PL/I requires that the new value be used (as if processed sequentially), where Fortran requires that the old value be used (a temporary array may be needed). The Fortran FORALL conveniently doesn't help much.

A construct that allowed the compiler (and parallel processor) to do the operations in any order, including a promise that no aliasing occurs, and that no destination array elements are used in the source, would, it seems to me, help.

Maybe even an assignment construct that allowed for a group of assignments (presumably array assignments) to be executed, allowing the compiler to do them in any order, again guaranteeing no aliasing and no element reuse.

Well, part of it is that we aren't so good at thinking of problems that way. Us (people) like to think things through one step at a time, and von-Neumann allows for that.

In nuclear physics there is a constant describing the number of years until viable nuclear fusion power plants can be built. It is a constant in that it seems to always be (about) that many years in the future. (I believe it is about 20 or 30 years, but I can't find a reference.)

I wonder if this dozen years is also a constant. People have been working on parallel programming for years, yet usable programming languages are always in the future.

-- glen

Vote

B

Bakul Shah 14 years ago

Your original statement was > Despite a lot of effort over the years, nobody has ever thought of > a good way of abstracting parallelism in programming languages.

I gave some counter examples but instead of responding to that, your bring in some random assertion. If you'd used Erlang or Go and had actual criticisms that would at least make this discussion interesting. Ah well.

Vote

J

Jon Elson 14 years ago

You have to understand this is a REALLY small business. I have an old Philips pick & place machine in my basement, and reflow the boards in a toaster oven, with a thermocouple reading temperature of the boards. I can't afford to have a $3000 a day consultant come in, and they'd just laugh when they saw my equipment.

I could go to an all lead-free process, but these boards have already been made with plain FR-4 and tin-lead finish. As for getting tin/lead parts, that is really difficult for a number of the components.

And, I STILL don't know why this ONE specific part is the ONLY one to show this problem. I use a bunch of other parts from Xilinx with no whiskers, as well as from a dozen other manufacturers.

Jon

Vote

R

Robert Myers 14 years ago

Spoken like someone who would know the difference between covariant and contravariant and wouldn't blink at a Christoffel symbol.

This is the "crystalline" memory structure that has so obsessed me. All of the most powerful mathematical disciplines would at one time have fit pretty well into this paradigm.

As Andy Glew commented, after talking to some CFD people, maybe the most natural structure is not objects like vectors and tensors, but something far more general. Trees (graphs) are important, and they can express a much more general class of objects than mutlidimensional arrays. The generality has an enormous price, of course

At least one and possibly more generations will have to die off. At one time, science and technology progressed slowly enough that the tenure of senior scientists and engineers was not an obvious obstacle to progress. Now it is.

Robert.

Vote

A

Andrew Reilly 14 years ago

I've read the language descriptions of Erlang and Go and think that both are heading in the right direction, in terms of practical coarse-grain parallelism, but I doubt that there is a compiler (for any language) that can turn, say, a large GEMM or FFT problem expressed entirely as independent agents or go-routines (or futures) into cache-aware vector code that runs nicely on a small-ish number of cores, if that's what you happen to have available. It isn't really a question of language at all: as you say, erlang, go and a few others already have quite reasonable syntaxes for independent operation. The problem is one of compilation competence: the ability to decide/adapt/guess vast collections of nominally independent operations into efficient arbitrarily sequential operations, rather than putting each potentially-parallel operation into its own thread and letting the operating system's scheduler muddle through it at run-time.

Cheers,

Andrew

Vote

B

Bakul Shah 14 years ago

Now I am not sure what Nick meant by "abstracting parallelism". I asked but he didn't clarify. I thought he meant "expressing the essential properties of parallelism". And here I think CSP/guarded commands do an excellent job). I think you are talking about is "placement" -- mapping an N-parallel algorithm to a smaller number of cores and in general making optimum use of available resources. But these are implementation issues, not abstraction ones. I agree that compilers have a long way to go.

Once easy to use parallel languages become widely available and we gain some experience I am hoping that a) better implementations will follow. b) we will find ways to extract parallelism in the language itself c) they will lead to much *simpler* h/w structures. Seems to me a lot of the h/w complexity is due to wanting to dynamically extract parallelism.

Vote

N

nmm1 14 years ago

There are several. APL is one (very limited) classic, but there were quite a lot of engineering languages in the 1960s and 1970s, and then there are Matlab and modern Fortran. None of those specify parallel processing, but all allow it.

Now, all of those abstract only dense, rectangular matrix algebra; sparse is a lot trickier, and others are worse still.

That is the problem. I disagree that parallelism is orthogonal to the serial Von Neumann model, because there are several forms of parallelism that don't match that and some that essentially require it (in both cases, even just compared with functional).

Regards, Nick Maclaren.

Vote

N

nmm1 14 years ago

What is currently called coarse-grain parallelism isn't really an abstraction in terms of a programming interface. All of the forms I have seen are merely APIs for a collection of communicating sequential processes, for the few current communication techniques. The theoretical work isn't one, either, and is a mathematical model for reasoning about such systems. At BEST, it is the assembler of parallelism.

A relevant abstraction would be comparable to procedures or arrays, both of which predate computing but have been incorporated very sucessfully as a programming model. 'Futures' are a possibility, but aren't new and didn't get very far in their previous incarnations.

I don't think that ANY form of communicating sequential processes can be made into even a decent programming model, as it is far too close to the (current) hardware and far too far away from the mathematical formulation of real problems.

Someone (maybe you?) mentioned networks and trees. I fully agree, and regard them as VERY badly handled by any current programming language I know of, serial or otherwise. I have some ideas on how to improve matters, but the extra generality makes the problem vastly more complicated.

Regards, Nick Maclaren.

Vote

N

Nico Coesel 14 years ago

In that case you'll have to experiment yourself or have the boards produced by an assembly house. Soldering lead-free is more difficult to get right because the temperature tolerances are much narrower.

Failure does not prove something is impossible, failure simply indicates you are not using the right tools... nico@nctdevpuntnl (punt=.) --------------------------------------------------------------

Vote

B

Brian Drummond 14 years ago

You mention humidity, but not temperature. Is there a possibility that the temperature fell below 10C during that storage period?

formatting link

- Brian

Vote

Re: The Manifest Destiny of Computer Architectures

Join the Discussion

Didn't find your answer?