Language feature selection

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 10, 2017 10:30 PM

More like the whole crop of interpreted languages now being used. I tend to think of C as something of our generation. A high percentage (but not all) of the compiler projects I have done has been C compilers.

w..

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Fri, Mar 10, 2017 10:42 PM

Most of my time now is working on both tools and ISA's. There has been some really significant changes in both approaches to compiling for heterogeneous parallel environments and execution environments that have hundreds to thousands of processors in them.

We are likely sooner than later to see some major shifts in tool sets. I am currently working on a reference design for one of these that has several hundred execution units.

w..

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 12:02 AM

But they are (largely) *static* environments (?). The toolchain doesn't have to decide when to bring another processor on-line... or, when it can retire a running processor and migrate its workload to some OTHER processor, etc. Or, which aspects of an application should be bound to specific processors (nearness of related I/Os) and which aspects should AVOID particular processors (as they were in insecure locations).

[Simulations of my first workload scheduler immediately brought every processor online and kept them there! D'uh!]

I've found it "trying" to come up with even a suitable set of criteria by which to constrain these choices. E.g., "performance" can be evaluated in a variety of ways: throughput, response time, power consumption, redundancy, etc. Just coming up with *a* set of criteria is a challenge. And, if the user can bias this at run-time, it becomes even more challenging! (perhaps userX might be willing to suffer slower response times for reduced power consumption?)

[As I get older, I am encountering more applications where The Right Answer is really elusive and often not available "at compile time". *Or*, even DESIGN TIME! (I'm still at a loss to forumulate a test suite to score the performance of the different speech synthesizer implelmentations I've created -- let alone their "costs"! sqrt(3) = 1.732 is a "better" answer than sqrt(3) = 1.7; but, how do you decide which pronunciation of which utterance is "better" -- and, how to weight the performances of the limitless number of POSSIBLE uterances to come up with a composite score??)]

I'd be interested in seeing what directions you took when you have something to share! And, the assumptions you made along the way.

Time for my evening jaunt... cripes, still 90 degrees -- this won't be fun. :<

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 12:46 AM

The more advanced toolchains are doing similar things. They instrument themselves, determine what the code+data is *actually* doing at runtime, and optimise the **** out of that.

That's as opposed to what the compiler can guess they are doing and where a compiler has to make pessimising assumptions.

And such techniques also work with C. For 18 year old (gulp) results, google for "hplb dynamo".

And don't forget that some related techniques are implemented in processor's hardware microarchitecture.

You are correct in presuming that you can't do an optimal job at compile time, since the information isn't there - and can't be there.

The bonus of avoiding getting toolchains to make premature optimisations, is that the same runtime optimisation techniques also work with different processors.

There are disadvantages, of course. TANSTAFFL.

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 2:14 AM

It is not a static environment. The compiler DOES allocate which processor (the compiler has heterogeneous processor support) is suitable for some particular part of the application. Most of the application distribution IS determined at compile time.

The compiler tool work is an evolution of the named address space work we did in Japan in the early 90's (ISO/IEC18037) to named processor space to compiler allocated named processor space.

w..

- P
- Paul Rubin
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 2:14 AM

Oh ok, but those languages are generally sugared-up re-inventions of Lisp, which is even older than C, and which the cogniscenti have been using all along ;-). E. W. Dijkstra in his Turing Award lecture back in

1972 had already observed:

With a few very basic principles at its foundation, it [LISP] has shown a remarkable stability. Besides that, LISP has been the carrier for a considerable number of in a sense our most sophisticated computer applications. LISP has jokingly been described as "the most intelligent way to misuse a computer". I think that description a great compliment because it transmits the full flavour of liberation: it has assisted a number of our most gifted fellow humans in thinking previously impossible thoughts.

By all means give the interpreters a try if you haven't. They make programming more productive along several axes, at the cost of some hardware resources (cpu and memory) that are generally plentiful with today's computers.

Yes, I was less surprised that good stuff was being done in interpreted languages, as that it's now relatively rare for even their expert users to have ever used C for anything.

- J
- jim.brakefield
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 2:46 AM

A plug for array operators: as in Numpy, IDL/PV~wave, APL and Julia. That is: array and vector operators baked into the language. I've found th at programming at this level yields shorter programs with less debugging: Y ou windup making your data structures and algorithms use the fewest operato rs possible/practical. Writing loops and subscript expressions are mostly gone! In theory, such renditions of algorithms can be optimized and/or par allelized by the compiler to a greater extent than for normal code.

There is a theoretical vantage point for this style of programming (which I call "programming in the large" as opposed to "programming in the small"): the fundamental data structure is, say, an eight dimensional array (possib ly non-contiguous). A scalar is such an array with the extent in each dime nsion equal to one. A vector has one dimension with the extent greater tha n one, etc. Of course, the array element can be something other than a sca lar (integer, float, etc) such as a record or list or hash table.

Jim Brakefield

- N
- Niklas Holsti
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 1:41 PM

In a language that allows operator overloading, programmers can define their own array operators.

I agree that array operators are useful, but only for relatively simple cases such as the basic arithmetic operations on arrays. However, taking it to the APL extreme with complex vector/matrix restructurings (outer products, lamination, ...) can create code that is hard for others to understand.

You may call it that, but I hope you know that most people understand these large/small terms differently; see

formatting link

--
Niklas Holsti 
Tidorum Ltd 
niklas holsti tidorum fi 
       .      @       .

- W
- Walter Banks
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 1:46 PM

It is our generation that is obsessed with optimization execution and data space. Some of the programs I have seen be developed in by these people tend to be using algorithms that are exchanging our sense of optimization for application performance. (VR applications for example)

Related to that we have almost always treated processor resources as a rare resource and when I changed that mind set on some of the massively parallel systems I have been working on to processors are just a resource that needs to be managed like memory in applications I was suddenly seeing huge leaps in application performance.

w..

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 7:16 PM

I don't see how that can be applied to anything but a generic environment.

E.g., I track resources associated with each node (especially "unique" I/O's). Then, based on the *events* encountered in the environment, decide which of those resources NEED to be brought on-line and how resources can then be redistributed to meet the current workload. (and, conversely, when I can "shed" resources -- to conserve *power* and/or improve communication overhead)

So, for example, if it's "daytime", the node that supports the video camera that monitors folks approaching the front door is powered up -- because I want to notice "visitors" in order to ANNOUNCE them (I don't accept visitors "after hours"). In this case, there is a NEED for that particular set of I/O's (i.e., the cameras facing the back yard are not capable of watching the front door approach!) along with a need for some additional compute resources (real-time image analysis). There is a COST associated with this: the power required to run that node and those hardware resources.

*Where* the code that analyzes the imagery executes is determined by the locations of the resources (CPU+memory) required to perform that task along with the communication costs to/from that application's physical location and the I/O's that it requires along with the clients with which it interacts.

If/when a visitor is "detected", then a means of informing the occupants of its identity is required. The most obvious solution being to power up another node proximate to an occupant and dispatch a live video feed to the *display* served by that node. An occupant can elect to interact with the visitor ("intercom") *or* direct the system to interact with them on their behalf (i.e., so they don't have to disclose their presence to the visitor: "Who are you? Whaddya want?"). Of course, this means tasks dedicated to synthesizing the required prompts need to be brought on-line and a channel opened by which that audio can be fed to the visitor and the visitor's reply captured and relayed to the occupant.

If the house is unoccupied, that video feed might, instead, be spooled to a media tank. Or, pushed over an internet/phone connection to the occupant(s) at a remote location. The audio prompts can be triggered from a "house_unoccupied()" script and responses similarly captured/dispatched.

When the visitor departs, all of this mechanism can be taken down to conserve power.

Later that night, that idle (cold!) node might be deliberately powered up, its camera left OFF and the CPU+memory assigned to "off-line/batch" processing of commercial detection in some OTA video broadcast captured earlier in the day. Or, the resources used to refine the speech recognizer's training set for UserA based on the stored audio for the voice commands issued during that day.

It's not possible to come up with an "ideal" resource (re)allocation strategy -- even having detailed knowledge of the *current* workload. I don't see how a tool can know these usage patterns or even possibilities at compile/build time!

[How does it know the cost of migrating taskA to node4 to accommodate taskM's MORE EFFICIENT use of node4's hardware resources (I/O's) in order to factor that into its decision as to whether node27 should, instead, be powered up and taskM spawned there, instead (incurring higher communication costs to PROXY stubs on node4 to twiddle those I/O's)? How does it know the communication costs for taskM's interactions with those proxies? etc.] [[I do this with a combination of crude metrics and heuristics "learned" over time -- by the system observing itself and how well it meets its performance requirements and deadlines with particular (task,node) bindings. And, I never know if I've got the *ideal* configuration for any set of nodes and tasks...]]

- D
- Don Y
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 8:05 PM

I don't see how that can happen -- unless you can force all possible uses of the code to occur while that analysis is being undertaken.

E.g., if it never sees you hit the brake pedal in your vehicle, it's likely to spend a lot of effort optimizing ignition timing, thinking "brakes" are never/seldom used! :>

I contend that there are huge classes of "programs" (this is c.a.E, right?) that would fail miserably with such an approach. You're not processing payroll where you can tweek umpteen gazillion iterations of the same loop. Rather, you are at the mercy of the events that transpire in the environment AND some externally sourced notion of relative "values" (neglecting timeliness for the moment).

That would depend on the nature of those optimizations and their potential consequences. Replacing an ADD with a SHIFT (assuming the SHIFT was more economical) isn't going to "color" the result, significantly.

And, an application ("program") can behave differently in different execution environments -- optimizations intended to exploit cache would be wasted on a system without cache; applications hosted in a paged memory management environment will benefit from different optimizations than the same application running in a flat/unpaged environment; etc.

- T
- Tom Gardner
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 9:58 PM

That would be counterproductive, since you want to optimise the common case. (Just like hardware and software caches do).

For a surprising example of where an experimental lab investigation became vaguely practical, see

formatting link

If by "embedded" you mean "tiny", then I agree these techniques are not currently very useful.

OTOH I have used them successfully in HA soft realtime telecom call processing systems. Those can reasonably be regarded as "large" embedded systems.

If you have hard realtime embedded systems, all caches are problematic since they are by definition statistical in nature.

IIRC the i960 processor had a crude mechanism for freezing its caches, to avoid such issues.

- J
- jim.brakefield
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 10:20 PM

_the_small.

]> You may call it that, but I hope you know that most people understand ]> these large/small terms differently; see

Was not aware of this definition. Tend to consider this type of "programmi ng in the large" as solving an organizational problem and an architectural problem.

Another form of programming in the large is characterized by provisioning a complete 64-bit computer: e.g. one with a complete 64-bit address space.

]> I agree that array operators are useful, but only for relatively simple ]> cases such as the basic arithmetic operations on arrays. However, taking ]> it to the APL extreme with complex vector/matrix restructurings (outer ]> products, lamination, ...) can create code that is hard for others to ]> understand.

My experience was with scientific programming, so yes this argument has som e validity. Am still convinced that it is a useful exercise: Push low leve l details down into the operators and data structures, consider the various ways of doing this so that the high level operators (that need to be writt en) emerge.

Also consider programming well with "array" operators to require greater ex perience, know-how and good judgement than doing low level coding.

Jim Brakefield

- L
- Les Cargill
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 10:40 PM

We've been at the von Neumann bottleneck for close to 20 years now for most architectures. I haven't seen anything CPU bound in a very long time, but I'm not in that market space.

With CUDA, anything is increasingly possible.

--
Les Cargill

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Sat, Mar 11, 2017 10:54 PM

FORTRAN had complex number support from the beginning.

The array support has been added more recently. It seems that array support was added ,more recently.. IMHO Fortran is still a viable option for solving mathematical problems after recent updates.

- J
- jim.brakefield
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Mon, Mar 13, 2017 1:08 AM

On Saturday, March 11, 2017 at 4:54:54 PM UTC-6, snipped-for-privacy@downunder.com wro te:

th

s

e

ng

_in_the_small.

mming in the large" as solving an organizational problem and an architectur al problem.

g a complete 64-bit computer: e.g. one with a complete 64-bit address space .

le

ing

r

some validity. Am still convinced that it is a useful exercise: Push low l evel details down into the operators and data structures, consider the vari ous ways of doing this so that the high level operators (that need to be wr itten) emerge.

experience, know-how and good judgement than doing low level coding.

]> FORTRAN had complex number support from the beginning. Fortran IV did not have a complex number type?

Had expected Fortran 90 to compete well against C/C++. What happened?

Julia uses 1 origin subscripts, same as Fortran. It can be difficult to co nvert a Fortran program to C/C++/etc: e.g, preserving correctness while con verting from 1 origin to 0 origin subscripts.

Jim Brakefield

- J
- Jacob Sparre Andersen
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Mon, Mar 13, 2017 8:25 PM

Yes. Strong, static typing.

Jacob

--
"Good enough for physics" -- Ridcully

- U
- upsidedown
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Mon, Mar 13, 2017 9:36 PM

FORTRAN IV definitively had COMPLEX data type.

The opposite conversion is trivial from stupid old C/C++ to Fortran 77 and later.

integer A (0:3, 0:4, 0:5)

thus you can (optionally) define the lower li,it for each array dimension.

If I understood correctly, recent Fortran versions also allow your own operators like defining an operator such as .dot. (dot product) and then you could write something like

Complex A, B C = A .dot. B

Regarding Julia, is it possible to overload various operators to a single Unicode character such as the center dot ?

While this might be useful for some common operators, the new Fortran .operator. syntax might be more versatile and readable.

- R
- Robert Wessel
  
  Contact options for registered users
Vote on answer
posted
7 years ago

Mon, Mar 13, 2017 10:03 PM

FSVO "beginning". COMPLEX support was not the original versions of Fortran, but was added pretty early. Certainly it was a standard feature by the end of the 50s, perhaps in Fortran II.