TILE64 embedded multicore processors - Guy Macon

Guy Macon · 2007-10-12T19:23:50+00:00

I have been following the development of these processors for the last five years, but only recently have I seen a bunch of marketing material that ranges from being wrong to outright deception.For an example of being just plain wrong, look at the prettypicture here: [ ]Looks like the corner processors only connect to two otherprocessors, doesn't it? Actually, the topology is a torus,so the far right processor on each row has a wraparound connection to the far right processor. Ditto for top/bottom.Another claim that is just plain wrong: "In architectures of this sort, you can keep growing and you won't have any serious congestion."The reality is that it takes one cycle for data to move from a processor to one of it's nearest four neighbors two cycles to reach the four nearest diagonal processors, and eight cycles to reach the processor farthest away -- and that 8 cycles will become 16 cycles on a 256-core design. Note that these 8or 16 cycles limit the latency of the L3 cache... It is also a basic reality of this architecture that as you scale up to more processors, each one has more data passing through it,causing -- you guessed it -- serious congestion.And, of course, they are trotting out the age-old vaporwarepixie dust compiler that will by magic solve all the problems involved with writing code for parallel processing, just like all the previous vaporware pixie dust compilers were supposed to solve all the problems involved with parallel processing.It is also quite telling that they aren't really revealing all the technical details. Go ahead and try to find out what theinstruction set is, whether all those processors can each talkdirectly to the gigabit ethernet ports on the board they say they are selling, or even the price of that board.The hype says that this is a "sea change in the computing industry," and the "first significant new chip architectural development in a decade." The reality is that this is an old idea with a few new twists,suitable for...

J

Jim Granville 18 years ago

Yes, but here is a bemusing story of a real use :

My son was in a maths-team speed competition (set by the University Maths association) and they asked (pocket calculators allowed) "How many Zeros in 25!" and the answer they marked as correct was 6.

What they actually _meant_ to ask, was: "How many Trailing Zeros in 25!"

- you can derive that answer, without knowing the full value.

Drop it into CCalc ans = 15511210043330985984000000

and oops, quite clearly there are 9 zeros. The local University Maths Association is somewhat red faced, as this really is a 15 second check on any decent PC calculator ! [ Heck, Even Windows calc will give this....]

-jg

Vote

R

rjf 18 years ago

Incompetence, laziness, and the collusion of language designers/ implementors with hardware implementors.

and if you want the sin(1.0e300), let us first clarify whether you mean sin (10^300)=sin(10000.......0000000)= approximately - 9.8575042516037699660904753142989 which can easily be computed by Mathematica or (the free) computer algebra system Maxima (ne Macsyma(.

or do you want to start with the number that is represented by the closest double precision binary approximation to 10^300, [1d300, usually; 1e300 is "too big".] This number is

1000000000000000052504760255204420248704468581108159154915854115511802457988908195786371375080447864043704443832883878176942523235360430575644792184786706982848387200926575803737830233794788090059368953234970799945081119038967640880074652742780142494579258788820056842838115669472196386865459400540160 and its sin is quite different, about -.81788191211

Now in regard to claims that it doesn't matter, as long as it is consistent, well, an excellent way to make it consistent is to make it correct. Most other ways of making it "consistent" will not work. Some people might look at half-angle formulas (and half-half-half-angles etc). others will look at sin^2+cos^2. If you want 20 decimal digits of the sine of 10^300, you need at least 320 decimal digits of pi, and for some deviously chosen numbers you will need considerably more.

Maxima, Mathematica, etc. know that. There are also several high (or arbitrary!) precision packages that know this kind of thing, too. ARPREC MPFR are two packages that are currently supported. There is also QD, which is good for 212 bits of fraction (quad- double).

Vote

N

Nick Maclaren 18 years ago

In article , rjf writes: |> On Oct 28, 8:00 pm, snipped-for-privacy@mojaveg.lsan.mdsg-pacwest.com (Everett M. |> Greene) wrote: |>

|> > > |> > And the answer is "it depends". It can be negligible, or it can |> > > |> > be huge, often depending critically on how accurately you want to |> > > |> > calculate the values at the extreme ranges. sin(1.0e300) or |> > > |> > pow(1+1.0e-15,1.0e15), for example. |> >

|> > The IEEE standards type groups are dominated by the academics who |> > have no touch with reality. Why do you think many implementors |> > ignore the NaNs, infinities, and denormalized numbers features |> > of IEEE 754? |> |> Incompetence, laziness, and the collusion of language designers/ |> implementors with hardware implementors.

Without necessarily agreeing with Everett Greene, you are merely exposing your ignorance.

Regards, Nick Maclaren.

Vote

K

Ken Hagan 18 years ago

Consistent with what? Given that pi is irrational, I imagine one could produce obscure trigonometrical identities that fail on any system of finite accuracy. (And if you say bignums, I say how much memory have you got, how much address space, how much disc space?)

Requiring consistency with *some* identities but not others strikes me as arbitrary, and if we are allowed to draw *arbitrary* lines beyond which we don't care, then why fret over sin(1e300)?

Show me the realistic application that needs to know sin(1e3), even.

Vote

E

Everett M. Greene 18 years ago

You overlook practicality...

This is all well and good, but there's the matter of getting a useful results with finite resources (hardware, programming, time). Smaller processors don't have the processing power to compute anything as ridiculous as sin(10**300) in a reasonable amount of time (even if the result is meaningful).

TANSTAAFL

Vote

N

Nick Maclaren 18 years ago

In article , snipped-for-privacy@mojaveg.lsan.mdsg-pacwest.com (Everett M. Greene) writes: |> |> This is all well and good, but there's the matter of getting a |> useful results with finite resources (hardware, programming, |> time). Smaller processors don't have the processing power to |> compute anything as ridiculous as sin(10**300) in a reasonable |> amount of time (even if the result is meaningful).

Actually, nowadays, they do. I can explain how, if you want - and I know that other people can, too.

But we can also explain why dedicating that proportion of a small computer's resources to calculating a meaningless number is, to be polite, completely cuckoo.

Regards, Nick Maclaren.

Vote

R

rjf 18 years ago

y'think? Does your opinion of my opinion matter? I think there is considerable empirical support for the view that PL designs leave out access to features that are not supported by hardware and vice-versa. Scientific computing emphasizes speed, sometimes at the expense of accuracy. On the other hand, most architects have very little knowledge of, or sympathy for, the use of floating-point arithmetic. Frankly, it's not a big seller, once you can put a check in the box "IEEE floating-point format".

perhaps check ..

Fateman, Richard J. "High-Level Language Implications of the Proposed IEEE Floating-Point Standard." ACM Transactions on Programming Languages and Systems Vol. 4, No. 2 (1982).

as for why sin(10^300) might be of interest -- most "serious" people would realize that this is problematical if they looked at it. But maybe they would not look at it. Note that sin(f(x)) may not look problematical at all in the context of a much larger problem, where f(x) might be big, if you happened to be able to grok f(x) sufficiently. Note that f(x) doesn't have to be 10^300; just 10^17 will cause problems for some systems.

Consider a high-school student who notices that sin(x) can be plotted around -10

Vote

G

Greg Lindahl 18 years ago

I'd be more excited if you could come up with an example which isn't merely well-known laziness of a few math library implementers on one kind of chip with inadequate range reduction in hardware. Most systems that do HPC use a math library without this flaw. Tilt at something more important, like max() and min() in the presence of NaN... good luck convincing programmers that the standard actually specifies a useful result for those.

-- greg

Vote

T

Terje Mathisen 18 years ago

I totally agree, when the number of significant bits are going towards zero, the result can be pretty much anything, and your programs had better be prepared to accept this.

There is only one good reason for handling sin(1e300), and that is that it in fact possible to do so (for a given definition of handling which is comparable to those used for basic arithmetic):

+-*/sqrt() are all defined as delivering correctly rounded results, starting from the assumption that all/both inputs are exact. When doing a series of such operations, each operation shall pretend that the previous operation was exact, instead of correctly rounded.

For sin(1e300) this really means pretending that the initial conversion to double precision, with an exponent close to the max (2^1022 or so), and a 53-bit mantissa is exact, right?

Starting from this value, I could then split the mantissa into two parts, each with 26/27 significant bits, then use bignum style operations to multiply by a few terms of an ~1100 bit approximation to

1/(2*pi).

The fractional part of this multiplication can either be used directly in a custom sin() algorithm, or multiplied back by 2*pi to use the normal sin() libs.

So what is the actual performance/memory cost of doing all this?

We need the 1100 bits of (1/(2*pi)), that's about 140 bytes, or less than 200 if stored as an array of float.

We'll need about 10 fp muls & adds to convert to fractional circles, then another 2 or 4 to convert back, so it will be significantly less than the cost of the core sin() function itself.

Terje

- "almost all programming can be viewed as an exercise in caching"

Vote

N

Nick Maclaren 18 years ago

In article , rjf writes: |> |> y'think? Does your opinion of my opinion matter?

Probably not a lot, as I and people who believe like me have lost the politics. But damned if I am going to let the facts and history be rewritten.

While I cannot prove it, as I did not publish, I predicted that IEEE

754 would NOT be supported by programming languages when I first saw it, for good reasons. 20+ years on, I have been shown to be right.

|> I think there is |> considerable empirical support for the view that PL designs leave out |> access to features that are not supported by hardware and vice-versa.

At the basic arithmetic level, yes.

|> Scientific computing emphasizes speed, sometimes at the expense of |> accuracy. On the other hand, most architects have very little |> knowledge of, or sympathy for, the use of floating-point arithmetic. |> Frankly, it's not a big seller, once you can put a check in the box |> "IEEE floating-point format".

That has been true for at least the past 25 years.

Thank you for reminding me of that. I have looked at it again.

|> ...Richard Fateman, member of the (original) IEEE 754 (binary floating- |> point) standards committee..

No comment.

Now back to why I said what I said, and why I predicted 20+ years back that IEEE 754 would not be taken up by programming languages, and was right. Currently, none supports it in full, and none even supports enough to make it possible to write portable, reliable code using the features. This will not change.

Denormalised numbers are an irrelevance, provided that an implementation either supports them properly or ALWAYS forces hard underflow (both pre- and post-operation). It is only mixing the two that causes trouble.

Infinities would be easy to support, if they were a pure overflow value (i.e. if 1/0 delivered a NaN). But, because of their specification, most languages avoid them (see later).

The first big killer is that IEEE 754 included only one trapping mode, trap-recover-and-resume. That is a gibbering nightmare to implement or even specify, as soon as you allow ANY optimisation or any form of parallelisation or asynchronicity at the language, compiler, operating system or hardware levels. The traditional and straightforward one, which is useful primarily for debugging, is trap-diagnose-and-terminate. LIA-1 gets that right, but came too late to recover the situation.

If you look at the Fortran standard carefully, you will see that it is impossible to specify trap-recover-and-resume. Dammit, in 40 years, Fortran hasn't even managed to find a way of specifying the behaviour of impure functions, despite it being a major known problem with the standard all that time. C and C++ came later, and the same remark applies to them.

If you have ever implemented language run-time systems with proper trap-recover-and-resume (and I have, several times), you will know what an evil task it is. Few modern programmers can even imagine how to start.

So we have to stick with the IEEE 754 default carry-on-regardless mode and rely on flags.

Well, lets start with them. Global flags (EVEN the limited global forms permitted in IEEE 754R) were a known disaster area well before

1980. To start with, they create havoc with optimisation, especially anything involving asynchronicity or parallelisation. And, boy, do they just! The number of inconsistencies I have found on various systems is legion - and 95% of those have not been bugs.

But this also applies to specification. Fortran has ALWAYS permitted code movement, including function calls. The proponents of flags say that you can handle this by moving the flags with the code; this is hideously expensive for small code fragments, but let that pass. More importantly, it is not true. Fortran has NEVER required code to be executed if its value is not needed, and does not even specify its behaviour in terms of an abstract, serial von Neumann machine. Including useful flag setting is a nightmare.

Fortran 2003 merely PERMITS IEEE 754 flags to be set - it doesn't say anything about whether they must be. In fact, an implementation can conform by doing everything except setting them! C99 is much worse, but I am not starting on that here.

Now let's get onto signed zero and its effect on infinities. The problem about signed zeroes is that they destroy many mathematical invariants, and the fact that 1/0 = infinity means that a program can't simply treat -0 = +0. Lots of optimisations become invalid, and users are expected to write to an arithmetic model that confuses the hell out of 99% of them. What's more, it encourages languages and implementations to not trap errors, such as 1/0 or sign(0).

But NaNs are the worst. While they LOOK simple, they aren't. They are easy to propagate only for simple arithmetic expressions; as soon as they go into a function or any non-trivial code, it must either be "NaN-aware" or it will fail. Take, for example, IFs and other conditionals. I tried writing some code using the IEEE 754 "NaNs compare false" convention (and I am a careful programmer with massive experience), and I couldn't. I have never met anyone who could, except just perhaps Kahan himself.

Now, this could easily be solved in a pure functional language, but lets's get real. The only simple solution for conventional languages is an error exit on using NaNs when they make no sense. And that is what we don't have.

And so on. But let's stop here.

Regards, Nick Maclaren.

Vote

M

mk 18 years ago

Please do.

Vote

N

Nick Maclaren 18 years ago

In article , mk writes: |> >In article , |> > snipped-for-privacy@mojaveg.lsan.mdsg-pacwest.com (Everett M. Greene) writes: |> >|> |> >|> This is all well and good, but there's the matter of getting a |> >|> useful results with finite resources (hardware, programming, |> >|> time). Smaller processors don't have the processing power to |> >|> compute anything as ridiculous as sin(10**300) in a reasonable |> >|> amount of time (even if the result is meaningful). |> >

|> >Actually, nowadays, they do. I can explain how, if you want |> |> Please do.

10^308 is about 3^646. Consider a table of all values of sin and cos in quad. precision for all powers of three - 32 KB. You can then use about 33 applications of the standard expansion formulae to get the result to something like 10^-6 ULP.

There are a zillion tweaks to improve that, and its not FAST, but it would deliver all values in a reasonable time. There are almost certainly much better approaches, too, but I can't be bothered to think of any now.

It's a stupid idea, but it isn't impossible.

Regards, Nick Maclaren.

Vote

R

rjf 18 years ago

Easy for you to say :)

The IEEE 754 standard does not require interrupts because even back then there were vector machines and the possibility if not the actuality of multiple asynchronous floating-point units producing results at difficult-to-predict times, as well as the impracticality of restarting them.

I have written about storing useful information in NaNs for retrospective diagnostics, an idea that I implemented on HP workstations, but could not be done on other hardware, at least not in the same way. I can do some of this in a machine independent way (almost) in Lisp. NaNs are, so far as I can tell, vastly underutilized as a consequence of poor language support.

As for your problems with signed zeros; these and a number of other artifacts of the IEEE standard tend to be compromises established after careful consideration and balancing among different potential uses: which choice provides the most utility and the least inconsistency. A careful examination of the signed zero situation might suggest that there should be 3 zeros: +, -, and 'unsigned' whose inverses would be +infinity, -infinity, and unsigned infinity. (This depends on your view of the real numbers as a subset of the complex numbers). But there aren't enough bits, so where do you compromise? You can give the argument for leaving off -0, but unless you also present the arguments pro/con and a context in which you weigh the two sides, you are (more or less) striking a pose, not making an evaluation. Some decisions were influenced by, for example, the possibility that one might wish to implement interval arithmetic. If interval arithmetic is unimportant, or if someone comes up with an idea that totally majorizes any prospect for interval arithmetic, then some of the compromises may not be right. [consider rounding directions, in particular]

The standards committee had enough trouble getting the hardware mfgs reps to agree to the standard provisions, in hopes that they would pick it up. No one expected the programming language community to be enthusiastic, which is why some people (me, for example) felt compelled to write about it. But we were not so naive: There were, after all, many important and non-conforming machines of substantial stature that were not going away soon. Cray, CDC, IBM, DEC. Designing languages that would run only on 754 hardware did not seem like a good idea then. And even now it doesn't seem like a compelling idea for most PL designers, who are more concerned with "objects" and such, not floating-point numbers.

The software successes of IEEE 754 were not in languages, but in subroutine libraries (for elementary functions, like those of Peter Tang, [who showed how to do range reduction of sine etc without much fuss, I think consistent with the comments in this thread], and packages like LAPACK. As well as more elaborate systems for (e.g.) interval arithmetic, quadruple-precision floats, etc.

RJF

Vote

G

Guy Macon 18 years ago

...as evidenced by the fact that the $20 TI-34-II calculator sitting in front of me can handle equations such as

10^10 * 1^99 - 0.99999999999999^99 = 9999999999 using a cheap 4-bit processor. That's way more than anyone will ever need in real life, and the only reason it doesn't handle larger numbers such as 0.999999999999999999999999999999^9999 is the lack of demand and the cost/size constraints of an LCD that displays that many digits.

Guy Macon

Vote

N

Nick Maclaren 18 years ago

In article , rjf writes: |> |> > And so on. But let's stop here. |> |> Easy for you to say :)

You WANT me to carry on? Boggle. I hadn't got onto modes, complex numbers, the complicated rounding-half rule and more.

|> The IEEE 754 standard does not require interrupts because even back |> then there were vector machines and the possibility if not the |> actuality of multiple asynchronous floating-point units producing |> results at difficult-to-predict times, as well as the impracticality |> of restarting them.

Please DO read my postings before responding. I addressed that in my paragraphs starting "The first big killer" and "If you look at the Fortran standard". IEEE 754's error, and it WAS an error, was not to require trap-diagnose-and-terminate as a programmer-selectable option. That is straightforward on vector machines, and has been the normal mode on them since time immemorial - indeed, on many vector machines, it was the ONLY option.

|> I have written about storing useful information in NaNs for |> retrospective diagnostics, an idea that I implemented on HP |> workstations, but could not be done on other hardware, at least not in |> the same way. I can do some of this in a machine independent way |> (almost) in Lisp. NaNs are, so far as I can tell, vastly underutilized |> as a consequence of poor language support.

I find your idea of machine independence most odd. I agree that such a facility could be useful, if it were not for the ease with which the NaN state gets lost.

|> The software successes of IEEE 754 were not in languages, but in |> subroutine libraries (for elementary functions, like those of Peter |> Tang, [who showed how to do range reduction of sine etc without much |> fuss, I think consistent with the comments in this thread], and |> packages like LAPACK. As well as more elaborate systems for (e.g.) |> interval arithmetic, quadruple-precision floats, etc.

Oh, really! You are claiming credit for things that had nothing to do with IEEE 754. It has been known how to do range reduction of sin since the 1960s, but nobody could be bothered. The same applies to interval arithmetic on a consistent architecture like the System/360. Quadruple precision? That's 1960s, too, and was partially in hardware from the 1970s.

And, lastly, exactly WHAT are you claiming that the success is with LAPACK? It runs perfectly well on other architectures, and I know of no major extra function that it delivers with IEEE 754.

Regards, Nick Maclaren.

Vote

R

rjf 18 years ago

Not at all. I considered your statement "But let's stop here." as "Nick wants to have the last word. Don't argue with Nick." I don't feel a need to honor such a request. But if you decide to stop writing, I would not object.

I am sure that such issues were discussed and such an elaboration would not have met with approval. Since IEEE 754 was not killed, I think that categorizing such a (non) feature as a killer is illogical.

NaN state can be preserved, as I have shown (and written into programs). One way is to store a pointer from the body of the NaN (there can be 2^52 different ones in a double-float) into a table of state vectors including flags, registers, PC, backtrace, alphanumeric error codes, whatever else you wish.... In fact I could store information in a functional form that would tell you that a particular NaN-2 was produced by adding 3+NaN-1 ... and NaN-1 has its own (separate) history.

Wrong. see below.

No one could be bothered because within a few short years the arithmetic would change and the benefits of the software would disappear. There may be excellent programs that were written for the IBM 7090 that embodied ideas that are no longer implemented in current software.

There is a limited supply of quality numerical analysts, and having them convert programs from 7090 to s/360 to vax to cray to sun to intel to ... is a waste that the creation of IEEE 754 went a long way to reducing.

The same applies to

Quad can be done the same way on all IEEE 754 machines. Interval too. Though if the software makes it impossible to set the rounding modes, interval arithmetic is unnecessarily nasty.

Sure, it runs on other architectures, but one would like to say what it computes. The extra functionality that you get is mostly that you get a predictably reliable answer, and that is, for some people, critical. It is certainly critical for the people who are building better versions for parallel machines, sparse problems, etc.

So IEEE 754 does contribute to this too.

RJF

Vote

N

Nick Maclaren 18 years ago

In article , rjf writes: |>

|> Not at all. I considered your statement "But let's stop here." as |> "Nick wants to have the last word. Don't argue with Nick." I don't |> feel a need to honor such a request. But if you decide to stop |> writing, I would not object.

Your interpretation is not the normal meaning of that expression.

|> I am sure that such issues were discussed and such an elaboration |> would not have met with approval.

I know that it wouldn't, and I know why.

|> Since IEEE 754 was not killed, I think that categorizing such a (non) |> feature as a killer is illogical.

Oh, for heaven's sake! This thread started because you made the following claim:

Incompetence, laziness, and the collusion of language designers/ implementors with hardware implementors.

The thing that has been killed is the integration of IEEE 754 into programming languages.

|> > I find your idea of machine independence most odd. I agree that such |> > a facility could be useful, if it were not for the ease with which the |> > NaN state gets lost. |> |> NaN state can be preserved, as I have shown (and written into |> programs). One way is to store a pointer from the body of the NaN |> (there can be 2^52 different ones in a double-float) into a table of |> state vectors including flags, registers, PC, backtrace, alphanumeric |> error codes, whatever else you wish....

That is not what I am referring to. I am referring to the erroneous state of an expression, as indicated by a NaN value. Too many normal operations on that expression lead to a result that is a normal and apparently correct value, but which is actually meaningless.

|> No one could be bothered because within a few short years the |> arithmetic would change and the benefits of the software would |> disappear. There may be excellent programs that were written for the |> IBM 7090 that embodied ideas that are no longer implemented in current |> software.

That is not true. At least in the UK, we knew perfectly well how to do such things portably, and many of us did. We also knew that it was a pointless activity for most purposes.

And that's ignoring the 20-year domination of the industry by the System/360 architecture.

|> There is a limited supply of quality numerical analysts, and having |> them convert programs from 7090 to s/360 to vax to cray to sun to |> intel to ... is a waste that the creation of IEEE 754 went a long way |> to reducing.

The downside is that modern programs deliver consistency and claim that it means correctness. In all of the cases INCLUDING ScaLAPACK that I have tracked down where users reported a bug on non-IEEE architectures, the root cause was that the code was broken and ALL versions were giving unreliable answers.

We used to check for bugs by running numerical code, unchanged, on as many different architectures as we could. It was an extremely useful debugging tool, now sadly unavailable.

|> Sure, it runs on other architectures, but one would like to say what |> it computes. The extra functionality that you get is mostly that you |> get a predictably reliable answer, and that is, for some people, |> critical. It is certainly critical for the people who are building |> better versions for parallel machines, sparse problems, etc.

Please don't be ridiculous. Firstly, neither Fortran nor C/C++ promise reproducible results, secondly, LAPACK delivers reliable results on all sane architectures and, thirdly, it is NOT critical for parallel versions and sparse problems. I have just spent a decade supporting a major one, where none of the systems had compatible arithmetic, and only the buggy programs (see above) had trouble.

Regards, Nick Maclaren.

Vote

M

Michel Hack 18 years ago

When two NaNs collide (e.g. add), one of them is lost, and the standard does not specify which one.

The standard does not define how narrowing type conversions affect the NaN payload, except that the payload must survive a widening followed by a narrowing, and should survive a radix conversion if the payload fits in the target format. Your way of exploiting NaN payloads only works in the absence of casts, but that's of course ok in programs that don't do that.

It would have saved me (doing binary/decimal FP conversions) a lot of trouble if IEEE 754(1985) had defined NaN payload narrowing so as to preserve the LOW-order bits (and the Q-bit of course, which is the high-order bit of the trailing significand) -- but it did not, and all 754 implementations I'm aware of preserve the high-order trailing-significand bits when narrowing. This is of course perfectly natural when the Q-bit is considered part of the payload, which is the way it was interpreted in 1985. (The fact that the 754(1985) did not settle the polarity of the Q-bit ended up as an annoyance too, because when 1 means "Signal" (as on PA-risc), automatic Quieting may have to insert another 1-bit into the payload in order to avoid generating an Infinity. (Indeed, you have only 51 bits to play with, not 52 as you (RJF) stated above.)

If we restrict meaningful NaN payloads to "small" integers (e.g. index or offset into something, or case numbers), and agree to encode these bit-reversed in binary payloads, then NaN payloads can be made to propagate, subject only to indeterminate collisions (where one of the two payloads will survive). That leaves only one small ugliness: is the Q-bit part of the payload or not? Before decimal formats came along, I assumed so, and treated odd payloads as Quiet and even non-zero payloads as Signalling (in conversion routines I wrote in 1997). Now I have to have two views of binary NaN payloads; the new one does not include the Q-bit, and the PFPO instruction of IBM System Z converts between BFP and DFP using this new rule. The saving grace is that nobody seems to have used the ability to convert NaN payloads in the last ten years.

Michel.

Vote

N

Nick Maclaren 18 years ago

It does not even specify that one is preserved; it merely recommends it.

That wasn't what I meant by "NaN state", but I agree is also a relevant point - and a perfectly valid interpretation of the term "NaN state". I don't know how to describe what I mean succintly except by using that term or equivalent.

Regards, Nick Maclaren.

Vote

R

rjf 18 years ago

The idea of a NaN is that operations on it should generally not produce a normal result which is, however, incorrect. It does require a different view of comparison, e.g. not(a>b) may not be the same logical value as ab and a

Doing correct range reduction was pointless? Why would that be?

At the same time as s/360 there were computers with rather different arithmetic from CDC, Cray, SDS/Xerox, DEC, Univac, as well as IBM

1130, s/3, etc.

Good for you.

To say that a program, to be free of bugs, must run on all architectures, is a peculiar demand unless it is part of the requirement of the program. It would probably make the program longer, slower, and more likely to have bugs.

A requirement that probably will lead to superior programs is to specify the architecture, or alternatively, specify IEEE 754 standard.

Huh? Except if the program calls random(), the results should be reproducible. Maybe you mean compatibility across machines? Maybe we should be writing in Java?

There are situations in which setting the optimization level changes the answers. we hope these situations do not occur in LAPACK, at least in any significant respect.

Perhaps you can write a paper about this, describing your experiences in detail and justifying your opinion. I'm sure that people working on new programs would like to hear you explain how the arithmetic doesn't matter; perhaps you could be more concrete and define "sane architecture". There have been attempts in the past by (for example, W.S. Brown, D.E. Knuth) and yet they were found wanting.

RJF

Vote

TILE64 embedded multicore processors - Guy Macon

Join the Discussion

Didn't find your answer?