Re: Intel details future Larrabee graphics chip

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 7:07 AM

That's one of the classic HAKMEM's isn't it?

It is indeed nice. :-)

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 8:40 AM

I am getting tired of simply pointing out factual errors, and this will be my last on this sub-thread.

In article , "Wilco Dijkstra" writes: |> |> > |> It's only when you implement the standard you realise many of the issues are |> > |> irrelevant in practice. Take sequence points for example. They are not even |> > |> modelled by most compilers, so whatever ambiguities there are, they simply |> > |> cannot become an issue. |> >

|> > They are relied on, heavily, by ALL compilers that do any serious |> > optimisation. That is why I have seen many problems caused by them, |> > and one reason why HPC people still prefer Fortran. |> |> It's only source-to-source optimizers that might need to consider these |> issues, but these are very rare (we bought one of the few still available). |> |> Most compilers, including the highly optimizing ones, do almost all |> optimization at a far lower level. This not only avoids most of the issues |> you're talking about, but it also ensures badly behaved programs are |> correctly optimized, while well behaved programs are still optimized |> aggressively.

I spent 10 years managing a wide range of HPC machines (and have advised on such uses for much longer). You are wrong in all respects, as you can find out if you look. Try Sun's and IBM's compiler documentation, for a start, and most of the others (though I can't now remember which).

Your claims that it isn't a problem would make anyone with significant HPC experience laugh hollowly. Few other people use aggressive optimisation on whole, complicated programs. Even I don't, for most code.

|> > |> Similarly various standard pendantics are moaning |> > |> about shifts not being portable, but they can never mention a compiler that |> > |> fails to implement them as expected... |> >

|> > Shifts are portable if you code them according to the rules, and don't |> > rely on unspecified behaviour. I have used compilers that treated |> > signed right shifts as unsigned, as well as ones that used only the |> > bottom 5/6/8 bits of the shift value, and ones that raised a 'signal' |> > on left shift overflow. There are good reasons for all of the |> > constraints. |> >

|> > No, I can't remember which, offhand, but they included the ones for |> > the System/370 and Hitachi S-3600. But there were also some |> > microprocessor ones - PA-RISC? Alpha? |> |> S370, Alpha and PA-RISC all support arithmetic right shifts. There |> is no information available on the S-3600.

All or almost all of those use only the bottom few bits of the shift. I can't remember the recent systems that had only unsigned shifts, but they may have been in one or of the various SIMD extensions to various architectures.

|> > Signed left shifts are undefined only if they overflow; that is undefined |> > because anything can happen (including the CPU stopping). Signed right |> > shifts are only implementation defined for negative values; that is |> > because they might be implemented as unsigned shifts. |> |> No. The standard is quite explicit that any left shift of a negative value |> is undefined, even if they there is no overflow. This is an inconsistency |> as compilers change multiplies by a power of 2 into a left shift and visa |> versa. There is no similar undefined behaviour for multiplies however.

From the standard:

[#4] The result of E1 Once we agree that it is feasible to emulate types, it is reasonable to |> mandate that each implemenation supports the sized types.

That is clearly your opinion. Almost all of those of us with experience of when that was claimed before for the previous 'universal' standard disagree.

Regards, Nick Maclaren.

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 8:58 AM

In article , snipped-for-privacy@yahoo.com writes: |> |> Byte addressability is still uncommon in DSP world. And no, C |> compilers for DSPs do not emulate char in a manner that you suggested |> below. They simply treat char and short as the same thing, on 32-bit |> systems char, short and long are all the same. I am pretty sure that |> what they do is in full compliance with the C standard.

Well, it is and it isn't :-( There was a heated debate on SC22WG14, both in C89 and C99, where the UK wanted to get the standard made self-consistent. We failed. The current situation is that it is in full compliance for a free-standing compiler, but not really for a hosted one (think EOF). This was claimed not to matter, as all DSP compilers are free-standing!

However, allowing for ones with 16- or 32-bit chars, or signed magnitude integers is not. The former is already happening, and there are active, well-supported attempts to introduce the latter (think IEEE 754R). Will they ever succeed? Dunno.

|> It seems you overlooked the main point of Nick's concern - sized types |> prevent automagical forward compatibility of the source code with |> larger problems on bigger machines.

Precisely.

Regards, Nick Maclaren.

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 9:52 AM

Which factual errors? :-)

And I laugh in their face about their claims of creating a "highly optimizing compiler" that generates incorrect code! Any idiot can write a highly optimizing compiler if it doesn't need to be correct... I know that many of the issues are caused by optimizations originally written for other languages (eg. Fortran has pretty loose aliasing rules), but which require more checks to be safe in C.

My point is that compilers have to compile existing code correctly - even if it is written badly. It isn't hard to recognise nasty cases, for example it's common to do *(T*)&var to convert between integer and floating point. Various compilers treat this as an idiom and use direct intFP moves which are more efficient. So this particular case wouldn't even show up when doing type based alias analysis.

That is typical of all implementations, but it is not a big issue, and the standard is correct in this respect.

Even if you only have unsigned shifts, you can still emulate arithmetic ones. My point is there is no excuse for getting them wrong, even if your name is Cray and you can improve cycle time by not supporting them in hardware.

Exactly my point. It clearly states that ALL leftshifts of negative values are undefined, EVEN if they would be representable. The "and nonnegative value" excludes negative values! The correct wording should be something like:

"If E1 has a signed type and E1×2^E2 is representable in the result type, then that is the resulting value; otherwise, the behavior is implementation defined."

Wilco

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 10:34 AM

In article , "Wilco Dijkstra" writes: |> |> > [#4] The result of E1 > positions; vacated bits are filled with zeros. If E1 has an |> > unsigned type, the value of the result is E1×2^E2, reduced |> > modulo one more than the maximum value representable in the |> > result type. If E1 has a signed type and nonnegative value, |> > and E1×2^E2 is representable in the result type, then that is |> > the resulting value; otherwise, the behavior is undefined. |> |> Exactly my point. It clearly states that ALL leftshifts of negative values are |> undefined, EVEN if they would be representable. The "and nonnegative value" |> excludes negative values! The correct wording should be something like:

Yes, you are correct there, and I was wrong. I apologise.

Regards, Nick Maclaren.

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 11:08 AM

Eventhough the standard is vague as usual about the relative sizes of integer types besides the minimum sizes, it is widely accepted that int must be larger than char and long long larger than int. That means a 32-bit DSP must support at least 3 different sizes. Even so, making short=int=long is bound to cause trouble, a lot of software can deal with short=int or int=long but not both.

32-bit wchar_t is OK, but 32-bit char is a bad idea (see above). C99 already allows sign magnitude integers. Or do you mean BCD integers? That would be a disaster of unimaginable proportion...

That's not true. Most problems do not get "larger" over time. Since DSP's are mentioned, imagine implementing a codec like AMR. You need a certain minimum size to process the fixed point samples. Larger types do not help at all (one often needs to saturate to a certain width, in other cases you can precalculate the maximum width needed for the required precision). For this kind of problem sized types are the most natural.

Now there are of course cases where the problem does get larger. That's why we've got ptrdiff_t - there is no reason to fix it size. I never said that we should completely abolish variable sized types, but that the standard should

*mandate* that all implementations support the sized types int8, int16 etc.

One of the key advantages of sized types is that software needs less porting effort. Eventhough Nick will claim his software runs on any system ever made, in reality it's nontrivial to ensure software works on systems with different integer sizes. I bet a lot of C code fails on this 32-bit-only DSP. However if the sized types were supported any code would work unchanged. Java uses sized types for the same reason.

Wilco

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 11:16 AM

In article , "Wilco Dijkstra" writes: |> |> Even though Nick will claim his software runs on any system |> ever made,

Please don't be ridiculous. I have never made such a claim, and have used some systems so tricky that I had trouble writing even simple Fortran that worked on them.

Regards, Nick Maclaren.

- A
- already5chosen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 3:32 PM

Why don't you count C55? It is relatively new and, according to my understanding of the market, by far the most popular general purpose DSP in the world.

Of those you mentioned I only used Blackfin. It's support for 'C" is, indeed, idiomatic as you call it.

IMHO, the main newness in DSP world is that on "simple algorithms, high throughput" front classic programmable Von-Neuman or Harward machines are less and less competitive with FPGAs. Appearance of HW multipliers in cost-oriented Spartan and Cyclone series changed the game once and for all. So traditional DSP vendors, esp. TI and ADI, should look for new niches. IMHO, it also means that C6000 and to less extend TigerSharc lines don't have a bright future. On the other hand, C55, Blackfin and flash-based C28 and similar Freescale products are not at danger. Oh, quite off topic...

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 4:35 PM

It sounds interesting - thanks for the pointer.

|> IMHO, the main newness in DSP world is that on "simple algorithms, |> high throughput" front classic programmable Von-Neuman or Harward |> machines are less and less competitive with FPGAs.

Unfortunately, FPGAs have some pretty serious restrictions on the classes of programming paradigm that is appropriate. And some of the trickier cases are important for that market - not typically DSP as such, but controllers. That is the main reason that the FPGA fanatics are wrong that they are going to take over the world, even if they do take over several important markets.

Personally, I would like to see FPGAs become cheap enough for the ordinary hobbyist to use for large projects. We might see some progress with getting away from the domination of the current subset of the Von Neumann model. I still think that dataflow deserves a fresh look, now we have got away from the constraints of the 1980s.

|> Oh, quite off topic...

Totally. You are ordered to stand in the corner for posting something on computer architecture. I will come and join you shortly.

Regards, Nick Maclaren.

- A
- already5chosen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 7:06 PM

hobbyist to use for large projects.

I can't quite figure out how "large projects" belong in the same statement with "cheap enough" and "ordinary hobbyist".

"Large projects" aside FPGA evaluation boards and development tools are certainly cheap enough for the ordinary hobbyist right now.

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 7:18 PM

In article , snipped-for-privacy@yahoo.com writes: |> >

|> > Personally, I would like to see FPGAs become cheap enough for the |> > ordinary hobbyist to use for large projects. |> |> I can't quite figure out how "large projects" belong in the same |> statement with "cheap enough" and "ordinary hobbyist".

Think Linux. Think gcc.

|> "Large projects" aside FPGA evaluation boards and development tools |> are certainly cheap enough for the ordinary hobbyist right now.

Hmm. The last time I looked, the cheap versions were so restrictive as to be implausible.

Regards, Nick Maclaren.

- A
- already5chosen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 7:51 PM

I'd rather call RMS and Linus extraordinary hobbyists ;) And the project themselves didn't retain true self-financing hobbyist status for too long.

Nick, after all these years you should have learned that people rarely able to read your thoughts. Want us to understand you? Be more specific! Restrictive in what sense? For around $2K/year you can get the tools that are sufficient for 95% of commercial users. Why wouldn't they be good enough for ordinary hobbyist?

If you are buying dev. kit you typically get 1-year software license for free.

formatting link

I am sure that being at super-prestigious uni you can get even better deal from you local Altera or Xilinx representative. IMHO, if it wasn't illegal, they would be glad to pay you for spreading their message in Cambridge labs .

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Aug 19, 2008 8:55 PM

In article , snipped-for-privacy@yahoo.com writes: |> |> > |> "Large projects" aside FPGA evaluation boards and development tools |> > |> are certainly cheap enough for the ordinary hobbyist right now. |> >

|> > Hmm. The last time I looked, the cheap versions were so restrictive |> > as to be implausible. |> |> Nick, after all these years you should have learned that people rarely |> able to read your thoughts. Want us to understand you? Be more |> specific! Restrictive in what sense?

I started to be, but it got complicated :-( The problems varied with time and company.

formatting link

|>

formatting link

|>

formatting link

Thanks for the update. If I get a moment, I will take a look at that. My personal problem is, of course, that it means learning a completely new skill set - which takes me longer at 60 than it used to!

I know that people work on minor tweaks, but they aren't the really interesting possibilities. A faster error function is useful, and effectively impossible in software, but doesn't lead to any breakthroughs.

Regards, Nick Maclaren.

- J
- Joe Pfeiffer
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 1:53 AM

Very, very true -- but the results (emacs, linux, gcc) are things an ordinary hobbiest can use.

Unfortunately, Nick sometimes writes really good insights, and sometimes just blows opinions out his ass. I was more than a little annoyed when I found out that his definition of "Intel didn't do real VM until the 386" really meant "Nick has no clue what Intel 286 VM looked like".

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 3:10 AM

On Aug 20, 12:35 am, snipped-for-privacy@cus.cam.ac.uk (Nick Maclaren) wrote: [....]

I think that the biggest problem keeping the FPGAs from being used by small companies and hobbiests is the problem of tools. If there was a compiler for FPGAs that was like "gcc", they would be a lot more useful.

The tools are too huge and include lots of things you don't really want.

They are rented to you for perhaps as little as nothing per year but you can't own the tools.

Many of the tools require way too much knowledge about the internal details of the chip to make your source code really portable.

- T
- Torben =?iso-8859-1?Q?=C6gidiu
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 7:39 AM

I agree.

The forwarding network inside modern OOO processors is very much like dataflow, so it should not be a big step to make a "real" dataflow CPU, where the visible programming model is dataflow. It would probably simplify a lot of things, such as renaming, since it is explicit when a value is dead and its holder can be re-used.

And compiling to dataflow is not really that difficult. The SSA form used in many modern compilers is not far from a dataflow model, with the phi nodes acting as merge nodes, and it should be no major effort to convert SSA into true dataflow (or generate dataflow directly instead).

Torben

- M
- Morten Reistad
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 8:09 AM

There are a number of algorithms I know that would have great benefit from some dedicated bit-shifting in hardware. As long as the FPGA is not too far removed from the CPU, and of acceptable speed. It usually involves scanning a few hundred to a few tens of thousand bytes and generating data.

They haven't understood the importance of portability.

-- mrr

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 8:25 AM

Yes. The area which interests me, and which is so far unsolved, is how to design a dataflow language that is suitable for the majority of applications currently programmed in Von Neumann ones. The ridiculous thing is that a lot of application requirements fit very naturally into a dataflow paradigm (e.g. GUIs) - the problem is almost entirely in the programming of their components.

Aside: does anyone know why the "Harvard" approach was promoted from being a trivial but important variation of Von Neumann to being of equal rank, starting about 20 years ago? Because it assuredly ain't so, despite the nonsense in Wikipedia, and almost all programming languages have used separate code and data "address spaces" since the invention of COBOL and FORTRAN, and were/are always talked about as using the Von Neumann model (as they do).

Regards, Nick Maclaren.

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 8:33 AM

In article , Joe Pfeiffer writes: |> |> Unfortunately, Nick sometimes writes really good insights, and |> sometimes just blows opinions out his ass. I was more than a little |> annoyed when I found out that his definition of "Intel didn't do real |> VM until the 386" really meant "Nick has no clue what Intel 286 VM |> looked like".

Whereas I wasn't at all annoyed when I found out that you didn't know the difference between real virtual memory, and the rudimentary mechanisms which were almost totally abandoned in the UK in the

1960s, and were not called virtual memory by their inventors.

But I am annoyed when you make assertions about me that are false.

I knew then when Intel 286 so-called virtual memory looked like, and I don't call it virtual memory. Nor, interestingly, did most of the people in IBM I talked to - they took a HELL of a long time to learn about virtual memory, but did eventually learn. Other people seem slower.

Regards, Nick Maclaren.

- A
- Anne & Lynn Wheeler
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 1:15 PM

of course I'll mostly agree with you ... except for small pockets like the science center

formatting link

some of the people from ctss had gone to the science center on the 4th flr ... and some went to multics on the 5th flr.

science center had done virtual machine implementation in the mid-60s. original was cp40 ... running on a modified 360/40 with address relocation hardware ... and morphed into cp67 when 360/67 (with standard address relocation hardware) became available.

as undergraduate in the late 60s, i rewrote much of cp67 code ... including the virtual memory management and things like page replacement (including creating a global LRU page replacement ... when much of the academic efforts of the period were directed at local LRU page replacement).

this showed up later in the early 80s ... when one of Jim's co-workers at Tandem had done his stanford phd thesis on page replacement algorithms (very similar to what i had done as undergraduate in the late

60s) and there was enormous pressure not to grant a phd on something that wasn't local LRU ... old communication

formatting link

in this post

formatting link

a lot of the work that i had done as undergraduate in the 60s (that had been picked up and shipped cp67 product) ... was dropped in the simplification morph of cp67 (from 360/67) to vm370 (when general availability of address relocation was announced for 370 computers, i.e. 360/67 was only 360 model that had address relocation as standard feature).

for other drift ... a recent folklore post about that period (mostly related to unbundling announcement and starting to charge for software)

formatting link

for other folklore ... the announcement that all 370s would ship with virtual memory support ... required that all the other operating systems had to now add support for address relocation. one of the big issues was the heritage of application programs creating (i/o) channel programs and passing them to the supervisor for initiation/execution. While instruction addresses went through address relocation ... i/o channel programs didn't ... they continued to be "real". This created a disconnect ... since application programs (running in virtual address mode) would now be creating the channel programs with virtual addresses. This required the supervisor to create a copy of the passed i/o channel programs (created by applications) and substituting real addresses for the virtual addresses.

CP67 had this kind of translation mechanism from the very beginning ... since it had to take the I/O channel programs created in the virtual machines ... make a copy ... coverting all the virtual machine "virtual" addresses into real addresses. The initial transition of the flagship batch operating system (MVT) to virtual memory operation ... involved some simple stub code in MVT ... giving it a single large virtual address space (majority of code continued to run as if it was on real machine that had real storage equivalent to large address space) and crafting "CCWTRANS" (from cp67) into the i/o supervisor (for making the copies of application i/o channel programs, substituting real addresses for virtual). some recent posts mentioning CCWTRANS

formatting link

authoritative IEFBR14 reference

formatting link

EXCP access methos

formatting link

EXCP access methos

--
40+yrs virtualization experience (since Jan68), online at home since Mar70