Re: Intel details future Larrabee graphics chip

- M
- Michel Hack
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Wed, Aug 20, 2008 8:57 PM

Ah yes -- but is that high-order one-bit the ORIGINAL one-bit, or is it a one-bit that was shifted into it? ;-)

Michel.

- J
- Joe Pfeiffer
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 21, 2008 3:01 AM

Well, no, actually. Based on you statements at the time, you appeared to have no idea that the 286 had quite an elegant segmented VM scheme. And when I interpreted your comments as not regarding segmented VM as VM, you concluded I wasn't familiar with the earlier mechanisms.

On the other hand (I realized too late that I hadn't said this in the first place), I very much appreciated your information on IBM's (and others') 65nm processes.

- A
- AnimalMagic
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Thu, Aug 21, 2008 6:50 AM

NetHack runs on almost anything.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 2:52 PM

I can think of several cases where having a custom bit shifting instructions would be very handy.

The simplest case is when you have to swap the byte order of multi- byte numbers. Many processors can exchange two bytes but I don't know of any that can reverse the order of a 4 byte or 8 byte field quickly.

Compressing and decompressing often need numbers to be combined or split along non-byte boundaries. The old TI speech chip used a conpressed data stream that was not byte oriented.

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 2:55 PM

[....]

I believe that the idea existed long before it got the tag "Harvard" put onto it..

It is part of the reason why the 8051 is a much better processor than an 8088.

- J
- Joe Pfeiffer
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 5:04 PM

Comes from the Harvard Mark I, which used a paper tape for instructions and some sort of electromechanical storage for data.

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 5:26 PM

Huh???

Have you ever heard about x86 (SWAP reg), or any of the SIMD instruction sets, like Altivec?

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- D
- Dennis
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 8:16 PM

Actually both x86 and PowerPc have byte reverse instructions for up to 8 bytes. On the the PowerPc the SIMD byte shuffle instruction is a bit faster but you need to worry about 16 byte alignment.

- B
- Bernd Paysan
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 9:28 PM

That goes back to the first Zuse machine. The actual advance was von Neumann's, because he found that instructions are just another form of data. However, many computer languages haven't caught up to that idea, and still only manipulate data, but don't provide meta-programming features.

No wonder ancient abominations like the 8051 are still popular :-(.

--
Bernd Paysan
"If you want it done right, you have to do it yourself"
http://www.jwdt.com/~paysan/

- M
- malc
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 10:23 PM

The PowerPC bit is news to me. First of all there are no byte reverse instructions per se, only reversed loads/stores, secondly there is no ld/stbrx so you can only reverse halfs and words.

Besides on _x86_ bswap also is limited to 16/32 bit quantities, bswapq is x86_64 material.

P.S. There's a chance i missing something about double word byte reverse on PPC64 so please do fill me in.

--
mailto:av1474@comtv.ru

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 23, 2008 11:49 PM

Yes, I have heard of the SWAP and exchange instructions. Take a look at what they do to a 32 bit field and compare that to the operation needed to convert big and little endian 32 byte values. You will find that they don't do the needed operation in one go.

or any of the SIMD instruction

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 12:13 AM

On Aug 23, 10:52 pm, MooseFET wrote: [....]

Ooops I must have been asleep while writing that.

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 11:59 AM

x86 bswap converts a 32-bit value between little and big-endian as expected. You can use an endian swap of any size to swap larger values by repeating it several times (and that applies to both pure endian and odd mixed endian values).

Wilco

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 12:25 PM

Decoding/encoding IEEE 754 values typically takes 3-5 instructions on ARM, so it's pretty trivial in software. 754R decimal is a lot more more complex of course, but not nearly 100 times. One can decode a 50-bit mantissa in around 20-25 instructions on a 32-bit CPU.

Indeed. Various architectures do implement bitreverse, but it is hardly needed as CPUs already have the ultimate bitshuffle instruction: the lookup table.

Wilco

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 12:37 PM

In article , "Wilco Dijkstra" writes: |> |> > Try writing code to unpick an IEEE 754 floating-point number; once |> > you have done that, try doing that with an IEEE 754R decimal one :-) |> >

|> > With the current ISAs, doing that sort of bit-munging in software |> > can be a hundred times as expensive as it could be done in a very |> > small amount of hardware. |> |> Decoding/encoding IEEE 754 values typically takes 3-5 instructions on |> ARM, so it's pretty trivial in software. 754R decimal is a lot more more |> complex of course, but not nearly 100 times. One can decode a 50-bit |> mantissa in around 20-25 instructions on a 32-bit CPU.

I think that you mistook what I said.

If you can decode an IEEE 754 value in 3-5 instructions, and get all of the special cases right, then it has hardware assistance. Note that merely breaking the number up into fields is the easy part of the decoding. Stopping at that point isn't interesting.

And I said "a hundred times as expensive", not "100 instructions", though it could well be 100 executed instructions. The reason that I said it was expensive is that it will often/usually have a lot of mispredicted branches. You are aware that there are TWO formats of decimal, aren't you?

Regards, Nick Maclaren.

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 2:07 PM

We were talking about decoding bit formats, not about emulation. Zeroes, denorm, Inf or NaN don't need further work beyond decoding into sign, exp and mantissa. The only special case is setting the leading bit, but even that is pretty easy (only set it if exp != 0 and exp != max).

For example, my 32-bit IEEE floating point multiply emulation routine takes 41 instructions in total. That is without any hardware assistance.

Much of the decoding would be done using a few small lookup tables and conditional moves, something modern CPUs can easily handle. The few remaining branches are likely very predictable (eg. if (isNaN(x))... )

I'm aware there are 2 decimal formats, but one only has to deal with one at a time. Converting the binary format into the BCD one is indeed very expensive, but that is not decoding.

Wilco

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 2:34 PM

This is one of the few severe cases of shift to assembler, as the HLLs get too far away from the binary representations and hardware operations.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 2:40 PM

I guess you never came across xs3 variant of 8421. It doesn't work for 2421.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 2:52 PM

Though i have used both, i do not particularly like either one. I prefer more regular instruction sets. There are plenty of them. e.g. SPARC, 68k, Power, mips, etc.,

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 24, 2008 3:08 PM

In article , JosephKK writes: |> |> >Try writing code to unpick an IEEE 754 floating-point number; once |> >you have done that, try doing that with an IEEE 754R decimal one :-) |> >

|> >With the current ISAs, doing that sort of bit-munging in software |> >can be a hundred times as expensive as it could be done in a very |> >small amount of hardware. |> |> This is one of the few severe cases of shift to assembler, as the HLLs |> get too far away from the binary representations and hardware |> operations.

It doesn't help, actually, compared with a language like C. I have done it in both assembler and C, and there's precious little difference in either the complexity or the performance.

Regards, Nick Maclaren.