Re: Intel details future Larrabee graphics chip

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 30, 2008 8:27 PM

round, which means that you absolutely have

That is true. But does FP->int conversion require dynamic rounding modes? Many ISAs have integer conversion instructions with a fixed rounding mode. This is far simpler and faster than reading the current rounding mode, changing it to round to zero, doing the conversion and then restoring the previous mode.

Why? On x86 at least it may be faster to do the conversion via emulation anyway (SSE2 can only do float/double->int in one instruction).

Wilco

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 30, 2008 8:42 PM

The context is clear, we were talking about the properties of the IEEE-754 floating point format. The encoding has the property that values from zero to infinity use larger encoded values for larger values (the only exceptions are +0 and -0 and NaN which has no value). This allows for the implementation tricks I described.

Wilco

- W
- Wilco Dijkstra
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 30, 2008 8:55 PM

Were they all different subsets of IEEE (which is what we were discussing)?

Wilco

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sat, Aug 30, 2008 11:21 PM

They do not occur in the standard encodings but they could have. This would have lost those signals but made some more floating point values. In this way, they are "in band".

>

- N
- Nick Maclaren
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 31, 2008 12:19 PM

In article , MooseFET writes: |> |> > >I had seen the implied bit idea elsewhere. The idea of using up codes |> > >to specify things like NAN, +INF and -INF are things that I have found |> > >myself disagreeing with. I have never liked "in band signaling". |> > >That is the only part of the standard I found to be strange. |> >

|> > >I had to convert some all positive fixed point numbers to IEEE so that |> > >is the only part I really spent much time on. |> >

|> > Actually they are not really in band nor denormal. They used codes |> > that do not occur in the standard value encodings. |> |> They do not occur in the standard encodings but they could have. This |> would have lost those signals but made some more floating point |> values. In this way, they are "in band".

Not just in that way. The infinities are "in band" because they have many of the semantics of actual numbers. One can argue whether or not that is a good idea (I vacillate according to whether they help or hinder what I am doing!), but there is only one serious consistency errors with infinities in IEEE 754. There are lots more in C99 and IEEE 754R, of course.

That is that the sign of zero is unreliable, but dividing by it gives an infinity; if infinity was created only explicitly or by overflow, there would be no consistency problems.

NaNs were a bit of a mess in IEEE 754 and are a disaster area in C99 and IEEE 754R. God alone knows what they mean, because I know for certain that none of the people working on either of those two standards did or do.

Regards, Nick Maclaren.

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 31, 2008 6:53 PM

round, which means that you absolutely have

No.

Which is of course why Intel does it this way in SSE, having learned an expensive lesson during the x87 years.

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- M
- Michel Hack
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 31, 2008 7:58 PM

Well, this is actually what I liked best about the format: rounding overflow from subnormal to normal is automatic, and as somebody else pointed out already, one can "count" through strictly increasing magnitudes using simple integer arithmetic (as long as one watches out for Inf).

I agree though that a test for zero requires a different mask than the other tests, and sometimes that's awkward.

Michel.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 31, 2008 8:53 PM

Caches or not, memory speed has been more performance limiting that CPU speed for decades. Multiple CPUs on a single socket only aggravate this. Multiple memory busses might help.

- J
- John Doe
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Sun, Aug 31, 2008 9:14 PM

...

BWAAAHAHAHAAAAAA!!!!

Sounds like someone who is fishing for the motivation to upgrade.

I'll let you know when my multiple core CPU cannot use all cores at

100%. Multiple core CPUs are the biggest hardware performance leap in many years. Bet on it.

--
The first big front wheel rollerblades.
http://www.flickr.com/photos/27532210@N04/2565924423/
Google Groups is destroying the USENET archive.

- A
- Andrew Reilly
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Mon, Sep 1, 2008 12:13 AM

They may very well show you that they're running at 100% in a CPU use meter administered by a time-sharing OS, but do you know how much of that

100% is the processor stalled, waiting for off-chip memory? [*1] Is the throughput on your problem of choice four (or whatever) times what it is on a single core?

Well, sometimes it is. My own algorithms fit neatly into two categories: totally contained in cache (for modern values of cache), and totally memory bandwidth limited, so I am happy to have a couple of extra cores. I can imagine applications where it makes little difference, though.

[1] This is the single statistic that I most wish for, in an operating system performance display, and I don't know how to get it. Is it possible?

Cheers,

--
Andrew

- J
- John Doe
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Mon, Sep 1, 2008 1:42 AM

...

If I needed to know, I'd probably use Performance Monitor in Windows XP.

It's close enough for me.

Some applications don't take advantage of multiple cores, but that's not necessarily the CPUs fault. A good example is Supreme Commander and a tiny utility called CoreMaximizer. Without the utility, one core bounces against 100% and causes a replay to stutter while the other core is 50 or 60%. With the utility, both cores are almost even and there is a noticeable improvement in performance without stuttering.

--
The first big front wheel rollerblades.
http://www.flickr.com/photos/27532210@N04/2565924423/

- K
- Kim Enkovaara
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Mon, Sep 1, 2008 5:15 AM

Fortunately the 3rd party synthesizers for FPGAs are not that expensive. And third party tools offer easier way of handling many different chip families.

You have the same problem with the "free" software also. Many of them used time based licenses, and also the windows based software might be hard to get working after 10 years. Nowadays virtual machines are really helpful in booting up the really old tools tough.

If you still have some software from the old tool vendor, usually it's quite easy to get temporary licenses for the really old tools. Some time ago I saw how one vendor managed to even find installation media for a tool from the middle of 90s. And they also created new licenses for the tool. No sane person would use the tool for new development, so the keys were easy to get.

--Kim

- T
- Terje Mathisen
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Mon, Sep 1, 2008 5:47 AM

[snip]

It might very well be:

The Intel EMON counters (and similar on most other architectures) allow you to count the number of cycles spent waiting for memory as well as the number of cache misses.

The ratio should be the clock cycles/miss, and the total wait/total cycles is the ratio lost to this.

Terje

--
- 
"almost all programming can be viewed as an exercise in caching"

- K
- Ken Hagan
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Mon, Sep 1, 2008 8:55 AM

Straight out of the box, performance monitor doesn't have a suitable counter for this. I believe sys-internals do a widget that exposes values from the RDPMC instruction. That might help if there were a suitable performance counter to report.

But is there? There are certainly counters for cache misses, but those don't necessarily affect performance in an OoO processor. I suppose you could quote the "total number of instructions retired" value as a proxy for "useful CPU work done", but the 100% level for that depends on the available ILP.

I suppose part of the problem is the lack of a suitable definition for when the CPU is "usefully busy".

- M
- MooseFET
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Mon, Sep 1, 2008 1:57 PM

[....]

x

Yes, I include the "free" license in with the rented.

I still have a machine that runs windows 89 and I have "bochs". I also have the ability to run a machine in DOS 5.0 and "dosemu".

What about when the vendor of the tool has gone out of business? I have a perfect record so far on the chips. Every CPLD I have selected for a product has gone off the market. I selected the Philips "cool runner" for example.

- M
- Martin Brown
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Mon, Sep 1, 2008 8:39 PM

If you don't mind a little work figuring out how to use them the Intel performance monitoring counters can be configured to do what you want. You have to run a custom Ring-0 driver to access them (so it isn't for the faint hearted). It can be very informative.

Look for ia32 from University of Texas or similar utilities for Linux.

Some of the chess optimisation work done for multiple cores is very enlightening about the difficulties of load balancing an algorithm across 4 or more cores without saturating external memory bandwidth.

Regards, Martin Brown

** Posted from

formatting link

**

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Sep 2, 2008 12:44 AM

Looks like some nice stuff there, it will take me a while to dig through it all.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Sep 2, 2008 12:46 AM

example.

Actually none of them were. They predated IEEE 754.

- J
- JosephKK
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Sep 2, 2008 1:04 AM

It is going to take me a while to get and go through all that. Meanwhile, i will keep asking questions to direct my study.

- R
- Rob Warnock
  
  Contact options for registered users
Vote on answer
posted
15 years ago

Tue, Sep 2, 2008 3:20 AM

+--------------- | What about when the vendor of the tool has gone out of business? I | have a perfect record so far on the chips. Every CPLD I have selected | for a product has gone off the market. I selected the Philips "cool | runner" for example. +---------------

Uh... Don't you mean *Xilinx* "CoolRunner"? That line was still quite active the last time I designed with them (within the last year)!!

formatting link

-Rob

p.s. Ah, yezz, I think I see now where you got the impression that they'd disappeared:

formatting link

... PHILIPS SALE OF CPLD COOLRUNNER PRODUCT LINE TO XILINX. Effective August 1999, Philips Semiconductors sold its CPLD CoolRunner product line to Xilinx ...

formatting link

Product Discontinuation Notice Number DN-42 December 31, 1999 ... - [PDF] list of Philips CPLD parts sold and transferred to xilinx inc.

formatting link

... This notice also contains an Exhibit C that lists clarification about certain previously discontinued products including the listing of CPLD CoolRunner products that were sold to Xilinx last August ...

----- Rob Warnock

627 26th Avenue San Mateo, CA 94403 (650)572-2607