Small, fast, resource-rich processor

Hmmm, may be I have misunderstood. Of course you are right about my wording, I should have written "nowhere near the same level", this is what I actually meant.

Reply to
dp
Loading thread data ...

Op Wed, 02 Oct 2013 03:02:15 +0200 schreef glen herrmannsfeldt :

I find this hard to imagine. Can you give (many) examples?

--
(Remove the obvious prefix to reply privately.) 
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/
Reply to
Boudewijn Dijkstra

I can't talk about "many" but the FPU found on the power architecture processors won't let you do that directly; you need to move the FP register to memory, then to a general purpose register, modify, then again to the FP register. That for anything you want to do with the FP register which is not sort of add, mul, fmad etc. FPU opcodes. But then one rarely wants to do that sort of thing anyway so it is unlikely to need it in a performance critical loop.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Reply to
dp

Just to emphasise the point, one of our clients uses 128 bit scaled integers. The application is estimation of construction projects. The difference between the scaled integer solution and the FP solution just for the cost of concrete to cap the piles of the Hong Kong airport was US$ 10,000,000.

Stephen

--
Stephen Pelc, stephenXXX@mpeforth.com 
MicroProcessor Engineering Ltd - More Real, Less Time 
 Click to see the full signature
Reply to
Stephen Pelc

Whaaaa? How are they doing that calculation? Obviously at least one of the two answers is wrong--but which? Have they tried it with something like MPRS and 1000 digit precision?

Reply to
Paul Rubin

Did they also do a sensitivity analysis on the assumptions/presumptions on which the estimate was based? Maybe that $10m is lost in the noise!

Reply to
Tom Gardner

Naturally the integer estimate was closer to reality. Do you really think that that my client's competitors are going to explain why/how they got it wrong? This is one of those areas where you should assume the presence of non-disclosure agreements. We just supply tools and consultancy.

Stephen

--
Stephen Pelc, stephenXXX@mpeforth.com 
MicroProcessor Engineering Ltd - More Real, Less Time 
 Click to see the full signature
Reply to
Stephen Pelc

David,

I wouldn't have known this if I hadn't just read about it in the spec a few hours earlier. I actually never knew about, and never have used, these "least" types before. However, I have used the intX_t and uintX_t types extensively for years and not being sure which were required is what prompted me to open the spec.

--
Randy Yates 
Digital Signal Labs 
 Click to see the full signature
Reply to
Randy Yates

How do you know that? I'd expect it to depend on the numbers and calculations involved. Did you try it with 80 or 128 bit floats?

Reply to
Paul Rubin

depending on if your species is on the decline or ascent, i guess.

i'll admit i've deliberately have tried to start a few fires. felt like the old growth needed to yield sunlight to the new undergrowth.

--
r b-j                  rbj@audioimagination.com 

"Imagination is more important than knowledge."
 Click to see the full signature
Reply to
robert bristow-johnson

The driving implementation at the time was the Intel 8087. Not too much different from embedded processors of today. The supercomputer guys mostly just looked on with amusement.

That's why 80 and 128 bit were invented.

I'd like to see actual numbers about that. What I seem to be hearing is that the world's top numerics experts spend years agreeing that the right way to solve this problem is to do X, Y, and Z; and then some hardware guy or PHB at a microprocessor vendor says "well I'm smarter than all those experts, so I think X and Y sound fine but I'm going to leave out Z and save 5 cents on transistors". If that's what's going on, it's not impressive.

This is fine, they can have an option like --fast-math for users who want it, though they should also have --ieee-math, preferably as the default. It's less of a problem than the hardware vendor who removes following the standard as even as a possibility for the user.

I wonder how often they're actually qualified to make such decisions. One thing about standards is they're codifications of best practices. If someone builds a critical application and something goes wrong because they decided to ignore a standard, they're potentially in a world of hurt.

Reply to
Paul Rubin

I suppose, but there is very little support for 128 outside IBM. IBM has supported it since the 360/85 around 1968.

Well, one thing in IEEE-754 that I don't think is worthwhile is denormals. The 64 bit format has an 11 bit exponent for a range of about -1023 to +1023. Denormals allow, approximately, the range to go down to -1040, and log2(1040) is about 10.02. So it allows for an additional 0.02 bits of exponent. How much additional logic does it take to do that?

As I understand it, many implementations interrupt and fix it up in software. That still takes a fair amount of hardware, but in a deep pipeline system is pretty much impossible.

If you really need more range, us an additional whole exponent bit.

But with a fixed number of bits, there is always a tradeoff between significand and exponent bits. Even so, 0.02 bits is pretty small.

Well, first of all, too much embedded work uses floating point when fixed point would be a better choice.

OK, say someone is building a heart rate monitor. First, a heart rate will never be NaN, and shouldn't be Inf. It could be zero, though. (But not within the range of floating point.)

For hospital use, it will have to have various certifications, such as the FDA, but, as far as I know, not the IEEE. (And it should be done in fixed point!)

-- glen

Reply to
glen herrmannsfeldt

On systems without a hidden one, you can add zero with the appropriate exponent which will shift out bits, add zero, and post normalize. On S/360 and HFP in newer processors, you can do that if the number of bits to keep is a multiple of four.

Since the 8087 keeps values on the stack in its internal 80 bit format, and without a hidden one, it might work in that case. It would, though, be 8087 specific.

Truncating all the fractional bits, the AINT() function in Fortran, is a relatively common operation, and hardware might have some support for that.

Otherwise, you can multply by an appropriate power of two, truncate all the fractional bits, and divide by the power of two. Some hardware provides instructions to speed that up.

You can, of course, store the value and do all the work using bit manipulations in fixed point.

-- glen

Reply to
glen herrmannsfeldt

glen,

That's one of the few references I've seen to an 8087 in the past 15 years. Do the modern Intel processors still have this device buried somewhere deep within their core?

By the way, I can still remember the day (circa 1988) I brought home an

80287 coprocessor to complement my 80286 system and saw my 1024-point double FFT times drop from 100 seconds to 10 seconds; I was amazed!
--
Randy Yates 
Digital Signal Labs 
 Click to see the full signature
Reply to
Randy Yates

My college senior thesis involved doing LOTS of simulations of a fairly complicated system (a tractor-trailer rig with three trailers). One day I lost patience, got up from my computer, went to the computer store three towns over and bought an 8087, got back home two hours later, interrupted the simulation that was STILL RUNNING, and installed the 8087 into my XP.

The simulation then completed in something under half an hour, if I remember correctly.

--
Tim Wescott 
Wescott Design Services 
 Click to see the full signature
Reply to
Tim Wescott

They had computers way back then?

--
Randy Yates 
Digital Signal Labs 
 Click to see the full signature
Reply to
Randy Yates

(snip, I wrote)

Most people say x87, but in this case I decided to say 8087.

The 80287 has the 8087 in its core. The 8087, like the 8088 and 8086 was designed to run on a clock with 33% duty cycle. (At least as you approach the maximum clock rate. You have to meet the clock low and clock high times.)

The 80286, on the other hand, runs with a 50% clock, and much faster than the 8086. The 80287 internal has much of the 8087 logic running on one (33%) clock, and bus interface logic to interface to the 80286 at its higher clock rate. The two clocks are asynchronous, synchronized internally.

My first 80286 machine had a socket for an 80287, but not good choices for the clock. I built a little board that plugged into the socket, held the 80287, and supplied it with a separate clock (from an 8284 clock generator).

The 80386, then, was designed to run with either an 80387 (rare in the beginning, and maybe later) or 80287. I don't know how much actual logic from the 8087 went into the 80387. It might have been a complete redesign.

The 80486 and later have the x87 logic built in, in whatever form.

The Pentium FDIV bug hints at a complete redesign of the floating point divide logic, if not the rest.

Continuing the clock story, the original IBM PC had a 14.31818MHz crystal (4*3.379545MHz) to make the CGA easier to build. Using that as the clock for the 8284 and 8088 (divide by three) got us the strange 4.77 MHz, running a 5MHz 8088. All to save one crystal.

Since the ISA bus supplied the 14.31818MHz signal, and some cards, especially the CGA, used it, all later ISA bus machines had to supply that clock, even if they didn't use it for anything else.

-- glen

Reply to
glen herrmannsfeldt
[snip]

[snip]

I don't think it adds a lot of additional logic. (I have just recently implemented IEEE floating point using integer arithmetic with the idea of porting to the 6502.) One thing in its favor is that zero is just another denormal. If you don't have logic for denormals, then you have to have special logic to recognize and handle zero.

Scott

--
Scott Hemphill	hemphill@alumni.caltech.edu 
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear
Reply to
Scott Hemphill

OK, this wasn't very accurate. There are still places you have to "recognize and handle zero". There are other places where zero is just another denormal.

Scott

--
Scott Hemphill	hemphill@alumni.caltech.edu 
"This isn't flying.  This is falling, with style."  -- Buzz Lightyear
Reply to
Scott Hemphill

I think that is poppycock. Whatever it is that the processor is doing to design the updated bitware can be done in the FPGA if you think about it a bit. What is so special about the "design" process that it can't just be coded into the FPGA in the first place?

The idea of *having* to compile a design and reload every aspect of the FPGA is a bit of a reach not to mention a very onerous programming task.

--

Rick
Reply to
rickman

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.