Small, fast, resource-rich processor

Tim Wescott · 2013-09-11T16:11:04+00:00

I'm working on a project that needs to have a pretty hefty amount of digital signal processing done in more or less real time ("soft" real time, if you must split hairs). For a variety of reasons I think this algorithm would work best on a small single-board computer (my customer disagrees -- but getting it shoe- horned into the chips I was considering is going to take WORK, and I think it'll be cheaper for them to go with more expensive hardware). So I'm looking for suggestions. I mostly build custom boards or I make algorithms for other people's hardware -- I've never specified a single- board computer that's gone into production. I was thinking PC-104, but I've never actually used a PC-104 computer, and I have no idea, beyond trade-show displays, how the market has evolved. So, here's what I think I need. Anyone who wants to look through this and point me to the current crop of solutions for all this is welcome to do so -- I'll be grateful. Small: PC-104 form factor, or some other solution that's less than about 20 square inches of board and less than an inch tall. Fast: Something that supports native dual-precision floating point, and has a clock rate of 500MHz or better. This algorithm runs about 5x faster than real time as a Linux application on a Dell Dimension 8300. That's a 2.8GHz Pentium 4, so if it's running alone it should do more with less. Resource-rich: The algorithm runs, albeit way slow, on a STM32F407, using less than 128kB of memory. So at least that much memory plus whatever is necessary for any OS (see below). Ports: Comes with serial ports. I don't need Ethernet or that stuff. Depending on the processor (see below), having a JTAG debug port would be nice. Extensible: I need something onto which I can easily slap an ADC board, or something that talks USB, and suggestions for matching ADC modules that talk USB. My preference is something that has an easy parallel I/O implementation, an SPI controller that I can hook...

D

dp 12 years ago

Hmmm, may be I have misunderstood. Of course you are right about my wording, I should have written "nowhere near the same level", this is what I actually meant.

Vote

B

Boudewijn Dijkstra 12 years ago

Op Wed, 02 Oct 2013 03:02:15 +0200 schreef glen herrmannsfeldt :

I find this hard to imagine. Can you give (many) examples?

(Remove the obvious prefix to reply privately.) Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/

Vote

D

dp 12 years ago

I can't talk about "many" but the FPU found on the power architecture processors won't let you do that directly; you need to move the FP register to memory, then to a general purpose register, modify, then again to the FP register. That for anything you want to do with the FP register which is not sort of add, mul, fmad etc. FPU opcodes. But then one rarely wants to do that sort of thing anyway so it is unlikely to need it in a performance critical loop.

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

Vote

S

Stephen Pelc 12 years ago

Just to emphasise the point, one of our clients uses 128 bit scaled integers. The application is estimation of construction projects. The difference between the scaled integer solution and the FP solution just for the cost of concrete to cap the piles of the Hong Kong airport was US$ 10,000,000.

Stephen

Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads

Vote

P

Paul Rubin 12 years ago

Whaaaa? How are they doing that calculation? Obviously at least one of the two answers is wrong--but which? Have they tried it with something like MPRS and 1000 digit precision?

Vote

T

Tom Gardner 12 years ago

Did they also do a sensitivity analysis on the assumptions/presumptions on which the estimate was based? Maybe that $10m is lost in the noise!

Vote

S

Stephen Pelc 12 years ago

Naturally the integer estimate was closer to reality. Do you really think that that my client's competitors are going to explain why/how they got it wrong? This is one of those areas where you should assume the presence of non-disclosure agreements. We just supply tools and consultancy.

Stephen

Stephen Pelc, stephenXXX@mpeforth.com MicroProcessor Engineering Ltd - More Real, Less Time 133 Hill Lane, Southampton SO15 5AF, England tel: +44 (0)23 8063 1441, fax: +44 (0)23 8033 9691 web: http://www.mpeforth.com - free VFX Forth downloads

Vote

R

Randy Yates 12 years ago

David,

I wouldn't have known this if I hadn't just read about it in the spec a few hours earlier. I actually never knew about, and never have used, these "least" types before. However, I have used the intX_t and uintX_t types extensively for years and not being sure which were required is what prompted me to open the spec.

Randy Yates Digital Signal Labs http://www.digitalsignallabs.com

Vote

P

Paul Rubin 12 years ago

How do you know that? I'd expect it to depend on the numbers and calculations involved. Did you try it with 80 or 128 bit floats?

Vote

R

robert bristow-johnson 12 years ago

depending on if your species is on the decline or ascent, i guess.

i'll admit i've deliberately have tried to start a few fires. felt like the old growth needed to yield sunlight to the new undergrowth.

r b-j rbj@audioimagination.com "Imagination is more important than knowledge."

Vote

P

Paul Rubin 12 years ago

The driving implementation at the time was the Intel 8087. Not too much different from embedded processors of today. The supercomputer guys mostly just looked on with amusement.

That's why 80 and 128 bit were invented.

I'd like to see actual numbers about that. What I seem to be hearing is that the world's top numerics experts spend years agreeing that the right way to solve this problem is to do X, Y, and Z; and then some hardware guy or PHB at a microprocessor vendor says "well I'm smarter than all those experts, so I think X and Y sound fine but I'm going to leave out Z and save 5 cents on transistors". If that's what's going on, it's not impressive.

This is fine, they can have an option like --fast-math for users who want it, though they should also have --ieee-math, preferably as the default. It's less of a problem than the hardware vendor who removes following the standard as even as a possibility for the user.

I wonder how often they're actually qualified to make such decisions. One thing about standards is they're codifications of best practices. If someone builds a critical application and something goes wrong because they decided to ignore a standard, they're potentially in a world of hurt.

Vote

G

glen herrmannsfeldt 12 years ago

I suppose, but there is very little support for 128 outside IBM. IBM has supported it since the 360/85 around 1968.

Well, one thing in IEEE-754 that I don't think is worthwhile is denormals. The 64 bit format has an 11 bit exponent for a range of about -1023 to +1023. Denormals allow, approximately, the range to go down to -1040, and log2(1040) is about 10.02. So it allows for an additional 0.02 bits of exponent. How much additional logic does it take to do that?

As I understand it, many implementations interrupt and fix it up in software. That still takes a fair amount of hardware, but in a deep pipeline system is pretty much impossible.

If you really need more range, us an additional whole exponent bit.

But with a fixed number of bits, there is always a tradeoff between significand and exponent bits. Even so, 0.02 bits is pretty small.

Well, first of all, too much embedded work uses floating point when fixed point would be a better choice.

OK, say someone is building a heart rate monitor. First, a heart rate will never be NaN, and shouldn't be Inf. It could be zero, though. (But not within the range of floating point.)

For hospital use, it will have to have various certifications, such as the FDA, but, as far as I know, not the IEEE. (And it should be done in fixed point!)

-- glen

Vote

G

glen herrmannsfeldt 12 years ago

On systems without a hidden one, you can add zero with the appropriate exponent which will shift out bits, add zero, and post normalize. On S/360 and HFP in newer processors, you can do that if the number of bits to keep is a multiple of four.

Since the 8087 keeps values on the stack in its internal 80 bit format, and without a hidden one, it might work in that case. It would, though, be 8087 specific.

Truncating all the fractional bits, the AINT() function in Fortran, is a relatively common operation, and hardware might have some support for that.

Otherwise, you can multply by an appropriate power of two, truncate all the fractional bits, and divide by the power of two. Some hardware provides instructions to speed that up.

You can, of course, store the value and do all the work using bit manipulations in fixed point.

-- glen

Vote

R

Randy Yates 12 years ago

glen,

That's one of the few references I've seen to an 8087 in the past 15 years. Do the modern Intel processors still have this device buried somewhere deep within their core?

By the way, I can still remember the day (circa 1988) I brought home an

80287 coprocessor to complement my 80286 system and saw my 1024-point double FFT times drop from 100 seconds to 10 seconds; I was amazed!

Randy Yates Digital Signal Labs http://www.digitalsignallabs.com

Vote

T

Tim Wescott 12 years ago

My college senior thesis involved doing LOTS of simulations of a fairly complicated system (a tractor-trailer rig with three trailers). One day I lost patience, got up from my computer, went to the computer store three towns over and bought an 8087, got back home two hours later, interrupted the simulation that was STILL RUNNING, and installed the 8087 into my XP.

The simulation then completed in something under half an hour, if I remember correctly.

Tim Wescott Wescott Design Services http://www.wescottdesign.com

Vote

R

Randy Yates 12 years ago

They had computers way back then?

Randy Yates Digital Signal Labs http://www.digitalsignallabs.com

Vote

G

glen herrmannsfeldt 12 years ago

(snip, I wrote)

Most people say x87, but in this case I decided to say 8087.

The 80287 has the 8087 in its core. The 8087, like the 8088 and 8086 was designed to run on a clock with 33% duty cycle. (At least as you approach the maximum clock rate. You have to meet the clock low and clock high times.)

The 80286, on the other hand, runs with a 50% clock, and much faster than the 8086. The 80287 internal has much of the 8087 logic running on one (33%) clock, and bus interface logic to interface to the 80286 at its higher clock rate. The two clocks are asynchronous, synchronized internally.

My first 80286 machine had a socket for an 80287, but not good choices for the clock. I built a little board that plugged into the socket, held the 80287, and supplied it with a separate clock (from an 8284 clock generator).

The 80386, then, was designed to run with either an 80387 (rare in the beginning, and maybe later) or 80287. I don't know how much actual logic from the 8087 went into the 80387. It might have been a complete redesign.

The 80486 and later have the x87 logic built in, in whatever form.

The Pentium FDIV bug hints at a complete redesign of the floating point divide logic, if not the rest.

Continuing the clock story, the original IBM PC had a 14.31818MHz crystal (4*3.379545MHz) to make the CGA easier to build. Using that as the clock for the 8284 and 8088 (divide by three) got us the strange 4.77 MHz, running a 5MHz 8088. All to save one crystal.

Since the ISA bus supplied the 14.31818MHz signal, and some cards, especially the CGA, used it, all later ISA bus machines had to supply that clock, even if they didn't use it for anything else.

-- glen

Vote

S

Scott Hemphill 12 years ago

[snip]

I don't think it adds a lot of additional logic. (I have just recently implemented IEEE floating point using integer arithmetic with the idea of porting to the 6502.) One thing in its favor is that zero is just another denormal. If you don't have logic for denormals, then you have to have special logic to recognize and handle zero.

Scott

Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear

Vote

S

Scott Hemphill 12 years ago

OK, this wasn't very accurate. There are still places you have to "recognize and handle zero". There are other places where zero is just another denormal.

Scott

Scott Hemphill hemphill@alumni.caltech.edu "This isn't flying. This is falling, with style." -- Buzz Lightyear

Vote

R

rickman 12 years ago

I think that is poppycock. Whatever it is that the processor is doing to design the updated bitware can be done in the FPGA if you think about it a bit. What is so special about the "design" process that it can't just be coded into the FPGA in the first place?

The idea of *having* to compile a design and reload every aspect of the FPGA is a bit of a reach not to mention a very onerous programming task.

Rick

Vote

Small, fast, resource-rich processor

Join the Discussion

Didn't find your answer?