It doesn't take much numerical ill-conditioning before you see quite visible and evil effects from IEEE single-precision floats even in straightforward operations like relatively low-order poly evaluation.
Of course if you understand the problem you'll just compute the ratio of max/min singular values or whatever.
Where time is involved, it's quite possible to require more precision than a 24-bit mantissa yields.
Best regards, Spehro Pefhany
--
"it\'s the network..." "The Journey is the reward"
speff@interlog.com Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog Info for designers: http://www.speff.com
Been there, done that, too old and cranky to ever do it again. Damn, but I hate division. Computing 1/x and multiplying makes more sense. Never dividing makes more.
and getting it efficient as well
I program mostly bare assembly, and usually do all the math right on the spot, inline, with no official format or subroutines. It's all just numbers, right? Sorta like climbing Half Dome with just your bare hands.
I was just playing with the minimal amount of assembly code that will make a nice 4096-point sine table. Looks like maybe a dozen lines or so, but the rounding errors are annnoying. Maybe I'll just stick with the 2048-word lookup table, which is accurate and fast.
I wanted a small 256 sine lookup for an AVR. Being lazy I pulled one from the web. Didn't look right on the scope or spectrum analyser, so pulled another which for some reason had slightly different values. That wasn't right either. Curious I then pulled another 3, all different all crap!. The annoyance was that they all came with their own righteous mathematical analyses. Eventually had to do it myself. Being lazy is sometimes hard work.
For small tables with short word lengths, there must be some subtle arrangement of rounding that nets the purest spectrum. Hurts me head to think about that one.
I suppose you could Fourier it and find the amplitude/phase of all the harmonics, one by one, and try to poke the inverse correction into the table at the sub-LSB level, sort of buried in the rounding noise. Hurts me head.
What does "small 256 sine lookup" mean? 256 points? How many bits per point?
...Jim Thompson
--
| James E.Thompson, P.E. | mens |
| Analog Innovations, Inc. | et |
| Analog/Mixed-Signal ASIC\'s and Discrete Systems | manus |
| Phoenix, Arizona 85048 Skype: Contacts Only | |
| Voice:(480)460-2350 Fax: Available upon request | Brass Rat |
| E-mail Icon at http://www.analog-innovations.com | 1962 |
America: Land of the Free, Because of the Brave
Sure, I do all my EM simulations in single precision because of the memory, more than the speed (which nowadays is usually quite comparable--transistors being free).
Polynomials are surprisingly bad if you evaluate them from the coefficients of powers of X, because of the creeping linear dependence between your basis functions. If you use chebyshev polynomials instead, especially with Clenshaw's summation rule, it's much much better conditioned and almost as fast as Horner's rule for powers of X. [You already use Horner's rule, it's P(x) = a+x*(b+x*(c+x*(...)))].
This is an excellent book for implementing many functions that need to be done on small (or otherwise resource-limited) devices:
formatting link
. In many cases he has surprisingly clever ways of implementing functions that are smaller and faster than what the "straightforward" implementation would be.
Rational polynomials are nowhere near as bad. Are you familiar with Bjorne Gustavsen's "Vector Fitting" work
formatting link
He has Matlab code on his web site, and the algorithm itself has shown up in commercial software -- particularly those aimed at fitting measured S parameters. I believe that both AWR (Microwave Office) and Cascade Microtech were using it at some point, and it's surely founds its way into other packages as well.
Yup, anything other than Horner's would be insane.
Best regards, Spehro Pefhany
--
"it\'s the network..." "The Journey is the reward"
speff@interlog.com Info for manufacturers: http://www.trexon.com
Embedded software/hardware/analog Info for designers: http://www.speff.com
Not wanting to get into FPU yet, I wrote a sine/cosine table for my various assembly programming explorations. 0 to 359 degrees (plus same range for cosine), signed integer so needs a shift after the multiply to get nearly correct values (SIN(angle) * 65534 is what, 20ppm shy of correct).
I generated it with QBasic, with some string processing to put in the spaces and "word" and lowercase hex conventions that I prefer. It compiles in MASM, probably among others.
Sine word 00h, 23ch, 478h, 6b3h, 8eeh, 0b28h, 0d61h, 0f99h, 11d0h word 1406h, 163ah, 186ch, 1a9dh, 1ccbh, 1ef7h, 2121h, 2348h, 256ch word 278eh, 29ach, 2bc7h, 2ddfh, 2ff3h, 3203h, 3410h, 3618h, 381ch word 3a1ch, 3c17h, 3e0eh, 3fffh, 41ech, 43d4h, 45b6h, 4793h, 496ah word 4b3ch, 4d08h, 4ecdh, 508dh, 5246h, 53f9h, 55a5h, 574bh, 58eah word 5a82h, 5c13h, 5d9ch, 5f1fh, 609ah, 620dh, 6379h, 64ddh, 6639h word 678dh, 68d9h, 6a1dh, 6b59h, 6c8ch, 6db7h, 6ed9h, 6ff3h, 7104h word 720ch, 730bh, 7401h, 74eeh, 75d2h, 76adh, 777fh, 7847h, 7906h word 79bbh, 7a67h, 7b0ah, 7ba2h, 7c32h, 7cb7h, 7d33h, 7da5h, 7e0dh word 7e6ch, 7ec0h, 7f0bh, 7f4bh, 7f82h, 7fafh, 7fd2h, 7febh, 7ffah Cosine word 7fffh, 7ffah, 7febh, 7fd2h, 7fafh, 7f82h, 7f4bh, 7f0bh, 7ec0h word 7e6ch, 7e0dh, 7da5h, 7d33h, 7cb7h, 7c32h, 7ba2h, 7b0ah, 7a67h word 79bbh, 7906h, 7847h, 777fh, 76adh, 75d2h, 74eeh, 7401h, 730bh word 720ch, 7104h, 6ff3h, 6ed9h, 6db7h, 6c8ch, 6b59h, 6a1dh, 68d9h (And so on . . .)
It makes a reasonable circle, which is good enough for me.
Generating tables in HLL is pretty typical business.
Tim
--
Deep Friar: a very philosophical monk.
Website: http://webpages.charter.net/dawill/tmoranwms
A list of 256 values, each value 8 bits. A 256 point approximation of one complete sine cycle, rather than the accurate, 16 bit, 1/4 cycle shown by Vladimir.
Has anyone implemented (in hardware) Don Lancaster's Magic Sinewaves scheme?
...Jim Thompson
--
| James E.Thompson, P.E. | mens |
| Analog Innovations, Inc. | et |
| Analog/Mixed-Signal ASIC\'s and Discrete Systems | manus |
| Phoenix, Arizona 85048 Skype: Contacts Only | |
| Voice:(480)460-2350 Fax: Available upon request | Brass Rat |
| E-mail Icon at http://www.analog-innovations.com | 1962 |
America: Land of the Free, Because of the Brave
It depends on how you evaluate them--from coefficients or from a nice stable recurrence. Dividing one polynomial by another can put off the evil day by reducing the required order of numerator and denominator, but it doesn't affect the underlying ill-conditioning of the powers-of-x basis set.
--
| James E.Thompson, P.E. | mens |
| Analog Innovations, Inc. | et |
| Analog/Mixed-Signal ASIC\'s and Discrete Systems | manus |
| Phoenix, Arizona 85048 Skype: Contacts Only | |
| Voice:(480)460-2350 Fax: Available upon request | Brass Rat |
| E-mail Icon at http://www.analog-innovations.com | 1962 |
America: Land of the Free, Because of the Brave
Most of what is puzzling you is some lacking of some fundamentals, and never having had to wade through both integer and floating point at the gate level, like i did. As always, for both integer and floating point, divide is the slowest. For multiply Wallace trees can provide great acceleration, they just do not help all that much for divide. Perhaps you should start with fundamental theory (gate and algorithm level) of computer arithmetic, it is not all that difficult. Then proceed to Booth's algorithm(s) and Wallace trees. Do integer first, then learn barrel shifters. They are crucial to decent FP add/subtract performance. At that point everything is just extensions and combinations of what you already know.
Actually both the 8087 and its successors, and the 68881 and its successors, as well as later MIPS, SPARCs, and HP-PARISC, and IBM POWER processors all do correct and complete IEEE 754 floating point. Most of cause interrupts on each denormal result.
Now that is an inefficient implementation issue.
For the x86 series the FPU was brought onboard with the 486, and with the 68020 in that series. I think it was incorporated in MIPS in the
4000 series and later, and the first SPARCs. Superscalar came later.
Do you know of any processor that handles denormals in hardware, other than by flushing to zero? I haven't found one, and I don't think any exists. That's what I mean by "no hardware support".
ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here.
All logos and trade names are the property of their respective owners.