Math is hard

- R
- Rob Gaddi
  
  Contact options for registered users
posted
10 years ago

Fri, Jan 17, 2014 7:00 PM

Hey y'all --

So this is one of those times that my lack of serious math chops is coming round to bite me, and none of my usual books is helping me out. I'm hoping someone has some thoughts.

I'm trying to approximate either exp(-1/(n+1)) or 4^(-1/n+1). I can convince myself I don't care which. n is an integer from 1-65535, and the result should be fixed-point fractional, probably U0.18. The function output is always between 0-1, and goes up like a rocket for small n before leveling off to a steady cruise of >0.9 for the rest of the function domain.

I'm working in an FPGA, so I've got adds, multiplies, and table lookups from tables of reasonable size (10s of kb) cheap, but other things (divides especially) are expensive. I can throw several clock cycles at the problem if need be.

Taylor series attacks seem to fail horribly. I feel like there may be some answer where answers for n in [1,127] gets a direct table lookup, and n in [128,65535] gets some other algorithm, possibly with a table boost. Or somehow taking advantage of the fact that log(1-f(n)) is related to log(n)?

Anyone have any thoughts?

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.H

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 7:30 PM

(snip)

You want an 18 bit function of a 16 bit input. Fits into one BRAM on most current, and not so current, FPGAs. Generate all 65536 values and store them in BRAM (ROM in verilog/VHDL terms).

-- glen

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 7:31 PM

First, instead of Taylor's series, try a best-fit polynomial. I suspect you'll be nearly as disappointed, but you can try.

Second, when best-fit polynomials let me down, I look to interpolation. Depending on the problem this ends up being somewhere between linear and cubic.

You should be aware that for any one way of going from your tables of values to an interpolation, there's way more than one way of coming up with tables of values, some of which are better than others.

Depending on how motivated you are you can either generate splines, or do a least-squares best fit, or find an overall minimax solution (minimize the maximum error). Note that this doesn't change the algorithm in your FPGA -- it just changes the values in your look-up tables. Formulating the problem as linear interpolation and using the known points, or formulating the problem as cubic splines and looking up the algorithm to generate your tables is probably the easiest if you've got the typical engineer's "jack of all trades" way with math.

If you want to get fancier, then once you have the problem formulated, least-squares can be done with linear algebra on something like Scilab or Matlab in less time than it takes to lift your finger off the enter key. Getting a minimax solution requires more work both in formulating the problem and in waiting for the optimizer to finish. It also means you have to have a decent optimization software package (which Scilab comes with: I'm not sure if it's native to Matlab or if you have to buy yet another package).

--

Tim Wescott 
Wescott Design Services 
http://www.wescottdesign.com

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 8:21 PM

Taylor series are great for the theory, but very rarely a useful calculation technique in practice.

In a case like this, I would first look at using a single table - if you can afford the space, use it for simplicity.

Failing that, draw a graph and look at the function as it changes. Also draw some graphs of the derivatives. Any areas of the function where you have low second derivatives are suitable for linear interpolation. Where you have high second derivatives, you need either tightly packed linear interpolation or cubic splines of some sort. There are a few different kinds of cubic splines, depending on things like smooth joins between the parts, least maximal errors, least total square error, etc. But for a limited range function like this, it is not unrealistic to get bit-perfect results if that is what you need.

So once you have these graphs, you can look at linear interpolation with a fixed step size, linear interpolation with varying step size, cubic splines of some sort, or dividing the range up and using different techniques for different areas.

- E
- evilkidder
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 8:35 PM

formatting link

The "Generic fixed point function evaluators" might provide a solution.

Caveat; I've played with some of the floating point stuff but not this bit.

-Andy

- R
- Randy Yates
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 9:19 PM

Or if a "full" table is too big, consider reducing the table size and interpolating between table entries.

--
Randy Yates 
Digital Signal Labs 
http://www.digitalsignallabs.com

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 9:45 PM

(snip)

Using a table of interpolation values and an adder.

-- glen

- G
- GaborSzakacs
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 9:56 PM

Glen,

What devices are you using. My measly Xilinx parts only have

36K bits at most per BRAM. I'd need 36 of those to do this table.

--
Gabor

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 10:38 PM

(snip, I wrote)

(snip)

Last time I was working with Xilinx, I only needed tiny RAMs.

The BRAMs have 18 inputs and 18 outputs (which I remembered) but it seems to be dual port, two 9 bit addresses (the part I didn't remember).

The Spartan-6 devices have between 12 and 268 such BRAMs, though.

How many such BRAM can you use? Easiest way to do interpolation is with look-up tables for the interpolation values.

Start with a 2Kx18 RAM, so 11 address bits. A second 2Kx18 RAM is addresses by the high bits (six in this case) of the address, and the rest (five bits) from the 16 bit address. That, plus an 18 bit adder, possibly pipelined, will give you linear interpolation on the 2Kx18 values.

If that isn't enough, two interpolation RAMS, indexed by combinations of the high and low address bits, will do better. One way to look at the interpolation ROM is that the high bits select a slope, and the low bits select an appropriate multiple of that slope to be added.

That is what was done when MOS ROMs were about that size. You can also use the block multipliers, with interpolation tables as inputs.

A lot depends on how fast it needs to be, and how many of them you need at the same time.

-- glen

- R
- robert bristow-johnson
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Fri, Jan 17, 2014 11:34 PM

to what degree of accuracy do you need this

for instance, for one octave range, these finite-order polynomial are accurate to something like 1/10000 octave. (i can't seem to find the max error anymore.)

at least for the first few values of integer n. after n gets large enough, you might be able to use a sparce table and interpolate.

--

r b-j                  rbj@audioimagination.com 

"Imagination is more important than knowledge."

- F
- Fred Bartoli
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jan 18, 2014 12:05 PM

Required accuracy? (may be absolute or relative)

expansion (ratio of polynomials) but this involves one division.

Over the [1..65535] range the absolute accuracy you have is

1,013E-3 for order {1,1} 1.054E-6 for order {2,2} 4,71E-10 for order {3,3}

At 2.5E-3, orders {1,2} and {2,1} give accuracies worse than the {1,1} and are thus not worth.

Given that you want a 18bits result the {2,2} expension is clearly enough, but maybe {1,1} will suit your need.

Optimal expansion (symetrical absolute max error) is for: x0=3.3613 (order 1,1) x0=3.21001 (order 2,2)

Not much error is introduced when approximating the x0s to

10/3 or 34/10 for (1,1) and 10/3 or 32/10 for (2,2)

With a general form p(x) = k (1 + a x + b x^2)/(1 + c x + d x^2) that gives: for order (1,1) {k,a,c} xo=10/3 => {4/(9 Exp(3/13), 29/16, 23/36} xo=34/10 => {27/(61 Exp(5/22)), 49/27, 39/61}

for order (2,2) {k,a,b,c,d} xo=10/3 => {337/(727 Exp(3/13)), 1725/674, 2271/1348,2271/1454,1803/2908} xo=32/10 => {883/(1891 Exp(5/21)), 4519/1766, 5947/3532, 5905/3782,

4687/7564}

--
Thanks, 
Fred.

- D
- Dave
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jan 18, 2014 1:51 PM

The relevant books also seem to be both rare and expensive....

formatting link

Well almost all approximations to functions end end up as a power series of some sorts. If you read Hastings he uses truncated power series with coefficients optimized to minimize the error over the domain in question. When I wrote a "C" Maths library for the IBM/370 (yes I am a Vintage Computer Geek) I wrote a REXX program to calculate the functions to a higher accuracy than needed and then optimize the co-efficients of the taylor series needed.

Load it into RAM or Flash at startup....

See Above Dave

- B
- Brian Drummond
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Sat, Jan 18, 2014 10:06 PM

Quadratic interpolation is easy, and gets arbitrarily close to most smooth curves.

Mock it up in a spreadsheet until its error is acceptably low, then transfer to HDL. If the quadratic coefficients end up being vanishingly small, then you can simplify to linear interpolation.

- Brian

- J
- John Miles
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jan 20, 2014 2:56 AM

What are you actually using the function to accomplish? Maybe there's a way to compute it incrementally.

-- john

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jan 20, 2014 6:11 PM

I did a little playing with it again this morning, and just tried curve fitting just the first small segment (n = 1:128). Even a cubic fit has immense errors over just that short of a range. exp() really has pretty vicious behavior.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jan 20, 2014 6:32 PM

Thanks for the all the ideas, everyone. I think at the end of the day, I'm going to solve this problem by choosing to solve a different problem instead.

The application was so trivial as to not be nearly worth all this nonsense; a programmable asymmetric debounce of a digital input with linearish behavior (a little momentary glitch should only slow the resolution a bit, not reset the entire counter). The spec as written called for the time delay to be programmable in steps of

10us. "Huh" I say to myself. "I'll just model this as anti-parallel diodes, programmable resistors, a cap, and a Schmitt trigger." After all, tau is conveniently linear on R in continuous time.

So you've got y[n] = (1-a)x[n] + ay[n-1], and you get the result that for a time-constant of N clocks, you get a=exp(-1/(N+1)) for limits of .36 and .63, or a=4(-1/(N+1)) for 0.25 and 0.75. And then you get the reality that crunching that equation is apparently horrible.

So the new plan is, since no one's overwhelmingly committed to that particular implementation, that instead delays will be specified as the slew rate at which an up/down counter will be fed; proportional to

1/delay, and I'll make the customer do the divide himself in the comfort and privacy of his own FPU.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- R
- robert bristow-johnson
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Mon, Jan 20, 2014 10:11 PM

...

not as bad as log. i can fit exp() over an octave range rather nicely with a 4th-order polynomial. log() needs a 6th-order to be as good over an octave.

but to do the asymptotic exp(-x), any polynomial used to fit this will fail at the tail. you would need 1/x or some other function with a similar asymptote in there. so division is unavoidable, unless you do LUT.

--

r b-j                  rbj@audioimagination.com 

"Imagination is more important than knowledge."

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Jan 21, 2014 9:45 AM

You can't fit a cubic (or other polynomial) by itself - you need a spline. And for a curve like this, you want varying steps, with far more steps at the lower range. You might be able to divide it up using the first zero as the index (i.e., ranges 1..2, 2..4, 4..8, 8..16, etc.) The lowest range - say up to 16 - is probably best handled with a straight lookup table. After that, you'd need 12 sets of coefficients. If cubic interpolation here is not good enough, try higher odd powers (anchor your polynomials at the end points and the derivatives at these points to get a smooth spline).

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Jan 21, 2014 5:18 PM

Not sure I'm following you there, this could be a problem with my understanding. When I hear people talking about spline fitting, to my mind that's a series of piecewise polynomial approximations. Piecewise linear being the degenerate case. Is that a reasonable statement?

The overall range I was trying to span was integer N 1:65535. Trying to fit a cubic to the range 1:128 was at attempt to see whether even going to 512 (linearly spaced) pieces was going to give me a decent approximation. At least in the high curvature section of small N, the results were ghastly.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- R
- Randy Yates
  
  Contact options for registered users
Vote on answer
posted
10 years ago

Tue, Jan 21, 2014 5:33 PM

Ron, why is such a fancy debouncing algorithm necessary? Just curious.

--
Randy Yates 
Digital Signal Labs 
http://www.digitalsignallabs.com