Subsample and Interpolation in hardware using LUT??

Question

Hi all,I read a technical report and get confused. I would like to ask youguys for help.In the technical report, they discribe a method to use LUT-basedapproach for  approximately implementing  a function f(x).Lets assume that the x input is a 4 bit number. (The actual number inthe report is 25 bit number but just take 4 bit number for the sake ofsimplicity).The problem is that they do not have enough memory to store all 2^4entries. So their solution is to sub-sample the original range0....2^4-1  down to 0...2^2-1 and store the data in a 2-bit-addressLUT and then using linear interpolation for obtaining the restvalues.  To do the sub-sampling, I see they use the right shiftoperator.I try to infer how they implement this idea in more detail.  But I amconfused how they populate the 2-bit address LUT?The original x is that : 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15.After using x>>2, I guess the sampled range x_s is as follws:0  4  8   12If so, the 2-bit address LUT is full with f(x_s).  The question is howto obtain the rest values: f(13), f(14), f(15)? Perhaps we need onemore memory cell to store f(16) for the interpolation.In your opinion, how it can work? If you have this problem how do youimplement?Thank you

rickman · Accepted Answer

I'm not sure what you are looking for. Are the samples f(13) through f (15) the only ones you are having trouble figuring out? To interpolate, you have to have two values, one on either end of a range. So, yes, you would need a value for f(16). Otherwise, you would need to extrapolate by assuming these samples follow the same line as the values between f(8) and f(12). If your actual data is over a wide range and there are a large number of decimated data samples, the extrapolation could work fairly well, but it would require a special circuit to do that. It would just be a lot easier to provide a value for f(16), even if this requires an extra location in memory.

Another possibility, a very messy possibility, would be to break the range into n-1 subranges. In your case of 2^4, this would leave three subranges of 5 elements each with 4 endpoints giving 3*5+1=16. So it would fit well in this case. If x_s is larger, say 256, after interpolation by 256 from an original x of 65,536 this requires that the 65,536 samples be subdivided into 255 ranges of 257 points with an extra end point. If x_s is not the square root of x, this does not result in evenly distributed interpolation subranges.

Of course, this is just me talking off the top of my head. Others may have worked on this before and found an easier way to deal with it.

Rick

Paul Carpenter · Answer

Sounds like all they have done is halved the number of points.

No you need at least an implied value if not actual end value (either first or last). To know what your end points are, if process always starts at 0, then use points 3, 7, 11, 15, if process always ends at

15 then use your set. Otherwise you need 5 points.

Either that of f(0) depending on your application.

A lot depends on your application.

Paul Keinanen · Answer

To reduce computational load, you can store both the function values for x=0, 4, 8 and 12 and also store the multiplier, by which the fractional part of x should be multiplied with, thus, values larger than 12 are automatically extrapolated.

Storing also the square and even cube terms for each fractional value in the table will more accurately follow the function and you can reduce the number of elements in the table.

Paul

rickman · Answer

THAT is the way to do it. You don't need to store *any* data points in memory. The slope and offset can be stored for each segment. This can be applied to whatever sized segments you need and as you say, it will reduce the computational load. These values can be precomputed to *minimize* the overall error rather than storing some points exactly and having an error on all the interpolated points which will mostly be in the same direction in any given segment and therefore not a minimum error overall.

I knew there had to be a good solution to this.

Rick

Paul Keinanen · Answer

Trigonometric functions are often calculated by first taking modulo 2 pi to get the argument into 0..2pi. The two most significant bits select the quadrant, the next 2-4 bits select the segment and the remaining bits are inserted into the polynomial a + bx + cx² + dx³ + .... in which a, b, c, d, .. are constants stored for each segment.

For single precision a 3rd or 4th degree polynomial is usually enough, for double precision 7-8 is usually needed, depending on the number of segments used.

Depending on the processor computational speed and available constant memory size, quite a lot of trading can be done between speed and size.

Paul

Subsample and Interpolation in hardware using LUT??

Join the Discussion

Didn't find your answer?