dsp, arithmetic scaling questions, advice

I've done a lot of control unit (state machine) type designs in the past, but not much in the DSP, arithmetic realm.

I'm working on a program which will generate some VHDL or Verilog code to do some arithmetic operations. The code generated should be able to be synthesized into various FPGA families (an option to the code generating program). The function is something along the lines of:

threshold= -15.0 #typically between -20.0 and -10.0 depending on problem X = vector of numbers in the range of -infinity to 0.0 Y = vector of numbers in the range of -n to +n accumulator = 0 foreach x in X, each y in Y if x > threshold #typically -20.0 to -10.0 depending on the problem accumulator += y*exp(x) end end answer = accumulator + bias

Currently I've got a lookup table for the exp(x) function which works fine for this application since the values of x will always be negative and we canb disregard values of x which are less than some threshold (since for our purposes the result is essentially 0). I'm simulating the lookup table using a case statement in a programming language called Ruby. I can then replace the exp() function with the lookup table version in the code and compare results. So far the lookup table seems very manageable - 64 entries seems to work just fine. I need to do a bit more research, but it seems like 256 entries or less should work for the majority of problems.

Now I need to start migrating more towards a hardware implementation. So I need some way to represent the values from the exp() function in binary. I'm looking for advice on how to scale these numbers. I'm currently thinking that given 8 bits (though the number of bits will probably be variable depending on the application and the size of the FPGA) I have a range of 0 to 255, so exp(0) (which is 1.0, of course) could be represented by 255 and exp(threshold) could be represented by 0. Then multiply this integer (between 0 and 255) by the y value which is itself scaled, and add (or subtract, depending on the sign) the result to/from the accumulator. At this point, I suspect that I would need to disregard some number of LSBs in the accumulator (because the x values were essentially scaled up to be greater than 0).

Am I on the right track?

Phil

Reply to
Phil Tomson
Loading thread data ...

The lower limit of X sounds a little difficult to synthesise :-)

OK, but of course you need to be very careful that you understand all the issues about precision - too few entries in your lookup table will give significant errors in some cases. For well-behaved functions such as exp() it's sometimes appropriate to use a fairly coarse lookup table and then do linear interpolation between the entries. Make each LUT entry hold two values: the value at the chosen point, and the gradient from there to the next point. (Hey, for exp() they're the same!!!) Then split the binary x value into two fields: the most significant bits address the LUT, and the less significant bits are multiplied by the gradient and added to the LUT first-order result.

Be somewhat wary of simply dropping unwanted LSBs. Whenever you do that, you are introducing a bias into the result of (on average) a 1 in the most significant lost bit. This is usually harmless on a single value, but if you're accumulating many values then the truncation errors can stack up. Consider rounding instead: add a 1 in the most significant lost position, and then drop the LSBs (or, if you prefer, increment the truncated result if the most significant dropped bit is 1).

Yes indeed. You are, however, mixing together two slightly different (but closely related) ideas and it might be a good idea to separate them.

First, you are considering scaling all the numbers in the problem (scaling your exponent by 255). This is of course entirely reasonable if (1) you can get the table into that form easily, and (2) you can "un-scale" the final result before making use of it.

Second, you are quite properly thinking about how to represent non-integral values as bit patterns in hardware. Such bit patterns of course end up looking rather like integers. You then need to worry about dealing with rounding and overflow conditions, because your hardware has a limit on the number of bits at its most significant and at its least significant end. Once again you are obviously aware of this. The usual jargon is "fixed point arithmetic". I'm sure you can find information on this in all kinds of places, but I wrote something about it a year or two back when I built my own fixed-point VHDL library - you can read my notes by downloading the stuff at

formatting link
It's a reasonable "beginners' guide" to fixed-point binary arithmetic. However, please DON'T use the fixed-point package you'll find there, for two reasons: (1) it's not very well tested, and I found a horrible bug in it a couple of months ago (2) there's a proposed standardised fixed-point package, written by David Bishop, now available at
formatting link
(look for the item "IEEE.Fixed_Pkg).

Finally, it sounds as though you've done all the prototyping in Ruby so this probably doesn't help, but... if you use something like Matlab then you would be able to simulate all the effects of limited bit precision without too much trouble.

Hope this helps a bit.

--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
VHDL, Verilog, SystemC, Perl, Tcl/Tk, Verification, Project Services

Doulos Ltd. Church Hatch, 22 Market Place, Ringwood, BH24 1AW, UK
Tel: +44 (0)1425 471223          mail:jonathan.bromley@doulos.com
Fax: +44 (0)1425 471573                Web: http://www.doulos.com

The contents of this message may contain personal views which 
are not the views of Doulos Ltd., unless specifically stated.
Reply to
Jonathan Bromley

Right... that's the reason for that threshold test :) But in the hardware implementation, that will be our minium value that you can address the lookup table with.

Actually, I'm doing some experiements right now to determine how 'course' we can make the lookup table. It depends on the problem of course, so I also need to come up with a way to analyze a given problem to determine this prior to generating the lookup table. It turns out that for one of the problems a lookup table for exp(x) with only 32 entries works fine whereas for a different problem a lookup table closer to 640 entires is needed.

good point. What I was thinking was that the resolution of my LUT (determined via experiments as I mentioned above) could determine the number of bits addressing my LUT. So if x were an 8 bit integer (256 locations) and I only need 128 entries in my LUT then I was thinking that I would only use the 7 most significant bits of x to address the LUT. Does this make sense? Of course this all has to be scaled correctly so that the resolution lost by eliminating the lower bit will match the resolution of the LUTs in the experimental results... I suspect that could mean adding locations in the LUT to make it easier to match up the two without losing too much precision. Lucily, so far precision doesn't seem to be a problem.

Thanks for the pointer, I'll have a look.

Actually, it's quite doable in Ruby as well - there's the RHDL (Ruby Hardware Description Language) as well as some other libraries (it's fairly easy to do ranged integers, for example) that should make it fairly easy. Currently I'm generating lookup tables (case statements, actually) for various problem/data sets and varying the range and resolution until I get failures from the testcases associated with these problems. This is easily done by replacing the standard 'exp()' function with the lookup table (which are all still in floating point using case statments) but the next step is to convert to using ranged integers. After that I should be able to start having the program generate VHDL.

Yes, thanks for the pointers.

Phil

Reply to
Phil Tomson

Rather than a straight lut for the exp function, you treat z more like floating point and use a barrel shift along with a LUT. First convert to base 2. e^x = 2^(x/ln(2)) by multiplying x by the constant

1/ln(2). The product generally has an integer and fractional portion: y= i + f. 2^(i+f)= (2^i)*(2^f) That presents a solution then: Use a look-up table to find 2^f. The table size is now bounded by the precision you require in the significand. Then use a barrel shifter to shift the result of the look up by the number of bits indicated by the integer portion. Your circuit then, is a constant multiply to convert to base 2, a look-up table addressed by the fractional bits out of the multiplier, and a barrel shift that shifts the results of the look-up by the integer bits out of the multiplier. You'll of course need a delay on the integer portion to match the latency of the lookup-table.

Hope this helps you to see a solution.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759
Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.