Algorithm for x/63 and x/127

- N
- Nikolaos Kavvadias
  
  Contact options for registered users
posted
9 years ago

Sat, Oct 11, 2014 7:43 AM

Dear all,

I believe I have invented a formula for quotient calculation that works for x/63 and x/127.

The formula is as follows and it is divisionless and multiplierless:

y = (((x>>n)+x+((1n)-1;

Use n=6 for 63, and n=7 for 127. 1

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 9:09 AM

For big enough values, it won't be right. Those may be bigger than you need. For n=6, it seems to fail at 4095, and for n=7 and 16383.

Traditionally, integer division truncates. I am not sure about the 1 and 1

- N
- Nikolaos Kavvadias
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 12:05 PM

Dear Glen,

your observations are correct! I am trying to avoid any multiplication including the one with the multiplicative inverse (the reciprocal of the constant).

I have updated my claims for the algorithm. It works for any constant divisor (but I've only tested n=2,3,4,5,6,7,8, and given the following assumptions:

Also, the range for x is [0,2^(2*n)-2], i.e. for n = 6, the range indeed is

0:4094.

And a test program:

[code] #include #include #include

/* main: */ int main(void) { int qapprox, qexact; int i, j, k;

for (i = 2; i < 8; i++) { for (j = 1; j < (1

- T
- Tim Wescott
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 2:57 PM

Without working through the math, I think that you're using a special case of the more general approximation for the reciprocal of x:

1/x ~ 1/x0 - (x0 - x)/x0^2

Do the math, and I think you'll find that embedded in the expression above. It's just the first two terms of the Taylor's expansion of 1/x, and works without multiplication because x/64 = x >> 6 (ignoring rounding) and because (64 - 63) = 1.

--
www.wescottdesign.com

- N
- Nikolaos Kavvadias
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 5:48 PM

Hi Tim,

Great observation!

Indeed, I get the asymptotic behavior that would be typical for a Taylor expansion; when away from x0, the errors start to escalate since the approximation diverges.

Still it would be interesting to see if there is any known use of this hack, beforehand my claim. It's dated May 16, 2011 in my notes, but no other details survive, apart from remembering that I used sketching.

Best regards Nikolaos Kavvadias

formatting link

- T
- Tauno Voipio
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 6:17 PM

I used a similar method in a floating-point math package I wrote in the mid-70's. The difference is that, to get acceptable accuracy, I had to use the second-order term too.

--

Tauno Voipio

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 7:01 PM

Am 11.10.2014 um 19:48 schrieb Nikolaos Kavvadias:

Sorry to burst your bubble, but that optimization technique has been around sind effectively forever; it may well have been in the world since before you were born. A quick search reveals papers at least as far back as 1976.

GCC has had an automatic peep-hole optimizer for stuff like this since about 1994.

- N
- Nikolaos Kavvadias
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 7:51 PM

Bubble == totally burst. However I still don't have a concrete example that the specific expression has been in use. Any specific reference is muc h appreciated!

Since I was born in 1977, this is quite plausible :)

Most probably this is based on Torbjorn's work. I understand that the GCC p eephole optimizer works at the RTL; I don't know if there is such GIMPLE pa ss. These optimizers are low-level essentially; one might want to apply the technique even as a source-level optimization:

formatting link

Sections 9.7 and 9.8:

formatting link

Eventually it is always possible to derive both a divisionless and multipli erless hack; use HD's magic number (multiplicative inverse) algorithm and t hen optimize the constant multiplication with the fixed-point magic number. It is great in its genericity, will work for any integer constant divisor, but this formula will perform better for divisors for the form 2^n-1, if y ou are only interested in small ranges of x.

Best regards Nikolaos Kavvadias

formatting link

- H
- Hans-Bernhard Bröker
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 9:23 PM

Am 11.10.2014 um 21:51 schrieb Nikolaos Kavvadias:

Not just based on. The code in GCC _is_ Torbjoern Granlund's work. He wrote a paper about what he did, and got the code included into GCC.

It doesn't generally result in exactly the instruction sequence your example in the OP layed out. But then, that particular sequence wouldn't actually be faster than a straight multiply by 65 on platforms with an at least somewhat decent HW multiplier. And it works only for a rather limited range of inputs, which makes it lose a lot of its appeal.

- N
- Nikolaos Kavvadias
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sat, Oct 11, 2014 9:45 PM

Hi,

Indeed, it is his work incl. the GCC implementation and it is true that bot h design and implementation have been described in his 1994 paper.

Good catch on the umlaut; sometimes I don't transliterate \"{o} as oe but a s o.

I think the proposed expression can build very fast hardware given that you have the freedom to design it, e.g. on an ASIC or FPGA. The critical path is 1 ADD (scaled addition of x with itself) followed by 1 constant ADD by s ome wiring and 1 decrement. This all looks like a customized addition struc ture that if optimized will have the critical path of a single adder plus t wo stages.

Essentially it is four-operand addition with two variable and two constant inputs. The range of inputs is indeed limited, outside these limits it is j ust an approximation (and not a good one).

Best regards Nikolaos Kavvadias

formatting link

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Sun, Oct 12, 2014 2:04 AM

In the days before computers had hardware divide, such might have been well known.

Related, you might look at:

formatting link

The problem is how to do a cosine transform based image compression on a machine without hardware multiply. They optimize the transform coefficients to minimize the 1's, so less shift and add. (The RCA CDP1802, from about 1976.)

The inverse transform is more complicated, but can be done one faster ground-based computers.

-- glen