Algorithm for x/63 and x/127

Dear all,

I believe I have invented a formula for quotient calculation that works for x/63 and x/127.

The formula is as follows and it is divisionless and multiplierless:

y = (((x>>n)+x+((1n)-1;

Use n=6 for 63, and n=7 for 127. 1

Reply to
Nikolaos Kavvadias
Loading thread data ...

For big enough values, it won't be right. Those may be bigger than you need. For n=6, it seems to fail at 4095, and for n=7 and 16383.

Traditionally, integer division truncates. I am not sure about the 1 and 1

Reply to
glen herrmannsfeldt

Dear Glen,

your observations are correct! I am trying to avoid any multiplication including the one with the multiplicative inverse (the reciprocal of the constant).

I have updated my claims for the algorithm. It works for any constant divisor (but I've only tested n=2,3,4,5,6,7,8, and given the following assumptions:

Also, the range for x is [0,2^(2*n)-2], i.e. for n = 6, the range indeed is

0:4094.

And a test program:

[code] #include #include #include

/* main: */ int main(void) { int qapprox, qexact; int i, j, k;

for (i = 2; i < 8; i++) { for (j = 1; j < (1

Reply to
Nikolaos Kavvadias

Without working through the math, I think that you're using a special case of the more general approximation for the reciprocal of x:

1/x ~ 1/x0 - (x0 - x)/x0^2

Do the math, and I think you'll find that embedded in the expression above. It's just the first two terms of the Taylor's expansion of 1/x, and works without multiplication because x/64 = x >> 6 (ignoring rounding) and because (64 - 63) = 1.

--
www.wescottdesign.com
Reply to
Tim Wescott

Hi Tim,

Great observation!

Indeed, I get the asymptotic behavior that would be typical for a Taylor expansion; when away from x0, the errors start to escalate since the approximation diverges.

Still it would be interesting to see if there is any known use of this hack, beforehand my claim. It's dated May 16, 2011 in my notes, but no other details survive, apart from remembering that I used sketching.

Best regards Nikolaos Kavvadias

formatting link

Reply to
Nikolaos Kavvadias

I used a similar method in a floating-point math package I wrote in the mid-70's. The difference is that, to get acceptable accuracy, I had to use the second-order term too.

--

Tauno Voipio
Reply to
Tauno Voipio

Am 11.10.2014 um 19:48 schrieb Nikolaos Kavvadias:

Sorry to burst your bubble, but that optimization technique has been around sind effectively forever; it may well have been in the world since before you were born. A quick search reveals papers at least as far back as 1976.

GCC has had an automatic peep-hole optimizer for stuff like this since about 1994.

Reply to
Hans-Bernhard Bröker

Bubble == totally burst. However I still don't have a concrete example that the specific expression has been in use. Any specific reference is muc h appreciated!

Since I was born in 1977, this is quite plausible :)

Most probably this is based on Torbjorn's work. I understand that the GCC p eephole optimizer works at the RTL; I don't know if there is such GIMPLE pa ss. These optimizers are low-level essentially; one might want to apply the technique even as a source-level optimization:

formatting link
Sections 9.7 and 9.8:
formatting link

Eventually it is always possible to derive both a divisionless and multipli erless hack; use HD's magic number (multiplicative inverse) algorithm and t hen optimize the constant multiplication with the fixed-point magic number. It is great in its genericity, will work for any integer constant divisor, but this formula will perform better for divisors for the form 2^n-1, if y ou are only interested in small ranges of x.

Best regards Nikolaos Kavvadias

formatting link

Reply to
Nikolaos Kavvadias

Am 11.10.2014 um 21:51 schrieb Nikolaos Kavvadias:

Not just based on. The code in GCC _is_ Torbjoern Granlund's work. He wrote a paper about what he did, and got the code included into GCC.

It doesn't generally result in exactly the instruction sequence your example in the OP layed out. But then, that particular sequence wouldn't actually be faster than a straight multiply by 65 on platforms with an at least somewhat decent HW multiplier. And it works only for a rather limited range of inputs, which makes it lose a lot of its appeal.

Reply to
Hans-Bernhard Bröker

Hi,

Indeed, it is his work incl. the GCC implementation and it is true that bot h design and implementation have been described in his 1994 paper.

Good catch on the umlaut; sometimes I don't transliterate \"{o} as oe but a s o.

I think the proposed expression can build very fast hardware given that you have the freedom to design it, e.g. on an ASIC or FPGA. The critical path is 1 ADD (scaled addition of x with itself) followed by 1 constant ADD by s ome wiring and 1 decrement. This all looks like a customized addition struc ture that if optimized will have the critical path of a single adder plus t wo stages.

Essentially it is four-operand addition with two variable and two constant inputs. The range of inputs is indeed limited, outside these limits it is j ust an approximation (and not a good one).

Best regards Nikolaos Kavvadias

formatting link

Reply to
Nikolaos Kavvadias

In the days before computers had hardware divide, such might have been well known.

Related, you might look at:

formatting link

The problem is how to do a cosine transform based image compression on a machine without hardware multiply. They optimize the transform coefficients to minimize the 1's, so less shift and add. (The RCA CDP1802, from about 1976.)

The inverse transform is more complicated, but can be done one faster ground-based computers.

-- glen

Reply to
glen herrmannsfeldt

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.