Division by a constant

- R
- Rob Gaddi
  
  Contact options for registered users
posted
9 years ago

Thu, Apr 9, 2015 4:57 PM

So I just had a thought. Most synthesis tools (in VHDL, and I assume in Verilog) will allow you to use the division operator to perform truncating division by a constant in synthesizable code, so long as that constant is a power of 2.

That seems like a reasonable restriction; that you can only divide when it's just a right shift, right up until you think a bit longer. Because I do synthesizable division by a constant all the time, actually, as multiplication by the reciprocal. So I wind up writing things like

y := x * (2**17 / 3) / 2**17.

It obscures the logic a bit, but works. But I was thinking, and not only does it obscure the logic, but it forces assumptions into my code about what the underlying multiplier block looks like. Why 2**17? Because I'm assuming a 18 bit signed multiplier, because that's what happens to be on some architecture (Altera Cyclone4 if I remember right).

It seems trivial for the synthesizer to do that transformation, division by a constant => multiplication by the reciprocal, in a way that is optimized for the underlying hardware. Any non-braindamaged C compiler will do it without being asked. And maybe the synth tools do, it's just been forever since I've actually checked.

Has anyone looked at this in a while? Are any of the synth tools smart enough to handle this on their own these days?

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- T
- Thomas Stanka
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 10, 2015 11:26 AM

Am Donnerstag, 9. April 2015 18:58:41 UTC+2 schrieb Rob Gaddi:

There are a few nontrivial problems when using division in digital logic th at also apply for constant operation.

Multiplicate with the reziprocal value is a valid function for real number (mathematic term), but not with fixed point (unsigned with shifted decimal) . If a number is 2^^n, its reciprog is well defined for fix point. For 3 yo u will find no exact reciproc with fixpoint notation, regardless of the num ber of digits you spend after fixpoint.

If you use floating point your chances of correct calculation result are fa r better, but you are not safe. The term "(a-b)*c" is not always equivalent to "a*c - b*c" even in floating point arithmetic with a and b beeing significant different in size.

best regards

Thomas

- K
- kaz
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 10, 2015 12:30 PM

if you multiply by fraction e.g y = x*0.333... then you are just leading the compiler to avoid division.

I don't use fractions but new libraries support them.

Kaz

--------------------------------------- Posted through

formatting link

- A
- Allan Herriman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 10, 2015 1:11 PM

Hi Rob,

I'm still not sure I'd trust a synthesiser to handle that sort of thing portably.

I don't think I've ever actually used a multiplier or divider in a synthesisable design. There always seems to be some way to avoid them, even for DSP, usually by employing smarter design at the system level and careful selection of scaling factors, filter coefficients, etc. (I don't do the sort of filtering that needs extremely precise coefficients. YMMV.)

When trying to avoid the use of the hard multipliers, I would consider employing tricks like Booth recoding:

formatting link

which can sometimes help with a fixed multiplicand that has a long string of 1s.

I would also look for repeating patterns in a fixed multiplicand. Repeating patterns often arise when taking the reciprocal of a constant, e.g. 1/5 = (binary) 0.00110011001100110011001100 etc. This is equal to 11 * 10001 * 100000001 * ... (shifted right) and a small number of adders can produce this result to any desired precision.

Both techniques ought to be amenable to automation in a synthesiser. But the synth tool I use doesn't even support VHDL 2008 yet (thanks Xilinx!) so I won't hold my breath waiting for comprehensive tool support for multiplication other than the basic/obvious use of the built-in hard blocks.

Regards, Allan

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 10, 2015 5:22 PM

Yes and no. I believe one can prove (for values of one excluding me) that for a bounded integer numerator, you can always define a reciprocal multiply that will give the exact same result as "floor division" for all numerators in those bounds. Those differences you're talking about due to integers being "unreal" numbers are all pushed down into the remainder. And the same should therefore hold for fixed_point, since you're always looking for the quotient to be in some finite format past which you don't care about the errors.

I did a quick Python script just to test with integers. It tests the dumb way, through complete exhaustion of the input set, but my arbitrary poking about sure implies that a) you can always find an answer and b) that answer will require no more than 1 bit more than the numerator.

#!/usr/bin/env python3

""" Test the theory that, given a bounded numerator, there is a reciprocal multiply that will always give the same result as floor division. """

import numpy as np

nbits = 22 divide_by = 65537

# Proof through exhaustion, create all possible numerators numerators = np.arange(2**nbits, dtype=int) quotients = numerators // divide_by

class FoundAnswer(Exception): pass

try: expected_dbits = nbits + divide_by.bit_length() for dbits in range(expected_dbits-1, expected_dbits+2): basic_recip = (2**dbits) // divide_by for recip in range(basic_recip, basic_recip + 2): approx_quotients = (numerators * recip) >> dbits if np.all(approx_quotients == quotients): print( 'For all {nbits} bit numerators N//{div} == N*{recip}>> {dbits}'.format( nbits = nbits, div = divide_by, recip=recip, dbits=dbits )) print('{recip} requires {b} bits.'.format(recip=recip, b=recip.bit_length())) raise FoundAnswer() except FoundAnswer: pass

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Apr 16, 2015 11:01 PM

(snip)

With the multiply instruction on many processors, that generates a double length signed product, I believe that for many constants, maybe half of them, there is a multiplier that will generate the appropriate truncated quotient in the high half of the product.

But in the case the OP asked, it isn't so obvious that it should do that.

Another choice would be a primitive that would generate the appropriate multiplier.

Often when you want a divider, you want it pipelined, which is unlikely to be synthesized.

-- glen

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Thu, Apr 16, 2015 11:23 PM

Right, and I've never seen a multiplier block in an FPGA that doesn't do the same. For a while they were 18x18=>36, then I started hitting

18*25=>43, but regardless, the hard multiplier blocks always generate P'length = A'length + B'length.

Ugh, but then you'd have to instantiate it as a separate block and wire it in. That's even uglier than having to put the bit-shifting logic in manually.

Not really. Often when I want a divider I have to pipeline it because the division algorithm is inherently serial. But I don't _want_ it to be pipelined, that's just the only choice I've got for a true X/Y divide.

But for X/K with constant K, it can (in every case I've seen) be implemented with a multiplier block, or simply by wire if K is a power of

That gets us done in a single cycle.

Sure that multiplier block may blow up into horrible cross-multiplies spanning multiple blocks if I ask for a stupidly large K. But the same can be said for X*K, and the tool lets me request that just fine, and if I ask for a stupid K I get a mess of logic that either a) only meets timing with a slow clock or b) requires me to make a few stages of register pushback available.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- R
- rickman
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 17, 2015 12:00 AM

Maybe I am missing something, but it doesn't need to be pipelined. Just take out the registers and it's no longer pipelined.

I don't think a large K will create any problems that require more multiplier blocks. The resolution required only depends on... the resolution required. If you are working with truncated integers it doesn't matter if the divisor is large. That just means you get smaller results, not more math to do. Or are you thinking you can save logic by using small values of K which can reduce the logic required? If using a block multiplier you can't use less than one... or can you? Hmmm...

--

Rick

- R
- Rob Gaddi
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 17, 2015 12:27 AM

There are probably some trivial cases where the divide by N reduces to a multiply by something silly like 3 followed by a bit-shift that might get implemented on fabric, but I'd tend to assume that any time I did a divide, like anytime I did a multiply, I'm likely to commit a multiplier to it. If I get lucky, all the better.

If it takes a lot of bits to accurately represent K, it takes a lot of bits to accurately represent 1/K, subject to many of the same caveats about factoring out powers of 2.

Likewise, if I tell the tools that I want to use a 32-bit numerator, that'll take cross-multiplies too. But all that already gets handled correctly in the multiplication case.

--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com 
Email address domain is currently out of order.  See above to fix.

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 17, 2015 12:43 AM

(snip, I wrote)

Yes it is ugh, but you know that you are asking for one.

Well, I usually go the FPGA route when I want something done fast, which means pipelined. Maybe not everyone does that.

For software, you usually have (N)*(N)=(2N) and (2N)/(N)=(N)

In the FPGA case, even though the hardware is 18 bits, you can choose any number of bits for your actual values.

I am pretty sure that if you have one more bit in the constant than you need in the quotient, that it is enough. I am not sure that it always is when you have the same number of bits.

That is, I am not sure that you can generate a 32 bit signed quotient as the high half of a 32 bit multiply for all possible 32 bit signed divisors.

-- glen

- N
- Nikolaos Kavvadias
  
  Contact options for registered users
Vote on answer
posted
9 years ago

Fri, Apr 17, 2015 1:35 PM

Hi Rob and Glen

have you used kdiv, my constant division routine generator?

It produces low-level ("assembly") and C implementations for constant division.

formatting link

Many people are happy with it; it is based on Warren's Hackers' Delight.

Best regards, Nikolaos Kavvadias

formatting link