How do I scale a 9-b signed 2's complement data by 17/sqrt(21)?

References:

The 115/31 was the strangest idea offered. If you need the result in a single clock, please look *seriously* at the simple multiplier. These are designed as library elements for very fast results and can easily accommodate your "one clock cycle" requirement.

If your clock is 20 MHz, doing the 115/31 might be reasonable but it sure isn't single-clock friendly!

Another consideration: does this value get used somewhere that you can algebraically manipulate the values so a /31 or /sqrt(21) can be "pulled in" to other number manipulation?

PLEASE consider the multiplier.

Reply to
John_H
Loading thread data ...

References:

The 115/31 was the strangest idea offered. If you need the result in a single clock, please look *seriously* at the simple multiplier. These are designed as library elements for very fast results and can easily accommodate your "one clock cycle" requirement.

If your clock is 20 MHz, doing the 115/31 might be reasonable but it sure isn't single-clock friendly!

Another consideration: does this value get used somewhere that you can algebraically manipulate the values so a /31 or /sqrt(21) can be "pulled in" to other number manipulation?

PLEASE consider the multiplier.

Reply to
John_H

References:

The 115/31 was the strangest idea offered. If you need the result in a single clock, please look *seriously* at the simple multiplier. These are designed as library elements for very fast results and can easily accommodate your "one clock cycle" requirement.

If your clock is 20 MHz, doing the 115/31 might be reasonable but it sure isn't single-clock friendly!

Another consideration: does this value get used somewhere that you can algebraically manipulate the values so a /31 or /sqrt(21) can be "pulled in" to other number manipulation?

PLEASE consider the multiplier.

Reply to
John_H

References:

The 115/31 was the strangest idea offered. If you need the result in a single clock, please look *seriously* at the simple multiplier. These are designed as library elements for very fast results and can easily accommodate your "one clock cycle" requirement.

If your clock is 20 MHz, doing the 115/31 might be reasonable but it sure isn't single-clock friendly!

Another consideration: does this value get used somewhere that you can algebraically manipulate the values so a /31 or /sqrt(21) can be "pulled in" to other number manipulation?

PLEASE consider the multiplier.

Reply to
John_H

References:

The 115/31 was the strangest idea offered. If you need the result in a single clock, please look *seriously* at the simple multiplier. These are designed as library elements for very fast results and can easily accommodate your "one clock cycle" requirement.

If your clock is 20 MHz, doing the 115/31 might be reasonable but it sure isn't single-clock friendly!

Another consideration: does this value get used somewhere that you can algebraically manipulate the values so a /31 or /sqrt(21) can be "pulled in" to other number manipulation?

PLEASE consider the multiplier.

Reply to
John_H

References:

The 115/31 was the strangest idea offered. If you need the result in a single clock, please look *seriously* at the simple multiplier. These are designed as library elements for very fast results and can easily accommodate your "one clock cycle" requirement.

If your clock is 20 MHz, doing the 115/31 might be reasonable but it sure isn't single-clock friendly!

Another consideration: does this value get used somewhere that you can algebraically manipulate the values so a /31 or /sqrt(21) can be "pulled in" to other number manipulation?

PLEASE consider the multiplier.

Reply to
John_H

I may be missing something - I didn't read all 60 posts by John, but couldn't a LUT implementation work here?

If he only needs limited precision in the final answer and his input data is at 9-bits, that's a 2^9 = 512 entry LUT. Assuming 16-bit output accuracy, we've got 1kB of data that will fit in a single BRAM.

Reply to
Stephen Craven

I did miss something. He wants an ASIC implementation.

Does anyone know how a ROM-based LUT approach would compare to an arithmetic-based approach in terms of area? I suppose one would just need a single transistor per ROM bit plus the associated row / column mux / decoders.

Even if it is larger in area, it might be advantageous from a speed / simplicity standpoint.

Reply to
Stephen Craven

Mr. Ken schrieb:

Note that a division or multiplication by a power of 2 is free in hardware. Also note, that multiplication by a constant is a lot cheaper and faster than general multiplication.

Lets see:

(X*59)/16 = (X*64 - X*4-X)/16

So X*59/16 uses only two adders.

(X*119)/32 = (X*128-X*8-X)/32 Two adders, even more precision.

Y= X*4 + X*32 (X*15195)/4096 = X*16*1024 - Y*32 - Y Three adders. Very high precision:

I doubt you can beat this with any other approach.

Kolja Sulimma

Reply to
Kolja Sulimma

References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-15 Content-Transfer-Encoding: 7bit Lines: 29 Message-ID: Organization: Arcor NNTP-Posting-Date: 31 May 2006 19:46:49 MEST NNTP-Posting-Host: 3ec7147e.newsread2.arcor-online.net X-Trace: DXC=XUTigJE^R`D[T26?78JQ5U85hF6f;DjW\KbG]kaMHVA=iV best precision possible. Without considering clipping and range issues, I

Note that a division or multiplication by a power of 2 is free in hardware. Also note, that multiplication by a constant is a lot cheaper and faster than general multiplication.

Lets see:

(X*59)/16 = (X*64 - X*4-X)/16

So X*59/16 uses only two adders.

(X*119)/32 = (X*128-X*8-X)/32 Two adders, even more precision.

Y= X*4 + X*32 (X*15195)/4096 = X*16*1024 - Y*32 - Y Three adders. Very high precision:

I doubt you can beat this with any other approach.

Kolja Sulimma

Reply to
Kolja Sulimma

My apologies, folks. I'll try to reinstall my newsreader software at home in an attempt to avoid the posting flood. I just hope it's not the ISP that's giving me the problem.

LUTs might be the bast answer!

Reply to
John_H

Reply to
Peter Alfke

My clock is only 3.92MHz, and i design in a 0.15um process, timing won't be an issue here. Yeah, 1/31 can be factored into other number multiplications, but again, it will affect precision there. It's all a matter of compromise between different choice. Thank you for the ideas.

Reply to
Mr. Ken

Yeah, in the implementation, this technique is used for all my multipliers, since I have another set of scaling as well, like 17/sqrt(10), 17/sqrt(20), etc. I will make use of this saving.

Thank you for your input.

Reply to
Mr. Ken

If best precision is what you need, just add more bits. Multiply by 237/64, 475/128, 950/256 or 1899/512.

H.

Reply to
Herman Dullink

Mr. Ken schrieb:

If you multiply the same X by different scales, you can share intermediate results between the constant coefficient multipliers.

Kolja Sulimma

Reply to
Kolja Sulimma

multipliers,

use

True. Intermediate results will be shared.

After studying the following document, I realized that dividing by constant can be implemented by same technique.

formatting link

Reply to
Mr. Ken

Use more bits. If you were looking at simple shift for the division you're on the right track but you need more digits such as 3799/10241.

If ASICs have dedicated multipliers as a simple element, you probably have what you need with a multiplier.

If you have loads of time, a bit-serial approach can give you tiny.

If you want abstruse, you can do a 115/31 where the divide by 31 is a bunch of 5-bit adds and a few conditionals around the digit 31 (and a bit of latency).

Where do you want to go?

Reply to
John_H

References:

The 115/31 was the strangest idea offered. If you need the result in a single clock, please look *seriously* at the simple multiplier. These are designed as library elements for very fast results and can easily accommodate your "one clock cycle" requirement.

If your clock is 20 MHz, doing the 115/31 might be reasonable but it sure isn't single-clock friendly!

Another consideration: does this value get used somewhere that you can algebraically manipulate the values so a /31 or /sqrt(21) can be "pulled in" to other number manipulation?

PLEASE consider the multiplier.

Reply to
John_H

17/sqrt(21) is a constant= 3.7097..., which represented as a binary nubmer is 011.1011010111 when rounded off to 10 bits right of the radix point. You said your input is 9 bits, so you already have an error of +/- 1/2 of the LSB weight. In most cases it doesn't make sense to multiply by any more precision than you have in the input. Rounding your constant to 9 bits (and treating as unsigned) gives: 11.1011011. = 2^1 + 2^0 + 2^-1 +2^-3 + 2^-4 + 2^-6 + 2^-8 =3.7109375

This can be done with 4 adders arranged in 3 layers to sum shifted terms of input N:

a= N + N*2; b= a + 8*a; c= 8*N - N; d= b + 64*c; result = d/128;

This will give you pretty much the fastest logic solution assuming no fast memories.

Your input is only 9 bits. If you are doing it in an FPGA, just program a block ram as a look-up table ot (0:512)*17/sqrt(21) and be done with it. If block RAMs are at a premium, and your FPGA has embedded multipliers, use the embedded multiply to multiply the 9 bit input by

0x1DB (or more bits if you so desire) and be done with it.
Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.