What to do with an improved algorithm?

Hi,

I think I've got a really good way to improve a commonly used & well establ ished algorithm that is often used in FPGAs, and it all checks out. The imp lementation completes the same tasks in 2/3rds the cycles and using 2/3rds the resources of an standard Xilinx IP block, with comparable timing).

I've verified that the output is correct over the entire range of 32-bit in put values.

I can't find anything similar designs in a Google patent search, or looking through journal articles. Once you are familiar with the original algorith m, and the optimization is explained it becomes pretty self-evident in retr ospect. It just seems the right way to do things.

What should I do?

Should I just throw the implementation on a website somewhere as a curiosit y?

Publish it in an article?

Pass it to a local student to make a paper from it? (I'm not studying at al l)

Attempt to patent and then commercialize it?

Thanks!

Mike

Reply to
Mike Field
Loading thread data ...

I think the best option is to write an article -- or a patent.

Simply because it's an extra opportunity to verify that your approach is correct.

If there's a hidden mistake, a student might be unable to see it.

Gene

Reply to
Gene Filatov

I agree with Gene, plus you might consider publishing the IP as open source code on a website of your own or opencores.org.

--Mike

tablished algorithm that is often used in FPGAs, and it all checks out. The implementation completes the same tasks in 2/3rds the cycles and using 2/3 rds the resources of an standard Xilinx IP block, with comparable timing).

t input values.

king through journal articles. Once you are familiar with the original algo rithm, and the optimization is explained it becomes pretty self-evident in retrospect. It just seems the right way to do things.

osity?

t all)

Reply to
Mike Butts

I'd publish - since you are not already in the IP licensing/patenting groove I doubt if you would make any money from it but you might gain kudos which may help you career and business. Xilinx might want to publish it - which might give a lot more visibility.

If you have a web site you could put it on that.

MK

Reply to
Michael Kellett

blished algorithm that is often used in FPGAs, and it all checks out. The i mplementation completes the same tasks in 2/3rds the cycles and using 2/3rd s the resources of an standard Xilinx IP block, with comparable timing).

input values.

ng through journal articles. Once you are familiar with the original algori thm, and the optimization is explained it becomes pretty self-evident in re trospect. It just seems the right way to do things.

ity?

all)

Licensing and selling IP comes with a bit of a learning curve and requires an investment on your part. As Michael mentions, without some of that fram ework already in place, a license vetted by an IP attorney, and a good mark eting plan, you might not see a return on that investment.

If you want your name more prominently attached to it, I'd suggest posting up on a personal Github account rather than opencores.org which makes you c onform to their requirements (such as wishbone interface, etc.).

Xilinx always welcomes guest articles on their blogs (although those have b een in flux since the recent reorg), and their e-magazine Xcell Journal (ag ain, seems to have been discontinued and the Xcell Daily Blog archived)

formatting link
formatting link

formatting link

--Kris

Reply to
kkoorndyk

I am serious in this reply. You may already know this, if so then I post for other people to know as well.

I would recommend that if you have means to get by in your life as you are today without receiving income from this invention / discovery, then not only give it away to people, but to do so in the name of Jesus Christ. Tell the people around you that because of what Jesus has done for you, you are desiring to give to the people here the fruit of what He first gave you (your intellect, abilities, opportunities to learn, guiding your path through life, etc.).

By giving Him the credit for your invention / discovery, and not doing it for money or some other thing here, then you will receive a reward in Heaven for your labor / knowledge gift to other people, and that re- ward in Heaven is like the burning bush, or the fishes and the loaves, meaning it is able to be used without being consumed. In short: re- wards in Heaven do not wear out or diminish with use. They remain in their full gift for all time, even under maximum / heavy use.

Remember the Lord in your life, and give things to people citing Him as the reason why you are doing so. You will receive the maximum re- turn on your efforts, and it sounds like you will also help out many people here on this Earth by your gift.

It would be a win-win for everyone in your giving.

--
Rick C. Hodgin
Reply to
Rick C. Hodgin

g up on a personal Github account rather than opencores.org which makes you conform to their requirements (such as wishbone interface, etc.).

OpenCores encourages use of the Wishbone interface for SoC components and t hey do offer coding guidelines, but there are no requirements for either. F or example in the entire DSP core section there are 38 entries, none of whi ch are marked as Wishbone compliant.

Reply to
Mike Butts

tablished algorithm that is often used in FPGAs, and it all checks out. The implementation completes the same tasks in 2/3rds the cycles and using 2/3 rds the resources of an standard Xilinx IP block, with comparable timing).

t input values.

king through journal articles. Once you are familiar with the original algo rithm, and the optimization is explained it becomes pretty self-evident in retrospect. It just seems the right way to do things.

osity?

t all)

s an investment on your part. As Michael mentions, without some of that fr amework already in place, a license vetted by an IP attorney, and a good ma rketing plan, you might not see a return on that investment.

g up on a personal Github account rather than opencores.org which makes you conform to their requirements (such as wishbone interface, etc.).

been in flux since the recent reorg), and their e-magazine Xcell Journal ( again, seems to have been discontinued and the Xcell Daily Blog archived)

I never though I would agree with Rick, but....

All sounds like too much work. So here is a quick summary with C-like pseud o-code. I'll put the HDL code up somewhere soon once I am happy with it. I am removing the last rounding errors.

I've been playing with CORDIC, and have come up with what looks to be an ov erlooked optimization. I've done a bit of googling, and haven't found anyth ing - maybe it is a novel approach?

I've tested it with 32-bit inputs and outputs, and it is within +/-2, and a nd average error of around 0.6.I a am sure with a bit more analysis of wher e the errors are coming from I can get it more accurate.

This has two parts to it, both by themselves seem quite trivial, but comple ment each other quite nicely.

Scaling Z

---------

  1. The 'z' value in CORDIC uses becomes smaller and smaller as stages incre ase:

The core of CORDIC for SIN() and COS() is: x = INITIAL; y = INITIAL; for(i = 0; i < CORDIC_REPS; i++ ) { int64_t tx,ty; // divide to scale the current vector tx = x >> (i+1); ty = y >> (i+1);

// Either add or subtract at right angles to the current x -= (z > 0 ? ty : -ty); y += (z > 0 ? tx : -tx); z -= (z > 0 ? angles[i] : -angles[i]); }

The value for angle[] is all important, for example:

angle[0] = 1238021 angle[1] = 654136 angle[2] = 332050 angle[3] = 166670 angle[4] = 83415 angle[5] = 41718 angle[6] = 20860 angle[7] = 10430 angle[8] = 5215 angle[9] = 2607 angle[10] = 1303 angle[11] = 652 angle[12] = 326 angle[13] = 163 angle[14] = 81 angle[15] = 41 angle[16] = 20 angle[17] = 10 angle[18] = 5 angle[19] = 3 angle[20] = 1

If you make the following change:

for(i = 0; i < CORDIC_REPS; i++ ) { int64_t tx,ty; // divide to scale the current vector tx = x >> (i+1); ty = y >> (i+1);

// Either add or subtract at right angles x -= (z > 0 ? ty : -ty); y += (z > 0 ? tx : -tx); z -= (z > 0 ? angles[i] : -angles[i]);

//!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! z 0 ? tx : -tx); z -= (z > 0 ? ANGLE_CONSTANT : -ANGLE_CONSTANT); z

Reply to
Mike Field

As far as your "revised" angle[i] converging to a constant is concerned, there's a simple explanation using the first two terms of the taylor series for the arctan function:

arctan(x) = x - 1/3 * x^3 + o(x^3)

So that

angle[i] = arctan(2^-i) / pi * 2^i = (1/pi) * ( 1 - 1/3 * 2^-(2*i)) + o(2^-(2*i))

Based on which you can easily say how many stages of the conventional cordic algorithm do you need to skip (i.e. store the outputs in a lookup table) for a given bit precision.

I don't know the literature well, but I think it would be cool if you actually write an article detailing your approach!

Gene

Reply to
Gene Filatov

If perchance this is related to your recent CORDIC rotator code, I've seen a number of CORDIC optimization schemes over the years to reduce the number of rotation stages, IIRC typically either by a 'jump start' or merging/optimizing rotation stages.

If I ever manage to find my folder of CORDIC papers, I'll post some links...

------- some notes on your CORDIC implementation

formatting link

- instead of quadrant folding, given I & Q you can do octant folding (0-45 degrees) using the top three bits of the phase

- if you pass in the bit widths and stages as generics, you can initialize the constant arctan table on-the-fly in a function using VHDL reals

-Brian

Reply to
Brian Davis

oops, for some reason, when first reading this thread I didn't see the later posts with the explanation... I'd swear they weren't there, but maybe I was just scroll-impaired.

-Brian

Reply to
Brian Davis

I'll work on writing one up over the next few days, as well as posting a sample implementation.

Mike

Reply to
Mike Field

That's good stuff. I wonder why you think using a BRAM is bad? It's good to use the hard cores in an FPGA instead of the fabric--it's less power and deterministic routing times.

Is the CORDIC still advantageous in a modern FPGA? The last time I needed to find sine, I used a coarse BRAM lookup that output sine on one port and cos on another. Those were used as derivatives for a 2nd-order Taylor. Tw o multipliers (more hard cores) are used (using Horner's Rule) for the 1st and 2nd-order interpolations. I don't remember how many digits of accuracy this yields, but the latency is low.

Reply to
Kevin Neilson

?:

d to use the hard cores in an FPGA instead of the fabric--it's less power a nd deterministic routing times.

d to find sine, I used a coarse BRAM lookup that output sine on one port an d cos on another. Those were used as derivatives for a 2nd-order Taylor. Two multipliers (more hard cores) are used (using Horner's Rule) for the 1s t and 2nd-order interpolations. I don't remember how many digits of accura cy this yields, but the latency is low.

Cordic is useful to compute high-precision atan2. Otherwise for 2 16-bit i nputs, you'd need a ram with 2^32 addresses (maybe 16 times less if you tak e advantage of the symmetries).

Reply to
Benjamin Couillard

I think you missed the part about the Taylor interpolation.

Reply to
Kevin Neilson

On Thursday, October 11, 2018 at 7:37:37 AM UTC-6, Benjamin Couillard wrote :

inputs, you'd need a ram with 2^32 addresses (maybe 16 times less if you t ake advantage of the symmetries).

Sorry; I didn't see that you were talking about arctan. It's not quite as easy as sin/cos, but there is still the question as to whether a Farrow-typ e architecture using a coarse lookup and Taylor interpolations would be bet ter than a CORDIC, and I am guessing that with the BRAMs and multipliers in present-day FPGAs, the answer would be yes.

Reply to
Kevin Neilson

?:

te:

it inputs, you'd need a ram with 2^32 addresses (maybe 16 times less if you take advantage of the symmetries).

s easy as sin/cos, but there is still the question as to whether a Farrow-t ype architecture using a coarse lookup and Taylor interpolations would be b etter than a CORDIC, and I am guessing that with the BRAMs and multipliers in present-day FPGAs, the answer would be yes.

Yes I think you're right it could work for an atan or atan 2. You could im plemented a divider for atan 2 (y/x), a sign look-up (to get the whole 360 degrees), a BRAM + taylor interpolation.

For atan, you'd simply skip the divider and sign look-up part.

On the other hand, Xilinx and Altera offer plug-and-play Cordic cores for a tan/atan2.

Reply to
Benjamin Couillard

implemented a divider for atan 2 (y/x), a sign look-up (to get the whole 36

0 degrees), a BRAM + taylor interpolation.

atan/atan2.

It's probably better to multiply y/x by x to get a normalized ratio rather than use a divider, which requires a lot of resources.

I recalled that I had to implement the atan2 function once for a QAM carrie r/symbol recovery circuit. I didn't need great precision, so I split one q uadrant into a grid and put the angle of the center of each grid square int o a 2-dimensional lookup ROM. Then I could put X,Y into the ROM and get th e coarse angle (which was then adjusted for the quadrant) and could use tha t for carrier recovery.

Reply to
Kevin Neilson

d implemented a divider for atan 2 (y/x), a sign look-up (to get the whole

360 degrees), a BRAM + taylor interpolation.

or atan/atan2.

er than use a divider, which requires a lot of resources.

ier/symbol recovery circuit. I didn't need great precision, so I split one quadrant into a grid and put the angle of the center of each grid square i nto a 2-dimensional lookup ROM. Then I could put X,Y into the ROM and get the coarse angle (which was then adjusted for the quadrant) and could use t hat for carrier recovery.

One case that comes to mind : you have 2 quadrature signals

x = A(t) * cos (wt + some phase) + noise_x y = A(t) * sin (wt + some phase) + noise_y

Atan2(y, x) = wt + some phase. The variations of A(t) (as long as they a re slow-ish) will cancel each other out.

You can filter or average wt + some phase to extract the phase. Or derivate "wt + some phase" then filter to get the filtered frequency.

So multiplying "y/x by x" would not make much sense in this case.

Reply to
Benjamin Couillard

On Wednesday, October 17, 2018 at 9:26:38 AM UTC-6, Benjamin Couillard wrot e:

:

uld implemented a divider for atan 2 (y/x), a sign look-up (to get the whol e 360 degrees), a BRAM + taylor interpolation.

for atan/atan2.

ther than use a divider, which requires a lot of resources.

rrier/symbol recovery circuit. I didn't need great precision, so I split o ne quadrant into a grid and put the angle of the center of each grid square into a 2-dimensional lookup ROM. Then I could put X,Y into the ROM and ge t the coarse angle (which was then adjusted for the quadrant) and could use that for carrier recovery.

are slow-ish) will cancel each other out.

te "wt + some phase" then filter to get the filtered frequency.

No, multiplying by x doesn't make sense. Perhaps using a ROM for 1/x and a multiplier would be better than a full divider.

Reply to
Kevin Neilson

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.