CORDIC bit-serial vs. bit-parallel

Hello,

I'm trying to write a CORDIC macro for a polar transmitter FPGA design. I've used the parallel approach, but when I do a timing estimation the longest delay path is through the CORDIC routine, and limits the maximum clock rate of the whole design to about 40MHz. Other parts of the design on the same board need to run at much faster rate so I'm considering using a bit-serial version.

As I understand it, the bit-parallel implementation has low latency and therefore high throughput, but because of the the word-wide shifts it clocks at a slower rate. Conversely, the bit-serial routine has a high latency and low throughput rate, but allows the board to run at a faster clock rate. Is this right?

My question is:

In the bit-serial implementation, you still need to perform the shift operation on the entire word to select the right bit to send to the bit-serial adder/subtractor, so how does this solve the problem of a slow clock rate due to the shift operation?

Thanks,

Mees

Reply to
m_oylulan
Loading thread data ...

This is why you will get paid the big bucks!!!

Now figure how much you need to make serial or registered to accomplish the task.

I bet some registering but not completely a serial approach will work.

gm

Reply to
GMM50

Bit wide shifts are slowing you down.... Are you implementing the CORDIC as a beautifully pipelined hardware datapath or are you doing this in software? CORDIC should be able to run easily at 150 MHz+ if I recall correctly. If he doesn't respond quickly here, look for Ray Andraka's CORDIC information in his paper entitled:

A Survey of CORDIC Algorithms for FPGAs at

formatting link

Reply to
John_H

Is this an iterative or an unrolled? I am assuming it is iterative, in which case you have a rather nasty shifter to deal with which is killing your performance assuming one clock per iteration. You can pipeline the iterations to allow more than one iteration result at a time in the loop, but it requires a bit of careful bookkeeping in the design. For a bit serial implementation, the shift is accomplished by varying the delay, which if implemented in memories involves messing with the address to reduce the overhead for the shifter.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
 Click to see the full signature
Reply to
Ray Andraka

I don't know if anyone is still reading this thread, but could I ask a couple of more questions?

I am using (or trying to use) the iterative CORDIC algorithm written in software. I've read Ray Andraka's paper on designing a bit serial processor, in which he writes that when considering whether or not to use a bit-serial design:

"...the application for the processor must be able to tolerate any pipeline delay introduced by the serial processor. The latency in a parallel system is frequently as high or higher than the equivalent serial system so this is rarely a concern."

I find this statement confusing. I thought that the advantage of the bit-parallel was that it has a much lower latency = number of iterations, while the bit-serial has a latency = word width * number of iterations. So why is the "latency in a parallel system as high or higher?"

Thankyou, Mees

Reply to
m_oylulan

At a given clock frequency, it is true that the bit parallel will have a lower latency (that should be obvious), however a totally bit serial design can generally be clocked faster than an equivalent bit parallel design. In certain pipelined bit serial designs, you can also begin the next stage before the previous one is completed, hiding some of the latency, so the overall latency is only a little longer than the bit parallel latency. Unfortunately, CORDIC is not one of those because you need the sign (last bit generated) of one stage before you start the processing for the next stage. Nevertheless, at the time that paper was written, a bit serial design in the then current FPGAs could be clocked much faster than a bit parallel arithmetic design in the same part, so while the number of clocks of latency was greater, the higher clock frequency makes up for much of that latency in terms of absolute time.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
 Click to see the full signature
Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.