CORDIC bit-serial vs. bit-parallel

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View

I'm trying to write a CORDIC macro for a polar transmitter FPGA design.
I've used the parallel approach, but when I do a timing estimation the
longest delay path is through the CORDIC routine, and limits the
maximum clock rate of the whole design to about 40MHz. Other parts of
the design on the same board need to run at much faster rate so I'm
considering using a bit-serial version.

As I understand it, the bit-parallel implementation has low latency and
therefore high throughput, but because of the the word-wide shifts it
clocks at a slower rate. Conversely, the bit-serial
routine has a high latency and low throughput rate, but allows the
board to run at a faster clock rate. Is this right?

My question is:

In the bit-serial implementation, you still need to perform the shift
operation on the entire word to select the right bit to send to the
bit-serial adder/subtractor, so how does this solve the problem of a
slow clock rate due to the shift operation?



Re: CORDIC bit-serial vs. bit-parallel
This is why you will get paid the big bucks!!!

Now figure how much you need to make serial or registered to accomplish
the task.

I bet some registering but not completely a serial approach will work.


Re: CORDIC bit-serial vs. bit-parallel
Bit wide shifts are slowing you down....
Are you implementing the CORDIC as a beautifully pipelined hardware datapath
or are you doing this in software?
CORDIC should be able to run easily at 150 MHz+ if I recall correctly.  If
he doesn't respond quickly here, look for Ray Andraka's CORDIC information
in his paper entitled:

    A Survey of CORDIC Algorithms for FPGAs

Quoted text here. Click to load it

Re: CORDIC bit-serial vs. bit-parallel

Quoted text here. Click to load it
Is this an iterative or an unrolled?    I am assuming it is iterative,
in which case you have a rather nasty shifter to deal with which is
killing your performance assuming one clock per iteration.  You can
pipeline the iterations to allow more than one iteration result at a
time in the loop, but it requires a bit of careful bookkeeping in the
design.  For a bit serial implementation, the shift is accomplished by
varying the delay, which if implemented in memories involves messing
with the address to reduce the overhead for the shifter.

--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.
Re: CORDIC bit-serial vs. bit-parallel
I don't know if anyone is still reading this thread, but could I ask a
couple of more questions?

I am using (or trying to use) the iterative CORDIC algorithm written in
software. I've read Ray Andraka's paper on designing a bit serial
processor, in which he writes that when considering whether or not to
use a bit-serial design:

"...the application for the processor must be able to tolerate any
pipeline delay introduced by the serial processor. The latency in a
parallel system is frequently as high or higher than the equivalent
serial system so this is rarely a concern."

I find this statement confusing.  I thought that the advantage of the
bit-parallel was that it has a much lower latency = number of
iterations, while the bit-serial has a latency = word width * number of
iterations. So why is the "latency in a parallel system as high or


Re: CORDIC bit-serial vs. bit-parallel

Quoted text here. Click to load it
At a given clock frequency,  it is true that the bit parallel will have
a lower latency (that should be obvious),  however a totally bit serial
design can generally be clocked faster than an equivalent bit parallel
design.  In certain pipelined bit serial designs, you can also begin the
next stage  before the previous one is completed, hiding some of the
latency, so the overall latency is only a little longer than the bit
parallel latency.  Unfortunately, CORDIC is not one of those because you
need the sign (last bit generated) of one stage before you start the
processing for the next stage.  Nevertheless, at the time that paper was
written, a bit serial design in the then current FPGAs could be clocked
much faster than a bit parallel arithmetic design in the same part, so
while the number of clocks of latency was greater, the higher clock
frequency makes up for much of that latency in terms of absolute time.

--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline