divide by 2^n, n=21..37 ==> 3 Virtex Slices !!

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hi

below is possible the most dense divide by 2^n ever implemented on any FPGA:
divide by 2^37 takes only 3 Virtex Slices! (4 slices on spartan-3)

Antti



--
-- http://gforge.openchip.org/projects/xilcores /
--
-- Divide by 2^n, n21%..37, code00%00 n37%, code11%11 n21%
-- Theory of operation:
We've slightly trimmed the long signature. Click to see the full one.
Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!
Quoted text here. Click to load it
FPGA:
Quoted text here. Click to load it

<snip of the original code>

Antti -

I'm truly amazed by the approach.  I think I did things right in making a
Verilog 32-bit, bit-serial NCO below.  (Formats nicely with fixed space
fonts).  Only 1 level of logic makes for sweet timing.  Thanks for bringing
this approach to light.

- John_H


// This is a bit-serial NCO with an initial
// divide-by-32 implementing a 32-bit NCO word.
//
// Original credit is given to Antti Lukats
//  from a comp.arch.fpga posting 22 Mar 05.
//
// This version produced by John Handwork
//  with Synplify V8.0 for a Xilinx target.
// - Some reg_input_delay route values (1.4ns)
//    were applied to Add, Cry in Synplify
//    to keep the logic to 1 level.
// - Note that the intializations don't synthesize.
// - Xilinx ISE v7.1 pushes 500MHz in the slow (-4)
//   XC3S50, limited by routing to the I/O cell.
// - 4 Slices and an IOB is all it takes.
//
`define SRL /* synthesis syn_srlstyle = "select_srl" */
//
module TinyNCO ( input      clk
               , output reg Fout /* synthesis syn_useioff = 1 */
               );
                 parameter STEP = 32'd1000000;

reg [31:0] End `SRL = 32'h0000_0001;
reg [31:0] Inc `SRL = STEP;
reg [30:0] Acc `SRL = 32'h0000_0000;
reg        Cry      = 1'b0;
reg        Add      = 1'b0;

always @(posedge clk)
begin
  End <= {End[0], End[31:1]};
  Inc <= {Inc[0], Inc[31:1]};
  Acc <= {Add   , Acc[30:1]};
   <= Inc[0] + Acc[0] + (~End[0] & Cry);
  if( End[0] )
    Fout <= Add;
end

endmodule



Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!
Quoted text here. Click to load it
bringing

Hi John,

nice :) !
would it be ok to add your code to the 'xilcores' project?
http://gforge.openchip.org
? you are of course welcome todo it yourself, just register
and I will add you to the project developers

Antti











Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!

< snip>

Quoted text here. Click to load it

Since I hadn't run a full simulation, I'd be hesitant to throw it in there
without a solid double-check but feel free to add the code yourself if you
like.  Anyone is welcome to try to leverage this approach whether for your
contest or for commercial projects.

Anyone working with SRLs in this sense should be aware: the global reset to
the Xilinx device could deassert asynchronously to different SRLs at
different clocks, knocking off the needed alignment between the SRLs.  I'd
suggest working in a global enable for all the SRLs that waits a few clock
cycles before letting everything run.  Another approach would be to use the
BUFGCE primitive in the newer families using the same delayed enable to gate
the clock to the entire FPGA rather than just the enables to the SRLs.

I also want to double check with ISE v7.1 service pack 1 and/or the hotline
to see if the minimum clock cycle for the SRLs is communicated properly.
ISE v6.3.3i gave warnings if I tried to run too fast and 7.1i didn't
complain at all about shift frequencies approaching 500 MHz.  The SRL
performance numbers have either improved *significantly* over the previous
speed files or the information got lost.  I'm hoping the former is true.

- John_H



Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!

<snip>

Quoted text here. Click to load it
hotline


For those who care, in the slower version of the XC3S50 (-4) the registered
SRL limits the frequency to 285 MHz.  I couldn't get the newer software to
report this limit; there is now a change request to make this value visible.



Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!
Quoted text here. Click to load it
to
the
gate

Hi John,

I think that isnt a problem (most cases) the SRL are not connected to global
set/reset lines at all, the initial state is loaded from configuration
memory and
not dependant on any delay in set/reset. As the clock is not propagating to
the SRL until all of them are configured then I think they all will start in
proper
sync. The only requirement is that the first CLK pulse comes to all SRL at
the same time - if they CLK inputs are on the same GCLK I think this is
always the case. Besides that Xilinx is using the SRL in seveal designs that
require sync starting and there is nothing special done. So no worries.
But it was a good think to bring to notice, in some cases this could be
and issue. As example if the SRL shift loop includes flip-flop that use
async reset, etc..

Antti
PS thanks again for your Verilog code, I think I add this just as example
of verilog coding - your header already contains the disclaimer that it is
not
fully tested/verified. As verilog example I think its brillant :)













Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!
<snip>

Quoted text here. Click to load it
 >>the global reset to the Xilinx device could deassert
 >>asynchronously to different SRLs at different clocks,
 >>knocking off the needed alignment between the SRLs.

<snip>

Quoted text here. Click to load it
 > connected to global set/reset lines at all, the initial
 > state is loaded from configuration memory and not dependant
 > on any delay in set/reset. As the clock is not propagating to
Quoted text here. Click to load it
 > all will start in proper sync. The only requirement is that
 > the first CLK pulse comes to all SRL at the same time - if
 > they CLK inputs are on the same GCLK I think this is always
 > the case. Besides that Xilinx is using the SRL in seveal
 > designs that require sync starting and there is nothing
 > special done. So no worries. But it was a good think to
 > bring to notice, in some cases this could be and issue. As
 > example if the SRL shift loop includes flip-flop that use
Quoted text here. Click to load it

<snip>

Perhaps the global reset was the wrong signal to pick on.  When the chip
first comes up, the power-up happens in sequence with global reset,
global tristate, and... global enable(?) coming up at specified times
relative to the configuration clock.  If the global enable is like the
global reset, applying asynchronously - then the occasional failure of
an SRL-based divider I implemented a couple years ago (3 slices and a
carry chain for about a 2^25 divider) could be explained.  I only used
it to blink some test LEDs so I never bothered to figure out the details
but I did see the occasional failure.  Because of the known issues with
the asynchronous deassertion of the global reset I would think the
global deassertion of the global enable would also cause the occasional
problem if not designed for.

I don't have proof-positive that it's an issue, only an SRL divider that
occasionally went bad.

- John_H

Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!
Quoted text here. Click to load it

hm thats interesting, I wish some Xilinx guys would jump in on this!

Antti









Re: divide by 2^n, n=21..37 ==> 3 Virtex Slices !!
FYI -

I tried synthesizing with XST V6.2.03...  It looks like XST accepts the

assignments to initialize the shift registers it infers, BUT it breaks
the
chain and inserts a discrete flip-flop every place there's a '1' in the
init
value.  Thus an initialization of the Inc shifter to 32'h1001_0000
appears
to infer 2 smaller shift regs and 2 flip-flops.  I don't know if this
has been
fixed in newer versions of XST.

Also - I added the following comment to the top of the file:
// The frequency is set by the 32 bit PARAMETER 'step'
// Consider 'step' as a 0.32 fractional value.
// For step <= 0.5 (1/2) (32'h8000_0000), the  output frequency is
//      f_out = (f_in * fraction)/32
//      f_out = (f_in * STEP)/(32 * 2^32)
// Examples:
//      STEP            Divisor
//      32'h8000_0000   64
//      32'h4000_0000   128
//      32'h3000_0000   170.6666
//      32'h1000_0000   512
//      32'h0800_0000   1024

John Providenza


Site Timeline