divide by 2^n, n=21..37 ==> 3 Virtex Slices !!

Hi

below is possible the most dense divide by 2^n ever implemented on any FPGA: divide by 2^37 takes only 3 Virtex Slices! (4 slices on spartan-3)

Antti

--

--

formatting link

--

-- Divide by 2^n, n=21..37, code=0000 n=37, code=1111 n=21

-- Theory of operation:

-- bit serial NCO with 32 bit phase accumulator (2 LUT's as SRL16),

-- phase increment is fixed constant "..0001" (2LUT's as SRL16)

-- 1 LUT for 1 bit adder, 1 LUT for carry logic,

-- 1 flip-flop for carry, 1 flip-flop for 1 bit serial to parallel conversion

-- total: 3 Slices (4 for Spartan3)

-- the bit serial auto divides the incoming clock by 2^5 that is the frequency

-- for the NCO to operate, the phase accumulator gets overflow in 2^32 so the

-- maximum divide constant is 2^37

-- :)

-- snipped-for-privacy@openchip.org

library IEEE; use IEEE.STD_LOGIC_1164.ALL; use IEEE.STD_LOGIC_ARITH.ALL; use IEEE.STD_LOGIC_UNSIGNED.ALL; library UNISIM; use UNISIM.VComponents.all;

entity div_2n_21_37 is Port ( clk : in std_logic; code : in std_logic_vector(3 downto 0); fout : out std_logic); end div_2n_21_37;

architecture Behavioral of div_2n_21_37 is signal one_L: std_logic; signal one_H: std_logic; signal one_X: std_logic; signal accu_L: std_logic; signal accu_in: std_logic; signal accu_out: std_logic; signal cy_out: std_logic; signal cy_in: std_logic;

begin

-- rotating 1, 32 bit long Instance_SRL16_1L: SRLC16E generic map (INIT => X"0001") port map (CLK => clk, CE => '1', A0 => code(0), A1 => code(1), A2 =>

code(2), A3 => code(3), Q => one_X, D => one_H, Q15 => one_L); Instance_SRL16_1H: SRLC16E generic map (INIT => X"0000") port map (CLK => clk, CE => '1', A0 => '1', A1 => '1', A2 => '1', A3 => '1', Q => one_H, D => one_L, Q15 => open);

-- rotating ACCU, 32 bit long Instance_SRL16_AL: SRLC16E generic map (INIT => X"0000") port map (CLK => clk, CE => '1', A0 => '0', A1 => '0', A2 => '0', A3 => '0', Q => open, D => accu_in, Q15 => accu_L); Instance_SRL16_AH: SRLC16E generic map (INIT => X"0000") port map (CLK => clk, CE => '1', A0 => '0', A1 => '0', A2 => '0', A3 => '0', Q => open, D => accu_L, Q15 => accu_out);

-- ALU add 1 with Carry, only when not 1 accu_in clk, D => cy_in, R => '0');

-- 1 bit serial to parallel converter Instance_fout: FDRE port map(Q => fout, C => clk, CE=> one_X, D => accu_out, R => '0'); end Behavioral;

Reply to
Antti Lukats
Loading thread data ...

FPGA:

Antti -

I'm truly amazed by the approach. I think I did things right in making a Verilog 32-bit, bit-serial NCO below. (Formats nicely with fixed space fonts). Only 1 level of logic makes for sweet timing. Thanks for bringing this approach to light.

- John_H

// This is a bit-serial NCO with an initial // divide-by-32 implementing a 32-bit NCO word. // // Original credit is given to Antti Lukats // from a comp.arch.fpga posting 22 Mar 05. // // This version produced by John Handwork // with Synplify V8.0 for a Xilinx target. // - Some reg_input_delay route values (1.4ns) // were applied to Add, Cry in Synplify // to keep the logic to 1 level. // - Note that the intializations don't synthesize. // - Xilinx ISE v7.1 pushes 500MHz in the slow (-4) // XC3S50, limited by routing to the I/O cell. // - 4 Slices and an IOB is all it takes. // `define SRL /* synthesis syn_srlstyle = "select_srl" */ // module TinyNCO ( input clk , output reg Fout /* synthesis syn_useioff = 1 */ ); parameter STEP = 32'd1000000;

reg [31:0] End `SRL = 32'h0000_0001; reg [31:0] Inc `SRL = STEP; reg [30:0] Acc `SRL = 32'h0000_0000; reg Cry = 1'b0; reg Add = 1'b0;

always @(posedge clk) begin End

Reply to
John_H

"John_H" schrieb im Newsbeitrag news:Vg21e.16$ snipped-for-privacy@news-west.eli.net...

bringing

Hi John,

nice :) ! would it be ok to add your code to the 'xilcores' project?

formatting link
? you are of course welcome todo it yourself, just register and I will add you to the project developers

Antti

Reply to
Antti Lukats
< snip>

Since I hadn't run a full simulation, I'd be hesitant to throw it in there without a solid double-check but feel free to add the code yourself if you like. Anyone is welcome to try to leverage this approach whether for your contest or for commercial projects.

Anyone working with SRLs in this sense should be aware: the global reset to the Xilinx device could deassert asynchronously to different SRLs at different clocks, knocking off the needed alignment between the SRLs. I'd suggest working in a global enable for all the SRLs that waits a few clock cycles before letting everything run. Another approach would be to use the BUFGCE primitive in the newer families using the same delayed enable to gate the clock to the entire FPGA rather than just the enables to the SRLs.

I also want to double check with ISE v7.1 service pack 1 and/or the hotline to see if the minimum clock cycle for the SRLs is communicated properly. ISE v6.3.3i gave warnings if I tried to run too fast and 7.1i didn't complain at all about shift frequencies approaching 500 MHz. The SRL performance numbers have either improved *significantly* over the previous speed files or the information got lost. I'm hoping the former is true.

- John_H

Reply to
John_H

hotline

For those who care, in the slower version of the XC3S50 (-4) the registered SRL limits the frequency to 285 MHz. I couldn't get the newer software to report this limit; there is now a change request to make this value visible.

Reply to
John_H

"John_H" schrieb im Newsbeitrag news:wkW1e.1$ snipped-for-privacy@news-west.eli.net...

to

the

gate

Hi John,

I think that isnt a problem (most cases) the SRL are not connected to global set/reset lines at all, the initial state is loaded from configuration memory and not dependant on any delay in set/reset. As the clock is not propagating to the SRL until all of them are configured then I think they all will start in proper sync. The only requirement is that the first CLK pulse comes to all SRL at the same time - if they CLK inputs are on the same GCLK I think this is always the case. Besides that Xilinx is using the SRL in seveal designs that require sync starting and there is nothing special done. So no worries. But it was a good think to bring to notice, in some cases this could be and issue. As example if the SRL shift loop includes flip-flop that use async reset, etc..

Antti PS thanks again for your Verilog code, I think I add this just as example of verilog coding - your header already contains the disclaimer that it is not fully tested/verified. As verilog example I think its brillant :)

Reply to
Antti Lukats

Perhaps the global reset was the wrong signal to pick on. When the chip first comes up, the power-up happens in sequence with global reset, global tristate, and... global enable(?) coming up at specified times relative to the configuration clock. If the global enable is like the global reset, applying asynchronously - then the occasional failure of an SRL-based divider I implemented a couple years ago (3 slices and a carry chain for about a 2^25 divider) could be explained. I only used it to blink some test LEDs so I never bothered to figure out the details but I did see the occasional failure. Because of the known issues with the asynchronous deassertion of the global reset I would think the global deassertion of the global enable would also cause the occasional problem if not designed for.

I don't have proof-positive that it's an issue, only an SRL divider that occasionally went bad.

- John_H

Reply to
John_H

hm thats interesting, I wish some Xilinx guys would jump in on this!

Antti

Reply to
Antti Lukats

FYI -

I tried synthesizing with XST V6.2.03... It looks like XST accepts the

assignments to initialize the shift registers it infers, BUT it breaks the chain and inserts a discrete flip-flop every place there's a '1' in the init value. Thus an initialization of the Inc shifter to 32'h1001_0000 appears to infer 2 smaller shift regs and 2 flip-flops. I don't know if this has been fixed in newer versions of XST.

Also - I added the following comment to the top of the file: // The frequency is set by the 32 bit PARAMETER 'step' // Consider 'step' as a 0.32 fractional value. // For step

Reply to
johnp

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.