Xilinx: XST synchronous FIFO using BRAMs

Dear all,

Lately I have been trying to design and implement a synchronous FIFO using a cyclic buffer, that can be synthesized by XST to use Block RAMs.

I now have a version that works (it does so in simulation using GHDL). I have attached it below. However, since it doesn't follow the prescribed pattern for BLOCK RAM inference, the code as shown synthesizes to an implementation that uses "distributed RAM" instead.

Fortunately, I can easily get it in a form where BRAM can be inferred by enabling the currently-commented-out line marked "ENABLE FOR BRAM", around line 70.

The thing is that if I do this, the entity stops functioning as intended. This is particularly strange since (as far as I can tell) the change shouldn't affect the architecture's semantics in any way!

Any help/pointers by one of the VHDL gurus here would be much appreciated.

Best regards,

Sidney

------------- ramfifo.vhdl

library ieee;

use ieee.std_logic_1164.all, ieee.numeric_std.all;

entity RAMFIFO is port( CLK : in std_logic; data_in : in std_logic_vector(7 downto 0); data_out : out std_logic_vector(7 downto 0); status : out std_logic_vector(7 downto 0); reset : in std_logic; shift_in : in std_logic; shift_out : in std_logic ); end entity RAMFIFO;

architecture arch of RAMFIFO is

signal cur_address_r : unsigned(3 downto 0) := "0000"; signal cur_address_r2 : unsigned(3 downto 0) := "0000"; signal cur_num_entries : unsigned(3 downto 0) := "0000"; signal cur_address_w : unsigned(3 downto 0);

signal nxt_address_r : unsigned(3 downto 0); signal nxt_num_entries : unsigned(3 downto 0);

signal sig_data_r : std_logic_vector(7 downto 0);

type RAMType is array(0 to 15) of std_logic_vector(7 downto 0);

-- initialize the ram below with sensible ascii values for debugging signal ram : RAMType := ( x"30", x"31", x"32", x"33", x"34", x"35", x"36", x"37", x"38", x"39", x"41", x"42", x"43", x"44", x"45", x"46" );

signal shift_in_possible : std_logic; signal shift_out_possible : std_logic; signal shift_in_will_happen : std_logic; signal shift_out_will_happen : std_logic;

begin

status

Reply to
Sidney Cadot
Loading thread data ...

Sidney. I am not a VHDL guru, but I claim to understand FIFOs. There is an important difference between LUTRAM and BlockRAM: In LUTRAM the reading is non-clocked (Change the read address, and the new data appears immediately at the output), while the reading from a BlockRAM is clocked (change the read address, and nothing happens at the output until the next rising clock edge)

BTW, this is a long holoday weekend in the US. Most people have better things to do than watching this ng. :-) Peter Alfke

Reply to
Peter Alfke

formatting link

-- Mike Treseler

Reply to
Mike Treseler

I reviewed the results of this query, but I don't see anything that is relevant to my problem I am afraid.

The example code I gave can be inferred as BRAMs by XST by the deletion of 1 (semantically empty) line, as explained in my previous post. In that case however, it ceases to work as expected. My only conclusion is that either XST or my VHDL code must have a bug.

Regards, Sidney

Reply to
sidney

Yes, I understand this. My FIFO design has a delay of at least one clock cycle, I am not expecting the output to appear immediately.

Note, too, that my code carefully refrains from using any Xilinx-specific components. If I enable the "ENABLE FOR BRAM" line in the code I gave, XST is able to infer a Block-RAM implementation all by itself (cf. the 'VHDL Coding Techniques' chapter of the XST User's Guide), so it apparently 'thinks' that the BRAM can provide the VHDL semantics of my code in that case.

Impossible ... :) Thanks for your help so far.

Regards, Sidney

Reply to
sidney

Sidney, a synchronous FIFO (same clock for write and read) is really a trivial design, using two counters. But have you made sure that the two perverse cases are done propery: FULL and EMPTY. This is quite easy in a synchronous design, but still requires some thought. (It's really tricky when the two clocks are not related...) Peter Alfke

Reply to
Peter Alfke

Sure. This is exactly what I am using: a counter that points to the location where we are reading, and a counter that gives the number of elements currently in the FIFO. All the rest is combinatorial logic.

Alternatively, I could do the same with counters that point to the 'read' and 'write' positions- it doesn't matter a lot.

My implementation works flawlessly, including all border cases, both in hardware (when distributed RAM is derived) and in simulation. It handles simultaneous reads/writes on the same cycle a-ok, also in cases when this happens on an empty FIFO.

The only instance where things go awry is if I allow XST to derive BRAMs (by adding a semantically empty register that the XST structural pattern matcher seems to need). This, to me, indicates either a subtle bug in my VHDL (I am not an expert) or a not-so-subtle bug in XST.

I like to think I have got all cases covered. I hope a VHDL veteran can take a look at my code and point out my error. Alternatively, this issue could indicate a problem with XST.

Regards, Sidney

Reply to
sidney

For some reason google pointed to the wrong item. #13 in the thread has the keyword.

-- Mike Treseler

Reply to
Mike Treseler

So let's wait for a VHDL guru. Probably on Tuesday... Cheers Peter

Reply to
Peter Alfke

Hi,

When you enabled the line, the signal cur_address_r2 is assigned. But I can't find any uses of the signal, so XST will in any case remove the assignment.

Could you also explain more on what the failure is when you enabled the line? The error condition can explain what the problem is.

Göran

Sidney Cadot wrote:

Reply to
Göran Bilski

/SNIP/

/SNIP/

/SNIP/

Hi Sidney, I'm not a guru. I haven't taken the time to read every line of your code and I don't understand how it *ever* infers BRAM.

It seems to me with my limited knowledge that you need to both write the FIFO and read the FIFO inside clocked processes. Looks here, like you're reading it asynchronously. If you just put the line:

sig_data_r

Reply to
Unbeliever

Hi Sidney,

There is one serious flaw in your code: you don't reset the pointers, so resetting your fifo after some usage will actually bring it in a non-working state. In my follow-up post, I'll dump some code (entity + lbram and bram architectures + package + testbench). It does simulate, but I didn't try to synthesize it. Feel free to use and improve it (you can dump the fifo count by using the pointers and sacrificing 1 entry).

Regards, Alvin.

Reply to
Alvin Andries

Here's the code.

Alvin.

-----------------------------------------------------------

-- RAMFIFO.vhd

----------------------------------------------------------- library ieee;

use ieee.std_logic_1164.all; use ieee.numeric_std.all;

entity RAMFIFO is generic ( address_width : POSITIVE := 4 ); port ( CLK : in std_logic; data_in : in std_logic_vector(7 downto 0); data_out : out std_logic_vector(7 downto 0); status : out std_logic_vector(2 * address_width downto 0); reset : in std_logic; shift_in : in std_logic; shift_out : in std_logic ); end entity RAMFIFO;

-----------------------------------------------------------

-- RAMFIFO_LBRAM.vhd

----------------------------------------------------------- architecture LBRAM of RAMFIFO is

signal rd_addr : unsigned(address_width - 1 downto 0) := (others =>

'0'); signal wr_addr : unsigned(address_width - 1 downto 0) := (others =>

'0'); signal has_data : boolean := false;

signal count : unsigned(address_width downto 0) := (others => '0');

signal read_data : std_logic_vector(7 downto 0);

type RAMType is array(0 to 2**address_width - 1) of std_logic_vector(7 downto 0);

signal ram : RAMType := (others => (others => '0'));

signal full : boolean; signal empty : boolean; signal push : boolean; signal pop : boolean;

begin

empty

Reply to
Alvin Andries

he

No, it will not simply remove the assignment. It will detect cur_address_r2 as a register that can be used as a BRAM register (in contrast to cur_address_r, that is used as an expression to calculate cur_address_w, and cannot therefore be internal to the BRAM). This explains, I think, why enabling the line makes XST infer BRAMS instead of distributed RAM.

ine?

If I enable the line, the code still compiles perfectly but the resulting FIFO (now implemented by XST using BRAM) doesn't work as intended. As you see, I initialize the RAM to "0123456789ABCDEF". When I hook up the FIFO to a simple RS232-output component, and try to emit "abcdefghij...." (etc.), I will actually get:

"0123456789ABCDEFabcdefghij......."

So, the bytes I put into the FIFO actually get stored, but they are not made available on time.

Regards, Sidney

Reply to
sidney

Well, if you remove the comment line, it does :)

No, that is not the case. The code is loosely modelled after the sample VHDL code given in the "XST User's Guide" for a dual-ported, synchronous BRAM inference. You will see the same pattern there. Basically, the BRAM is registered; one needs to set the address for both ports synchronously and the contents will then be available on the next clock cycle.

It does also when the indicated line is uncommented. Try it! :)

Regards, Sidney

Reply to
sidney

Hi Alvin,

Why so? I synchronously set only the counter to zero upon reset (this is by design); there is no reason why "cur_address_r" would need to be zero as well upon reset. The location "cur_address_r" points to is immaterial- a FIFO is empty iff cur_num_entries is zero.

Thanks a lot, I will test this for sure - but I am still curious to know what goes wrong with my code.

Best regards, Sidney

Reply to
sidney

Thanks. I hope this was something you had made already and not something you did specially for me- I'd feel all guilty. :)

One small nit: your line:

push

Reply to
sidney

non-working

to

count

Oops,

I was too fast. Normally, I'd create a read and write pointer (because it's faster than what you're doing). In that case, not resetting both would kill the fifo. you're using the number of words for address calculation. This inferes a full adder, which is slower than an increment.

Regards, Alvin.

Reply to
Alvin Andries

Hi,

You could do this. But if you want to use the fifo in a fully handshaking system, you loose an important property! The fifo breaks the combinatorial handshake path and is required to leverage the increased latency in the handshake. In your design, this isn't an issue. It's just a matter of habit. As for the coding: the exercise was usefull to refresh my modelsim knowledge a bit.

Regards, Alvin.

Reply to
Alvin Andries

Why would an adder be slower than an increment. In Xilinx, I see no difference. It's all a matter of carry propagation ( and it is pretty fast in either case.) Or is this a case of generics and portability? My analogy for generics is Fast Food. Throw a coin whether you go to McDon or to BurgerKing. it's all the same mediocre or below stuff. Same with a design that is fearful of being Xilinx specific. Using only the ingredients that everybody has, means lowering yourself to the lowest common denominator. Have guts and be specific! There is more stuff than Altera ever dreamt of. Sorry, couldn't resist the plug. Peter Alfke

Reply to
Peter Alfke

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.