Dual Port RAM

I've got the definite impression that we are not on the same page when we talk about what it means for something to be 'standardized'. The usage I'm talking about is that something 'standardized' if it is interchangable with something else that accomplishes that same function but not necessarily in the same way. Although the fifo functionality is pretty low level, since you suggested it I'll use it for the example....it will also be a good indicator of exactly what I mean about the lack of standardization even on something that is easy to describe.

Copied below (without permission of the authors but only done for demonstration purposes) are the entity definitions for 'lpm_fifo' (an EIA standard) and 'sync_fifo' (Mike Tressler standard). Keeping mind what it is that a fifo is supposed to do (the function) and the intended USER of the fifo who has to deal with the interface to that function (i.e. signals and generics on the entity) let's peruse these two.

Both have generics for specifying width of the fifo...but can't agree on the name of that generic parameter.

Both have generics for specifying the depth of the fifo....but again can't agree on the name of that generic parameter. Both are bad from the standpoint of the user since Mike makes you specify depth as an address length (which is really log2(depth)) and lpm_fifo makes you use two parameters (the depth and log2(depth)). Each has their own unique way of not really telling you that their parameters really are simply log2(depth).

Lpm_fifo allows you an option (Lpm_Showahead) for having the output show up as soon as it is available or only after in response to the read command. sync_fifo has no similar option. Having used fifos for a number of years I can say that both forms are generally useful depending on the application and performance requirements so in that regard sync_fifo is less useful than lpm_fifo.

Lpm_fifo has some mystery parameters (lpm_type and lpm_hint) that are intended to be used to pass information further down the tool chain. From the standpoint of a user of the fifo, they just clutter up the interface and are no value. sync_fifo has nothing comparable so in that regard it is better tham lpm_fifo. LPM intended those to be used to pass info on to the fitter in some way but those strings are all going to be 'vendor specific' so no kudos there.

Users of a fifo need to reset it, write to it and read from it. Again here there is no naming consistency on any of the signals with the exception of the 'full' and 'empty' status outputs. sync_fifo also has some two forms of reset (rst and init) that apparently Mike has found to be useful that the rest of the industry has not seen to be of enough value to include. Maybe sync_fifo is more useful because of this...or maybe it was a band-aid for a particular nitch problem but in most cases I would likely end up tying them to the same signal or tying one of them to '0'. lpm_fifo also has two forms of reset (async and sync); I won't rant on the dubious value of async resets maybe some find value to it.

Although 'full' and 'empty' outputs are usually all that most users of a fifo will need, there are more than a few occasions where more status is needed. 'sync_fifo' does not provide anything, 'lpm_fifo' goes to the other extreme and provides a raw dump of the count. What is generally more useful would be to have a number of flag outputs where the depth where the flag gets triggered can be specified by a generic value (like half full, three quarters full, .98% full, etc.) Give both a failing mark on this for not giving much thought to how fifos really get used.

Users of fifos tend to think of 'writing' and 'reading' the fifo, not 'pushing' and 'popping' it as sync_fifo does. In fact 'push' and 'pop' give the implication of a stack (which is a "last in first out" memory) not a fifo (which is "first in, first out") which would make one question whether you're even using the right function so demerits to sync_fifo on that one.

So now that the long winded critique of two definitions of a fifo has been completed, so what? Why does any of this matter? Mike would probably agree that his 'sync_fifo' could very well have been implemented with the exact same set of parameter and signal names as lpm_fifo but he chose not to. So what is the impact of that decision? Well, if you want to switch from lpm_fifo over to Mike's sync_fifo because maybe it is so much better you've got a bunch of editing to do (every usage of lpm_fifo)...and yet sync_fifo implements the EXACT SAME FUNCTION. Even worse, maybe you'd just like to evaluate that supposed performance improvement.

You're not getting anything more, and maybe you're simply trying to figure out if you get better performance using Mike's implementation but you've got a bit of effort in front of you to do that evaluation. Why? Because the interface to these two things that implement the EXACT SAME FUNCTION are completely different. Which is the situation that we're in today. Some FPGA vendors support the one sorry standard that exists (LPM) but are not doing much to improve it and move up the ladder to more higher level functions while one large FPGA vendor (namely 'X') does not directly support it that one standard...but they do provide their own unique way to implement the EXACT SAME FUNCTION.

Peter Alfke goes off on his rant about his X's hard coded fifo runs at 550 MHZ but doesn't seem to grasp the fact that he is talking about performance and that that hard coded fifo is implementing the EXACT SAME FUNCTION. I find it hard to believe that the Xilnx tool set could not, once it encountered the entity called 'lpm_fifo' that it could plop in that hard coded block. Maybe there are some usage restrictions that would prevent that hard coded block from always being instantiated but hey, isn't that what synthesis is supposed to figure out? How to best implement the specified function? Perhaps Peter can explain to everyone how the interface to that Xilinx hard coded fifo differs in some fundamental manner that the interface to that EXACT SAME FUNCTION could not be the same.

Last, I would like to thank Mike for making available his code on his web site. His example designs are a definite 'good' thing. My comparisons of Mike's fifo implementation to LPM fifo is not meant to imply that his is any way worse just that it goes to show you that even with something simple, agreement on the interface to the EXACT SAME FUNCTION differs for no rational reason. Mike being able to reuse 'sync_fifo' wherever he needs fifo functionality though is no different than any other FPGA vendor pushing their 'standard'.

Look at the IC industry and how they can standardize on things and fairly ask yourself why IP can't follow that model. Hint: The answer is not "Well IP is not the same as ICs".....if you analyze it rationally I think you'll see that the answer is more along the lines of "Well dang, it really could be standardized and still come out with new stuff as well as improve on old stuff".

KJ

------------- entity LPM_FIFO generic ( LPM_WIDTH : natural; -- MUST be greater than 0 LPM_WIDTHU : natural := 1; -- MUST be greater than 0 LPM_NUMWORDS : natural; -- MUST be greater than 0 LPM_SHOWAHEAD : string := "OFF"; LPM_TYPE : string := L_FIFO; LPM_HINT : string := "UNUSED"); port ( DATA : in std_logic_vector(LPM_WIDTH-1 downto 0); CLOCK : in std_logic; WRREQ : in std_logic; RDREQ : in std_logic; ACLR : in std_logic := '0'; SCLR : in std_logic := '0'; Q : out std_logic_vector(LPM_WIDTH-1 downto 0); USEDW : out std_logic_vector(LPM_WIDTHU-1 downto 0); FULL : out std_logic; EMPTY : out std_logic); end LPM_FIFO;

entity sync_fifo is generic (dat_length : natural := 16; add_length : natural := 8 -- 7 for 127x16 8 for 255x16 );

port (clk : in std_ulogic; rst : in std_ulogic; -- powerup init of pointers, flags init : in std_ulogic; -- synch init of pointers, flags -- (but not data) push : in std_ulogic; -- data_i latched on edge after push -- ok to push and pop at same time pop : in std_ulogic; -- hold one tick, read data_q while low data_i : in std_logic_vector(dat_length-1 downto 0); data_q : out std_logic_vector(dat_length-1 downto 0); -- readback full : out std_ulogic; empty : out std_ulogic);

end sync_fifo;

Reply to
KJ
Loading thread data ...

Well, KJ, first of all, tone down, and dont second-guess how much I can grasp. "Peter Alfke goes off on his rant about his X's hard coded fifo runs at

550 MHZ but doesn't seem to grasp the fact that he is talk> "Mike Treseler" wrote in

messagenews: snipped-for-privacy@individual.net...

we are not on the same page when we

Reply to
Peter Alfke

I call them as I see them. You were the one who several times went on with the sales pitch on the 550 MHz fifo when we were talking about interface standardization, not performance.

Good, now let's move on.

Even now, instead of actually responding to anything that I had posted you are rambling on about things that Xilinx has done inside the various X products. It reads again like a sales pitch, give it a rest....the relevance of what you have to say to my post is completely missing.....maybe you intended this for some other post, who can tell?

I'm thinking that the programmable flags idea came not so much from listening to customers but from the discrete fifos that existed long before in the industry that also had 'programmable' flags. A bit of digging would've led you to that what most customers really needed was not programmability of the flag levels but being able to specify as a generic parameters the fifo fill level and how many flags were needed. The discrete IC fifo guys couldn't really do this so they had to make it programmable, but in the soft IP world of CPLD/FPGAs you could....but instead chose to copy industry parts. Maybe that's a good decision, maybe not but I'll bet not having run-time programmability is acceptable in most situations. For the others, this would be a different function (different entity).

Fifo interfaces are all pretty intuitive, so why does X not support the lpm_fifo interface? Is it not intuitive?

Why not? As a user I want to multiply and divide clock frequency, skew them if necessary and possibly mux them together. Sounds like something that can be standardized to me. Toss in standardization of the various memory controllers as well while you're at it since the memory devices are standardized. Remember what standardization means....interchangable with somebody else equivalent function.

That's your opinion (about being so far ahead). Does that statement also imply that Xilinx is just so far ahead that they can't bother with component standardization? That's another reading I could take away from that statement and seems to be what you're suggesting as well.

Maybe. Are you suggesting that the lpm_fifo function wouldn't run at your horn tooting 550 MHz? Would be a bit of a bummer to think that you couldn't.

I don't want to design FPGAs, I want to use them better and not be re-creating the wheel either and I think darn near all of your users would too. The Xilinx way appears to be to avoid adhering to or encouraging any design standards for IP creation other than to imply that the X way is the only and the best way.

Yep

KJ

Reply to
KJ

KJ, is it so difficult for two reasonably intelligent engineers to communicate ? We are forced to be very efficient (small size) and very (fast high clock rate), that's what the market demands. Sacrificing performance by adhering to one or another of the so-called interface standards is, at best, our second or third priority. You may not like it, others do. 'nough said. Peter Alfke

messagenews: snipped-for-privacy@f16g2000cwb.googlegroups.com...> Well, KJ, first of all, tone down, and dont second-guess how much I can

call them as I see them. You were the one who several times went on with

of actually responding to anything that I had posted you

thinking that the programmable flags idea came not so much from

does X not support the

skew them

that statement also

wouldn't run at your

and not be

Reply to
Peter Alfke

No, and I might suggest that you would not find it so difficult either if, when you make your posts, you would give the courtesy of replying to what was said in the post that you're replying to instead of going into a speech or sales pitch. Speeches and pitches can be interesting and can often spawn their own sub-topics, but that doesn't make them relevant to the discussion. One technique is to not top post.

Among other demands as well. Customers are like that. We all have them.

And yet the basis for this statement is "because Peter said so". You haven't been able to offer even a single example to back up what you've said. You have offered no benchmarks, no references, nothing to back up your statement that one would be "Sacrificing performance by adhering to one or another of the so-called interface standards".

Others are free to (dis)like things for no rational basis if they so choose. Personally I think you're shortchanging the intelligence of a good chunk of your users and power users with the statements you've made in this thread.

KJ

Reply to
KJ

Maybe I'll regret jumping in here, but here's my take :-)

We designers want standard interfaces to FPGA bits and bobs.

FIFOs are a good example, no-one *wants* to design their own. There is a "standard" interface defined, but it's not used across all vendors, which makes life painful to port...

Also, mostly we don't need lots of clever configurations.

If I want to use Xilinx's FIFO, I use code like this (with some snippage):

-- FIFO18: 16k+2k Parity Synchronous/Asynchronous BlockRAM FIFO BlockRAM Memory

-- Virtex-5

-- Xilinx HDL Libraries Guide, version 8.2.2 FIFO18_inst : FIFO18 generic map ( ALMOST_FULL_OFFSET => X"0080", -- Sets almost full threshold ALMOST_EMPTY_OFFSET => X"0080", -- Sets the almost empty threshold DATA_WIDTH => 18, -- Sets data width to 4, 9, or 18 FIRST_WORD_FALL_THROUGH => false) -- Sets the FIFO FWFT to TRUE or FALSE port map ( DO => DO, -- 16-bit data output DOP => DOP, -- 2-bit parity data output EMPTY => EMPTY, -- 1-bit empty output flag FULL => FULL, -- 1-bit full output flag WRCOUNT => WRCOUNT, -- 12-bit write count output DI => DI, -- FIFO data input, with determined by DATA_WIDTH DIP => DIP, -- 2-bit partity input RDCLK => RDCLK, -- 1-bit read clock input RDEN => RDEN, -- 1-bit read enable input RST => RST, -- 1-bit reset input WRCLK => WRCLK, -- 1-bit write clock input WREN => WREN -- 1-bit write enable input );

As an aside: And what's this DI/DO and DIP/DOP about? I ask for 18 bits, I just want 18-bits!

Now, not inly is this non-standard, if I want a different size, I have to instantiate a different component with a different name!

The same goes for RAMs.

Anyway, if I want to use most others' I use this: LPM_FIFO_1: entity work.LPM_FIFO generic map ( LPM_WIDTH => LPM_WIDTH, LPM_WIDTHU => LPM_WIDTHU, LPM_NUMWORDS => LPM_NUMWORDS, LPM_SHOWAHEAD => LPM_SHOWAHEAD, LPM_TYPE => LPM_TYPE, LPM_HINT => LPM_HINT) port map ( DATA => DATA, CLOCK => CLOCK, WRREQ => WRREQ, RDREQ => RDREQ, ACLR => ACLR, SCLR => SCLR, Q => Q, USEDW => USEDW, FULL => FULL, EMPTY => EMPTY);

Now, I agree with KJ, the LPM_WIDTH and WIDTHU thing is daft. The TYPE and HINT don't usually help me much, but the rest of it maps directly onto Xilinx's template doesn't? There's a different entity with async clocks, for those cases, but chnaging between async and sync clocks happens a lots less often than between sizes of FIFO!

And if I change the size, the SW is smart enough to instantiate the low-level stuff in the right way for me.

If I need absolute control, then sure, I can go and instantiate those myself, but for the majority, all I need is a straightforward FIFO!

And again the same applies for things like LPM_RAM.

They all do the same thing. Why can't I just write the same thing for all of them?

After all, I can write a bit of VHDL to wrap it, which get's me there

- but why do *I* (and everyone like me) have to do this, why can't the vendors do it for me?

Maybe I'm missing something fundamental, but when I use a FIFO, all I want is a simple FIFO with a consistent interface. Do others have a different view?

Perhaps we should all get together and write the wrappers and docs for this between us for the benefit of future generations ( :-)?)

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
 Click to see the full signature
Reply to
Martin Thompson

Hi KJ,

I wish I could be that idealistic! :-) Here are two questions to which I don't have precise answers:

(1) How many DDR SDRAM chips were sold worldwide in 2005? (2) How many FPGA IP cores were sold worldwide in 2005?

With or without actually finding the answers, now speculate as to the relative importance of standardization for these two application areas.

It's not one-size-fits-all though; in fact, nothing really is. That is half of the problem. Many of the more interesting IP cores use variable and/or non-standard word sizes, have non-obvious flow control requirements stemming from extensive pipelining and buffering, and so on.

If every FPGA user just wanted to build SoCs based around a microprocessor with a big old local bus with all the peripherals hanging off it, then life would be much easier. As it is, we try to provide the most flexible interfacing options possible, so that no-one is prevented from using a particular core just because their system is wired together in a way that we hadn't thought of.

In an embedded system-builder context, brand A have their proprietary bus, whereas brand X's bus structure did at least come from IBM (which in my mind makes it "somewhat more standard"; you may of course disagree).

When it comes to "raw" IP blocks, the basic, pervasive idioms of two-wire synchronous handshakes and simple flow-controlling pulses are simple to comprehend and use. The names and exact timings may be different but if you go and compare the datasheet-specified interface for a brand X Turbo Encoder and the brand A equivalent, you'll probably find there's very little difference.

I would respectfully disagree, certainly as far as embedded processing is concerned. Sure, there are many other important factors, and many people do treat - or try to treat - FPGAs as commodity products, but I would certainly not make a big A vs X purchasing decision without making a diligent study of the relative merits of the microprocessor interfaces favoured by each vendor.

Cheers,

-Ben-

Reply to
Ben Jones

It's nice to hear another opinion, but.... I have a software guy that wants us FPGA designers to standardize on 16550 UARTs for our embedded designs. WTF?! This device is from a quarter century ago and made to work with serial links that are quite a bit worse than the embedded applications I have today.

WHY is a standard interface desired when much of what "was" there can become completely superfluous?

Personally - and this is opinion as it is with the other posts - I want to get the best performance/cost/size balance I can strike. Why should I burden my design with bloated code to pacify others who want to blow my performance, increase my cost, or bloat the size?

If you want a standard interface, make one. PLEASE don't force me to use silicon with hard-coded features that are much less than what they could be all for the sake of conformance.

The only time I would care to see a standard interface pursued is if there is ZERO impact to my engineering tradeoffs.

- John_H

Reply to
John_H

I couldn't agree more, John. I have no use for primitives, or IP for that matter, that has been crippled or bloated in order to comply with some arbitrary "standard" that is not likely to match my needs anyway.

Much more important is complete, concise, and clear documentation, visibility into the guts of the IP, and tools that work properly.

Reply to
Ray Andraka

I think you've hit the nail on the head. There is no money to be made by solving this problem, and spending time on it won't speed up the hot project of the day. There would have to be some other motivation.

-- Mike Treseler

Reply to
Mike Treseler

Well, to make life easier for *some* people. But read on...

I agree - you shouldn't.

I (and I think KJ) don't want the silicon defined by the interface. If you need all the hairy features of the V-5 FIFO, then it's there for you to use. But for those of use that just need a simple FIFO with a write and a read and some flags, then I shouldn't need to instance a different block for each bit of silicon I target. The interface *to my VHDL (or verilog)* is what is constant. The silicon can do what it wants. And if in future if ends up not able to meet my simple interface (which I doubt for a memory or FIFO type thing) then I'll accept I need to change things.

All this applies in spades to RAM blocks, most of the time, all I want is an address bus (or two), a write enable and a data bus (or two). The Xiinx way forces me to instantiate specific sizes of blocks, which change from generation to generation, when all I want to do is say I need a 2Kx8 RAM.

And that is what I would want to see also. Easy for those who can gain from standardisation. "Power" available for those who need it.

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
 Click to see the full signature
Reply to
Martin Thompson

Martin, starting with Virtex4, the same RAMB16 primitive is used for all variants of the block RAM. It is parameterized with generics, which makes it a lot easier to instantiate a RAM that is sized according to the need. I find it still needs a wrapper, but at least that wrapper doesn't have to contain primitives with every combination of port sizes. I use a wrapper that automatically generates a RAM array with the appropriately sized ports on individual BRAMs based on the widths of the data ports and address ports. It also hides the parity bit/data bit distinction, plus it gives an easy method to porting to a different family (replace the wrapper).

Reply to
Ray Andraka

Thanks Ray, that's good to know - I haven't V-4ed in anger yet...

I still maintain that Xilinx should provide us with a useful wrapper though, rather than us all having to do our own :-)

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
 Click to see the full signature
Reply to
Martin Thompson

I agree, however with the caveat that it may still not fit all users. Mine, for example accepts an unconstrained integer array generic for initialization values. It puts the contents of that N element array into the first N locations of the RAM and zero fills the rest. Since the width can be greater than the 32 bits represented by VHDL integers, I have a second integer array generic for separately initializing the top half of an upto 63 bit wide composite memory. The wrapper instantiates RAMB16s with the depths determined by the number of address bits on each port, and then puts in as many RAMB16s as are required to accommodate the widths of the data ports. The entity for my V4 wrapper is:

entity dual_port_ram is generic( SIM_COLLISION_CHECK : STRING := "NONE"; DO_reg: integer:=0; allow_pbits: integer:=1; reset_data: integer:= 0; --not usable with do_reg=1 RAM_data: int_array:=(0,0); --initial data RAM_data_hi: int_array:=(0,0)); --initial data

port( CLKA: in std_logic; CLKB: in std_logic; SSRA: in std_logic; SSRB: in std_logic; WEA : in std_logic; WEB : in std_logic; DIA : in std_logic_vector; DIB : in std_logic_vector; DOA : out std_logic_vector; DOB : out std_logic_vector; ADDRA : in std_logic_vector; ADDRB : in std_logic_vector ); end dual_port_ram;

Reply to
Ray Andraka

This is always true :-)

That's useful!

And presumably that works with more than just V-4, and also on other brands?

I like the idea of using unconstraing vectors on the ports - I got in a habit of generic-ising those sorts of things early in my VHDL life and then constraining port widths based on that.

I'm slowly revisiting older code to your sort of style!

The upshot of all this seems to be we all have our own wrappers for the standard blocks, which probably only differ very slightly (if at all!)... oh well - I guess that's the way the world has worked out, even if it seems inefficient in term sof engineering effort!

Thanks for your contributions (as always - they're full of useful nuggets!)

Cheers, Martin

--
martin.j.thompson@trw.com 
TRW Conekt - Consultancy in Engineering, Knowledge and Technology
 Click to see the full signature
Reply to
Martin Thompson

What sort of dual port ram are you trying to infer? Distributed or block?

What does this reset pin meant to do? Distributed RAM doesn't play with reset, the block memory has a reset which only resets the output registers.

Cheers, Syms.

Reply to
Symon

Block ram. I used the 2 clock 2 read/write port template.

Ah. I was intending to use the dual port ram as a register file. The register file needs 3 registers (out of 18-20) initialising at reset. Is there any way to do this using any of the dual/single port RAM's?

I was tring to compare a register based register file with a dual port ram based register file.

Looks like it may not be possible.

Reply to
Andy Botterill

See below - in FPGAs, yes. In other technologies, no. For this problem on this scale, the technique I usually use is to build the DPRAM in the ordinary way, and then to build a regular collection of flip-flops with one FF per RAM location. These FFs are reset to 0 at power-up, and are changed to 1 whenever the corresponding RAM location is written - an easy thing to arrange. The trick, of course, is that when you READ one of the RAM locations you first check the "have I been written" register - and, if it's zero, then you substitute the desired power-up reset value in place of the RAM contents.

HOWEVER, you *can* initialise the contents of any Xilinx RAM at configuration (I think - betcha there are some exceptions I don't know about). Trawl the docs for "memory initialization file" or somesuch. Plenty of people here will have that information at their fingertips; I don't.

--
Jonathan Bromley, Consultant

DOULOS - Developing Design Know-how
 Click to see the full signature
Reply to
Jonathan Bromley

Only three registers need a reset change of state. That will simplify the code. I'll give it a try and get back to you.

Hopefully I won't have to use this method.

Thanks for everyones help.

Reply to
Andy Botterill

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.