CAM for FPGA ...

Hi all, I would like to get into a CAM design for FPGA. Does any of you know about where can I find material on this subject? I will appreciate stuff like tutorials and reference designs (examples in any HDL)..

Thanks in advance, Moti.

Reply to
Moti Cohen
Loading thread data ...

try the xilinx web-site, they have lots of information on CAMs, they also used to have reference designs

Reply to
Daryl Bradley

Hi Daryl, I already knows the stuff from xilinx, i was actually aiming for stuff like tutorials and HDL code examples. Regards, Moti.

Reply to
Moti Cohen

Start by reading some data sheets of real CAMs:

formatting link

There are not many HDL level code examples because a CAM does not fit well in an FPGA. The basic idea is to search through a register array to match an input tag in one clock cycle. The read code is just a FOR loop. The problem is the large amount of wiring and logic generated between all the register outputs and the match bit. What will fit in an FPGA is a very small array at a very low Fmax compared to an off-the-shelf CAM.

-- Mike Treseler

Reply to
Mike Treseler

The clock cycles for commercial CAMs seem pretty long; I wonder whether a solution that stored the data in the first part of each of a large number of block RAMs, with a very simple counter for addresses and a comparator, running at a much quicker internal clock, might get better results. This feels as if it could be made a bandwidth problem, and FPGAs lack not in bandwidth.

I suppose block RAMs are seldom wide enough; this feels like a case where Altera's 512-bit memory blocks might come into their own [OK, you need four to store a 64-bit word; with 318MHz not-to-exceed speed, you probably can't fit more than 16 words per set of four blocks if you want clock rates competitive with the 20Mhz of MUSIC; but you could fit sixteen sets of four blocks on even the smallest Stratix].

Though that's 256x64, whilst MUSIC offers 4096x64 in their larger devices.

Tom

Reply to
Thomas Womack

Note that Altera has or had CAM functionality for 16 or 32 words at a time in their memories starting in the 20K family. That size was never interesting for my applications so I didn't pursue them.

If you use a BlockRAM with a counter, you're going to have a maximum delay of the number of entries in the RAM before a match: much, much longer than commercial CAMs.

Hash tables or dictionaries built from parts of the value and pointers to other values in a search tree can reduce your search times but they aren't simple.

1) If you want a 1kx64 CAM, I'd suggest the FPGA isn't the way to go. 2) Look at Altera's CAM functionality. 3) If you want small CAMs or have lots of FPGA space, let us know. We can't devine your requirements from the CAM "idea" you're asking about.
Reply to
John_H

Nor did anyone else. The function was dropped after the 20K.

Yes, "getting into a CAM design for FPGA" is a little unfocused.

-- Mike Treseler

Reply to
Mike Treseler

That's why I was proposing using lots of blockRAMs with a counter; given that the cycle times for block RAM are so much shorter than the ones I see for commercial CAMs, the trade-off might be less completely hopeless than you'd fear; the catalogue in

formatting link

has a lot of CAMs with a compare time as long as 90ns, in which time you can look up quite a lot of entries in quite a lot of block RAMs in parallel..

Tom

Reply to
Thomas Womack

Maybe there could be some possibility to build a CAM by using the integrated RAM of a FPGA / CPLD ...

What are going to use ? Altera, Lattice, ... ?

Rgds André

Reply to
ALuPin

For the (one) cam part I've seen - the latency was a number of clock cycles. I've wondered whether it could actually be doing something a little more complex like a binary tree type search. Still have to deal with don't-care type masks though, I suppose.

Jeremy

Reply to
Jeremy Stringer

Before you start the design of a Content-Addressable memory you have to answer a few questions for yourself:

Mow many entries? How fast do you need the answer? How many bits per word, and how many of them need to be searched ? (usually not all of them) What do you intend to do with multiple matches? And are all these decisions unchangable or variable?

After you know what you want, you can explore various techniques, even in an FPGA. Peter Alfke, Xilinx Applications

Reply to
Peter Alfke

I

in

If you want a fairly small array of matches and possibly multiple matches with wildbits, follow the usual CAM papers, but if you want a really large dictionary for unique matches the only practical solution is hash tables or possibly n ary trees for some uses.

For my cpu design I use an inverted page table for the MMU which really amounts to a hash table. IBM have used this forever and its now becoming more common due to huge address spaces >>32bits.

This stuff is pretty well covered in any CS text, from Knuth in the 60s to every body else reinventing the same thing.

In the CS world you can find quite a bit about hashing right next to random no generators, but for HW purposes they are misleading, they will tell you to go for tables that are prime no in size and use nice hash functions that often include modulo % with a prime no and maybe again for rehash. Then they also suggest link list chaining collisions.

If you go back to before the CS folks got too clever, the hash literature was much more straighforward to do in HW.

I won't go into it here, you will have to read up on all the issues and various workarounds.

Before you even think about HW, study the problem in plain C with data sets similar to what you expect with sequences of similar reads/writes. Choose hash functions that are mostly based on xors, shifts, and possibly rnd table lookups.

If you keep the entire memory table about 50% full/empty, the avg no of probes to get at data can be very small close to 1.7 or so but can also be variable.

When you are comfortable with the C model and the FSM that goes with it, the HW version shouldn't be too difficult.

johnjakson at usa dot com

Reply to
JJ

How big a memory do you need? How sparse is it?

For small CAMs, you can use the Xilinx SRL16 capability to essentially make custom LUTs that are dependent on the stored words. These take 16 clocks to write, but provide an address look-up in a single cycle once they are programmed. I believe Xilinx has an app-note on this technique. I think you can extend the idea to use a BRAM as well, although the write time starts getting unreasonably large.

As your allowed time for access increases, so do your options. Let us know what your requirements are, and perhaps one of us can steer you to a solution that works for your case.

--
--Ray Andraka, P.E.
President, the Andraka Consulting Group, Inc.
401/884-7930     Fax 401/884-7950
email ray@andraka.com  
http://www.andraka.com  

 "They that give up essential liberty to obtain a little 
  temporary safety deserve neither liberty nor safety."
                                          -Benjamin Franklin, 1759
Reply to
Ray Andraka

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.