How to select an FPGA size (beginner)

- P
- Paul Marciano
  
  Contact options for registered users
posted
19 years ago

Tue, May 18, 2004 6:44 PM

Hi, I am learning Verilog with a mind to implementing a simple 1980s style video generator, hooked up to a W65C02S microprocessor. Very

80s.

The processor interface will have a dozen or so 8-bit registers, and the memory interface at least a 16-bit memory address register, some state registers and a few counters.

The VGA timing interface will have a couple or three counters.

As you can instantly tell, it hasn't been thought through yet. This is day two of my "do something concrete" plan.

In looking at FGPA specifications on the XILINX site, I see gate counts and CLB counts. The specs suggest that a CLB can hold two registered bits. I figure two things:

I need a device with at least nRegisterBits/2 CLBs.
nRegisterBits/2 is probably grossly optimistic.

So my questions are:

How do you get a coarse feel for the size of FPGA you need for a design such as this?
Do CLBs map 1:1 with registered bits, or per spec 2:1.

I was hoping to be able to keep to a sub $50 FGPA, but it's not looking promising. I'd appreciate any advice or pointers from real engineers on how to go about selecting the right device.

Thanks, Paul.

- R
- Ricardo
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, May 18, 2004 7:17 PM

Hi!

Each logic element holds basicaly a DFF and a 4 input LUT, meaning that you can make any logic that uses a single FF and some logic with max 4 inputs. They do have also a carry chain, meaning that a simple counter will use one logic element per bit. Each Xilinx CLB has two o these. Each ALTERA LE is one of these. Gate count is useless. For your registers, don´t forget the embedded ram blocks. Both ALTERA Cyclone and XILINX SpartaIIe should achieve what you want at arround $20 a single FPGA chip (Digikey price for XILINX, Arrow price for ALTERA). The best option to have it selected is, usually, take ALTERA QUARTUS or XILINX WEBPACK, make a first version and see where it fits. Double it to have plenty of room for ajustments corrections and improvements. If it were a product, choose the smallest not so tight device for production.

Hope to have helped... Ricardo

- E
- E.S.
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, May 18, 2004 9:27 PM

That would be just few PALs ;-)

Have a look at

formatting link

Jan Gray put a 16bit RISC,DMA,MemoryControl & Video in a chip which is so small, it isn't even supported by xilix tools anymore ;-)

Just as an idea how to do things ...

If you're new to this FPGA business, forget about it at the beginning.

You look fine, as long as you talk about $50 for the FPGA alone.

Just start with your design, and look the place & route statistics. Then you really get a feeling what resources you use on what function, and probably you even notice, that you implement it in a not so efficient way for an FPGA.

And as soon you have some solid design, you still can run the place & route on different families & chips, then you really see what the difference is.

Hope it helps at least a little, good luck

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Tue, May 18, 2004 9:33 PM

The ratio of LUTs to flip-flops depends heavily upon your design. For current xilinx FPGAs, there are two flip-flops per slice, and either two or four slices per CLB (two for virtex, 4 for virtex2). Even the smallest FPGAs (XC2S15) have sufficient resources for a simple video text display generator if you are clever with the design (eg, be smart about the load values for the counters to make the decodes easy, as in not requiring a full decode of the counter). Depending on the size of your page and which FPGA you use, you may need some external RAM for the page memory.

Count the register bits in the design. Look at each register and take a swag at the number of inputs to the logic leading up to that register and map that to 4 input logic cells. Anything with less than 4 inputs is free, as it comes with the register. stuff with more than 4 inputs that is not arithmetic (add/subtract) adds another LUT for each 4 inputs. With that you can get a fair guess at the number of LUTs and FF's needed. It is easier to let the synthesis tool do it for you if you've gotten far enough on the design.

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- C
- Chuck McManis
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, May 19, 2004 4:19 AM

Lots of responses already so I'll just add a few bits that have been missed ...

Actually I've yet to run out of CLBs in anything I've done (not a lot of complicated stuff) however you can easily run out of "clock domains" (global clocks) routing resources (no way to connect two parts of your design), and I/O pins.

So perhaps a better way to approach this is figure out how many I/O's you need. Get the free WebPack tools and select a devices that has that many i/o's. Start developing your verilog stuff and synthesize it after each major subsystem is done. If you run out of CLB's go to the next bigger, if there isn't one bigger in the same package, try the next larger package. Then the next "family." The cost -> complexity path is:

9500 CPLD -> CoolRunner CPLD -> Spartan 2 -> Spartan 2E -> Spartan 3 ->

Spartan3E

Since you can get Spartan 3's with a bazillion CLBs I know you won't get that far up the food chain.

The latest Digikey catalog sells Spartan 2's with 50K "gates" for $22 qty 1, You can do a complete CPU + Video etc in one of those. Actually you should probably google for "Commodore One" where Jeri Ellsworth has implemented in FPGA replacements for the Commodore 64 special function chips.

On a related note, does anyone have a decent S-video circuit that one could use an FPGA to implement? Lots of FPGA kits have VGA connectors (simple RGB

Sync output) but I'd like something I could use to make video on my television ...

--Chuck

- G
- glen herrmannsfeldt
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, May 19, 2004 4:33 AM

Chuck McManis wrote: (snip)

The old trick for generating composite video should still work for S-video, except that the chrominance and luminance are not combined. Look at how the IBM CGA did it, for example. The dot clock is

4x the color subcarrier (3579545Hz), such that the result comes out a specific color depending on the phase of the result. D/A converters on each, so that you can generate more than just on/off for each color.

It might be that you can do a little better with a higher multiple of 3579545, but you will find the 4x crystals easier to find. (They still exist in any PC with an ISA bus.)

-- glen

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, May 19, 2004 11:45 AM

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- P
- Paul Marciano
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Wed, May 19, 2004 10:00 PM

That's very interesting. His VGA timing block uses half as many registers as my first go at it. I think I'm over-using registers (where wires would work fine).

Jan uses an LSFR counter for his horizontal and vertical counters. I read a thread on this newgroup about such counters from, I think, around 2001. Talking about wide 100MHz counters. The general view was that LFSR counters use fewer CLBs than binary counters, and run faster due to the lack of a carry chain, but come with caveats. A post near the end of the thread said, effectively, "use straight binary counters and let the synthesis tool figure it out - modern FPGAs are fast and the tools are good".

The VGA dot clock is around 25MHz. I need a 10-bit horizontal counter and a 9-bit vertical counter.

reg [9:0] xcnt; // counts from 0 to 799.

always @(posedge clk) if (reset || xcnt == 799) xcnt

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, May 20, 2004 1:06 AM

Using LFSRs for video counters comes from the days when 25-30 MHz performance took a lot of work, and especially with earlier families like the 3000 and 3100 that did not have carry chains (in which case, a binary counter not only was slow, but it also took up a lot of resources...more than one level of logic). 10 bit counters in modern FPGAs can run at several hundred MHz, and the carry logic is free. For 25 MHz video, it is not worth the extra headache of working with an LFSR. It gains you nothing in this case. If you look at my Dynamic VIdeo hardware paper from ca. 1996, it also used LFSRs in the video timing logic. That was done in National CLAY FPGAs, which are structurally similar to the Atmel 6000 series parts, no carry chain, very limited interconnect, and simple logic cells that did not do random logic well.

Paul Marciano wrote:

- C
- Chuck McManis
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, May 20, 2004 2:42 AM

clock

abundant

Some assumptions in there Ray :

"...If you are doing a synchronous design ..." but these days people are doing SOC designs that might have a video clock, a CPU clock, a clock driving an ethernet PHY, and a perhaps some refresh logic for their DRAM controller. I'm not disagreeing with you, my point was that "gates" are generally not the thing you run out of first.

Next up "...The virtex parts..." however the original poster wanted to stay away from the "expensive" chips. so starting from CPLDs and moving up, you're somewhat constrained there. Lots of clock resources in a Virtex II Pro, but then again in single quantities the chip is $300. :-)

--Chuck

- R
- Ray Andraka
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, May 20, 2004 4:00 AM

Even the smallest FPGAs have four or more clock nets, and that is going back to the

4000's. Clock enables can do a lot for you, in most cases there is not really a need for a proliferation of clocks. Multiple clock domains give rise to potential timing constraints issues, as well as problems crossing clock domain boundaries. Not that any of that is insurmountable, just that it requires extra diligence to make sure you do it right, and that the tools don't mess up your good work.

When I said virtex parts, I was referring to the virtex architecture, which includes all the current families. SpartanII is the virtex1 architecture, spartan2e is virtexE, spartan3 is a mutation of virtexII. The point is all of these families have ample routing. Now, whether the tools make efficient use of that routing is another question altogether (they don't, the tools now do a 'lazy' routing that only improves the routing until it is good enough. Problem is in a dense design, the circuitous routing artificially congests the routing resources which can make it appear that you do not have the routing to make timing. Poor placement can also aggravate the routing. The placer is *still* very bad at placing logic when a signal goes through multiple LUTs between flip-flops, often placing the LUTs without flip-flops far from the destinations, and well out of the way between the source and destination. The result is again unnecessary congestion of the routing resources, and pathetic timing results. Floorplanning will relieve enough of the problems caused by the placer that it is very hard to run out of routing.

BTW, the routing is generally more stressed with larger devices rather than the smaller ones. Required routing goes up roughly with the square of the number of LUTs, yet the routing network is virtually unchanged across the family. Additionally, placement is more critical with larger devices because haphazard placement will incur large routing delay penalties, and that makes the job of the router tht much harder (possibly leading to a no-route situation due to timing) With small FPGAs, it is the memories followed by logic that gets used up first. I would argue that the routing in the cheap FPGAs is even more abundant.

CPLDs are a different animal altogether. There, the routing between macrocells is generally sparse, and without careful planning it can be easy to use up the routing there.

I stand by my earlier comments.

Chuck McManis wrote:

--

--Ray Andraka, P.E. President, the Andraka Consulting Group, Inc.

401/884-7930 Fax 401/884-7950 email snipped-for-privacy@andraka.com

formatting link

"They that give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." -Benjamin Franklin, 1759

- S
- Symon
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Thu, May 20, 2004 4:25 PM

Just to reinforce what Ray says in his response, if you have designs that have a lot of input clocks, I have found that some effort up front to retime these clocks to one higher frequency masterclock (which is clock enabled for each source clock domain) can often save you from a world of pain later on. Especially if you're transferring data between domains. The DCMs are great for making this masterclock. Cheers, Syms.

- T
- thangkho
  
  Contact options for registered users
Vote on answer
posted
19 years ago

Mon, May 24, 2004 5:35 PM

A wild guest, it would well fit in spartan2 100 :)