Beginning FPGA programming

I'd like to try out some ideas and would appreciate some guidance. Would a 200k-gate FPGA be enough for a simple or complex 8-bit CPU design? I have this Digilent product:

formatting link

but being totally new to hardware design I have some questions:

  1. What language would be suitable - VHDL or Verilog? Or are there others?

  1. What description style would be appropriate? Or can I break the design into modules, initially make each module with a high level description and rewrite it at a lower level later as needed?

Reply to
James Harris
Loading thread data ...

Should have said that part of the design is a large crossbar switch. It may be relevant to the number of gates and/or the design style.

-- TIA, James

Reply to
James Harris

How large is "large"? But it should be fairly simply to calculate the size of a crossbar switch.

Assuming the switch is implemented using muxes: A 2-to-1 mux uses 1 LUT A 4-to-1 mux uses 2 LUTs An 8-to-1 mux uses 4 LUTs A 16-to-1 mux uses 8 LUTs (and so on)

Multiply this with the number of ports and the width of each port to get a rough total LUT cost. (Ignoring the cost of the arbiter or other configuration logic for the crossbar.)

An XC3S200 has 3840 LUTs if I calculate correctly. So an 8x8 crossbar with 8 bit wide ports should fit fairly comfortable whereas for example a 16x16 crossbar of width 16 will consume half the FPGA.

How often do you need to reconfigure the inputs/outputs of the crossbar? If it is not very often, perhaps you could serially load configuration data into SRL16 elements to reduce the number of required LUTs.

What kind of bitrate do you need through the crossbar? Perhaps you could use a time-multiplexed bus instead?

/Andreas

Reply to
Andreas Ehliar

On 2 Sep, 23:44, Andreas Ehliar wrote: ...

Shooting from the hip somewhat I think I could start with about seven ports (to test the concept) each being 8-bit. I need to pass a strobe with each input to the switch and possibly an acknowledge fed /back/ from each output. So there would be 10 bits (8 data + 1 strobe + 1 ack) per port leading, I think, to a 70x70 crossbar.

If it were implemented on bare silicon I think it could be made from one transistor or possibly a pair of transistors per junction. The main problem in that case would be routing of inputs, outputs and controls. For an FPGA I had no idea how it could be implemented. Thanks for the tip on using muxes.

Do you mean each output would be fed from an 8-to-1 mux (to select the appropriate input line for each output)? So I would need 70 such muxes?

The crossbar's mapping of inputs to outputs would change potentially every system 'cycle' though instead of using clock cycles I intend to use handshaking to sync transfers. This is part of the concept: 1) faster since compute elements do not have to wait for the next clock cycle in order to complete their work and 2) lower power consumtion because the system is not clocked. Someone will probably tell me this has already been done, though, or is not effective. Maybe I'm reinventing a wheel and one not as good as existing ones....

The rate isn't important at this stage. I want to test a concept rather than make a production system. Yes, one option is to have tri- state output latches on the elements and, rather than have them talk through a switch, have them talk over a couple of buses instead. I may need output latches anyway.

I understand I can write this in VHDL or Verilog. Any suggestions on whether a newbie like me should use a particular style of description to implement the above? I guess I should avoid gate level to avoid too much complexity and also avoid high-level concepts so it can be readily synthesized. Is that about right?

-- James

Reply to
James Harris

Hi,

Internal tristates are gone from Xilinx devices.

There is a way of implement efficient large muxes by using DFFs and the carry-chain. The solution is using many DFFs but usually you use less DFFs than LUTs in a design. You would let each source to mux passing through a DFF with a synchronous reset. All DFFs are kept in reset state except the source that you have selected to mux. This allows you to just OR all the sources since only the selected sources is not under reset. The ORing can be done using carry-chain to even further decrease the LUT usage. It's it in fact an AND-OR structure but the AND is coming from the synchronous reset in a DFF

Example.

16-bit busses and you need a 16-1 mux. Using normal muxes would require 16*8 = 128 LUTs With this solution you would need 4 LUTs for ORing 16 sources. So the 16-1 mux would consume 16*4 = 64 LUTs and 16*16 = 256 DFFs.

So the DFFs usages is high but you have 50% less LUT usage.

Göran

Reply to
Göran Bilski

Isn't that a 7x7 crossbar with 10 bit data path , 3 bits of addressing to specify a port? total size would be 10*2*7=140 LUTs for crossbar port input mux tree.

Much different problem than a 70x70 single bit data path with 7 bits of addressing. That problem requires 70*64=4,480 LUTs.

Reply to
fpga_toys

should have asked, what's the data rate thru the cross bar per port?

There are sometimes cheaper alternatives to a cross bar, such as message switching.

Reply to
fpga_toys

I think that should be enough for a good sized CPU. Maybe for some of the support logic, also.

(snip)

That is about equivalent to asking if you should use Ada or C for software development. Either will work in most cases, one likely suits you better.

As with software, the appropriate level of modularization is up to you. Both verilog and VHDL allow modules. With many you can mix VHDL and verilog modules in the same design (along with schematic capture and a few other languages).

-- glen

Reply to
glen herrmannsfeldt

On 4 Sep, 19:38, fpga snipped-for-privacy@yahoo.com wrote: ...

Quite possibly.

Each output bit would only come from one of 7 inputs, yes, not one from 70. An output bit zero could only come from one of the input bits zero.

Reply to
James Harris

I think most alternatives are not non-blocking which is the main characteristic I want from the switching system. That said, the switching part of the idea is independent of other parts so the design of the switch should affect performance but not correctness.

Reply to
James Harris

with 7 ports total, ignoring loopback, there are 6 other sources. You can pack a 3:1 mux into the LUT and carry logic, with some tricks ... reference Carl Brannen's posts in this forum back around Dec 2001. I gave Carl's post to Guerric who took the paper and did 3:1 muxes for our d.net RC5 fpga core to cut the barrel shifter depth and area - not pretty, but dense and fast with constrained routing.

The cpu side of the project sounds more fun ... :)

Reply to
fpga_toys

The reason I asked about data rate, is another dense architecture is a 7x TDM using one 10 bit wide mux tree to sample all inputs as a front end, and backend it into 7 SRL's with the tap set to the desired channel, with the SRL's FF clocked at 1/7th the data clock. Very small crossbar design, but the external frequency can not be faster the 1/7th the core TDM rate if sync, or would need over clocking if async. Fits into about 15-20 slices with some handy work. Input mux is

10 slices, another 3-1/2 for 7 SRL lanes, and a few more for timing logic. The SRL cycle timing will limit your clock rate.
Reply to
fpga_toys

On 4 Sep, 23:16, fpga snipped-for-privacy@yahoo.com wrote: ...

It's great that you like the idea. In fact the crossbar is intended to be the main way values get from one compute element to another so it is really part of the CPU. I have no idea how I am going to control operations yet - i.e. select what operations happen and in what order, and where the results are to be stored (without the cost of passing the control info through the switch). Sure, it should be fun to work it through.

Reply to
James Harris

hmm ... forgot to add up the FF's required ... 70+ of FF's ... so a little over 40 slices :(

Reply to
fpga_toys

Doesn't that make it synchronous, when tristate buffers are normally asynchronous? If the buffer drives, or all are driven, by latches then it would seem to work.

-- glen

Reply to
glen herrmannsfeldt

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.