How to evaluate the space efficiency of a historic design.

- P
- Paul Marciano
  
  Contact options for registered users
posted
17 years ago

Thu, Jun 29, 2006 7:42 PM

Before I start let me say I'm not sure this is either an intelligent question nor an answerable one... so please be gentle.

I'm looking at implementing an 8-bit processor clone on an FPGA (purely academic exercise - I know there are free IP cores available) and am wondering how to judge the space efficiency of my design (as opposed to speed efficiency).

According to numbers found on the web the MOS 6502 has 9000 transistors.

I haven't written a single line of RTL yet, but say I implemented a

100% functional equivalent in a 200K gate Spartan3, and it uses up 25% of the resources... How would you judge that?

Would you just take your own experience and say, "That's 3x too big... try again".

Would knowing it can be done in 9000 custom placed transistors help at all in judging the relative efficiency of the FPGA implementation?

Regards, Paul.

- P
- Paul Marciano
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jun 29, 2006 7:50 PM

Just to follow up on my own post, Steve Knapp said, in 1995:

That's a great data point, but the original question still stands:

Regards, Paul.

- M
- mk
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jun 29, 2006 8:01 PM

Let me do a rough calculation here:

9000 transistors assuming 6502 had no memory can be used to generate 2250 2 input NAND gates. S3200 has 4320 logic cells (1 flop + 1 lookup table); assuming a 4 input look up table is around 3 nand2 and a flop is around 5 nand2, I'd say it has 34560 equivalent nand2s so 25% would be 8640 gates and that would be around 4x too big; again very roughly and I am sure lots of people would disagree but I think it's a reasonable starting point.

- J
- Jim Granville
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jun 29, 2006 9:43 PM

Not without taking a very large-jump.

Your best space efficiency measure, is to compare like-with-like, so have a quick look at the free IP cores you mention, and note their LUT counts, in the same FPGA you will be using.

Good ones would be PicoBlaze (all variants), PacoBlaze, Lattice Mico8, as they are all mature, and optimised for FPGA deployment.

Commercial IP cores also often openly spec their LUT/MHz, so they can also be used as yardsticks.

Now, you CAN compare your 8-bit LUT results in a meaningful way.

For Minimal-Opcode-Cores, these make interesting reading :

The venerable MC14500, a Boolean Industrial Control Unit. [ the core fits in 7 macrcells, in a CPLD ]

The Maxim MAX1464 - remarkably similar opcodes, but 16 bit data space

The IEC 61131 IL language ( assembler-like ) Example here :

formatting link

-jg

- T
- Tommy Thorn
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, Jun 29, 2006 10:59 PM

Well, when your basic building block is the transistor, you can implement a lot more logic pr transistor than when it's just a NAND gate. Add to that the fact that setting 1 LUT = 3 NAND2 is really unfair to the FPGA as there will be lots of logic that don't come near that utilization of the LUT. Thus, I'd say that it's probably only about 2x too big.

The real problem is the premise of comparing LUTs to transistors and you can in fact do much better than that. After mapping, you are told much more detail of how resources were used, say how many were FF, LUT2, LUT3, etc. Make an estimate of how many transistors you would need for each (say a LUT2 is somewhere between a NAND and an XOR).

Don't forget to keep the original 6502's interfaces the same if you want an accurate model.

Tommy

- J
- JJ
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Jun 30, 2006 12:12 AM

I'd ignore the transistor count, too meaningless. Since it was nmos, 1t could have been used as a pass gate, a dotted NOR gate input, or even a dynamic latch. I don't recall the design was static so dynamic nodes were quite likely.

If you look at an old die picture of a 6502, you will see that the ALU & datapath is a big chunk of the chip like it was for all the 8 bitters, maybe 1/3 overall. The rest was the FSMs needed to make it all work. I would hazard a guess by taking the no of known register bits in the chip instruction set architecture documentation and tripling that to include the actual FSMs.

I recall maybe 5x 8 bit, 2x 16b regs but thats OTOH, so rough total around 200 FFs so its in the same ballpark as a beefed up Pico.

FWIW, if it takes any more than that, I'd look for a more FPGA friendly design. if it actually has to run code and be completely correct, you also have to include the known bugs exactly.

Its quite possible to build a far more powerful 32b PE with around 500 Luts not all FFs used.

John Jakson transputer guy

- M
- M.Randelzhofer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Jun 30, 2006 12:58 AM

"Paul Marciano" schrieb im Newsbeitrag news: snipped-for-privacy@b68g2000cwa.googlegroups.com...

It all depends on logic optimizing. The best optimization can be done at the transistor level. So an FPGA implementation will use lots of overhead gates. But in my estimation, several 6502's should fit into an XC3S200...

Should the 6502 core be clock cycle compatible ?

If yes, its an hard work: E.g. connect an original 6502 to an FPGA, and develop your core by running test programs concurrently on the original and your core. Detect all anomalies, and correct them. If your core works perfectly, study it, and throw it away. Then start from scratch with optimizing the datapath and FSM's.

If not, you also can consider a picoblaze 6502 soft emulation for minimum 'gate count'. The program space of an 18kbit blockram should be enough, or cascade several picoblazes.

MIKE

--
www.oho-elektronik.de
OHO-Elektronik
Michael Randelzhofer
FPGA und CPLD Mini Module
Klein aber oho !

- B
- backhus
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, Jun 30, 2006 6:58 AM

Hi Paul, besides the fact that there is an existing 6502 HDL-Clone, your project is something you can learn a lot from.

As mentioned by others comparing Transistors to LUTs is not really satisfying, because you don't know what mean tricks the original designers used on their silicon. ;-)

Just think a while about this: When you are going to write the decode logic for the instruction set, your synthesis tool will generate all neccessary logic for decoding exactly this instruction set and nothing more. The original designers designed a logic that decoded the instruction set , and didn't care what happened when unknown opcodes appear. This resulted in a very small decoding circuit and lots of (oficially) undocumented opcodes which do more or less meaningful things. Similar things apply for the other modules.

In these times the use of real RS-FFs was very common. (made from NANDs with 2 Transistors per NAND you get a powerful FlipFlop at the cost of 4 transistors). Inside an FPGA you don't have these anymore. You may do some tricks with Preset and Reset on the common D-FF, though. But from the point of comparing the designs, that's a high price for emulating 4 transistors.

Have you ever asked yourself why there are three clock signals on the

6502 (Phi0 (input), Phi1 and Phi2 (output))? Complex clock schemes like this were used to enable selected datapaths to the RS-FFs. Something similar (and even more complex) can be seen in the schematics of the H316. (look at:

formatting link

and contact the guy who made this homepage if you want to know more.)

Out of the above reasons, and the different design techniques of today it makes not really any sense to compare the gate or transistor counts of yesterday whith LUT-counts of today.

If anything, compare your design to the existing clones. But keep in mind that space eficiency and performance are both have to be considered for a real comparision. Also the grade of compatiblity. (Are the undocumented opcodes implemented? They were used somtimes to speed up (mainly) games.)

So the answer to your last question is: No, not really.

have a nice synthesis Eilert

Paul Marciano schrieb: