Ideal CPU for FPGA?

In an unrelated thread on FPGACPU clock speeds, an FPGA developer stated that FPGAs are ill suited to replicating legacy CPU designs.

I never got around to asking why, but a more relevant question is:

What type of CPU architecture are FPGAs more suited to implement?

Reply to
dave
Loading thread data ...

Actually both questions are relevant, and are best answered together.

When designing a CPU a key ingredient is for the architect to be totally familiar with the strengths and weaknesses of the underlying implementation medium. This knowledge then affects the decisions and tradeoffs that follow. ("Medium" here is the design abstraction that the CPU designer will be using to implement the CPU. If you work for Intel in the x86 organization, it is CMOS. If you are using an FPGA for your canvas, then although they are CMOS chips, your abstraction is gates/LUTs/flip flops with CE/carry chains etc.)

Here is my first example: If the goal is a 100 ns instruction cycle time, and the implementation medium supports wide PROMS at 5 ns, then a microcoded implementation may be appropriate. If the PROM takes

60 ns, then hardwired decode is the more likely choice.

All things are not equal. The area, speed, power, drive strength, input load, and other factors do not have the same ratios across all implementation mediums.

Second example: CMOS, NMOS, PMOS, Bipolar, GaAs, SiGe all have different "weights" for the above list of characteristics for muxes, flip flops, memory cells, decoder trees, bus drivers, registers, etc.

What if you have ultra fast logic, but crappy slow I/O, or maybe your required cycle time is 2 orders of magnitude slower than the cycle time the hardware can support. Then maybe a serial architecture is called for.

So when you look at a legacy CPUs, you are looking at a design that has been crafted to take advantage of the medium it was designed to be implemented in. Other things affect CPU architecture. It is well established that to be continually successful in the CPU business, you need to leverage as best as possible the existing software base, and development tools. So follow-on products tend to be backward compatible, and the development tools often seamlessly will take the source for a program (maybe in C, Fortran, PL/M, ...) and compile to the newer product. Or maybe not so seamlessly. If it is a major change the products do not fair well, because once you force the customer to abandon the existing environment, then they may as well look at competitors in their product selection process. The Intel 860 and 960 are good examples of a company that lost its way for a while. What this means is that legacy CPUs often have an amazing amount of baggage that is there for backward compatibility. Some of it makes no sense for current applications, but must be there in case some legacy program comes along. The latest Pentium will still boot DOS 6.1, in 8088 mode, with silly 1 MB address space, and those inane segment registers. All this must be replicated for an FPGA equivalent to be called equivalent.

What all this means is that because legacy CPUs were designed with some type of silicon implementation medium, the tradeoffs of how the instruction set is implemented are strongly influenced by this. The other major influence is the experience of the CPU architect and the CPUs they have worked on in the past. Oh, and also marketing gets into the act, and says you have to have decimal adjust instructions, and a half-carry bit, and handle 1's complement as well as 2's complement arithmetic. Oh, and the software people say they absolutely need to have a semaphore instruction and hardware task switching support. And more ....

Now take the legacy CPU and try to map it onto an FPGA. The medium is different, and all the careful balancing that was done for the legacy CPU are totally out of whack for the FPGA implementation. It is not that it can't be done, it is that it can't be done efficiently. And you are not allowed to drop the bits that are difficult. If you do, then you are no longer compatible, and the hoped for leverage of existing tools or existing code goes out the window.

As to what maps well to FPGAs, the answer is an architecture that is designed with an FPGA as the target. This tends to be as follows: Basically RISC style register sets rather than accumulator(s), with multiple reads and writes per cycle. Pipelined operation (4 or 5 stages). Direct coding of the control bits for the data path in the instruction encoding (minimal decode trees). Parallel execution of mutually exclusive functions with the un-needed results thrown away (Add, barrel shift, priority encode/CLZ, branch calc, TLB search, etc all occur every cycle, based on the same operands fetched, but only the needed result is used).

The first CPU in an FPGA (RISC4005/R16) and XSOC/XR16 and NIOS, and MicroBlaze all look like this. They all easily outperform legacy CPUs implemented in the same FPGAs, because they are tuned to the resources that the FPGAs have to offer.

Philip Freidin

Philip Freidin Fliptronics

Reply to
Philip Freidin

Pretty good summary.

I think several of us have said this on occasion or more often, but there are times when it is justified. I believe one avionics system does ship with a Z8000 compatible FPGA design at original speed, the function had to be identical and cycle accurate which only meant

Reply to
JJ

Thanks.

Reply to
dave

Thanks JJ.

Are there any open implementations that demonstrate the multi threaded approach you've mentioned?

P.S.

How is that Transputer going?

Reply to
dave

BTW: I didn't mean to imply the first paragraphs were not relevant!!!

Reply to
dave

Read up on Suns Niagara and also Raza, he was the architect of the Athlon and later as a VC helped get Niagara of the ground then did the same again for the Raza MT Mips arch. These and my MT share quite a few ideas but I go off in a differnet direction esp with the inverted MMU and Transputer stuff.

I don't know of any open source MT designs, perhaps mine will be or won't. I am sure that if the opensource fans really want to do it, they could figure it out, but they should target to FPGA and not ASIC, ASIC perf comes for free after that, not like many open cores ever go to ASIC. I suspect though that since most comp arch students use the H & P textbooks that is entirely single threaded, thats all we will see from students.

At the moment I am spending most of my time on the C compiler just trying to get the function stuff wrapped up and join up with code packing. ASAP I go back to getting the ISA simulator to validate what the compiler will give it then update the RTL code and then the Verilog and then the memory interface for RLDRAM or atleast a 8 way threaded BlockRam/SRAM model of it (with artificial 20ns latency). Even my starter S3 board will be able to model the idea of threaded cpu & memory even with its meager SRAM.

Since V4 and Webpack 7 have been out, I only today tried to redo the P/R of the V2P on that, so far results not too good, 320MHz is now barely 200MHz, more timewasting there. Even V4 doesn't give me anything yet in perf, will have to redo the flow again but the SW comes 1st.

by for now

JJ

Reply to
JJ

Primitive CPU architectures are easier to implement on an FPGA chip. ;-) FPGAs generally have problems with multiport memories -- it's easy to implement a 1-write N-read port memory bank, but if the number of write ports is higher, you will have a big problem. And since such blocks are commonly used by DSPs (and remaining architectures that exploit massive parallelism, but not VLIW-like, i.e. with complex inter-stage datapath dependencies), its hard do implement them using on FPGA.

Best regards Piotr Wyderski

Reply to
Piotr Wyderski

Hi Philip,

Hum. Some, perhaps...but "crafted" and 8088/8086 in the same paragraph, much less sentence one doesn't see much. There is such a thing as simply a bad and/or poorly implemented architecture.

May be not so "crafted" after all, eh? Parlez-vous 68k? ;-)

Regards,

Austin

Reply to
Austin Franklin

Well I would say for their time the 8086, Z8000, 68k were all reasonably well crafted as VLSIs go, but all in different ways, I know this for a fact since I got to reverse engineer them transister by transister at least for the more repeated or interesting blocks and all very different styles of circuit and logic design too. At the time Intel had a budget of around 17K devices, Motorola on the other hand budgeted around 60k or so though alot of that was microcode rom. That also meant the 68k was bound to cost several times what the 8086 did which didn't matter much to workstation vendors at the time.

As for architecture Intel was only upgrading the 8080 to a 16bit arch, it was not intended to be the entire future of the world computing as it presently is, blame IBM for that, but then no one sees much further than 2-3 years do they. Intel was aiming the 432 for that purpose. The

68K had the luxury of not being backwards compatible with anything and had a fair bit of clean design.

Remarkably all these design teams had very little CAD available to them for most of the general designs except for spice for absolutely critical cells, and layout digitizing. Today a novice designing an FPGA cpu could do billions of times more logic & arch simulations. Thats why those designs had 200 man years in them.

johnjakson at usa dot com

Reply to
JJ

Hi JJ,

Perhaps from a physical layout, but it was the architecture I was commenting on.

In a terribly feeble way! There were some much better architectures available (and do-able) at the time.

I'm not sure I agree with that. I worked on a VLSI design system from ComputerVision back in 1978...

Regards,

Austin

Reply to
Austin Franklin

Hindsight is a wonderful thing. I suppose the questions are:

What are these "better" and "do-able" architectures? Were they better then for the same reasons we consider them better now? Why don't you think they were adopted at that time? Are they more suited to implementation on today's FPGA technology?

Very little CAD is not the same as no CAD. There was CAD, it was just too expensive to be common place.

Reply to
dave

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.