Greetings. I am new to FPGA programming. I am seeking to create a 40-bit 80386-like CPU core with a 32-bit and 64-bit FPU with 16 registers, a 128-bit four- and two-way 32-bit and 64-bit vector FPU engine with 16 registers, 60 additional general purpose integer registers, and a six stage execution pipeline.
I am wondering if somebody can guide me into which Altera product I should use for this CPU design? Thank you in advance.
There is some interesting software they provide for working with Altera FPGAs called Quartus. It will let you synthesize your designs and measure the size. Then you can tell what size part it will fit. No guesswork required. :)
The same package has a simulator to allow you to do a lot of testing without ever buying a chip or board.
So design your chip, do *lots* of simulating to verify that all the instructions work. Optimize your architecture and then, only then consider which chip you need to buy.
You might want to look for one that has hardware floating point support since you plan to implement floating point. But the size words they implement may not be the size you want so you may need to do that in the fabric anyway.
For a point of comparison, we have a 64-bit MIPS-like CPU core, with MMU, L1/L2 cache, 32-bit floating point support, capability unit (32x 256-bit registers), and a 256 bit datapath to DDR2 memory, and it runs at 100MHz in about 80% of a Stratix IV GX230 (230K LEs). Picture and numbers on page 10 here (the Stratix IV doesn't have hard floating point):
This is not particularly optimised for size (or speed), and when you put more things on the FPGA the area usage for the CPU shrinks as the tools work harder. We're trying to make it fit in a Cyclone V SoC part (5CSXFC6D6F31C6N) but haven't yet trimmed it down sufficiently.
The 10 family apparently supports hard floating point: the Stratix 10 is not available yet but the Arria 10 might be worth a look.
The Arria family is also worth looking at from a cost per LE point of view: according to my graph on page 2 here:
it works out somewhat cheaper LUT-for-LUT than the Stratix parts.
I will be designing my CPU in several stages. I have the first stage designed but not debugged. Am working on that this week (Lord willing).
Would it be preferable to design and test each in Quartus-II only? What about DRAM controllers? And Ethernet? I plan on using Ethernet for remote debugging during development and testing. And do I want a dev board with VGA out to make it easier? Or should I pass everything through the Ethernet port?
Thank you in advance for your assistance. It is greatly appreciated. :-)
I'm not sure what would be easier. I've never worked with Ethernet in an FPGA before. I would think you could get a VGA interface working faster than an Ethernet interface will all the software required. Are you planning to run Linux on it or will you be coding to the bare metal without an OS? Will your Ethernet interface be a full custom or are you going to use an Ethernet module that provides a serial link to your CPU?
I plan on creating a simple buffer which receives internal pipe stage information at each CPU clock, and then transmits that data back out in real-time to some a port being monitored by my debugger. This will allow me to then constantly monitor the machine state. I can also then encode external source level single-step debugging, assembly tools, make even program changes in real-time, etc., to complete the entire toolset.
I developed my own kernel and primitive OS back in the late 90s, early
00s. I will be using a modified version of that as the ISA I'm using is somewhat different than the actual 80386 ISA in (except in compatibility mode, which I will probably add last).
I'm thinking I would also like to figure out and test timing on a fixed SVGA video mode for a 1920x1080 signal at 60 Hz, and just hard-code that video mode and use it for everything the machine does until I can later add other modes. And the same for a DRAM controller so I can have that consistent and normal access to memory, Ethernet, and VGA throughout all of my development.
I've since given this some additional thought and have decided I'll transmit everything over Ethernet. In this way I can create several virtual screens and simply write to memory ranges and have them be transmitted when possible.
VGA is pretty easy, and there should be already done examples. You need row and column counters, and gates to generate the hsync and vsync. Output data from display RAM, either directly or through a character ROM. It is FPGA outputs, through resistors, and to the VGA pins.
For ethernet, it is usual to put a PHY chip on board, which has the analog circuits that you can't build on an FPGA, and interface to that.
For RS232, the FPGA pins go to level converters to convert to the appropriate voltages, but all the logic (UART) is in the FPGA.
The ao486 project on opencores.org contains a VGA controller along with other controllers. I figured I may look at those when the time comes. However, from back in my OS driver development days, I remember learning about timings for SVGA. Some of the registers driving the 3dfx voodoo3
2000 video card I was using at that time had to be programmed directly (when no driver was available). I also remember working with the HGA Hercules monochrome graphics adapter timings. I can see how it all fits together now.
I purchased a TI PHY chip today (DP83848C 10/100 model) in a pin package that I believe will allow direct connection to the Altera dev board. I downloaded the protocol specs and it was very straight-forward. I found a better solution earlier from Silicon Labs, but I couldn't find an inexpensive connection board that didn't require something like solder masks ... so I went with TI's already-together product.
For point-to-point inter-LibSF-device communication I think I'll simply invent Libernet, a simple protocol using five pins (tx-clk, tx, rx-clk, rx, and gnd) and make the communication protocol as simple as possible.
I like the idea of Manchester coding, however, and may consider using that as well to keep pin count down at the expense of some logic on the sending and receiving end.
I am absolutely loving this project so far. I think it's my all-time favorite.
I can't seem to find it now, but someone recently posted a link to price/LUT vs size data in graph form. It gets a bit crowded at the bottom end, but appears to show there is no real price difference between the two brands. The data does include a very small number of other devices than X and A, but not enough to be useful.
In fact it is interesting that the prices get very crowded at the low end jamming up the graph. I suggested that he present the data with a logarithmic Y axis or even in log-log form. Clearly competitive market forces at work.
There you go! Great job. Interesting how the members of a family are more evenly spaced on a log scale for size.
BTW, where did you get your pricing data? One thing I have learned about FPGA pricing is that list price means nothing if you are buying any real quantity. To get a design win, especially in a new family, they will bid very aggressively.
Pricing data is downloaded from Digikey. There were 17000 prices in all, so to compress the dataset into something meaningful I grouped all the parts of the same number of logical elements and the same subfamily - eg Cyclone IV GX - and took the median of the price of each group. That gives some measure of the middle of the range of different packages, temperatures, speed grades etc without too many of the outliers (eg Mil Spec with crazy prices).
Obviously in any real situation you should have coffee with the salesman and explain how many million you're going to buy but can't afford just yet, but there's no way to do that comparison objectively. Digikey prices aren't ideal, but they are an example of what you would pay if you want one FPGA tomorrow, and don't haggle with the salesman or commit to a lead time of 26 weeks.
The Altera IP for MACs is quite nice. I used the 10G MAC recently (for the paper I cited in my other post in fact) and it was a case of feed it a stream of 64 bit payload words (in an Avalon stream that does the flow control, I did this by hand in my Verilog testbench) and out pops Ethernet frames, complete with CRC. In this case I was using an SFP+ transceiver as my PHY, but an external PHY chip should work equally well. It gets a bit trickier in the multi-speed MACs, when you have to talk to the PHY from software, or when you want the MAC to also provide the buffering/interrupts/software interface etc.
Having an external PHY/MAC usually means a bidirectional bus-style interface (address, data, R/-W, clock, etc) which gets annoying at high speed. Or PCIe things likewise. I don't know of any FPGAs that have integrated PHY, though there are SFP+ Direct Attach cables which are essentially PHY-less serial connections between MACs. Likewise SATA, USB3, etc wiring can be so abused.
Sometimes it can be handy if you're doing something different to their constraints. But at the very least keep the physical layer the same, you can still mess with the packet structure etc. You don't want to be debugging the PHY without a decent high speed test infrastructure. And obviously this is only useful if you have control of both ends of the link.
I had data for ECP3 and Igloo 2. Most of the others that Digikey listed are either very small ( In fact it is interesting that the prices get very crowded at the low
One of the interesting things to note is the pricing margin between budget and premium ranges: you pay 4-6x more per LE in a 'premium' family than in a budget range. So if you can build a system with multiple FPGAs (and we described a way to do that in the paper, which fits some applications better than others) then it can be economic to use smaller budget FPGAs rather than buy the premium FPGA.
The alternative theory is that people buy budget FPGAs from Digikey, and buying premium devices requires a long chat with a salesman, so the Digikey list prices for those are mostly fiction that nobody pays. However the same trend seems to apply across all vendors.