Which Altera to buy?

R

rick.c.hodgin 11 years ago

Greetings. I am new to FPGA programming. I am seeking to create a 40-bit 80386-like CPU core with a 32-bit and 64-bit FPU with 16 registers, a 128-bit four- and two-way 32-bit and 64-bit vector FPU engine with 16 registers, 60 additional general purpose integer registers, and a six stage execution pipeline.

I am wondering if somebody can guide me into which Altera product I should use for this CPU design? Thank you in advance.

Best regards, Rick C. Hodgin

Vote

G

glen herrmannsfeldt 11 years ago

Most likely a big one.

If you simplify the system somewhat, maybe only a medium sized one.

-- glen

Vote

R

rickman 11 years ago

There is some interesting software they provide for working with Altera FPGAs called Quartus. It will let you synthesize your designs and measure the size. Then you can tell what size part it will fit. No guesswork required. :)

The same package has a simulator to allow you to do a lot of testing without ever buying a chip or board.

So design your chip, do *lots* of simulating to verify that all the instructions work. Optimize your architecture and then, only then consider which chip you need to buy.

You might want to look for one that has hardware floating point support since you plan to implement floating point. But the size words they implement may not be the size you want so you may need to do that in the fabric anyway.

Rick

Vote

T

Theo Markettos 11 years ago

For a point of comparison, we have a 64-bit MIPS-like CPU core, with MMU, L1/L2 cache, 32-bit floating point support, capability unit (32x 256-bit registers), and a 256 bit datapath to DDR2 memory, and it runs at 100MHz in about 80% of a Stratix IV GX230 (230K LEs). Picture and numbers on page 10 here (the Stratix IV doesn't have hard floating point):

formatting link

This is not particularly optimised for size (or speed), and when you put more things on the FPGA the area usage for the CPU shrinks as the tools work harder. We're trying to make it fit in a Cyclone V SoC part (5CSXFC6D6F31C6N) but haven't yet trimmed it down sufficiently.

The 10 family apparently supports hard floating point: the Stratix 10 is not available yet but the Arria 10 might be worth a look.

The Arria family is also worth looking at from a cost per LE point of view: according to my graph on page 2 here:

formatting link

it works out somewhat cheaper LUT-for-LUT than the Stratix parts.

Theo

Vote

R

rickman 11 years ago

Nice info. You might consider using a log scale for the pricing axis. That would help spread out the low end rather than having all that data bunched into the corner. Maybe even log the size axis too.

Rick

Vote

R

rick.c.hodgin 11 years ago

Thank you for the replies and info.

I will be designing my CPU in several stages. I have the first stage designed but not debugged. Am working on that this week (Lord willing).

Would it be preferable to design and test each in Quartus-II only? What about DRAM controllers? And Ethernet? I plan on using Ethernet for remote debugging during development and testing. And do I want a dev board with VGA out to make it easier? Or should I pass everything through the Ethernet port?

Thank you in advance for your assistance. It is greatly appreciated. :-)

Best regards, Rick C. Hodgin

Vote

R

rickman 11 years ago

I'm not sure what would be easier. I've never worked with Ethernet in an FPGA before. I would think you could get a VGA interface working faster than an Ethernet interface will all the software required. Are you planning to run Linux on it or will you be coding to the bare metal without an OS? Will your Ethernet interface be a full custom or are you going to use an Ethernet module that provides a serial link to your CPU?

Rick

Vote

R

rick.c.hodgin 11 years ago

I found a simple Ethernet controller on fpga4fun:

formatting link

I plan on creating a simple buffer which receives internal pipe stage information at each CPU clock, and then transmits that data back out in real-time to some a port being monitored by my debugger. This will allow me to then constantly monitor the machine state. I can also then encode external source level single-step debugging, assembly tools, make even program changes in real-time, etc., to complete the entire toolset.

I developed my own kernel and primitive OS back in the late 90s, early

00s. I will be using a modified version of that as the ISA I'm using is somewhat different than the actual 80386 ISA in (except in compatibility mode, which I will probably add last).

I'm thinking I would also like to figure out and test timing on a fixed SVGA video mode for a 1920x1080 signal at 60 Hz, and just hard-code that video mode and use it for everything the machine does until I can later add other modes. And the same for a DRAM controller so I can have that consistent and normal access to memory, Ethernet, and VGA throughout all of my development.

Best regards, Rick C. Hodgin

Vote

R

Rick C. Hodgin 11 years ago

I've since given this some additional thought and have decided I'll transmit everything over Ethernet. In this way I can create several virtual screens and simply write to memory ranges and have them be transmitted when possible.

Best regards, Rick C. Hodgin

Vote

J

Jon Elson 11 years ago

Xilinx has similar capabilities. You can desing for an arbitrary family, then it will tell you the smalles device it will actually fit into.

I only use Xilinx, but some research SEEMS to indicate to me that Xilinx devices may be a good deal cheaper than Altera.

All of the rest of Rick's comments are very good, and apply equally to Xilinx.

Jon

Vote

T

Theo Markettos 11 years ago

I did try, but it wasn't usable in the limited space I had for the paper. Here's a larger loglog version:

formatting link

Theo

Vote

G

glen herrmannsfeldt 11 years ago

(snip)

VGA is pretty easy, and there should be already done examples. You need row and column counters, and gates to generate the hsync and vsync. Output data from display RAM, either directly or through a character ROM. It is FPGA outputs, through resistors, and to the VGA pins.

For ethernet, it is usual to put a PHY chip on board, which has the analog circuits that you can't build on an FPGA, and interface to that.

For RS232, the FPGA pins go to level converters to convert to the appropriate voltages, but all the logic (UART) is in the FPGA.

-- glen

Vote

R

Rick C. Hodgin 11 years ago

The ao486 project on opencores.org contains a VGA controller along with other controllers. I figured I may look at those when the time comes. However, from back in my OS driver development days, I remember learning about timings for SVGA. Some of the registers driving the 3dfx voodoo3

2000 video card I was using at that time had to be programmed directly (when no driver was available). I also remember working with the HGA Hercules monochrome graphics adapter timings. I can see how it all fits together now.

I purchased a TI PHY chip today (DP83848C 10/100 model) in a pin package that I believe will allow direct connection to the Altera dev board. I downloaded the protocol specs and it was very straight-forward. I found a better solution earlier from Silicon Labs, but I couldn't find an inexpensive connection board that didn't require something like solder masks ... so I went with TI's already-together product.

For point-to-point inter-LibSF-device communication I think I'll simply invent Libernet, a simple protocol using five pins (tx-clk, tx, rx-clk, rx, and gnd) and make the communication protocol as simple as possible.

I like the idea of Manchester coding, however, and may consider using that as well to keep pin count down at the expense of some logic on the sending and receiving end.

I am absolutely loving this project so far. I think it's my all-time favorite.

Best regards, Rick C. Hodgin

Vote

R

rickman 11 years ago

I can't seem to find it now, but someone recently posted a link to price/LUT vs size data in graph form. It gets a bit crowded at the bottom end, but appears to show there is no real price difference between the two brands. The data does include a very small number of other devices than X and A, but not enough to be useful.

In fact it is interesting that the prices get very crowded at the low end jamming up the graph. I suggested that he present the data with a logarithmic Y axis or even in log-log form. Clearly competitive market forces at work.

Rick

Vote

R

rickman 11 years ago

Why not just use an integrated PHY/MAC and save the trouble of rolling your own? How big is the device you will need?

Please don't roll your own serial interface. You will forever regret it I think. Why add to the huge number of existing interfaces when nearly every need has already been met by one or the other?

I assume it is all just for fun?

Rick

Vote

R

rickman 11 years ago

There you go! Great job. Interesting how the members of a family are more evenly spaced on a log scale for size.

BTW, where did you get your pricing data? One thing I have learned about FPGA pricing is that list price means nothing if you are buying any real quantity. To get a design win, especially in a new family, they will bid very aggressively.

Rick

Vote

R

Rick C. Hodgin 11 years ago

The product I bought:

formatting link

Because it would work in wholly replicable logic within the FPGA, requiring nothing other than connecting up pins.

FWIW, I'm talking about slow device on-motherboard communications here, and potentially some remote debugging communication across multi-CPUs through a single interface.

No. It is to become the foundation of a hardware and software stack I am creating that has roots in the 80386 design, but has been modified to be simpler, and has been extended out to 40-bits.

Best regards, Rick C. Hodgin

Vote

T

Theo Markettos 11 years ago

Pricing data is downloaded from Digikey. There were 17000 prices in all, so to compress the dataset into something meaningful I grouped all the parts of the same number of logical elements and the same subfamily - eg Cyclone IV GX - and took the median of the price of each group. That gives some measure of the middle of the range of different packages, temperatures, speed grades etc without too many of the outliers (eg Mil Spec with crazy prices).

Obviously in any real situation you should have coffee with the salesman and explain how many million you're going to buy but can't afford just yet, but there's no way to do that comparison objectively. Digikey prices aren't ideal, but they are an example of what you would pay if you want one FPGA tomorrow, and don't haggle with the salesman or commit to a lead time of 26 weeks.

Theo

Vote

T

Theo Markettos 11 years ago

The Altera IP for MACs is quite nice. I used the 10G MAC recently (for the paper I cited in my other post in fact) and it was a case of feed it a stream of 64 bit payload words (in an Avalon stream that does the flow control, I did this by hand in my Verilog testbench) and out pops Ethernet frames, complete with CRC. In this case I was using an SFP+ transceiver as my PHY, but an external PHY chip should work equally well. It gets a bit trickier in the multi-speed MACs, when you have to talk to the PHY from software, or when you want the MAC to also provide the buffering/interrupts/software interface etc.

Having an external PHY/MAC usually means a bidirectional bus-style interface (address, data, R/-W, clock, etc) which gets annoying at high speed. Or PCIe things likewise. I don't know of any FPGAs that have integrated PHY, though there are SFP+ Direct Attach cables which are essentially PHY-less serial connections between MACs. Likewise SATA, USB3, etc wiring can be so abused.

Sometimes it can be handy if you're doing something different to their constraints. But at the very least keep the physical layer the same, you can still mess with the packet structure etc. You don't want to be debugging the PHY without a decent high speed test infrastructure. And obviously this is only useful if you have control of both ends of the link.

Theo

Vote

T

Theo Markettos 11 years ago

Graphs again:

formatting link

(page 2)

formatting link

I had data for ECP3 and Igloo 2. Most of the others that Digikey listed are either very small ( In fact it is interesting that the prices get very crowded at the low

One of the interesting things to note is the pricing margin between budget and premium ranges: you pay 4-6x more per LE in a 'premium' family than in a budget range. So if you can build a system with multiple FPGAs (and we described a way to do that in the paper, which fits some applications better than others) then it can be economic to use smaller budget FPGAs rather than buy the premium FPGA.

The alternative theory is that people buy budget FPGAs from Digikey, and buying premium devices requires a long chat with a salesman, so the Digikey list prices for those are mostly fiction that nobody pays. However the same trend seems to apply across all vendors.

Theo

Vote

Which Altera to buy?

Join the Discussion

Didn't find your answer?