I'm looking for a Cortex-A class processor that has reasonably quick parallel I/O that might be hooked up to an FPGA. I'm aware of the existing Zynq and Altera SoC FPGAs, but looking for something different.
By 'parallel I/O' I mean ideally a memory interface - either bidirectional eg 64-bits or separate 32-bit tx and 32-bit rx. GPIO with data valid signals/strobes is another possibility. By 'quick' I mean hundreds of MHz, ideally with low latency.
I know there are things like the TI PRU in eg the OMAP family, but they seem to have a limited number of pins (16 bit tx/rx).
I think you are not going to find anything other than the memory interface. What's wrong with that? I assume you are referring to a processor that runs in the GHz range, but even then it would be hard for it to push data out of parallel I/O at "hundreds of MHz". Since this would be a very atypical use of parallel I/Os, I can't imagine a chip maker who would put the I/O pins on the fast local bus. Rather they are typically connected through a slower bus for the peripherals.
Can I ask why you don't want to use high speed serial I/Os (which are intended for this) or a combined FPGA/CPU chip? Is using the memory interface too obvious or is there a reason to not use that?
I don't have a problem with using a memory interface, just would rather avoid something like PCIe - I don't have a transceiver interface to receive it. I just wasn't aware of anything that exported a 'simple' high speed memory interface.
The reason for not using a combined FPGA/CPU chip is that I'll be using a dev board rather than making my own board (buying serious FPGAs in small quantities isn't fun and I'd rather not have to do the DDR3/etc layout). On the other side of my bridge CPLD/FPGA is a 1.5V parallel interface: most ARM FPGA dev boards hardwire their pins to something other than 1.5v, so I can't simply use the FPGA on the combined chip. I'm also physically constrained which rules out a lot of dev boards.
Have you looked at data sheets? The last time I looked it seemed like there were ones out there that came with built-in SDRAM interfaces. That may not be ideal ('cuz you'd have to make your FPGA that pretend to be SDRAM), but it should support the bandwidth you'd want.
I have to admit I didn't look hard -- I wanted a microcontroller with a Cortex A core; I basically stopped looking when I saw that I'd need to deal with external memory and whatnot if I wanted that core.
Hi Tim, not having fpga experience (though I have designed plenty of logic, used cpld-s, written a logic compiler for cpld-s and another for GAL-s back in the day etc.) I wonder if it would not be easier to use PCIe than pretend to be DDRAM. I have seen fpga-s (soon I may have to use one with DDR etc. if I want to have a display controller which I seem to do...) advertising PCI-e which looks somehow "readily available" to the customer, perhaps this could be a way? I have also seen them having DDR but it seems to be the wrong way around for this task (not for mine with the display though).
I did think that, and should have commented. One of the features of life in a world with big FPGAs is that you can just go out and buy IP to do things like talk to SDRAM or PCI. The last time that I was involved in such a project the PCI in question was "plain old", but there has to be PCI-e cores out there for sale.
"Reverse" SDRAM would be more rare, but the actual core should be simpler (and answers the OP's question more closely).
Were it me, I'd certainly leave PCI-e on the table until I'd done some design studies.
PCIe is relatively easy to use, but does involve a much more heavyweight stack at both ends - needs a full PCIe root complex and an FPGA that supports high speed transceivers. Everything ends up much larger/more expensive - both for the FPGA and for the CPU. Many ARM application processor SoCs don't have PCIe, for instance, which means you end up having an Atom-class SoC or larger.
I hadn't thought about pretending to be SDRAM - will think about that. Which CPU did you find, Tim? I suspect finding a dev board that exposes SDRAM pins but wires DDR3 internally is going to be tricky.
I have used and programmed PCI a few times and assuming that PCIe is very similar from a programmers point of view I'd say you can do it without too much pain. If your two sides will be aware of each other (i.e. not discover what is there on the bus, things allowed to do, set address range etc.) you can do it quite easily, you will only need the bus handshakes. And if you restrict them to a known size (say, 32 bit) it becomes even easier.
Should be doable, they do put 2 DIMMS per board after all (still so?). But you are likely to get yourself into a nightmare of problems, trial and error etc. - whereas the PCIe link will just work unless you abuse it too much (not so long ago I switched from ATA to SATA - well, the SATA part were just 4 signal wires and connecting them in a decent way it just worked for me).
You might want to look at the Freescale (now NXP) QorIQ parts, like the t1040 or the t1042, large and not really cheap things but I think they had smaller and cheaper there, too.
What I will probably do one of these days is a display controller doing up to and included 4k video. Just the bitmap -> serial stream, hdmi and probably raw display module lvds. From what I have seen on the first page of the datasheets if not even before that there are fpgas which have DDR, PCIe and even lvds drivers (but I think the latter are not fast enough, yet to check on that). The idea is the DDR to hold the display bitmap, the processor to write it via PCIe (will also read it but rarely if at all). No fancy stuff with maintaining windows etc. involved, the processors nowadays are fast enough to do all that (and I am not making a
3d gaming console). But let me see when (if...) I'll manage to get around to that....
PCIe is completely different at the electrical level: you need 2.5G, 5G or
8Gbps high speed serial transceivers. The lower tiers of FPGAs don't support such transceivers, so you're into the hundreds of dollars territory already. Then these fancy FPGAs only come in 500 pin BGAs... because they're fancy. That means you can't hand solder them, you have to contract that out. Then the 500 pin BGAs need an 8-layer PCB stack to get the ball escapes. And so on - the costs and complexity keep rising.
I was kinda hoping to do this with an off-the-shelf SOM and a $10 CPLD/bargain bucket FPGA on a 4 layer board...
It's not quite that simple.
It turns out that NAND or NOR flash interfaces are a better bet than SDRAM because a lot more of the protocol stuff is left to software, while on SDRAM you have to dodge the controller helpfully reordering accesses and inserting refresh cycles. However a lot of SOMs don't export them, and when they do it's only 8 or 16 bit. 16 bits at ~100MHz is something, I suppose.
The retransmission masks a vast multitude of sins... (people don't realise how terrible their $1 SATA cables are until you start measuring their packet loss and crosstalk)
Wow, that's a family where marketing really need to get a grip - 32 bit, 64 bit, Power, ARM11, Cortex A, low power, baseband, server - throw them all into the same brand so the customer is thoroughly confused. I'm not sure I see anything distinctive there, but I'm mostly baffled by how they've organised them.
I don't know what to tell you. I would contact the makers of the Zedboard of one of the equivalent Altera based parts and ask about setting I/O voltage of 1.5 volts. The Zedboard seems to have provision to set the voltage on at least two banks of I/Os although 1.5 volts is not indicated. I expect a simple part change will get you 1.5 in place of 1.8 volts. Call the maker... I think this problem will be easier than getting the throughput between two boards that you need.
That shouldn't be true. I know some time back Lattice came out with low end FPGAs with SERDES and I thought X and A had to follow suit.
It has been a long time since you could hand solder any FPGA other than possibly the 144 pin TQFP which is a pretty large package.
Check out the Lattice FPGAs. If you are designing your own FPGA board, why do you care if the FPGA is $10 or $50? That cost will be swamped by the cost of making a board.
Flash interfaces don't run at 100's of MHz.
Motorola (now Freescale... I mean NXP) has always been terrible at making the differences and similarities clear in their processor product lines. I think that partly comes from targeting the really large customers where they get *lots* of support to explain just what parts will suit their needs.
I couldn't find where you asked, but here's the processor I found:
I'm absolutely not saying "use this one" -- it's just the first one I found, and it had a conventional memory bus. I suspect that if you look around you'll find something better.
I looked at the general-purpose memory interface and it doesn't look fast enough for you -- it's calling out 100MHz or 50MHz clock on a 16-bit wide bus, and I'm not sure when you can drive it at 100MHz.
DDR, OTOH, will go up to a 200MHz clock (with 400MHz data rate) -- that's why I was suggesting it, if things are simple enough on the FPGA side.
Control systems, embedded software and circuit design
]> > I'm looking for a Cortex-A class processor that has reasonably quick ]> > parallel I/O that might be hooked up to an FPGA. I'm aware of the ]> > existing Zynq and Altera SoC FPGAs, but looking for something different.
The TI part has a DRAM and a separate 2nd memory port with 7 chip selects. A lot better than a single memory port: DDR timing much different from FPGA IO port capabilities. With 7 chip selects one can have distinct FPGA read and write pins with tri-states on the read pins. Again, helps with timing in my experience.
According to the literature the xilinx tools completely handle the internal interfaces within the Zynq part and Vivado/ISE gives you your timing pass/fail. I'd go with a SOC FPGA in a minute, given the timing troubles we had with a distinct ARM chip.