Implementing Multi-Processor Systems in FPGAs

I finally got around to watching the "Implementing Multi-Processor Systems in FPGAs" TechOnLine Webcast:

formatting link
formatting link

Based on this seminar and my own imagination, I can envision quite a lot of potential usage models. However, I was wondering to what level folks are _actually_ using multiple soft-core processors in an FPGA for their commercial, academic, and/or personal projects right now.

What is the overall architecture--how many processors are used? How do the processors coordinate their activities? How is data processing distributed across them? Is the code/data stored in on-chip memory or externally?

Thanks. Paul

Reply to
Paul Hartke
Loading thread data ...

I have one of the Stratix II development kit with the EP2S60. Just out of curosity I wanted to see if how many processors I could fit on a device like that. I was able to add sixteen of the fast processors. I gave each a little on chip RAM and ROM to bootstrap with. And I implemented a SDRAM controller. That used up approximatel 78 % of the device. The discouraging factor from my little experiment was that SOPC Builder became almost impossibly slow. SOPC Builder is a JAVA application. It spent a lot of time looking for conflicts.

When I have the time my next little experiment is to create a working dual or quad processor. My long term goal is create a single chip, multiprocessor, 3d - graphics system with a similar api as OpenGL.

Without having researched or done it yet I would have to say that the system controller and memory is implementation dependent on the designer.

Derek

Reply to
DerekSimmons

out

SOPC

While the very largest FPGAs look like they could hold perhaps 1 cpu per BlockRam (perhaps even upto 100 or more), I think the middle size part will be a better fit, price closer to the min possible per BlockRam in vol, more BlockRams per 1KLuts, more total system IOs etc and more place to throw off heat and closer to higher vol price.

The other question is what type of cpu arch to use, answer seems obvious to me, one that supports concurrency right in its architecture rather than foisting it on top of something with no idea what a process is. So far only the Transputer has shown how easy it is to put together

100s of cpus and how to program them. Ofcourse in an FPGA it would have to be a modern register style ld/st RISC.

So what are you doing with your 160TP array and how would the perf compare with the same app running on 1 2-3 GHz x86, and did you check out the other NG?

regards

johnjakson at usa dot com

Reply to
JJ

For me to answer your question let me tell you a little bit about myself. In the fall of 1987 I entered college at RIT. I was exposed to a lot of new computer hardware. Growing up I was exposed to computers designed for data processing. I bought a Commodore Amiga to do my school work on and it turned out to be an excellent choice because it allowed me to work files from IBM PC and Apple Macintosh environments. Remember at this time IBM's were still primarily CGA (4 colors - cyan, white, magenta and black) and Macintosh's were black and white. Commodore Amiga had a quasi-12-bit color mode called HAM. For recreation one of the first freeware applications I discovered raytracers. The Commodore Amiga was a 16/32-bit MC68000 at about 14 Mhz (IBMs were 16, 20, 25 and Mac was 8 Mhz). In some of my free time between classes I spent time at the library researching different ways to accelerate raytracing. The first and most obvious way was to buy an accelerator or co-processor card with a faster processor and floating point co-processor. I think it was in byte magazine I saw an article on Transputers and I had read articles on transputer products being developed for the Amiga. I saved my money while waited for the products to be completed but eventually the projects were canceled. Late one winter with the money saved I bought a CSA Education Kit. I could compile and run transputer applications on an IBM bridge card and the copy them to the Amiga file system and view them from the Workbench desktop. I also made it a habit of visiting Rochester's surplus shops and through dump luck I found a factory tray of eight T800s. The guy who ran the shop didn't know what they were, seeing that they were gold told me he would have to charge me a premium for them, $10. Using a Vector prototyping board I connected the eight processors to the CSA card. I just wired them up so that they could properly reset. I didn't have money buy any memory so I just used the on chip ram. I could implement a very small raytracer and when I out grew the memory of one processor I would pair them up. Eventually I had a tightly coupled processor made up of an arrangement of 8 transputers in a cube topology. I think it was about a year later I was a HAM radio flea market found my next upgrade. This guy and his son brought a real truck load of junk. I remember him have bar code scanners, data entry pads, and parts of old telephone system. One of things I found was a black PC expansion case. The front was ripped off, on the back I could see the rows of 37 pin connectors and through the vents I could see the tops of gold chips. I asked him how much it was. He told me it was marked and came over and found the price for me. He charged me $20 for it. The friend with asked me what I bought and I told him I'm not sure but I'll show you. We took it back to the car where I removed the top. Inside where 5 CSA 4 transputer boards, a crossbar board, an INMOS B008 with the graphics TRAM and who ever had it had tucked the cable for the graphics TRAM inside. My transputer setup had moved from the Amiga to a dedicated Everex Step 386/33 Mhz. My raytracer evolved into a hypercube and I was able to let the main rendering routine recurse more or I added on more features. As time went on, the topology evolved into a sophisticated pipe line. A few years after graduating from college I started buying them through eBay. My system is split between an industrial PC, the old black PC expansion case and a VME cabinet. The last time I spent anytime doing anything with I was having problems with the worm program that maps the network. I could determine if the network had gotten so big it was timing out before it had finished discover the network or if there was a hardware failure. I do follow the other news group (comp.sys.transputer). I haven't compared it to a modern PC, currently it I have a PIII 500 Mhz laptop and dual 733 Mhz desktop. But it would require a rewrite to take advantage of the PC threading architecture.

I bought the NIOS II Development kit because I liked the development tools and I can see the potential for doing the same kind of things that I have done with transputers. I bought the kit and a Lancelot video adaptor. I plan on developing a 3D graphics core for it with a similar api to OpenGL with intentions of making it into a commercial product. With the Stratix II development board, I see the SDRAM as the biggest bottle neck. I have sketched out an elaborate buffering system that should alleviate this. I would also like to be able to configure the resolution and color depth from software. When I roll it over as a core the wizard would give the engineer the option of letting it be programmable with default values or hard code the settings.

I have been poking around the couple of days about and have found a couple of post about engineers implementing multi-processor systems. I would say have of them sounded like student projects. If anybody has implemented multi-processors systems I would like to hear about their experiences and any after thoughts from the experience. Since a lot of this is still new to me, I'm still at the steep part of the learning curve, I would appreciate if anybody has any projects that they can share with me.

Derek

Reply to
DerekSimmons

Wierd story, been along time since I went to junk stores, l used to buy Plasma display tubes, TTL & cmos rams 20yrs ago but after getting into VLSI (at Inmos on the Transputer) never actually built anything outside the chip. But FPGAs allow an old VLSI guy without his own fab to do something only a company with a Fab could do 5-10yrs ago.

I contemplated trying to turn MicroBlaze and perhaps Nios into Transputer replacements by adding on extra HW but came away thinking it would be better to start over with sail set in the right direction day

  1. The benchmarks posted in the "NiosII Vs MicroBlaze thread" for Leon, MicroBlaze & Opencores 1200 would seem to justify my pt but I am not complete yet.

Good luck with your MPP endevours too!

regards

johnjakson at usa dot com

The Transputer Will be back (T2 movie)

Reply to
JJ

speaking of transputers, does enough documentation exist to accurately reproduce them in a fpga?

Reply to
Ziggy

Yes sort of, see see the comp.sys.transputer NG

FPGA thread status (Rams post), at the last Wotug conf Tanaka etc reported on a 24MHz near complete T425 clone cycle similar design, no timer though, no FPU ofcourse. Its their 1st step to understanding a new direction to build TP style design. I decided to skip this step, Occam capable cpus don't need to look like the old stack design and shouldn't for perf reasons.

formatting link

Interesting read anyway. A few months ago on another TP thread, another student said he would do the same thing, but reverse engineering takes alot of resources that Tanaka had at his Uni.

regards

johnjakson at usa dot com

Reply to
JJ

Thanks, ill have to keep an eye out there.. I have an old Buchsbaum book that discusses 'current' technology CPU's ( current when i bought the thing ) and it also discussed the Transputer T800, but it never did seem to have enough detail to recreate it..

Back then i was going to do it in a 8051 ( yes, i know about speed issues ) since FPGAs really didnt exist yet...

Reply to
Ziggy

Seems like you like Transputers as well. In my master thesis I built transputer boards and liked the hardware. Occam was a bit weird but functional for the purpose.

If you liked the transputer links, you will also like the MicroBlaze FSL connections. It will allow you to built the same kind of systems but with higher bandwidth. The FSL are 32-bit wide compared to 4-bit on the transputer link.

Göran Bilski

snipped-for-privacy@Fr> For me to answer your question let me tell you a little bit about

Reply to
Göran Bilski

I've been involved in a tool for software development for just thes sort of architectures. It turns out that the hardware design o Transputer-style connected multiprocessors is relatively simple, bu the software development can be a challenge. The good news is tha with all on-chip communication, you can exploit parallelism tha board-level multiprocessors can't. Have a look a

formatting link
for more info. (This is aimed at ASIC folks but FPGAs work, too. In fact, NIOS I, NIOS II and microblaze model are already available)

-- Stev

-- 3/3/0

Reply to
Steven_Guccione

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.