PowerPC soft-core?

Question

In XCELL issue 52, page 19 Xilinx claims that:

V4-PowerPC reduces power 10:1 compared to FPGA Fabric built version (of PowerPC)

but that means Xilinx has internally a PowerPC Soft Core IP? If they dont then could not measure the power difference :) I wonder why there is no information about the xilinx soft-core PowerPC at all?

Antti

Jan Gray · Accepted Answer

There's a big difference between a synthesizable ASIC CPU model and an FPGA optimized CPU model. Back in 2000 I did a design study on implementing the PPC instruction set architecture. Depending upon specific hard-wired or software-emulated feature sets, and small-or-fast settings, an integer PPC subset requires between 1000-2000 LUTs and today would run at most of the speed of current FPGA optimized soft CPU cores.

Jan Gray

Ed McGettigan · Answer

Yes, we have an soft version of the PowerPC 405. We used this extensively in the development of the Virtex-II Pro family in order to create and verify IP blocks, development system tools, port software and to provide early access system boards to external 3rd party developers. We were able to do this through our contract with IBM and a lot of work within Xilinx.

No, there are no plans to release this as we do not have the rights to do so and the size, speed and power would make it unattractive to nearly everyone.

The V-4 PowerPC 405 in comparison displaces only 672 slices, consumes 0.29mW/DMIP (0.44mW/MHz), runs up to 450 MHz and places and routes in less then a second. Just try to get a soft processor to match that. :)

Ed

Jan Gray · Answer

I think the PPC cores are a nice feature and are well executed. That said,

672 displaced slices are sufficient to hold two (or three austere) 32-bit pipelined RISC soft cores (requiring say 1 BRAM each), each running at ~1/3 of the PPC freq. So, for some applications (e.g. small memory footprint code and data 'controllers' that fit in a BRAM), the hard core is not a big (order of magnitude) win on MIPS/area. Can't 'speak to power' -- the hard processor core is surely much lower power. Properly RPM'd, a compact soft processor core will PAR in neglible time.

Certainly the PPC core(s) are vastly more attractive targets for COTS software tools and OSs and infrastructure (docs, developer expertise, ...).

See also

formatting link

"... this counterintuitive rule of thumb: one streamlined 32-bit soft CPU core optimized for programmable logic might need only half the silicon area of an elaborate 32-bit hard CPU core!"

Jan Gray

Antti Lukats · Answer

"Jan Gray"  schrieb im Newsbeitragnews:%wY%d.1173$said,~1/3big...).anLOL, mercy mercy, :)MicroBlaze is is defenetly more than 672/2 slices !but I think I agree that the rule of thumb is OK!btw Jan I guess you are one of the few who could correctlyanswer the following FPGA-Quiz question:How many slices are needed to implement frequency divider by 2^37 ?ANSWER: Number of Slices:                       3  out of   1408     0% Number of Slice Flip Flops:             2  out of   2816     0% Number of 4 input LUTs:                 6  out of   2816     0% Number of bonded IOBs:                  1  out of    140     0% Number of GCLKs:                        1  out of     16     6%the above is synthesis report for divide by 2^n, n=21..37P&R shows 3 slices for V2Pro or 4 slices for S3Antti

Thomas Entner · Answer

Hi Antti,You could use ERIC5... but it does not really compare to a PowerPC ;-)This makes me curious: Is there other stuff like BRAMinvolved? Otherwise you HAVE to tell us how you do that (I would simplyclaim that this is not possible...)Thomas

Thomas Entner · Answer

Hi Antti,as a long time Altera-user I just remembered that the Xilinx-slices supportdistributed RAM and stuff... I suppose you take advantage of that. Still very impressive!Thomas

Sylvain Munaut · Answer

Using SRL16 ?you can in one lut have the output set to 1 every 16 clock I think 2^4.So using that as Clock Enable to another you can have 2^(4*n) with nthe number of LUT so that would make 2^(4*6) = 2^24 ... damn, not enough.	SylvainThomas Entner wrote:

Jan Gray · Answer

Nice one!  The secret lies in understanding1. Peter Alfke's appnote [2. "On Arbitrary Cycle n-Bit LFSRs" [3. and the LFSR generator in [Jan Gray

Antti Lukats · Answer

"Jan Gray"  schrieb im Newsbeitragnews:aT40e.1546$Hm... the LFSR is a nice try! Well my bet is that LFSR based approuchwould use at least 1 more slice (possible 2 more slices)...there is the actual solution :)Antti

Antti Lukats · Answer

"Thomas Entner"  schrieb im Newsbeitragnews:42407bd6$0$28872$dont claim things not possible!there is no BRAM involved and no DSP48 eithera divider can be implemented with 0 slices when using BRAMs :)

Thomas Entner · Answer

Great stuff!I forgot about the distributed RAM and SRL16 when I made my claim ;-) but at least I realized that myself... I sent a second post, but it did not appear in my Outlook Express which I use as news-reader. However, if you look at the post of Sylvain, at the very end, you can see that my post as arrived at least somewhere. Very strange... Do others see the same behavior of lost posts?Thomas

PowerPC soft-core?

Join the Discussion

Didn't find your answer?