Speed acceleration !!!

Hello

I am using a Virtex2Pro board and lately I was trying to use the PowerPC at the highest speed (300MHz) on my board. I have a function which uses a lot of floating point instructions for calculating the log, sine, cosine and such stuff. When I ran this program on the PowerPC it took almost 2 minutes to perform 1000 iterations at 100MHz. Then we wanted the code to run a little more faster and so we implemented the same design at 300MHz. Even if we didn't expect a three fold increase in speed, there was only an improvement of a couple of seconds. Can somebody tell me the reason.

Well about the memory that I was using, I used a OCM interface with 32kB for Instruction and 8kB for data. Cache memory was given to the PLB RAM(16kB). There was infact no speed difference with and without the cache memories !! Can somebody please help.

When I selected 300MHz for the processor, the maximum bus freq that I could select was only 100MHz. But this is only for the PLB or is it also for the OCM interface. (I always uses the wizard to build a new system, as I am quite new to tis stuff)

Thank you very much Joey

Reply to
Johnsons. Joe
Loading thread data ...

"Johnsons. Joe" schrieb im Newsbeitrag news:d4atft$44q$ snipped-for-privacy@news.uni-kl.de...

at

minutes

if

for

!!

Sounds like the algorithm uses heavyly IO operation (external RAM access etc.). Since the bus speed is the same, same performance. Try to convert your algortihm to use PPC registers, the benefit from increased clock speed. Amybe its just a compiler option.

MfG Falk

Reply to
Falk Brunner

Hi,

If you executing from OCM then you would not see the speed up since OCM accesses is BEFORE the caches.

If you have high cache rates, place the memory on PLB instead since this would make PPC to execute from the caches which scales with the CPU frequency.

Göran

Reply to
Göran Bilski

The 405 used does not have floating point in hardware! (unless Xilinx 405 is something extra and does...) But the PPC instruction set always support them, in this case by taking exception (or never compile to them and use library routines instead)

Each log, sine and cosine takes lots of floating point operations...

Would you like to do better than that?

If your input has limited range (integers) you can

1) precompute look up tables for each possible input value, next step could be to move the look up tables to the FPGA... 2) do your math in fixed point

/RogerL

Reply to
roger.larsson

Correct,

The 405 core we use does not have a FPU.

There are FPU cores available that can be used with the new V4 APU, or with the older V2 Pro 405 PPC through the bus.

The new FPU in hardware + APU in V4 offers a roughly 80X improvement over the software FPU alone.

Something to seriously consider if you have FPU intensive work to do.

The new APU interface allows for single cycle multiple word transfers to/from the CPU.

Otherwise, you may use the soft FPU that replaces FPU instructions with subroutine calls to code.

Austin

snipped-for-privacy@norran.net wrote:

Reply to
austin

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.