Embedded MicroBlaze solution

Hi, ALL!

Recently one of my friends faced very strange problem. He had the MicroBlaze CPU in his design running with 50MHz clock speed. He also had external SDRAM module and his application was executing out of external SDRAM memory. During first few benchmark tests he realized that it takes about 24 clock cycles to access memory :( This means that cool embedded 50MHz MicroBlaze CPU runs slower than poor external

8MHz AVR. After my advice he enabled the cache within MicroBlaze, but application execution speed did not increased significantly.

As he described later, this was one of hand-on samples from EDK. May be the sample is not optimized for performance and very simplified, but net performance of 2MHz processor is not even close to advertised by Xilinx :(

Could any one give any comment on that?

Regards, Vladimir S. Mirgorodsky

Reply to
v_mirgorodsky
Loading thread data ...

Vladmir,

Interesting data point. How much did his performance increase after enabling caches?

First, check to make sure that you have compiler optimization enabled. This does make a hugh difference in optimizing your software code (2-3x in some instances). I would suggest using the latest EDK 7.1 GNU compiler here.

Second, in EDK 7.1 a new MCH_OPB_SDRAM memory controller was released that connects to the Xilinx CacheLink interface of MicroBlaze v4.0. This also greatly improves performance when using caches.

Finally, you may want to use tools like xil_profile to see where the processor is spending a lot of its time. You may be able to improve the performance by enabling hardware features such as multiplier, divider or barrel shifter.

Cheers, Shal> Hi, ALL!

--
------------------------------
Shalin Sheth
Embedded Applications Engineer
General Products Division
Spartan-3 Generation FPGAs
http://www.xilinx.com/spartan3
http://www.xilinx.com/spartan3e
------------------------------
Reply to
Shalin Sheth

There are very few applications that do not benefit significantly from a cache. Even if the data side does not benefit dramatically from cache (like working mostly on large amounts of sequential data), the instruction side generally does. In many benchmarks running from external memory with cache achieves almost the same performance as running from LMB BRAMs.

The first thing I would check is to make sure the cache is configured correctly. Make sure the address space of the cache includes the external memory you are caching (C_ICACHE_BASEADDR, C_ICACHE_HIGHADDR, C_DCACHE_BASEADDR, C_DCACHE_HIGHADDR). Make sure caches are turned on (C_USE_ICACHE, C_USE_DCACHE).

The most common missing item is not turning caches on in software.

microblaze_enable_icache(); microblaze_enable_dcache();

If you are using xmd and downloading multiple programs that turn on cache, you should wipe the cache before enabling the cache otherwise you could have cache data from the previous program.

microblaze_init_icache_range(0, ICACHE_SIZE); microblaze_init_dcache_range(0, DCACHE_SIZE);

As Shalin referred to, in MicroBlaze v3.00a and higher the Xilinx Cache Link (XCL) interface adds an optional dedicated cache interface and uses

4 word cache lines to improve performance. You may want to look into using XCL and mch_opb_sdram. The mch_opb_sdram is new in EDK 7.1.

This e-mail is my own > Hi, ALL!

Reply to
Benjamin J. Stassart

Is this some sort of FAQ reply for people who want more speed from a MicroBlaze? I've never used a MicroBlaze or Xilinx (I use Nios II on Altera chips), but it looks like you almost completely missed the OP's point - he is not (yet !) interested in the quality of code generated by the compiler, but is suffering from 24-cycle memory reads on the SDRAM. This is most likely a problem with the SDRAM controller or its setup. Perhaps you are getting a full bank + row + column select for every access, although even then 24 cycles is way too long. I don't know what sort of tools Xilinx has, and how they compare to Altera's SOPC Builder, but when I had trouble with my SDRAM (it took 2 cycles per access instead of 1, during bursts), I tested with a simple setup of a Nios II running from internal FPGA memory, a DMA component (to easily generate burst sequences), and the SDRAM controller. Using the debugger, I manually set the DMA to burst read or write and used SignalTap (ChipScope on Xilinx?) to view what was happening. That way you are simplifying things as much as possible to concentrate on the specific problem.

Reply to
David

Hi,

The SDRAM controller doesn't need 24 clock cycle for a single access. It's more around 12 clock cycles. But it seems that both the instruction and data interface on microblaze is connected to the same memory controller and that no internal memory is used. So for a load instruction to execute, it will require two 12-clock cycles accesses. Store is done a few cycles faster and instruction that doesn't access memory should be 12 clock cycles.

Using LMB will reduce instruction fetches to 1 clock cycles and data accesses to

2 clock cycles. That is the same latency as for cache hits.

It seems unusual that the usage of caches doesn't improve the performance. It's to my knowledge always a big improvement compared to running from external memories specially SDRAM or DDR.

Fast SRAM will have much less latency.

In order to get cacheline burst access, the MCH_OPB_SDRAM controller should be used. It will do read and write burstlines both for instruction and data cache misses.

Göran Bilski

David wrote:

Reply to
Göran Bilski

Probably a whole lot easier just to run an RTL sim, and see the bus activity.

Cheers, Jon

Reply to
Jon Beniston

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.