Application loaded from flash runs slower (Coldfire MCF5475)

Have a simple application (without OS) which samples and stores some register values within a while loop. It's running on a Coldfire MCF5475EVB.

If the application is loaded from the Logic Loader via the 'load elf' command (ie. into RAM), then the approx time for each sample is about 400ns.

If the same application code (built with a different compiler) is loaded into boot flash memory then the approx time for each sample is closer to

1000ns.

The RAM version of the app is build using cygwin and gcc (from LogicPD). The Flash version of the app is built using Codewarrior Special Edition. Both have optimisations turned on. The reason for using 2 tools is I've been unable to craft a suitable linker control file for flash memory and gcc (issue with getting vectors.S into flash), whereas Codewarrior generated it's own (which I needed to edit). Unfortunately, I cannot build the app for RAM in Codewarrior as it complains about license conditions (based on some code/RAM usage limits). Why I can build for flash and not meet those limits is another issue.

The start-up code is very similar and I've been unable to detect any obvious functional differences between them. Once main() is called, all application source code is identical. For both builds the start-up code copies the vector table to RAM, along with initialised data from ROM, and then zeroes uninitialised data.

For both builds I have instruction and branch caches enabled (not data, as I've been unable to exclude the external device registers from the cached area).

The thing which I've wondered about is whether having the instructions in flash is going to slow it - however, I have the instruction cache enabled and the loop is very simple (only a few lines of C).

Any suggestions as to why it's slower, and how I might solve it?

Thanks

David

Reply to
David Hearn
Loading thread data ...

Without looking into the particular details of that processor, that is to be expected. The FLASH access time is (generally) much larger than the RAM access time.

(a) Enable any feature your CPU may have to improve performance: cache / pipelining / instruction-prefetch / etc.

(b) Identify the critical sections (timewise) in your code and run those from RAM. That means that code sections should be linked to be loaded at a certain address, (in FLASH,) and then copied to a different address, (in RAM,) and run from there. Some toolchains will make this a simple task, other will fight you all the way...

For a similar discussion, search the Google archives for "Performance and Flash Pipelining on TI 28F12 DSPs" in comp.dsp

Roberto Waltman

Reply to
Roberto Waltman

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.