Application loaded from flash runs slower (Coldfire MCF5475)

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Have a simple application (without OS) which samples and stores some
register values within a while loop.  It's running on a Coldfire MCF5475EVB.

If the application is loaded from the Logic Loader via the 'load elf'
command (ie. into RAM), then the approx time for each sample is about 400ns.

If the same application code (built with a different compiler) is loaded
into boot flash memory then the approx time for each sample is closer to

The RAM version of the app is build using cygwin and gcc (from LogicPD).
  The Flash version of the app is built using Codewarrior Special
Edition.   Both have optimisations turned on.  The reason for using 2
tools is I've been unable to craft a suitable linker control file for
flash memory and gcc (issue with getting vectors.S into flash), whereas
Codewarrior generated it's own (which I needed to edit).  Unfortunately,
I cannot build the app for RAM in Codewarrior as it complains about
license conditions (based on some code/RAM usage limits).  Why I can
build for flash and not meet those limits is another issue.

The start-up code is very similar and I've been unable to detect any
obvious functional differences between them.  Once main() is called, all
application source code is identical.  For both builds the start-up code
copies the vector table to RAM, along with initialised data from ROM,
and then zeroes uninitialised data.

For both builds I have instruction and branch caches enabled (not data,
as I've been unable to exclude the external device registers from the
cached area).

The thing which I've wondered about is whether having the instructions
in flash is going to slow it - however, I have the instruction cache
enabled and the loop is very simple (only a few lines of C).

Any suggestions as to why it's slower, and how I might solve it?



Re: Application loaded from flash runs slower (Coldfire MCF5475)

Quoted text here. Click to load it

Without looking into the particular details of that processor, that is
to be expected. The FLASH access time is (generally) much larger than
the RAM access time.

Quoted text here. Click to load it

(a) Enable any feature your CPU may have to improve performance: cache
/ pipelining / instruction-prefetch / etc.

(b) Identify the critical sections (timewise) in your code and run
those from RAM.  That means that code sections should be linked to be
loaded at a certain address, (in FLASH,) and then copied to a
different address, (in RAM,) and run from there.
Some toolchains will make this a simple task, other will fight you all
the way...

For a similar discussion, search the Google archives for "Performance
and Flash Pipelining on TI 28F12 DSPs" in comp.dsp

Roberto Waltman

Site Timeline