Have a simple application (without OS) which samples and stores some register values within a while loop. It's running on a Coldfire MCF5475EVB.
If the application is loaded from the Logic Loader via the 'load elf' command (ie. into RAM), then the approx time for each sample is about 400ns.
If the same application code (built with a different compiler) is loaded into boot flash memory then the approx time for each sample is closer to
1000ns.The RAM version of the app is build using cygwin and gcc (from LogicPD). The Flash version of the app is built using Codewarrior Special Edition. Both have optimisations turned on. The reason for using 2 tools is I've been unable to craft a suitable linker control file for flash memory and gcc (issue with getting vectors.S into flash), whereas Codewarrior generated it's own (which I needed to edit). Unfortunately, I cannot build the app for RAM in Codewarrior as it complains about license conditions (based on some code/RAM usage limits). Why I can build for flash and not meet those limits is another issue.
The start-up code is very similar and I've been unable to detect any obvious functional differences between them. Once main() is called, all application source code is identical. For both builds the start-up code copies the vector table to RAM, along with initialised data from ROM, and then zeroes uninitialised data.
For both builds I have instruction and branch caches enabled (not data, as I've been unable to exclude the external device registers from the cached area).
The thing which I've wondered about is whether having the instructions in flash is going to slow it - however, I have the instruction cache enabled and the loop is very simple (only a few lines of C).
Any suggestions as to why it's slower, and how I might solve it?
Thanks
David