We're trying to use a MCF5475 for some high speed data logging, so we tried doing a bit of a benchmark using a simple app and an oscilloscope and we're getting performance signficantly below that which we were expecting for a 266MHz/410MIPs processor.
As a basic test I wrote a simple application which basically was just a simple loop (important bits detailed below):
typedef struct { uint32 value; unsigned char status; }test_struct;
spy_struct source; spy_struct destination;
while (1) { for (temp_loop = 0; temp_loop < 1000; temp_loop++) { memcpy(&destination, &source, sizeof(test_struct)); memcpy(&source, &source, sizeof(test_struct)); }
// Set output to match (high) MCF_GPIO_PODR_DSPI |= MCF_GPIO_PODR_DSPI_PODR_DSPI2;
// Set output to match (low) MCF_GPIO_PODR_DSPI &= ~MCF_GPIO_PODR_DSPI_PODR_DSPI2; }
Basically we're looping 1000 times, each time copying about 5 bytes of memory using memcpy (provided from Freescale sample code). At the end of that loop we set some GPIO pins high and then low again and repeat the loop. We then use the oscilloscope to measure the time it takes between each GPIO toggle.
We're seeing that it's taking:
a.) 1.5ms to do the whole process if we don't have any memcpy in the loop (just an empty for loop). b.) 15ms to do the whole process if we have 1 memcpy in there c.) 28ms to do the whole process with 2 memcpys in there.
Using a debugger, it appears that one cycle of the loop with a single memcpy takes about 60 instructions.
The difference between the empty loop and the 1 memcpy loop is about
13.5ms (for 1000 iterations). So that's 13.5us for 60 instructions which works out to be 4,440,000 instructions per second - 4MIPs.Any idea of the factor of 100 difference between the value in the specs (410MIPs) and our example. I realise that each benchmark is different - but an order of 100?
Thanks