Code size reduction migrating from PIC18 to Cortex M0

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Jun 23, 2012 10:15 AM

As I say, we don't know what toolchain package the poster here was using, but there certainly are gcc-based toolchain packages available with gcc that handle this fine. We use Code Sourcery for a couple of different processors - they package gcc along with libraries, debugger support, and Eclipse to give similar ease-of-use. Although Code Sourcery is that package I am most familiar with, I know that others such as Code Red are similar. I don't know what the poster uses that makes it apparently so hard to get it right.

Personally, I prefer to use makefiles and explicit compiler flags (or pragmas / function attributes as needed). I think that gives better control, more replicable results, and is more suitable for re-use on different projects, different development hosts, and different tool versions. But that's a matter of taste - and I don't recommend it as the first step for someone unfamiliar with the tools.

gcc has very extensive static error checking and warning mechanisms, and they've been getting better with each version. It doesn't have MISRA rule checking, which I believe EW has, but otherwise it is top-class. Of course, you have to enable the warnings!

Here is the key point that makes EW worth the money for /you/ - you prefer it. When choosing tools and judging value for money, questions of code generation quality are normally secondary to what the developer finds most productive - developer time outweighs tool costs.

Indeed - people trying to "hand-optimise" their code often miss out details like that. ("I'll use a 16-bit variable instead of a 32-bit variable to save memory space...".)

If you can use section anchors (like above), or a "small data section" (as used by the PPC ABI, though not the ARM, for some reason), then you can avoid most of the individual storage and loads of addresses.

Learning to develop Linux programs is a lot more than just learning gcc, as you've found out.

One gets used to one's tools. I've been using gcc for embedded development for some 15 years, and have used it on perhaps 8 different processor architectures. So for me, gcc is always the obvious choice for new devices, since I am most familiar with it.

I actually think there is a fair similarity between modern CodeWarrior and gcc - CW supports many gcc extensions such as the inline assembly syntax and several attributes. On the IDE side, of course, gcc has no IDE - it's a compiler. But gcc is often used with Eclipse, which is what CW now uses for most targets. (The "classic" CW IDE was horrible - if EW has a similar feel, then I'll remember not to buy it!)

There are /lots/ of debugging options for Linux development, that can be much more powerful than C-Spy (depending on the type of programming you are doing, of course). However, it all involves a lot more learning and experimenting than the ease-of-use of an integrated debugger in an IDE.

Indeed.

mvh.,

David

- H
- hamilton
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Jun 23, 2012 3:23 PM

This discussion is kind of silly.

Has anyone tried a "simple" program with multiplying two or four flouting point numbers and displaying the result to an LCD ??

As has been mentioned, you don't buy a V8 just to watch the cylinders go up and down.

You buy a V8 to use it.

How much code space would the PIC18 take to multiply two/four floats ??

hamilton

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Jun 23, 2012 4:35 PM

I agree. I was glad I had other programmers and sample files to help me through the initial setup of GCC-ARM.

EW does have MISRA rule checking---but I haven't started to use it yet.

I found that to be very true when I switched from a very low-end PCB layout program to PADS PCB. The $4K cost of that system has been paid back many times over in time saved in the design and layout of PC boards for customers. Heck, I've even learned to trust the autorouter when it is properly set up. ("Trust, but verify" still applies, though). The autorouter really helps with those fine-pitch QFP STM32 chips! I was able to pull an MSP430 from an existing design and plug in an STM32F205 in just a few days.

It took me a while when I first started looking at disassembled ARM code to realize that constants were being loaded using PC-relative offsets. When I finally figured that out it took me back to the mid-80's when I was writing Macintosh code using position-independent modules. What a rush of nostalgia that was!

I think I mis-stated part of that. It is really the editor and project file window that are similar between EW and Codewarrior. The other parts of EW are much better with fairly straightforward menus and dialog boxes for setting compiler,linker, and debugger options. I was using Codewarrior to develop code for the Persistor micro data loggers. That was a pretty tightly constrained development environment and Persistor provided most of the setup files and initial project files. If you have to start from scratch, I agree that the old Codewarrior was a true PITA to get set up. The Persistor logger didn't have a true debug capability, so I can't comment on the capabilities of Codewarrior in that regard. I really like the integrated C-Spy debugger in EW, though.

Of of the problems with debugging the Linux-based system was that the code was controlling an autonomous parafoil supply delivery system. When the system was operating, it started 20,000 feet above and several miles away from the programmer! Thus, the "record everything and analyze later paradigm" I did develop a simulated version that grabbed the GPS and other sensor values via hooks and substituted simulated values based on a simple flight model. That sim ran on the target hardware while it sat on the bench. I got really tired of listening to the whining control servos! Unfortunately, the flight model wasn't up to simulating GPS loss, sticky servos, and all the things that can happen to parafoils stressed beyond their flight limits.

Flight algorithms were tested in sims running on Borland CPP Builder on a PC. That allowed good debugging, lots of intermediate variable recording and graphic displays. The CPP Builder sims were written to use the same C control code that ran on the target hardware.

Mark Borgerson

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sat, Jun 23, 2012 10:37 PM

It is certainly true that you can only get a real-world test using real-world code. The trouble is, real-world code is not very suitable for discussing in a newsgroup. So the best we can do is look at some simple sample functions, and work from there. And when a poster is having such trouble generating good code from a simple "a = b + 5" function, it makes sense to start with that and work up.

You bring up another point here, of course - while the PIC18 may have compact code for setting a bit or adding a couple of 8-bit numbers, the ARM code will be more compact (and /much/ faster) for multiplying floats. Code comparisons on wildly different architectures are heavily dependent on the sort of code used.

- D
- dp
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jun 24, 2012 10:19 AM

Hi David, have you tried how fast ARM does at say 64 (or 32) bit MAC in a filter loop? Not so long ago I had to reach the limit for a power CPU (MPC5200B) which is specified at 2 cycles per MAC. Was not trivial at all, doing it in a loop DSP-like took about 10 cycles mainly because of data dependencies. Had to spread things over many registers until I got there, 2.1 cycles (with load/store included). That for a filer with hundreds of taps. How many FP registers do ARM have? It took using 24 out of the 32 to get to 2.1 cycles (although using the same technique over 18 registers yielded about 2.3 cycles, 15 was dramatically worse, data dependencies began to kick in - likely a 6 stage pipeline).

Dimiter

------------------------------------------------------ Dimiter Popoff Transgalactic Instruments

formatting link

------------------------------------------------------

formatting link

- D
- David Brown
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jun 24, 2012 4:04 PM

I haven't tried anything exactly like that (I love doing that sort of thing, but seldom have the need). On the PPC cores I have used (e200z7 recently), it can make a big difference to the speed of the code when things are spread out over many registers. Most PPC cores have quite long pipelines, and some have super-scaler execution, speculative execution or loads, etc., which make it a big challenge getting everything right here. You also need to take into account the cache - getting the computation flow to match the cache flow is vital.

In comparison, the Cortex-M0 is very simple. It has pipelining, but not nearly as deep, and it does not have a cache to consider. Some Cortex-M devices have a bit of cache, and may also have tightly-coupled memory, so there you have to consider the flow of data into and out of the cpu core. But I expect it is easier to get close to peak performance from an M0 than from a typical PPC core.

- M
- Mark Borgerson
  
  Contact options for registered users
Vote on answer
posted
11 years ago

Sun, Jun 24, 2012 5:52 PM

I don't think the ARM-CortexM4 chips have 64-bit FPUs and their clock speeds top out at about 160MHz. You're not going to get anything near the MPC5200B performance.

The closest you might get with an ARM-based chip might be one of the TI OMAP chips which has a fixed/floating DSP coprocessor.

Mark Borgerson