Arduino APIs, performance, and C++ templates (long)

Folk,

I'm doing a little project, my first with Arduinos. I had assumed that the choice to use C++ meant that the APIs would be fancy object-oriented APIs that generate inline assembly code for performance. I normally do more bare-metal stuff, including building C++ APIs for the peripherals of the MC68HC11 more than a decade ago, so I was keen to see what can be achieve using more modern C++ compilers.

To say I've been disappointed is an understatement. The standard of the code is simply awful. The g++ compiler is fantastic, but the Arduino APIs just don't use that power.

As an example, "digitalWrite" takes over 50 cycles, compared to the expected 2. I know that there are libraries that work faster, but why are the default libraries so bad? Even calling these methods takes at least *three* times the code space that's required. I drilled in to see what's going on, but that's not the topic here. I wanted to show how things could be better, and to see if anyone here is interested in making it happen (personally I actually want to do this for ST's ARM range, but will assist if someone wants to do AVR versions).

Using template metaprogramming, we can get nice object-oriented APIs that also map directly to the hardware instructions. Unfortunately it's not easy to use the existing Arduino port definitions as template parameters, which might mean having to redefine some of the #defines of the low- level hardware (more below). So here's a minimal example that works, and shows what could be achieved by following this route:

template class Pin { public: Pin& operator=(bool b) { if (b) *(volatile uint8_t*)Port |= Mask; else *(volatile uint8_t*)Port &= ~Mask; return *this; } };

Pin portBp0;

Folk,

I'm doing a little project, my first with Arduinos. I had assumed that the choice to use C++ meant that the APIs would be fancy object-oriented APIs that generate inline assembly code for performance. I normally do more bare-metal stuff, including building C++ APIs for the peripherals of the MC68HC11 more than a decade ago, so I was keen to see what can be achieve using more modern C++ compilers.

To say I've been disappointed is an understatement. The standard of the code is simply awful. The g++ compiler is fantastic, but the Arduino APIs just don't use that power.

As an example, "digitalWrite" takes over 50 cycles, compared to the expected 2. I know that there are libraries that work faster, but why are the default libraries so bad? Even calling these methods takes at least *three* times the code space that's required. I drilled in to see what's going on, but that's not the topic here. I wanted to show how things could be better, and to see if anyone here is interested in making it happen (personally I actually want to do this for ST's ARM range, but will assist if someone wants to do AVR versions).

Using template metaprogramming, we can get nice object-oriented APIs that also map directly to the hardware instructions. Unfortunately it's not easy to use the existing Arduino port definitions as template parameters, which might mean having to redefine some of the #defines of the low- level hardware (more below). So here's a minimal example that works, and shows what could be achieved by following this route:

template class Pin { public: Pin& operator=(bool b) { if (b) *(volatile uint8_t*)Port |= Mask; else *(volatile uint8_t*)Port &= ~Mask; return *this; } };

Pin portBp0;

Note that the 0x25 is the memory-mapped address of PORTB (its I/O address is 0x05, but memory-mapping adds an offset of

0x20, if I understand the AVR hardware correctly).

Now, when I write "portBp0 = 1;" I get exactly one instruction emitted ("sbi") which takes the expected 2 cycles (1 in -Mega). Same deal for "portBp0 = 0;", the instruction is "cbi". Both are single-word instructions, whereas a call to digitalWrite takes three or four words of code space.

Note that I would have preferred to define the template like this:

template class Pin {...};

Which allows removing the casts on uses of Port, but to be able to instantiate the template requires a cast:

Pin portBp0;

which translates roughly to:

Pin portBp0;

... and that's not valid for a template parameter. The only method I know that does work is to define the port variable as extern, in a particular section, and use the linker script or the linker option --just-symbols to define the location. This means we can also use a C++ reference instead of a pointer:

extern volatile uint8_t PortB; // address provided to the linker template class Pin {...};

Pin portBp1;

It's quite a lot of fiddling to use a linker script, but using --just-symbols is easy enough; either way you can't use the standard AVR header files for the values :(.

One option might be to define a structure for all the registers in a given AVR variant (and just locate the structure using --just-symbols), e.g.

extern struct { ... volatile uint8_t PortB; // ... at address 0x25 in the structure. ... } CPU;

void clear_B() { CPU.PortB = 0; }

The other advantage of using templates is that we can specialise them to set up the port correctly, and to check for collisions in port usage:

template class OutputPin : public Pin { OutputPin() { // (Check with a pin registry that this pin // isn't already assigned to something else?) // Set up port direction... } ... };

This also means that you can dynamically assign port pins just by defining a local variable in a function, and the pin will be set up for you when you hit that function.

With more work, you could set up templates for whole ports, or for ranges of pins on the same port:

template class PinRange { public: operator int() { return (Port&Mask) >> Shift; };

PinRange& operator=(int val) { Port = (Port&~Mask) | ((val

Reply to
Clifford Heath
Loading thread data ...

I don't mind objects - they can be as efficient as C also, but like all things, you need to understand how they work.

Indeed :).

Have you seen Andy Brown's excellent stm32plus:

Unfortunately it doesn't handle the I/O crossbar architecture of the STM32F3, which is what I wanted to use. But excellent, all the same.

Clifford Heath.

Reply to
Clifford Heath

Op 18-May-16 om 6:58 AM schreef Clifford Heath:

You might check my "Objects? No Thanks" approach (being as efficient as C but with compile-time polymorphism) and Odin Holmes' Kvasir library (more complex, but potentially faster and compacter than C).

Keep me posted if you find more such libraries (I try to gather them on

formatting link
, or if you want to cooperate.

Let's make embedded more ++ !

Wouter

Reply to
Wouter van Ooijen

You're right, I did - it was separately authored and I had to re-wrap it to avoid a narrower wrap margin in Thunderbird :(.

Yes, I get that. But even as a "better C" there's no reason to be 5x slower than possible.

Much appreciated - it's exactly what I was hoping for by reposting here.

Yes, I just wanted to minimize clutter.

Great, I'll give that a try. My last major foray into C++ templates was in 2007, and things have advanced considerably since then.

Of course - But I wanted to make my example more easily comprehensible to someone who hadn't seen templates before.

Right; once you have the hardware mapped, you can write peripherals and solution-specific code.

Finally, on a CPU like the STM32F3 which has an I/O crossbar, you need to configure the crossbar using a template parameter too. So PortB, Bit3 might be mapped to pin 22 using the crossbar. It might be a big challenge to *statically* establish that all crossbar configuration is valid and non-conflicting. I was thinking about ways to do it by defining external symbols so the linker would reject conflicts, but even an error that's not thrown until program initialization would be better than just stomping on the same hardware.

Clifford Heath.

Reply to
Clifford Heath

(I'm snipping your post here, because it was quite long and because you appear to duplicated a few parts by copy and paste.)

First, I believe the Arduino uses C++ more as "a better C" than taking advantage of C++'s features. The point is to help make it easier to write correct code (and to get compiler warnings on incorrect code), rather than to be a good C++ framework or generate efficient code. I am sure it is possible to get a better compromise here, but it is not the first nor the last framework to be ridiculously inefficient at something as simple as setting an IO pin. I have seen many others where, in the interest of "abstraction", "layers", "drivers", etc., setting a pin means passing up and down through have a dozen function calls from different files. As well as being painfully slow, it is also extremely difficult to actually see what is happing in the code, and to trace problems.

So I am all in favour of templates as the modern way to handle this sort of thing. Currently, I use a set of macros - I have used basically the same macros, adjusted and tweaked for different targets, for C and assembly on a wide range of systems. But templates have many advantages, especially in light of the improvements in C++11 and C++14 on type safety and compile-time features (explicit conversion operators, constexpr, etc.).

I have a couple of general comments on your approach here, if I may.

First, you seem to want to get rid of the "*(volatile uint8_t*)" cast inside the template function. I would say this is no problem at all - casts like this are part of the low-level machinery, and putting them inside the template cast is exactly where they should go. When defining lots of member functions, you can reduce the clutter a little by wrapping the cast parts inside inline private functions.

It is also possible to make the template with a "volatile uint_t*" pointer (or reference) rather than an integer, but as you saw that brings new complications - you cannot instantiate the template with "PORTB" or any other casts from a constant integer.

The rules for where you can make casts and conversions like this are fiddly, and you need to be quite experienced to understand them all. But a bit of trial and error, guessing possible solutions and looking at what compiles, is also workable!

While a direct case would not work, as far as I could tell, this constexpr function seems fine:

int constexpr intAddressOf(volatile uint8_t* p) { return (int) (intptr_t) p; }

Pin portBp4;

Second, one of the most important points in using templates like this is to be able to give your pins appropriate names. So you would be doing something like this:

Pin statusLedPin;

I'd also add an optional template parameter for active low or active high, and consider named member functions rather than overloading the assignment operator, thus letting you write:

statusLedPin.on();

If a new hardware revision changes the polarity, or the pin number, you only need to change the declaration of statusLedPin, not its usage.

Reply to
David Brown

There may be other ways to achieve this too - it's just the first successful method from my trail-and-error probing. You might consider wrapping things in a macro.

That's fair enough. But if you are making some sort of tutorial or example, then I would put this at the top of the page. The last thing you want is people to have a new way of writing:

PORTB &= ~0x04; // Turn status led on

Changing that to:

portBp4 = 0; // Turn status led on

is no improvement.

So I would structure your tutorial or documentation in terms of first stating that you want to be able to write:

statusLedPin.on();

And then consider how to achieve this in a safe and efficient manner.

Certainly there may be some solution-specific code - it can be made as template specialisations. But a fair number of member functions can be standard, such as for port direction, pull-ups, etc.

You might also want to consider your operator choice. Do you really want to use boolean assignment? It is not a bad choice, IMHO, but there are other options. What about:

statusLedPin If a new hardware revision changes the polarity, or the pin number, you

There comes a point when the best method is a pre-processor (written for the host - in C++, Python, or whatever) that generates the C/C++ header and source file handling this sort of thing. It is /possible/ to do many weird and exciting things with the linker, but even when you can get it to spot conflicts, it can be hard to for the user to convert linker errors into what actually went wrong.

Reply to
David Brown

Thanks, added to the list. I'll study his techniques when I have some spare time. Might even be before I die of old age...

Wouter

Reply to
Wouter van Ooijen
020007090601010002000202 Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit

Much snippage, leav> >>> It is also possible to make the template with a "volatile uint_t*"

I tried this anyway, see the attachment. You're right, it didn't work, which leaves me without any comfortable solution to my original problem. It's not nice for *users* to have to include the intAddressOf() call - and that only passes a number, not a structure that describes a whole device.

I tried the equivalent of that, but the other way, from an integer to a reference or to a pointer... no dice.

If you can see any way to make the templates in the attachment work, I'd be most appreciative.

Yes, of course. I would specialize the template for Input, Output, and Bi-Dir pins; perhaps even for source-only and sink-only outputs. And signal polarity, as you suggest.

If a pin is declared as an auto, arguably it should revert the pin to its previous state on destruction, in case it was also used as a global. Or have an "in-use" bitmask to notify that.

The latter, as for any variable, because the former is a duplicate definition.

Actually I think I made an error. The port bits should get allocated as we've discussed, and the crossbar should have separate configuration templates; it assigns pins to port-bits. So the above template should be PortBit (not Pin) and the Pin template should assign physical pins to PortBits.

User programs would need PortBit instances, and also pin assignment ones. Double definition, but that reflects the hardware.

The nice thing is to do that in a way which knows which ports and pins exist on a specific device so you can't use hardware which doesn't exist, and can't (inadvertently) double up on the same hardware.

Right, but even if collisions in the cross-bar are detected as a failure at program initialization, that's ok as long as the definitions are easy to understand and to get right.

Can a template with an integer parameter export a symbol which includes that number in the symbol name? If so, you could have

template class Pin { static void pin__is_in_use; ... };

to fail at link-time if you re-use a pin. The linker would tell you where the duplicate definition is.

Clifford Heath.

Reply to
Clifford Heath

No takers? It seems that modern C++ is still the same exploding ball of rotating knives, except now, some of the knives have been deliberately blunted "to make it safer".

Humbug.

Reply to
Clifford Heath

The "no accidental duplicates" thing was just a nicety, and is not needed.

The real problem is that the code does not work at all. You can go use a number as a parameter, but not from a reference (whether initialised from a number or not) to a template parameter.

So it seems I can use a struct, or templates, but not both. See the previous attachment.

Clifford Heath

Reply to
Clifford Heath

Within the C++ language there is AFAIK no bridge between values (even compile time constants) and names. But the preprocessor has no such barrier.

But I think that still would not achieve your aim, because it is legal to instantiate a class template twice with the same parameters. That is simply two times the same thing, and the linker would ignore all but one.

Wouter "Objects? No Thanks!" van Ooijen

Reply to
Wouter van Ooijen

Here's another way for template-based register access:

The code is based on this lightning talk from CppCon 2014 by Ken Smith:

-a

Reply to
Anders.Montonen

That's the same as we already had - a template parameterised by a number.

To be able to use a struct instance at a specific address, you need to be able to use a reference, but the C++ compiler doesn't allow parameterisation by a reference.

As I said before, you can use a struct, or a template, but not both.

Reply to
Clifford Heath

Am Mittwoch, 18. Mai 2016 11:22:27 UTC+2 schrieb David Brown:

Over time there have been several attempts in that direction, there was a tool "Dave" from Siemens (still existing in Infineon) for their microcontrollers, and the PSoC Creator goes in that direction. For high-end devices NXP/Freescale's "DPAA Expert" is such a tool. In my opinion a community effort would be required that allows such preprocessing, allocation, configuration in a (host and target)-platform independent way, say XML+Python/Java. I think there are some free tools from academia but you would need to get enough acceptance to get the vendors into the boat.

Andreas

Reply to
acd

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.