That 5-10 times would be the absolutely worst case, when you are running tight integer benchmarks.
Real code would be different. On my interpreter program, the difference between -O0 and -O3 is less than 2:1. And even *that* is when interpreteing integer benchmarks!
And unless all the software (OS, drivers, libraries) is recompiled with the same slower compiler, the slowdown will only apply to the application.
Also, development speed might be more important than the maximum possible execution speed; you can always run gcc with -O3 when it's finished! It might also be that a simple optimising pass, if desired, could make an quick improvement on that 2x slower that -O0.
For embedded work especially, being able to express integer constants in binary (together with separators for readability) is one of many dozens of simple enhancements that can make life easier. (Maybe gcc has some non-standard extension, and obscure switch to enable it, for doing that. That's not the same.) Type-specifiers that you simply read and write left-to-right like English is another (I doubt there's a gcc switch for that!).
You will still be working predominantly in C (or whatever comparable high level is used). The inline asm stuff is just a technique. After all gcc C compilers seem to depend on a plethora of weird-looking attributes, a thousand compiler options, and an entirely separate language in make files in order to do what they do. A few lines of ASM is tame!
And if done properly, you will provide both asm and regular versions of the code, so that you can switch off the asm when necessary.
OK, I will take a look. The ARM assembly code posted here looked like that assembled by gcc when you give it a .s file. (I'm not interested these days in writing assemblers and linkers...)