I mean starting from -O2 -Os. The settings of Os do not switch on many of the space saving optimizations or switch off optimizations that are bad for codesize (Andras mentioned a few). So at least some of the gap between GCC and commercial compilers is explained by a bad choice of default settings.
The difference between -O0 and -O2 is much more than 5% - I don't have recent numbers, but it is more like 50%.
The variation between compilers of different vendors (at their best settings) is easily over 30% for codesize, and 50% for performance.
I know companies who would do anything to get just a 0.5% codesize improvement so they can add a new feature in their latest phone without increasing the flash size. It all depends...
Anyway finding the best compiler and options is easier compared to rewriting your code for space. Using assembler is never worth the effort - modern compilers generate better code than the average assembly programmer. Of course if the code was designed with codesize in mind you may not have a problem in the first place.
Wilco