Correct, this is the standard assumption of C/C++, eg. int arr[100] implies that &arr[100] != NULL. Many compilers and in particular loop optimizers use stronger assumptions than that to allow prefetching and other advanced loop transformations. All this is considered safe, especially since the first and last 4KB are typically reserved by the OS.
Yes, architecture 5 made interworking much simpler so you don't need veneers or special return sequences anymore.
Yes, this is good for codesize. Good compilers even use STM to create a small amount of stack:
PUSH {r0-r2, r4, lr} ; push r4, lr and create 12 bytes of stack space ... POP {r1-r3, r4, pc} ; remove 12 bytes of stack, restore r4,lr and return r0
High-end ARMs typically transfer 2 registers per cycle and may execute other instructions in parallel, so it doesn't cost much performance.
Yes... No there is no relation, but I dislike goto's as well!
Wilco