As far as I can see, that all makes sense for a program structured like this:
(main routine) : do thing1 do thing2 do thing3
Avoiding the call and return makes sense - the end of "thing1" can just be a jump to the start of "thing2", or even better, "thing2" can simply follow on after "thing1".
But if the program is: (main routine) : do thing1 do thing2 do thing1 do thing3
Or if both "thing1" and "thing2" want to call a common routine "thing4", you are stuck. Either you duplicate code (which might be worth doing if the code is short, but not if it is long), or you have some sort of stack - there is no other way (any other solutions are a stack in disguise).
An ideal compiler arrangement would figure this sort of thing out automatically and cut out all the calls and returns when they are not necessary, but include them when they are. I don't know much about forth implementations (it's over fifteen years since I looked at forth), but I do know of C compilers that do this.