we used the _fctcpy for HC08 flash programming code and another _fctcpy for ST7 code recently and both are working fine. I first also thought about writing a own and more simple function to move the code, but after a day reading in the compiler library sources I finally came to the point where I recognized that _fctcpy_ is the "deluxe-universal" version of what I like to write myself. It covers all current and future needs and the code size is very small. You an hardly write this yourself becouse you dont get the linker generated source addresses from the descriptor ___idesc__.
There is one thing to consider: If one memory area (typical ram page 0 from 0x50 up) is shared by more than one piece of code, only one segment should be compiled with debug info switched on. E.g. if you have diffrent code section "a", "b" or "c" what can be moved to ram by _fctcpy and a breakpoint appears, debuggers cannot know which of the sections has been moved to that place before in realtime. With banking mechanism, the debuggers can read back the actual bank number from MMU but this is not possible for movable code. This means, to debug movable code you must not enable debug info at more than one section.
_fctcpy() is with leading underscore in C language and the assembly label in source has been defined with two underscores __fctpy: The function uses the same __idesc__ structure than crtsi (C runtime startup) is using for initialized data. __idesc__ is a linker generated chained data structure with a "end of file" flag character. The generation of that linker information is triggered if __idesc__ remains unresolved at linking time becouse it is used by the crtsi or fctcpy library code or even if you write a own piece of code what tries to access the __idesc__ data.
If one memory area is shared by morethan one code piece, you have to create diffrent namespaces in the lkf linker file using the +seg -s spacename option to avoid clashes while linking. Further you need to identify the segment by using the -n segmentname option. Therefore I always use 1code, 2code, 3code ... while the first character is the argument for _fctcpy().
The diffrence between movable code by fctcpy and initialized data by crtsi is the flag byte in __idesc__. Both, fctcpy and crtsi scans the __idesc__ structure in same manner and the code is very similar. crtsi copies all data sections with flag bytes below visible ascii and fctcpy copies only one section what is given by the argument when invoked by application programs.
The linker generated __idesc__ structure is a rarely seen accumulation of cleverness. The first entry of the chained structure is the start address of source from indicated section and the first element is 7 bytes of length. The last 2 bytes of the element are the last address+1 of the source address. Becouse the linker chains all "from" segments into one area, this address is equal to the start adddress of the next section. While the linker places the automatically generated "from" section of source in consecutive areas, the destination address of each movable section can be diffrent. With other words, all following entries of the descriptor chain have only 5 bytes but they can be interpreted with 7 bytes whith overlapping first word with the previous chain element.
The interpreting library code of crtsi and fctcpy is a bit tricky to understand becouse it is completely reentrent. I already took the time to rewrite the _fctcpy() library function using fix ram allocations instead the HC08 stack instructions. After comparing the gain in code size and execution cycles I went back to the given library function. From that day, the library source of _fctcpy() is my prototype model on how to learn effizient stack usage of small microcontrollers.