For the Windoze haters - VS2005

In article , Greg Herlihy wrote: [...]

This doesn't really qualify as "modification". The programmer has typed in source code and compiled it. The output of the compiler happens just once and doesn't get changed from that point on. All that templates are is another layer of abstraction.

If you compile:

if (1==0) j = 17;

Nearly every compiler will remove the statement and not include the test for 1 being equal to z0 in the output. The removal of the "j=17;" statement from your code is just as much a "modification" as a template.

Actually C++ already allows self modifying code. It is installation dependant but:

char * MyPointer;

MyPointer = (char *)RoutineName; (*MyPointer) = 0x7E;

What more could you want?

--
--
kensmith@rahul.net   forging knowledge


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Ken Smith
Loading thread data ...

Particularly since Icaches aren't usually multi-ported, performance of self-modifying code would be abysmal.

--
  Keith

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Keith Williams

Typically, it won't even work, because the instruction cache is separate from the data cache, and writes to memory won't invalidate the cache line in the instruction cache.

At least for what is traditionally considered self modifying code. There should be no problem with writing a bit of code into a local (stack) variable, then executing it. As Francis says, the need would be very rare today, but I imagine that somewhere, there is an extreme case where it might solve a performance problem. (Perhaps in some embedded processor.)

-- James Kanze GABI Software Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 [ See
formatting link
for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
Reply to
kanze

Still, I could see reasons to have self-modifying code today. For example, I always wonder about the performance impact of using a function static variable:

int f() { static int count = 0; return ++count; }

An implementation might add some initialization code which is run when the function is called for the first time and then modifies the first instruction to branch to the code following the initialization. Of course, the initialization is not reentrant - nor should it. How to cope with function static variables in multi-threaded code is another issue but one which is rather independent of the code modification.

--
 
 - Efficient Artificial Intelligence

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Dietmar Kuehl

What is RoutineName? If it is a pointer to a function (or the name of a function, which is implicitly converted to a pointer to a function when not immediately followed by parentheses), then there is no such conversion, and the code above will not compile with a conformant compiler. If it is not a pointer to a function, how do you get it to point to code?

Of course, at least on the machines I use, the code is in a write protected segment, so anything I do which would actually write to it will cause my program to core dump.

In sum, it's not installation dependent (i.e. implementation defined), but undefined behavior.

-- James Kanze GABI Software Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 [ See
formatting link
for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
Reply to
kanze

Actually, dynamic code generation is used quite a lot in virtual machines (i.e. Java) that do just-in-time compilation. I consider this self-modifying code since the virtual machine, instead of interpreting bytecodes (with a loop), extends its machine code with the equivalent, much faster code.

Usually, the OS provides a system call that ensures the coherency between the I and D caches. On some architectures, one would also need to mark the dynamic code area as executable for things like speculative prefetch (branch prediction). On UNIX-like systems, there is mprotect() for this.

--
Catalin

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Catalin Marinas

Yeah, it depends on the processor, obviously. With some (I have a PowerPC 603e manual in front of me) one can do it with the appropriate memory write, barrier instructions and i-cache invalidate. Ugly, but it can be made to work.

Performance is still going to suck pretty badly since you have to get the new instruction out to memory and then refetch it into the i-cache (basically what the barrier instructions for the 603e, above, do). I guess there could be some corner case where some performance could be had playing these games, but I don't see it. Mostly this sort of thing is thrown in just to confuse debuggers/disassemblers.

--
  Keith

      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Keith Williams

I just fed this into gcc by typing "gcc try.c". It compiled without a twitch.

************************************** #include

void other(){ printf("Hi"); }

int main(){ char * temp; temp = (char *)other; return 5; }

*********************************

BTW: The RoutineName can be either a pointer to a function or directly the name of said function. The result is indeed the same. If you look back at the history of C you will find that a call was generated any time there was a value followed by an open paren. ie:

(a-long-expression)();

generates a function call.

No, on some machines, it will work just fine. These days, they will be down near the microcontroller end of the range. Dig out an old PC-XT and give it a try.

--
--
kensmith@rahul.net   forging knowledge


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Ken Smith

In article , Ken Smith writes

Then you do not know what undefined behaviour is. Such behaviour can be exactly what you expect and implementations are even allowed to tell you what they will do with it. However an implementation can do bizarrely strange things with undefined behaviour, even to the perversity of only misbehaving rarely.

You should never rely on undefined behaviour unless you ring fence your code with warnings that it is restricted to very specific hardware/implementation.

Implementations for embedded systems often do define what happens for at least one case of undefined behaviour but that is usually OK because the target hardware is pretty specific.

--
Francis Glassborow      ACCU
Author of \'You Can Do It!\' see http://www.spellen.org/youcandoit
For project ideas and contributions: http://www.spellen.org/youcandoit/projects


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Francis Glassborow

This is a known bug in g++. The standard is very clear in this matter -- any attempt to convert a pointer to a function into a pointer to data is illegal, and requires a diagnostic. This is also the case in C, and was the case in C90.

Historically, compilers (at least on Unix) have been lax about enforcing it; Posix, in fact, requires dlsym() to return a void*, even when the symbol is a pointer. (Posix also says that to assign the results to a pointer to function, you have to use something like: *(void **)(&p) = dlsym( ... ) ; Which does work if, and only if, pointers to functions and pointers to data have the same size and representation. Something which Posix requires, but which is not required by C nor by C++.)

Because of the historic situation, one can well understand compiler writers not rushing to make the conversion a fatal error -- Sun CC issues a warning, for example. Still, in g++, it is an error, since the error isn't signaled even when I use

-std=c++98.

There's a lot of left over junk from C. In most (but not all) contexts, an lvalue with function type T converts implicitly and automatically to an rvalue with type T* (donc, a pointer to function). And the () operator is defined as being applicable to both a function and a pointer to function.

That doesn't mean that we shouldn't be clear when we write code. About the only time I can see actually using these conversions is in template code, where a pointer to function works as a functional object.

The "undefined behavior" is the effect of writing through an arbitrary pointer. The fact that there exist machines where it will work does not change the fact that it is undefined behavior according to the standard.

Of course, if your code isn't meant to be portable, and you know the platform on which you are working... under CP/M, I once wrote code which patched the OS on the fly.

But of course, the code wasn't conform to the standard:-).

-- James Kanze GABI Software Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 [ See
formatting link
for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
Reply to
kanze

Yes, I do know what "undefine behavour" is. In this post I was directly disproving the statement that no conforming compiler would compile it. The gcc compiler, I believe, is considered to be truely conforming.

If you are running out of ram on an anything from 1802 to 8086, the code fetches can be predicted and the op-codes are known. In this case you could, if perverted enough, write self modifying code.

[...]

Actually not really. In embedded systems, you often can't write to code space since the code actually live in FLASH memory.

--
--
kensmith@rahul.net   forging knowledge


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Ken Smith

In article , kanze wrote: [....]

Do you have a cite for that?

[...]

Since my source was C, not C++. It wasn't a case of "left over from", it was a case of "still there".

I don't think self modifying code can ever be portable so we've already crossed from the swamp into the mire.

--
--
kensmith@rahul.net   forging knowledge


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Ken Smith

I don't know where you got that idea. I don't think it even claims full conformance -- otherwise, where's export. (And of course, any time someone claims 100% conformance, without a modulo bugs clausus... I don't think that there's a compiler implementer around that would claim that their compiler has absolutely no bugs, and of course, some of these bugs may allow an illegal program to compile.)

Some software does it on PC's under Windows or Linux, and on Sparcs under Solaris. I don't know how else you could qualify JIT and HotSpot for the JVM, for example.

Of course, they have to use behavior which is undefined according to the C++ (or C) standard to do it.

-- James Kanze GABI Software Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 [ See
formatting link
for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
Reply to
kanze

In what sense? It's not in the list of permitted conversions, so it isn't permitted. In the C standard, for example, we have (ISO 9899:1999, §6.3.2.3/7): "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type." for data pointers, and (§6.3.2.3/8) "A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer." for pointers to functions, but nothing for going from one to the other. In C++, it's a bit more complicated: you have to look at all of the permitted conversions for static_cast and reinterpret_cast, but you won't find any that allow going between a pointer to data and a pointer to a function either.

It's also worth reading what Posix has to say about it in the rationale for dlsym():

The ISO C standard does not require that pointers to functions can be cast back and forth to pointers to data. Indeed, the ISO C standard does not require that an object of type void * can hold a pointer to a function. Implementations supporting the XSI extension, however, do require that an object of type void * can hold a pointer to a function. The result of converting a pointer to a function into a pointer to another data type (except void *) is still undefined, however. Note that compilers conforming to the ISO C standard are required to generate a warning if a conversion from a void * pointer to a function pointer is attempted as in:

fptr = (int (*)(int))dlsym(handle, "my_function");

Due to the problem noted here, a future version may either add a new function to return function pointers, or the current interface may be deprecated in favor of two new functions: one that returns data pointers and the other that returns function pointers.

Finally, g++ does NOT allow it with either a reinterpret_cast nor a static_cast -- presumably because the C++ does not allow it. Of course, in C++, a C style cast MUST resolve into a reinterpret_cast or a static_cast, followed (or preceded, I forget) by a const_cast. Why this wierd solution, I don't know. (Sun CC gives a warning for both the C style cast and the reinterpret_cast.) Perhaps the people responsible for the C compatible parts have a different opinion vis-a-vis standards conformance than those responsible for the C++ parts.

It worked before, so g++ can't just remove it, regardless of what the standard says. This may also be the justification for not supporting it in reinterpret_cast -- there was no reinterpret_cast "before". IMHO, the correct solution would be a warning (which can be turned off), converting to an error when I specify -std=c++98; the fact that it isn't an error when I specify -std=c++98 can only be considered an error in the compiler (or else -std=c++98 doesn't mean what I think it should mean).

-- James Kanze GABI Software Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung

9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 [ See
formatting link
for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
Reply to
kanze

In article , kanze wrote: [...] Yes, that was what I hoped for.

[.. function pointer conversions .. ]

I think "remove" isn't exactly what the compiler writer has to do in this case. Its more a case of adding a test to make sure that the function pointer is being converted. 99% of the time, converting a pointer is just changing how big the increment is.

--
--
kensmith@rahul.net   forging knowledge


      [ See http://www.gotw.ca/resources/clcm.htm for info about ]
      [ comp.lang.c++.moderated.    First time posters: Do this! ]
Reply to
Ken Smith

Neither fair nor accurate.

First, there were two disjoint spaces, not two-- stack and data were in the same space.

Second, early versions of PDP-11 were limited to 32768 16-bit words (65536 8-bit bytes) of virtual address space. Only the operating system could use more than this, and then only by constantly re-mapping virtual addresses to physical addresses. When the limitations of that amount of space became obvious, later versions were able to map code and data space seperately, providing up to 32768 words of data AND 32768 words of code.

Furthermore, the Intel processor that Windows uses is capable of doing something quite similar. I don't think it's possible to map a program byte and a data byte at the same address... but I'm 100% certain that it's possible to mark memory as read-only, read-write, write-only, or execute-only. But since Intel platforms can support up to 64 bits of address space, there's no reason to make the same address be used for both code and data, as there was on the PDP-11.

Furthermore, I think (but I'm not 100% sure, I haven't tried it) that if a user program tried to modify it's own code segment, it would trap, unless it was specifically linked to allow this... so the sensitive part is the stack itself. You can easily overwrite the return address, to make a program crash... injecting code is not so easy.

[ See
formatting link
for info about ] [ comp.lang.c++.moderated. First time posters: Do this! ]
Reply to
Allan W

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.