embedded questions!!!

- R
- Richard Henry
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 6:50 PM

the

I will defer to you knowledge of C pointers.

However, my confusion about them has only increased.

- B
- Bob Stephens
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 7:07 PM

AHA! Breakthrough! Now you're getting it.

;)

Bob

- L
- Logan Shaw
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 7:12 PM

I'll assume that's supposed to be 'char *str1 = "JHONSON";'...

Nope, it's an array. If it helps to make things clearer, sizeof(str1) will be different than sizeof(str2). In particular:

sizeof(str1) == sizeof(char *) and sizeof(str2) == strlen(str2)+1

That should help make it clearer that str1 and str2 are different types. If you were to give them names, str1's type would be "char *" and str2's type might be "char[8]". One is a pointer and the other is an array.

That may not make much sense, but think for a moment about how an int can be automatically converted to a float:

int i = 1; float f = 3.14159;

printf ("%g", i + f);

What's going on here is that the compiler sees an expression where the operator (in this case "+") has two arguments which are different types. It then implicitly converts the integer into a floating point value when generating the code that will correspond to that expression.

A similar thing happens when arrays are used where a pointer value is needed. The compiler thinks "this isn't a pointer, but since it's an array, I can create a pointer value from it".

Another way they differ, if I understand correctly is that this would be legal:

str1 = (char *) "another string";

but this would not be:

str2 = (char *) "another string";

The reason is this: str1 is just a pointer, so it can take on any value. But str2 is the array that contains { 'J', 'H', 'O', 'N', 'S', 'O', 'N', 0 }, and how are you supposed to assign a pointer to that?

- Logan

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 7:18 PM

You mean *((a)+(i)). Or perhaps more precisely, to capture usual unary conversion rules, *((&a[0])+(i)).

Jon

- D
- Dave Hansen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 7:35 PM

Actually, I meant "*(a + i)". Apologies for fat-fingering that. I think, however, the additional parentheses in *((a)+(i)) are redundant, at least if a[i] is a well-formed expression.

Well, no, actually, that's not necessary, and arguably wrong. Consider i[a], which should be equivalent to a[i].

Regards, -=Dave

--
Change is inevitable, progress is not.

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 8:23 PM

well reside in ROM).

first one also creates a

To expand on this:

const char *str1 = "some string";

It produces something akin to:

char unnamed[] = "some string"; const char *str1= &unnamed[0];

where 'unnamed' isn't accessible by name, at the language level.

...

However, there is an interesting question this called to mind that I'm not precisely clear on, though I can explain my own mental model about it. This is the issue of the string initializer itself.

I believe the array contents of 'unnamed' need not be (but may be) modifiable. In other words, the compiler may or may not use read-only memory such as flash for the location of 'unnamed's array of chars.

I'm a little unclear on this point, though. So perhaps someone can cite the chapter in the standard that clarifies this.

In other words, do these two statements set up different type qualifiers for the literal?

char *str1= "Hi there."; char const *str2= "Hi there.";

Would a c compiler be allowed to "fold" these two constant arrays so that they occupy the exact same memory?

I'm unclear on that, though my own mental model says that a compiler may fold these together, despite the fact that the pointers to the then-identical array of characters do NOT have the same type qualifiers. And, further, I think whether or not the compiler places such unnamed strings in read-only memory or read-write is not specified by the language and the compiler is allowed to make either choice.

But I don't know.

Jon

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 8:25 PM

Understood.

See Harbison and Steele, 5th edition, page 141. I think it states what I wrote.

Jon

- D
- Dave Hansen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 8:44 PM

[...]

Well, maybe I'm jut getting nitpicky. Which is why I qualified my statement with "arguably."

My point was that it's not a positional thing, e.g., that i[a] is not equivalent to *((&i[0])+(a)). As i stated in my earlier post, the unadorned name of an array devolves into a pointer to the first element of that array. In the reference, Harbison names the mechanism by which this occurs (the ususal unary conversions).

Note that in the same paragraph he says what I said above, namely i[a] is equivalent to a[i].

Regards, -=Dave

--
Change is inevitable, progress is not.

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 8:51 PM

That's fine. Now, have you looked at my question regarding the contents of unnamed string initializers? I've done a brief look at the C99 standard and haven't found a specific answer (which may mean it is buried in a chain of logic.) I also have looked through three Harbinson and Steele editions without luck. I may jump into my compiler books (like the two editions of the 'dragon' book, and others) to see if I can find it referenced there. But I don't have a good model for this in my head and looking at these examples exposed my own ignorance on that specific point.

Jon

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 9:08 PM

Actually, I think folding two strings together (which is an option on some compilers) may be against the standard. So that part of the question may not be relevant. But the subtler question remains as to whether or not the string initializers themselves should be considered unmodifiable by a programmer. I'd argue that they must be considered unmodifiable, as some operating systems/compilers may place these constants in "program text areas" or otherwise in read-only protected memory. So that when saying:

char *s1= "hello";

you are doing something like,

char *s1= (char *) ((const char []) { "hello" });

Jon

- D
- Dave Hansen
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 9:09 PM

I don't think the Dragon book will help. It's really a language issue rather than a compiler issue.

My copy of the standard is in a box somewhere, but H&S5, p33, about

3/4 way down the page, says (asterisks indicate *bold* text):

"*Storage for string constants.* You should never attempt to modify the memory that holds the characters of a string constant since that memory may be read-only -- that is, physically protected from modification."

IIRC, the type of string literals is "array of (non-const, plain) char," but writing to them results in undefined (rather than implementation-defined) behavior. You might check with the denizens of comp.std.c to be sure...

Regards, -=Dave

--
Change is inevitable, progress is not.

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 9:12 PM

I think that settles it. Thanks, Dave. And it makes sense, too. Which is a good thing.

Jon

- A
- Alex Colvin
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 9:28 PM

well, your assignment's probably overdue, but as it's been bugging you...

he didn't specify what to do with the remaining characters. maybe str1 = ""; or str1 = "please finish my assignments"; it's not clear what role str2 plays in this question.

--
	mac the naïf

- J
- John B
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Fri, Jan 13, 2006 9:38 PM

On 13/01/2006 the venerable Jonathan Kirwan etched in runes:

We must also remember that C was devised for use on a machine with Von-Neuman architecture. Many modern microcontrollers use a Harvard architecture and this presents the compiler writers with the problem of where to put unmodifiable data.

--
John B

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Sat, Jan 14, 2006 2:56 AM

The PDP-11, for one.

I'm intimately aware of that.

Some thoughts on that:

The problem presented with any compiler used for dedicated embedded situations is all the work it takes to meet the spec before arriving at main(). At the time c was being developed, the current thinking about running program environments included the following functional classifications:

Segment Name Segment Description ------------------------------------------------- CODE Code section CONST Constant data section INIT Initialized data section BSS Uninitialized data section HEAP Heap section STACK Stack section

[Actually, the very concept of 'stack' as a general purpose workhorse will still in its childhood -- many of the existing and commercially successful machines did NOT support, via hardware, the idea of a stack and a great deal of code had been written completely without them except as a specialized concept for certain problems. (I worked on such operating systems and languages.) Heap was kind of new, too. The PDP-11 was only just out around 1970, or so, to light the way out of the darkness. :) ]

In Von Neumann, all of these are in the same memory addressing system. Modern concepts weren't completely worked out and the PDP-11 included support for several equally good conventions. But the gist of the above is that stack would grow down, heap grow up, and that the other four sections were each of fixed size at start. Only the CODE, CONST and INIT sections needed to be kept "on disk" or in some form of non-volatile storage (which could, of course, include cards, tape, or whatever.) Neatly, the non-volatile portions are all of fixed size.

In other words, like this:

Section Description Access NV? Size ================================================================= Code Execute Yes Fixed/static Constants Read Yes Fixed/static Initialized Data Read/Write Yes Fixed/static Uninitialized Data Read/Write No Fixed/static Heap Read/Write No Variable, up Stack Read/Write No Variable, down

If you look at the above list and think about Harvard architectures and c programming generally, you find that the code must be placed in code memory while the others must all be placed in data memory. on Von Neumann, this is the same memory system. On Harvard, two different ones, at least.

But this is NOT a problem, really. Even in the Harvard case. In fact, it's not too far from how an operating system would do it under the Intel 80386 and above, if it wanted to implement an execute-only region for the code (you can't read it as data.) And the 80386 is NOT Harvard.

Also, keep in mind that it is still the case that the first three must be stored somewhere in non-volatile memory. In the case of Von Neumann systems with flash on-chip, this is fairly easy -- just place it there. Both code and data can be accessed without having to move any of it around.

A question for c in embedded use for Harvard comes in the use of pointers. A pointer to code memory may NOT occupy the same memory footprint (in other words, the sizeof() the two pointer types may be different) and the actual instructions used to access these different types of memory may be different. The different size can be fixed, by requiring the larger of the two sizes for all (in other words, making a union.) And code generation can simply depend on the declaration of the pointer. I believe casting can also be handled. So, frankly, neither of these are insurmountable and it is quite possible for a c compiler to accept straight c code and generate functioning programs on Harvard machines without special decorations/declarations.

For Harvard, a re-definition of the Von Neumann layout is in order, if you want to be able to port code as easily as to another Von Neumann system. Something like these functional areas:

Segment Name Segment Description ------------------------------------------------- CODE Code section CONST_copy Data for constant section INIT_copy Data for initialized data section CONST Constant data section INIT Initialized data section BSS Uninitialized data section HEAP Heap section STACK Stack section

In this case, the first three must be placed in non-volatile memory -- flash, for example. And the remaining can be placed in volatile. At start, pre-main() code copies CONST_copy into CONST and INIT_copy into INIT before starting main(). If this is done, then once again all data memory is accessible as data. And Harvard works consistently with c's model, I think, in this case.

For embedded Harvard processors -- the only difficult problem in the above is if there are _no_ instructions which can read from code space and if the code space is the only non-volatile memory present. In such cases, I believe, space will have to be reserved in data memory and code instructions must be able to use immediate-mode constants they can load into registers and then place into data memory to initialize them to specific values. That would be painful, but doable, if instructions support some form of immediate mode and there is enough code space, of course.

So the bottom line, I think, is that Harvard really isn't exactly an insurmountable problem for c compilers accepting unvarnished c code, granting an instruction type or two on the target.

That is, until you worry about practical things like scarce resources

-- such as RAM. It is one thing that INIT_copy needs to be copied into INIT. There is no avoiding the need to use RAM for initialized data that can be later modified. It has to be in RAM. But CONST_copy must also then be copied into RAM where it can be accessed as data and it would be nice if, instead, those constants could just remain in the code space and not take up RAM resources at run-time.

So that can be a bad thing. There may not be very much RAM to go around. So suddenly you have a strong desire to access data that sits in code space (if data space doesn't include non-volatile memory.) But if you cave into that desire then you may have another problem, passing around pointers to data which may be either in code space or data space. In that case, you either need to fashion data pointers which support both (and that will likely balloon the code as well as slow execution time) or else have the compiler emit code for one kind of access where that routine then cannot accept pointers to the other space. I was faced with this problem, for example, using the PIC chips -- in a routine that was basically a "printf()" accepting strings which could _either_ be in a RAM buffer _or_ constant literals located in code space.

So vendors do expand things by adding type qualifiers or #pragma statements. But mostly to be competitive and sell their product -- not so much because they absolutely have to -- Harvard can be painful for vanilla c and it can be uncompetitive with decorations added, but I don't think it is impossible in principle.

Jon

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 16, 2006 2:12 PM

That's still incorrect. The actual rule is that any expression of array type (not just "the name of an array variable"!) used in a way that requires a pointer, automagically turns into a pointer to its first element. The canonical counter-example is that

sizeof(array) == sizeof(&(array[0]))

will be true only rather rarely.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 16, 2006 2:21 PM

Incorrect. Constant folding was even among the rationales for making string literals non-modifiable.

They *must* be considered immutable. Any program assuming differently will cause undefined behaviour. Which means that such a program would be about as fundamentally buggy as a program can possibly be and still make it through some compilers.

As I've written here before, the fact that the type of "hello" is formally "array of char" instead of "array of const char" is a historical accident.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 16, 2006 2:29 PM

Harvard vs. von-Neumann architecture has little or nothing to do with that. The only question is whether the target platform has some kind of read-only memory accessible as data (rather than only as code) or not. If data ROM is available, string literals should usually go there. If there isn't, it doesn't matter where you put them.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- J
- Jonathan Kirwan
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 16, 2006 6:31 PM

Constant folding would argue exactly that, of course. But the rationale is not the standard. I would be interested in the section of the standard where this is addressed. Anyway, I was guessing in the back of my mind about the potential distinctness of pointers to two different instances of the same literal text and whether or not it might be allowed to have the same address. I haven't thought deeply about it and I don't recall reading a specific point in the standard, so that was my guess. Now, I suppose, I should have to go more deeply into it. Unless you have a ready citation.

And thanks for the point.

Makes a lot of sense to me.

I didn't read what you wrote before. Thanks.

Jon

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
18 years ago

Mon, Jan 16, 2006 6:51 PM

Well, I don't have a copy of C89, but see K&R2 (ANSI C edition), appendix section A.2.6. Also see the C FAQ, entry 1.32, and references therein.

They're explicitly neither forbidden, nor guaranteed, to be the same. This falls under the general heading of implementation-defined behaviour.

Some people may think that the need for a distinction between two objects being the same vs. them being equal is a novelty of OO programming. Well, it's not.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.