arm-gcc: pointer to constant string

--- 1.c --- void foo(void) { dummy("hello"); }

--- 2.c --- static const char *s;

void dummy(const char *ss) { s = ss; }

"hello" is a constant string declared inside function foo(). Is there a guarantee that his pointer (passed to dummy()) is valid after exiting foo()? In other words, is the string on the stack (in this case it will not be valid after foo()) or in Flash (so it is permanent)?

Reply to
pozz
Loading thread data ...

It is not guaranteed by C, AFAIUI - the literal is only valid during its lifetime, which ends when the call to "dummy" returns. However, gcc always places such strings in flash and it will remain valid throughout the program's lifetime. I cannot see that changing in any future version of gcc.

But note that the same does not apply to pointers to other objects that might reasonably be put on the stack. And it may not apply to targets like the AVR where strings get copied from flash to ram before use.

Reply to
David Brown

Yes. K&R II (The C Programming Language, 2nd Ed), says that

(A2.6) "A string has type 'array of characters' and storage class 'static'."

and

(A4) "Static objects [...] retain their values across exit from and reentry to functions and blocks."

--
Nils M Holm  < n m h @ t 3 x . o r g >  www.t3x.org
Reply to
Nils M Holm

K&R does not define the C language. It was an approximation to a definition until ANSI (then ISO) made the standard, and has not been relevant for decades. The standards copied most of the features and rules given in K&R, but were not absolute about it. Accurate draft versions of the C standards are freely available on the net - the current document of choice is N1570 for C11.

No, it is not - not for decades. In C, "string" is defined in 7.1.1 of the current standard as "A string is a contiguous sequence of characters terminated by and including the first null character."

In particular, there is /no/ mention of storage class, and strings are independent of the storage class.

Also, here the OP is asking about the practical lifetime of a string literal, not a string. Literals, like other constants (using the C standards definition of "constant"), do not have lifetimes or scopes like objects - the nearest you have is the period of validity of the temporary pointer to the string literal. And in the OP's code, that is during the call to "dummy".

A C compiler /can/ implement the call function "foo" in this manner :

void foo(void) { char s[STRING_LIT_LENGTH_HELLO]; // On stack copy_string_literal_from_compressed_storage(&s, STRING_LIT_IDENTIFIER_HELLO, STRING_LIT_LENGTH_HELLO); dummy(s); }

That would - AFAIUI - be legal in C. It is not the way gcc does it on any targets I have seen (not even on the AVR, where the string must be copied to ram in a case like this). In particular, for the ARM it would be very strange for a compiler to do anything that would not involve having the string literal at a fixed place in read-only flash.

The current C definition of static objects' lifetime is given in

6.2.4p3, and is more accurate than the old K&R notes. They are not relevant here, however, since string literals are not objects.

(And if it is relevant to the OP, the rules for C++ are a bit more complicated. But C-string literals will be handled the same way by the ARM gcc compiler.)

Reply to
David Brown

You can look up it yourself, just add -Wa,-adhlms=myfile.lst to your command line (this for myfile.c).

Here is your example, slightly changed to prevent the optimizer from make the code totally disappear:

static const char *s;

void dummy(const char *ss) { s = ss; }

const char *foo(void) { dummy("hello"); return s; /* added to keep the code */ }

And the compiled code for Cortex-M3 (somewhat shortened):

1 .cpu cortex-m3 11 .file "pozz.c" 12 .section .text.dummy,"ax",%progbits 13 .align 1 14 .global dummy 15 .syntax unified 16 .thumb 17 .thumb_func 18 .fpu softvfp 20 dummy: 21 @ args = 0, pretend = 0, frame = 0 22 @ frame_needed = 0, uses_anonymous_args = 0 23 @ link register save eliminated. 24 0000 7047 bx lr 26 .section .text.foo,"ax",%progbits 27 .align 1 28 .global foo 29 .syntax unified 30 .thumb 31 .thumb_func 32 .fpu softvfp 34 foo: 35 @ args = 0, pretend = 0, frame = 0 36 @ frame_needed = 0, uses_anonymous_args = 0 37 @ link register save eliminated. 38 0000 0048 ldr r0, .L3 39 0002 7047 bx lr 40 .L4: 41 .align 2 42 .L3: 43 0004 00000000 .word .LC0 45 .section.rodata.str1.1,"aMS",%progbits,1 46 .LC0: 47 0000 68656C6C .ascii "hello\000" 47 6F00 48 .ident "GCC: (15:6.3.1+svn253039-1build1) 6.3.1 20170620"
--

The string constant goes to an own section (.rodata) which is then 
located as the linker script commands (often to ROM/ Flash).
Reply to
Tauno Voipio

If I was pedantic and I want to be sure the pointer to string is valid after dummy(), what should I do? Do I declare the string static (even

*in* the function)?

void foo(void) { static const char s[] = "hello"; dummy(s); }

Again the solution should be to explicitly declare objects as static.

Reply to
pozz

It should, I think be fine to say:

void foo(void) { static const char* s = "hello"; dummy(s); }

I believe that will mean that the compiler is obliged to make the string literal available for the lifetime of "s", which is program lifetime.

But I would be confident for arm-gcc to write it as you did, as the compiler implements strings in a way that keeps them all for the lifetime of the program. (I assume you are not doing anything nasty and undefined, like trying to change the values of strings somewhere.)

And it is also possible that I have this wrong, and that the compiler /must/ keep the string literal around for the lifetime of the program. comp.lang.c would be the newsgroup to look for extra opinions here.

For the AVR, the solution is usually to use the macros and attributes needed to access strings directly from flash to avoid wasting ram space. But it's a slightly awkward target.

Reply to
David Brown

You must be talking about K&R, Nils Specifically said K&R II. That edition has printed "ANSI C" in large capitals on the cover. A quote from the preface: "This second edition of 'The C Programming Language' describes C as defined by the ANSI standard"

That standard is almost identical to C90 IIRC. And that standard is, however old, still relevant. The Keil ARM compiler for instance defaults to C90 and you have to specifically enable C99. (and nothing newer if you don't want to set it to C++ 2003)

So don't throw out your K&R II just yet! I have my copy within an arms reach. ;-)

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

In 1914, the first crossword puzzle was printed in a newspaper.  The 
creator received $4000 down ... and $3000 across.
Reply to
Stef

Exactly - it /describes/ the C language defined by the standard. It does not define the language, or the standard, or give a complete or accurate set of rules for it. That is what the /standard/ is for. K&R (all editions) are /tutorials/, not standards or language definitions. (Equally, the standards are not tutorials, and say as much in their forewords.) K&R was updated for the second edition to contain much of what is often incorrectly termed "ANSI C", but it was never expected to be exact.

K&R is considered a "classic" in terms of quality technical writing. It is not a good book for learning modern C. It is not a good book for learning embedded programming. It is not a good book to use as a reference for the details of the C language. It is better than many others, but it is badly outdated, contains some horrible advice and examples, is strongly oriented towards command-line Unix software from the 80's and 90's, which is completely inappropriate for embedded development, and C is almost always a poor choice for modern day programming of the sort of tasks targeted in the book.

It is true that many compilers default to C90. That is not an excuse for using C90 - if you are serious about C development, you should be using /your/ choice of C standard, not the default from the compiler. It is an unfortunate reality that some people still have to use compilers that don't support C99 - that is, obviously, a good reason for using C90. Other than that, /you/ should choose your version of the language - and you should choose at least C99 because it lets you write higher quality software. And if your tool vendor does not support C11 by now, complain to them.

The year is 2018. Why anyone would think that a 30 year old tutorial book is a better choice of reference than the current official standards is beyond my comprehension.

So K&R (I or II) is an interesting read, and interesting history - but I would not consider it a way to learn good embedded C programming, nor would I consider it a useful reference.

And if you want more readable references then there are plenty available online. My recommendation (for C and C++) is .

Reply to
David Brown

Mine is the C++ FQA (sic):

formatting link

The FQA isn't completely up to date; but then the nice thing about C "standards" is that there are so many to choose from.

Reply to
Tom Gardner

That is a page about C++, not C. And it is widely ridiculed for being repetitive, incorrect, outdated, wrong, exaggerated and basically misunderstanding a huge number of issues about C++. It certainly has some valid points - C++ has plenty of faults, and the improvements to the language that have come in recent standards shows there was plenty that could be improved upon. But the valid points in that "FQA" are deeply hidden amongst the drivel.

With C, there are three standards to choose from - C90, C99 and C11. (There is an upcoming C17, but it is basically a bug-fix and occasional improved wording, so there is not much to be found there except nicer typesetting.)

And there are also implementation-specific extensions - both compiler specific and target-specific.

Generally, it is not hard to know which C version you are dealing with, and there is very little code that is valid in one version that is not equally valid in later versions.

Reply to
David Brown

K&R II is only partly a tutorial, the book itself puts that label only on chapter 1. Appendix A is defenitely meant as a reference manual. But it is presented as a more readable (but less concise) text than the standard (which is "intended foremost for compiler writers" :-) ).

Indeed, if you are new to C (or programming), don't start with K&R II. But if you (must) program in C90, it is a handy reference to have at hand. Much more readable than any standard. But indeed do not expect it to provide all the details.

'Better' is not the word here. 'better read'?. If you (must) program in C90, K&R II is nice to have at hand, it can answer >90% of your questions. (and >85% for C99 ?) And mostly quicker than going on-line. So still useful in my opinion, but not a definitive and complete reference.

For the other 10-15% ;-)

--
Stef    (remove caps, dashes and .invalid from e-mail address to reply by mail) 

You've been leading a dog's life.  Stay off the furniture.
Reply to
Stef

Don't forget the commercially important /subsets/! The need for subsets is a warning flag in itself.

That statement, while correct, skims over a significant problem: unfortunately too few people understand and can /predict/ what is valid and invalid.

That's a problem with/for the tool, whether or not that is the "tools" fault or the "users" fault.

Reply to
Tom Gardner

That is the library definition of "string". The language definition of a

# In translation phase 7, a byte or code of value zero is appended to # each multibyte character sequence that results from a string literal # or literals. The multibyte character sequence is then used to # initialize an array of static storage duration and length just # sufficient to contain the sequence. [...]

(paragraph 5 in ISO 9899:1999, paragraph 6 in n1548)

Sure, because the library does not care.

No, it can not. The literal in OP's code needs to have static storage duration.

Maybe you're confusing it with

void foo(void) { char s[] = "hello"; dummy(s); }

This will be implemented exactly as you've sketched, but here the programmer explicitly said they wanted an array with automatic storage duration.

TL;DR: OP's code is safe, language-wise.

Of course, this is comp.arch.embedded, and people do strange things with their silicon and their compilers, so I wouldn't be too surprised if there were a compiler that needed special measures. But ARM has a nice flat address space, so no problem here.

Stefan

Reply to
Stefan Reuther

What need of "commercially important subsets" are you talking about?

There /are/ compilers that implement subsets of C standards. But I don't see a /need/ for them.

Of course they can. Very few people ever write code that is valid C90 code and not valid C99 or C11 code. There are /very/ few incompatibilities. About the only one of significance I can think of is that functions no longer have implicit declarations. What examples can you give of reasonable, valid C90 code that is not C99 compatible, or C99 code that is not C11 compatible? Or C99 code that is not gnu99 compatible?

Reply to
David Brown

When K&R was originally written, there was no C reference document - the nearest was a manual for a Unix C compiler. K&R II was, IIRC, published slightly before ANSI C. I suppose there are really two meanings of the word "reference" here. K&R had a C "reference" in the sense of a list of rules that you could look up things when you want some details about the language - the standards are "references" in the sense that they say how the language is defined. If K&R says one thing, and the C standards say another, then the C standards are - by definition - correct for the C language.

(I agree that the C standards are not particularly readable. I have got quite familiar with them over the years, but I still get caught out occasionally.)

Again, I agree the standards are not readable - they are more for specialists, pedants, nerds and compiler writers (I think there is a fair overlap between these groups - with myself in the middle). To be honest, I don't know what books would make a good starter or a good reference (in the readable sense) for modern C programmers. I haven't needed a book on the C language - not since I read K&R some three decades ago.

Reply to
David Brown

Yes, I know the difference between a string and a string literal - I said that in a bit you snipped.

And the library definition of a "string" /is/ the C definition of "string". The term is italicised in 7.1.1, denoting that it is defined in that paragraph.

I know that's what the OP's code needs in order to be correct - the question is if the C standards guarantee that his code does what he wants.

No, I am not.

I know what the programmer wants, and I know what the programmer wrote. I know that with the compiler the programmer is using, the code he wrote does what he wants. The question is if that behaviour is guaranteed by the C standards, or is merely the most practical implementation in most cases. I cannot say for sure, but I believe the C standards do not guarantee the desired behaviour - I believe an implementation would be allowed to give the kind of implementation I described.

Can you back that up with references to the standard?

I would be surprised to see an implementation - especially on an ARM - which did not work the way the OP wants. A quick check with gcc on the AVR (where the string data would have to be copied from flash to ram) shows it works there too. Perhaps someone would like to check it on other AVR C compilers, or more awkward processors.

Reply to
David Brown

MISRA-C, for example.

You miss my point.

I am not arguing about the /differences/ between, say C99 and C11, but C of /any/ standard.

Reply to
Tom Gardner

I can easily think of cases where the string would not be accessed from Read-Only flash on the ARM, the simplest is a system where the program was stored in a non-execute-in-place media, and was copied by some from of loader into RAM. The string would then exist in RAM, not flash, and while it may not be enforced 'Read Only' the program (unless it has performed UB) is not going to invalidate the string.

Reply to
Richard Damon

That is a set of rules for C, rather than a subset as such. (And 90% of it could be replaced by three rules - 1. Learn the language properly and don't write invalid code. 2. Write in a readable fashion. 3. Get proper tools and learn how to use them.)

Of course if a specific task, project, employer or customer has rules for how you work, you need to know them and follow them. But that is not "a subset of the language", or the fault of the language.

Not unlikely!

I still don't get it. Do you mean, say, the difference between C99 on gcc and C99 on IAR ? Or the difference between C on an ARM and C on an AVR?

Reply to
David Brown

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.