How to eliminate duplicate strings?

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View

Short status: Target is 68332, compiler and linker is Microtec C++.
Application is written mainly in C++.

We are running out of FLASH memory, and a check in the linker map
revealed, that 800 Kbyte out of almost 2 Mbyte is used for the strings
segment. Quite a lot for an embedded system with no GUI.

Further checks with the cygwin command

 > strings prom.bin |sort|uniq -c

reveals, that most of the strings are RTTI information for C++, and
many are repeated 50 or 100 times!

(Strings finds printable strings in the binary, sort and uniq is used
to sort the strings and count duplicates.)

The raw output of strings is approx. 800K as expected, and if the
duplicates are removed it is squezed to 120K!

Is there a way to eliminate the duplicate strings? Logically the
linker should be able to analyze what is entered into the strings
segment, and eliminate identical strings that are already there.

Since the object format is said to be IEEE, it may be possible to use
another linker, e.g. GNU ld, without replacing the compiler (which has
its "specialities").

Has anyone tried that?

--
mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhagen
www.manbw.com    -  Electronics & software dept.
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?

Short status: Target is 68332, compiler and linker is Microtec C++

Application is written mainly in C++

We are running out of FLASH memory, and a check in the linker ma

revealed, that 800 Kbyte out of almost 2 Mbyte is used for the string
segment. Quite a lot for an embedded system with no GUI

Further checks with the cygwin comman


 > strings prom.bin |sort|uniq -


reveals, that most of the strings are RTTI information for C++, an

many are repeated 50 or 100 times

(Strings finds printable strings in the binary, sort and uniq is use

to sort the strings and count duplicates.

The raw output of strings is approx. 800K as expected, and if th

duplicates are removed it is squezed to 120K

Is there a way to eliminate the duplicate strings? Logically th

linker should be able to analyze what is entered into the string
segment, and eliminate identical strings that are already there

Since the object format is said to be IEEE, it may be possible to us

another linker, e.g. GNU ld, without replacing the compiler (which ha
its "specialities")

Has anyone tried that


--

mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhage
www.manbw.com    -  Electronics & software dept
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?
Quoted text here. Click to load it

Uhm, maybe a stupid question, but why compile with RTTI?

Imo RTTI can be handy, but you hardly ever *really* need it. If the
application really uses RTTI, maybe a redesign is in order to eliminate the
need for it?

PeterV




Re: How to eliminate duplicate strings?

Quoted text here. Click to load it

As stated earlier in this thread, we do need RTTI. Althoug this is an
embedded system, we use templates and dynamic casts. Nobody really
wants to give an estimate of the redesign to take it out.

And yes, you can always make another program than the one you
have. But you don't get it for free.  ;-)

Ever heard of super tankers breaking apart due to engine failure
during a hurricane? ;-) We don't want that happen to our system.

In a such mission critical system, the cost of test, verification and
approval can be prohibitive.

--
mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhagen
www.manbw.com    -  Electronics & software dept.
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?
Quoted text here. Click to load it

<snip>
Quoted text here. Click to load it

You're using RTTI in a mission-critical system? Wow.

Why?

Steve
http://www.fivetrees.com



Re: How to eliminate duplicate strings?

Quoted text here. Click to load it

You mean we rely on information stored i RAM, or what? So does the
underlying RTOS.

The basic decision is to use C++, which som people argue is not
"safe". I think the compiler is far bettet to throw around pointers to
objects and structures than a human programmer. And the application
_is_ that complex. And it works. That is why we don't want to just
cook up another solution. This is not the toy business.  ;-)

--
mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhagen
www.manbw.com    -  Electronics & software dept.
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?
Quoted text here. Click to load it

For the record, C++ templates don't require the use of RTTI.

--
Michael N. Moran           (h) 770 516 7918
5009 Old Field Ct.         (c) 678 521 5460
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?
Quoted text here. Click to load it

In some cases, the compiler has an option to merge duplicate
strings, however this usually happens only within a single module.

I had a similar problem one time, in this case it was a point of sale
terminal. I was asked to make several enhancements to the existing
application that had was already completely filling the available
code space in the terminal.

I noted that there was a fair number of duplicated strings, and
that they were spread through several modules.

I wrote a program to scan all of the source files, and identify all
strings and the number of occurances of each. On a second pass,
it replaces all literal strings (ie: not already variables) occuring more
than once with character array references, and also generates
XSTRINGS.H and XSTRINGS.C which contained definitions for the
string arrays. It also accepts a file listing strings NOT to change in
case you happen to be unlucky enough to be working on a system
allowing writable strings and someone actually did that.

You could try something like that - it worked very well for me.

Regards,
Dave

--
Dunfield Development Systems          http://www.dunfield.com
Low cost software development tools for embedded systems
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?
snipped-for-privacy@use.techsupport.link.on.my.website (Dave Dunfield) writes:

Quoted text here. Click to load it

Our problem is similar, also close to maximum in the hardware platform.

But unless you run the "string fixer" on some intermediate file
produced by the compilers C++ pass, it won't do the job here. Most
strings are created in that step, not in the source.

I still think the right place is in the linker, which has all relevant
information.

The vendor, Microtec/Mentor Graphics, gave som suggestions on linker options,
but it did not change anything. Haven't heard from them for some days, but
the problem has got a number.  ;-)

A hack to make GNUs ld link Microtec's object files, and optimize the strings,
may also be a solution. The formats are close, but not identical.

--
mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhagen
www.manbw.com    -  Electronics & software dept.
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?
mdc@_manbw.dk (Mogens Dybk Christensen) writes:

Quoted text here. Click to load it

Just for your info, Microtec support came up with the same "solution":
Edit the intermediate assembler files in 325 compilations to add
MERGE_START/MERGE_END where appropriate. No definition of appropriate.

:-(

--
mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhagen
www.manbw.com    -  Electronics & software dept.
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?

Quoted text here. Click to load it
[...]
Quoted text here. Click to load it
   I find it quite surprising that most compilers for embedded
programming don't seem to have an automatic optimization mode for this.
My in-house developed Pascal/Modula2 compiler does it as one of the
first steps in its optimization routines. The final assembler file can
look like this snippet:
 
;
;; String references
;
STR2:
STR4:
STR18:
STR22:
STR0: .DB "Saving... ",0
STR3:
STR5:
STR19:
STR23:
STR1: .DB "OK",0
STR6: .DB "PIN=",0
STR7: .DB "ID=",0
STR8: .DB "I=",0
STR9: .DB " sec",0
STR11:
STR10: .DB "DL",0
STR13:
STR12: .DB "OL",0
STR15:
STR21:
STR14: .DB ", ",0
STR16: .DB "Calibrating",0
STR17: .DB "  ",0
STR26:
STR20: .DB "A/D not calibrated!",0
STR24: .DB "No program saved!",0
STR25: .DB "Terminal module",0
STR27: .DB " <- Illegal input!",0
STR28: .DB "AT",0
STR29: .DB "AT+CPIN=",0
STR30: .DB "AT+CMGS=",0
STR31: .DB "*** Alarm condition restored ***",0

--
http://www.flexusergroup.com /

Re: How to eliminate duplicate strings?

Quoted text here. Click to load it

Oh but they do!  The one at hand just failed to use it on the RTTI
string tables --- and the workaround they proposed was to turn it on
for those, too, by massaging the intermediate asm source a bit.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: How to eliminate duplicate strings?

Quoted text here. Click to load it

Well, doesn't that almost force the solution: turn off RTTI --- you
almost certainly won't be needing that in an embedded system, anyway.

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: How to eliminate duplicate strings?

Quoted text here. Click to load it

Unfortunately, that would require some redesign. The exact amount is not
known just now, but we have reasons that it was not turned off.

If we could eliminate the duplicates, we would be up and running without
touching the source code!

--
mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhagen
www.manbw.com    -  Electronics & software dept.
We've slightly trimmed the long signature. Click to see the full one.
Re: How to eliminate duplicate strings?

Quoted text here. Click to load it



Careful with the assessment that everything found by 'strings' is
actually a string.  Code can look like text, to the 'strings' utility,
especially if you feed it a flat binary core image instead of a
structured object file format.

Looking at 'size -A' of individual .obj files or the debuggable object
file might be a better test, here.

Quoted text here. Click to load it

And for actual strings, it's probably doing that already.  But I'm far
from certain that such compression can be done on RTTI tables without
breaking them.  If they could, wouldn't the compiler/linker vendor
have done it already?

--
Hans-Bernhard Broeker ( snipped-for-privacy@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

Re: How to eliminate duplicate strings?

Quoted text here. Click to load it

Hi Hans-Bernhard

Thanks for your interest in the problem.

I am aware of the false strings in the output. They may account for
some %, but inspection of the output from strings reveals lots of real
strings, which are duplicated.

We actualle reverse-converted the S19 file that was produced by the
build process, and ran strings on that file. This should eliminate all
debug information etc. The size of that output is very close to what
the linker map says about the strings segment, so I think we are
looking at the real thing.

Microtec claims to use IEEE format, and GNU m68k-elf-objdump can read
their .obj files. It shows, that there is a binary RTTI segment (which
I cannot interpret), but the type strings are in the string
segment. So probably the RTTI segment is a set of pointers into the
strings segment.

Thus it should not change anything to the running code, if the address
of one string is replaced by the address of another identical string
(and the first string removed from the binary image). But the linker
does _not_ do that at the moment.

Unfortunately, the Microtec dialect of IEEE seems incompatible with
GNU m68k-elf-ld, which gave an assert when I tried.

- We are now in contact with Microtec support, but no solution till
now.

--
mdc at manbw dk  -  MAN B&W Diesel A/S, Copenhagen
www.manbw.com    -  Electronics & software dept.
We've slightly trimmed the long signature. Click to see the full one.

Site Timeline