Language Selcection Philosophy

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Thu, May 4, 2006 10:28 PM

Some years ago I developed a systems. The primary operation was to scan the source text and convert all strings. One of the inputs was a file list of sources to be scanned, so the actual source could be represented by a small integer, and the string proper by another. The primary input filter then converted things such as:

puts("Blah blah"); into puts(_(n,m)/*"Blah blah"*/);

where n described the file, m the actual string. The extracted strings were collected in an auxiliary file. Lets call the original (valid c source) file a .src, then a modified .c file and an auxiliary .d file would be generated. The c file remains readable.

There are limitations, such as arrays of strings (which I didn't bother with).

At any rate, at final linking a suitably indexed file is generated from the .d files, and a module included that defines the _(int,int) function, which is of type char*. At this point the object code is, to most purposes, language independent. There is an organization problem, to do with removing duplicate strings, and so forth remaining, besides the action of the _() function.

In my case I knew that no more than about 5 of these strings would be active at once, and had a maximum string length, so I made _() select the next in a circular linked list of strings, and read in the appropriate string from the external file.

One of the advantages is that, once done, (and largely automated) the indexed file can be independently edited and translated. However the intent of the code remains clear in the annotated .c file.

The filelist file is crucial to tying the various indices to actual files.

At any rate, it all worked very nicely. Removing the text from the code reduced the object code size, which was the prime motivator for the system. There was a slowdown due to the sequence of operations needed to translate the indices into an actual file read.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: 
Also see

- H
- Hans-Bernhard Broeker
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, May 5, 2006 8:28 AM

Erm... I don't think I see why you would be bothered by what language the users want to speak. Well, not as long as you're not planning on making your life eternally miserably by adding a microphone and speech recognition to your interface, that is ;->

Isn't it fascinating that they never care to mention for _whom_ it's supposed to be easier?

The part of the collective that has ties to platforms with seriously thick operating systems in place knows the answer: the gettext() library. Get your hands on the GNU implementation of it, and read its manual. In all likelihood you won't actually use it, because it's overkill for small systems, but it'll at least show you just how complex this business actually is.

Not necessarily. This is the general approach of GNU gettext, with the additional feature that the string tables are actually stored in separate files, not inside the program. This would obviously have to be reconsidered for small embedded systems that don't use file systems.

The basic idea is that each string literal is replaced by a call to a macro named '_', i.e. you would write:

puts(_("This is some message"));

A utility extracts all these from the source and builds a "translation input file". If necessary, this can have comments to guide the translators. Eventually, each translator translates all these messages, and the resulting translation table is compiled into a "message catalog" file. In the actual code, the _() macro invocations are expanded to call the gettext() function that gets the strings from that table.

--
Hans-Bernhard Broeker (broeker@physik.rwth-aachen.de)
Even if all the snow were burnt, ashes would remain.

- P
- Paul Keinanen
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, May 5, 2006 11:21 AM

How are these numbers n and m generated ?

Suppose that during product life times source files are added and removed, messages added and removed etc. How much work would be needed for the maintenance ?

How about simply picking up the strings in puts and printf etc. and give the list to various translators and ask them to write the translation under the default language message.

Deliver the default/translated message pairs to the system at language selection. Modify the puts/printf etc. function so that the string parameters are first compared to the default language text string and when the match is found, pick the translated message from the next line and display it. If this is a new message and no translation is available, display the default language message, which may help in solving the problem and at least alert that a new translation is required.

To speed up the process, calculate a hash value for each default language string and for all those hash values without a duplicate hash code, the default language text can be discarded. At each call to puts/printf, first calculate the hash code for the string and find the hash code from the table (if duplicate hash codes occur, some string matching would also be needed) and display the translated text.

This would not require any changing to the source and it would not be fatal if all program changes and _all_ translations would not be done synchronously.

Paul

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, May 5, 2006 2:04 PM

The n came from the line number in the list of sources file. The m from the act of counting strings during the actual conversion of .src to .c.

Essentially none. If files are added/removed alter the filelist file. If sources are altered rerun the scanning mechanism. If there are other languages to handle, then that translation needs to be done again, but that is inevitable.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: 
Also see

- C
- CBFalconer
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Fri, May 5, 2006 4:20 PM

... snip ...

Sounds very like the system I described elsethread, and that I independently developed about 1998 for a PPOE. Great minds think alike :-). I even used the same _(), but it was not a macro.

In my case the motivation was different. The need was to reduce the loaded program size by banishing the strings to a separate file, and the easy language translation was a side-effect. However the operation was probably similar. Write the sources, create a master filelist control file, and run the suite. Then compile normally. It could all be wrapped up in the makefile.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 "show options" at the top of the article, then click on the 
 "Reply" at the bottom of the article headers." - Keith Thompson
More details at: 
Also see

- N
- Not Really Me
  
  Contact options for registered users
Vote on answer
posted
17 years ago

Mon, May 8, 2006 2:57 PM

I'm rather late to this party, but there are companies that specialize in internationalization of products, providing all of the translations you need and help with implementation. It has been a long time since I looked, but even 7 years ago there was a wealth of white papers on their sites with lots of useful suggestions and getting started info.

Scott