Language Selcection Philosophy

As many of you are probably aware by now, I'm designing a user interface to a industrial boiler exhaust gas analyser.

There is a requirement for different languages to be supported by this user interface due to the rest of the world's unreasonable demands to speak their own language. My suggestion was to go and sort out Johnny foreigner like the good old days of the Her Majesty's Empire but management have decided that it would be easier to make the software multi-lingual. Besides Mr Bush is out there educating the natives for us :-p

I'm wondering which method the collective would suggest to allow this facility. My ideas are:

1) Multiple versions of software with all text and graphics in one tongue and changing language requires reprogramming the system. The language wouldn't change very often so this isn't too inconvenient, plus we could charge for a language pack. However, functional changes would have to be replicated across all versions.

2) The software written with #ifdefs everywhere so I choose the language at compile time with a #define which makes the code horrifically messy but keeps it down to one version (of which different builds can be supplied as different language packs)

3) A variable set within the software at run-time which is checked before displaying anything language specific. This will put case statements all over the code whenever I have to display text but is cleaner than #ifdefs and allows on-line change of language - an important factor for our service engineers who don't speak Swahili or Newcastle.

4) Several constant tables containing all the text the system will ever display in all the supported languages which is referred to by a pointer, which can be changed to point to another language table if required. This will remove the decision making from each text display and speed things up but will reduce the readability of the code.

5) Something I haven't thought of yet.

Another complication is that the manual must be stored and it is currently stored as 70-odd RLE bitmaps and there is not enough flash to store two versions at the same time. I could perhaps write a basic html viewer to reduce space but that then gives similar language choice decisions to as before.

Reply to
Tom Lucas
Loading thread data ...

to

user

their

the

There is more to it than jus changing the text. Often there are different number and date formats. Sometimes these are different even when the language is the same. Numbers in Germany use a comma for the decimal symbol even in english. Numbers in Switzerland use dot for the decimal symbol even in german. Similar problems with french in france and canada...

You only have one version of source with text selected by the language build.

at

You never need to have the #ifdefs littering the main code. You can define all strings as macros, for example, and only use the macro names in the body of the code. The code is still readable because the macro names are descriptive, but the actual text is defined in one place.

service

Again, you can use a macro for the strings, but now the macro can do some dynamic table look-up. So the way the language selection is performed is hidden from them that shouldn't know or don't care.

Basically, it's all the same. Most of the implementation does not affect the body of the code. The only rule is not to use explicit litteral strings in the code.

Well, now you are making life hard for yourself. Paper is quite cheap and a lot less hassle.

Peter

Reply to
Peter Dickerson

I would rather send in the gunboats too, but faced with the same problem in the past, my approach (for any nontrivial amount of text) has been essentially this:

I use a long concatenation of ASCIIZ strings with a double-NULL terminator; ie:

const char stringtable[] = "string1\0"\ "string2\0"\ "string3\0"\ "\0\0";

All strings are identified by an integer ID, which is simply the relative position of the desired string in this structure.

I replace all literals with a call to a function const char

*GetString(int id) that returns a pointer to the start of the requested string ID. It starts at the beginning of the requested language and seeks down counting 0s.

In some cases, where the language data is in serial EEPROM, GetString works a little differently; it loads the data out of EEPROM into a RAM buffer.

Reply to
larwe

A assume that you are not fluent in all the languages you are going to support, so most likely you would need to use different translators for different languages. How would the translations produced by different people end between the #ifdefs, there is a real risk that the texts end up in the wrong place, if the person doing this merging does not understand the language.

This would at least simplify the translation job, which most likely must be divided among several persons fluent in different languages.

If the text tables are in a separate loadable file, there is the possibility that the load may fail due to some problem (file missing etc.), so it is a good idea to have a default language text strings statically linked into the program image.

If a message contains more than one changing value that needs to be displayed, you should not use sprintf style formatting strings, since in order to make a sentence readable, you may have to alter the order in which the variable values are displayed in different languages.

As other have pointed out, different languages and even countries have different conventions for displaying numbers, dates etc.

There might also be some rendering problems in languages such as Arabic, with right to left writing direction for text (but the most significant digit of a number to the left), different outlines if the character is in the beginning, in the middle, at the end of a word or as a separate character etc.

Paul

Reply to
Paul Keinanen

How 'bout a set of include files that contain all the language-specific stuff, one for each language?

Best regards, Spehro Pefhany

--
"it's the network..."                          "The Journey is the reward"
speff@interlog.com             Info for manufacturers: http://www.trexon.com
 Click to see the full signature
Reply to
Spehro Pefhany

Many thanks Peter, Lewin, Paul and Spehro.

Is it just Americans that have their dates the wrong way round or do other countries do it as well? I miay just enure all dates are longhand to avoid the problem. As for the the decimal point, I have two approaches - 1) I'm sure the Germans can handle using a dot instead of a comma and are probably forced to by many other systems or 2) This can be selected at run-time and I can modify my DrawDecimal() function to reference a variable before is draws a comma or a dot.

That is certainly attractive from a configuration control perspective. QA would get all sweaty about five different versions of the same software with subtle but individual bugs creeping into each one.

Now, the macros are an extremely good idea and mean I can keep all translations in the same place and my code is always English to read. It will work with selecting images with text on them too.

One of the drivers behind the new interface is the provision of an on-line manual so that has to be there. Context sensitive help will cut down on time spent flicking through the manual as well. Plus I can use the money budgetted for printing to put some margin into the user interface.

Reply to
Tom Lucas

I think it is only America that use the illogical month/day/year approach. Europeans (AFAIK) all use the more natural day/month/year, unless they are using ISO formats which are year/month/day ordering, which is more logical if you are thinking of the date as a single big number, or want to sort the dates easily. I believe the Japanese using this ordering normally. And then there are the joys of different standards for separators, number of digits, leading zeros, etc. Then there is the Hebrew calendar, and I'm sure other countries prefer different systems. Longhand dates are an extra effort to read, and look unnatural if you get the ordering wrong. In practice, 03/05/2006 will be understandable to anyone outside the USA, but it might not fit with local preferences.

Germans (and about half the rest of Europe) will consider 53.2 as "fifty three point two", but might read 53.221 as "fifty three thousand, two hundred and twenty one".

If you really want to nitpick, older Norwegians prefer dots around their minus sign (so that it looks like a division sign to the rest of us). You can play this game endlessly!

Remember different languages require different characters - even Latin alphabet texts often have accented characters.

You can get around a lot of it if you use printf variants for your display, and let the format strings also be language-dependant. Obviously you need a lot of care to get things right, especially the lengths of strings on the display. And get native speakers to do the translations - foreign distributors can be useful here. Just don't trust them entirely - if you tell them your display is 20 characters wide, they'll include 21-character texts.

Finally, be especially careful for right-to-left languages. Even if you decide that you'll just use English for Arabic speaking countries, for example, you still have to remember that they will view your system from right to left.

formatting link

It's possible that gettext (

formatting link
) will be of help to you, but I believe it's GPL'ed which will probably be a problem for an embedded system.

Reply to
David Brown

I agree. We have a client who has supported 17 languages at the last count. Their comments suggest that there will be several translators. The best approach seems to be to write a simple tool that, for each language, constructs an index to each message in the text. Use this tool to create a structure for each language, and just copy the text image into memory. If you want to support non-Latin languages, be prepared to cope with UTF-8 as well.

You can add formatting macros for date, time, order and so on as you go along.

Once you start with internationalisation, you'll soon find that the text is the least of your problems.

Stephen

--
Stephen Pelc, stephenXXX@mpeforth.com
MicroProcessor Engineering Ltd - More Real, Less Time
 Click to see the full signature
Reply to
Stephen Pelc

Yikes!

Loadable language structures could be a goer because I have the facility to read from an MMC card - would definitely need the default though. There may well be non-latin languages but I believe my GUI (emWin) will handle UTF-8 shenanigans.

I think I'm going to produce a lot more icons and diagrams and cut down on the text:-D

We British invented time and dates and the rest of the bally world should do as they're told! Fortunately, a lot of the time and dates concerned with the system will be as timestamps on data for PC download so the PC can handle all the conversions.

Reply to
Tom Lucas

I've had to deal with these issues often enough that I write all my code in a particular way to support it. Most of my apps don't have room to store all the languages at runtime, so we build locale-specific binaries. But the principle remains good even if we did support selection at runtime.

Basically, I make a distinction between strings that are part of the user interface (human-readable) and those that are not (e.g. comms protocols). All the human-readable strings are defined in one .c file (typically strings.c) and are accessed in code as externs (the only globals I allow!). This system has served us well; we've shipped the strings.c file to (non-technical) translators and received foreign versions back. A quick check and cleanup, and off we go...

However: as others have said, translating strings is not all there is to it. If there are user prompts or keystrokes (e.g. y/n), these need dealing with too. Certain things (e.g. dates, as others have mentioned) need dealing with differently.

And then there's Unicode... ;).

Steve

formatting link

Reply to
Steve at fivetrees

I think that is compatible with Peter's macro suggestion only you replace Macros with global variables. I like the idea of shipping off the strings file for translation, particularly from a non-disclosure point of view because all the stings will be user viewable anyway but there is no need to reveal code.

I have a number of bitmap push buttons with text on them that would need to be changed as well. Could all be controlled from macros in the language header though.

Reply to
Tom Lucas

Right. And we invented the Internet!

I would format all dates like 2006-05-03 17:12:22.

If anyone whinges, tell them it is the International Standard:

--

John Devereux
Reply to
John Devereux

... snip ...

I have been doing so for about 20 years or more. Even in the US, I have no complaints, and the result is universally understood. Well, my wife used to tell me I was a trouble-maker.

--
"If you want to post a followup via groups.google.com, don't use
 the broken "Reply" link at the bottom of the article.  Click on 
 Click to see the full signature
Reply to
CBFalconer

The date format follows the spoken form. In English we say "December

20th, 2005" and we encode it as 12/20/2005" In Spanish, (my native language,) is "20 de Diciembre, 2005" and therefore 20/12/2005. I believe in Japanese the year goes first, also reflecting the way dates are spoken.

In any case, I stick with the format that John Devereux mentions in another post,

(that's May 3rd, not March 5th) because, among other reasons, this makes date and alphabetic sorts consistent which each other.

With respect to the original question: I would use a message file per supported language plus a mechanism to refer to each message using a unique ID. Because (mentioned already): (1) No "Set of source files per language" configuration control / QA nightmare. (2) Easy to outsource translation to other people without distributing source code. (3) Allows per site/customer customization, if necessary. For example, replace an error message such as "Extremely high levels of in exhaust" to "Put on your gas mask, leave the building and when you get out call the plant manager at

123-456-7890"
Reply to
Roberto Waltman

That is NOT correct.... it should be: "In American..."

In English we encode it 20/12/2005 It is the Americans who di it back to front.

As in English.

The alternate is dd-mmm-yyyy where the mmm is a three letter group.

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills  Staffs  England     /\/\/\/\/
 Click to see the full signature
Reply to
Chris Hills

Just a suggestion, a la PC BIOS extensions: think of your language (pack(s)) as an extension, and with a separate build, which, ideally, conceptually contains const strings only, but then it may contain even code (e.g. for date rendering). Arrange this stuff to be accessible, along with a signature and a CRC, through a single struct. Link this binary so that this root struct is at a fixed address. Make your main code look for a language extension at that fixed address; if nothing is found, your code falls back to a default, perhaps, mini, language, accessible via a similar struct but at a linked-in address. This way, you are free to replace your language at will without rebuilding your core image. Regards, Ark

Reply to
Ark

That's the beauty of standards - you can always find one to fit what you're trying to do ;-)

Reply to
Tom Lucas

I would have one program that picks up a single file that has the all the messages stored in it. All the messages can be numbered. So that message number 1 is the same what ever the language. Thus the program is universal and no need for multiple defines.

Given your system (and the external flash space you have) I am sure you could have several of these files loaded at any one time. Say three. Therefore you load the primary language to slot 1, secondary to slot 2 etc.

This means you could have a system that will let you over write language files with any other language file (in maintenance or set up mode). So customers could start with English as always installed and load any two other languages of choice. Of course you could build it with a different language in slot 1

You set a variable to say use language file 1, 2 or 3. It would only have to do this once at power up. You could do this by having message 0 as the name of the language. Thus on power up you read message 0 from each of the installed language files and put it on screen.

The user then chooses the language and this is locked in by setting one variable.

As message 25 is ALWAYS "Actuator 1 blocked" you just load message 25 and the function that gets the message only has to know if it is slot

1,2 or 3 and that is a single variable in the one function that actually gets the message.

Alternative is to do it the old fashioned way of get Her Majesty's Royal Navy, muster the Marines and go and invade to teach all the foreigners God's Own English. More colourful and fun but also more risky as Johnny Foreigner has rather unfairly stopped using spears and now uses the modern military weapons we sold them complete with instructional films from Hollywood :-)

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills  Staffs  England     /\/\/\/\/
 Click to see the full signature
Reply to
Chris Hills

This is an interesting idea but may get tricky when using message references instead of text but this could be overcome with macros to define the message numbers I guess. It all depends on how much of a PITA it is to change the software to change the language. It may get done so rarely that having multiple builds is viable.

Fortunately we included secret off buttons that can be activated by remote control. Didn't we?

Reply to
Tom Lucas

In article , Tom Lucas writes

You number al the messages in a list. then instead of "print(message text;" you print(number);

The print function gets the message text. IE get_text(slot, number); the slot is set depending if it is language 1, 2 or 3

that is the easy bit.

In the method described it is very easy. and possible th change language on the fly or add a new language. Also it means that if you have go to to unicode a non standard language the program stays the same and the messages inthe program are always handled the same way.

It is just the "print message" function that has to change if you want to change to unicode or something else.

--
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\/\/\/\/\ Chris Hills  Staffs  England     /\/\/\/\/
 Click to see the full signature
Reply to
Chris Hills

ElectronDepot website is not affiliated with any of the manufacturers or service providers discussed here. All logos and trade names are the property of their respective owners.