GB18030 -> Unicode Codepoint Conversion

Do you have a question? Post it now! No Registration Necessary

Translate This Thread From English to

Threaded View
Hello,

Anyone have any experience of doing this?  I gather that a lookup table is
needed since there is no neat way of converting a GB18030 codepoint (which
can be 1/2/4 bytes in length) to a Unicode codepoint?  We have been supplied
with the glyphs to cover GB18030 requirements, but ordered as per Unicode.
I don't believe reordering the glyphs (to GB18030) is an option because we
need to support UTF16 reception for some markets.

Regards,
Richard.



Re: GB18030 -> Unicode Codepoint Conversion
Quoted text here. Click to load it

In the past I've used the codepage support in Windows .NET to produce
translation tables from various multibyte character sets to/from Unicode.

Peter



Re: GB18030 -> Unicode Codepoint Conversion
Quoted text here. Click to load it

I have absolutely no idea how to do that!  Any hints?

I'll look into it tomorrow, though.

Cheers,
Richard.



Re: GB18030 -> Unicode Codepoint Conversion
Quoted text here. Click to load it

Try this sort of thing...

Encoding enc_cn = Encoding.GetEncoding(936);
byte[] bytes = enc_cn.GetBytes("A".ToCharArray());

more useful in your case would be to feed in individual characters (which
are Unicode) and get out an array of bytes that are the codepage 936
equivalent. I think this is GB18030 or at least extended GB2312-80 (GBK).

see also http://www.microsoft.com/globaldev/reference/dbcs/936.mspx

OK, and now I have to wash my mouth out...

Peter



Site Timeline