GTK的中文显示问题

时间：2007-03-24 来源：onlinesilence

这是官方的常见问题列表中给予的解答
(http://developer.gnome.org/doc/API/2.0/gtk/gtk-question-index.html)

主要讲解了使用以下两个函数将字符编码转换为UTF-8的编码格式
g_locale_to_utf8()
g_convert

1.8.
How do I use non-ASCII characters in GTK+ programs ?
GTK+ uses Unicode (more exactly UTF-8) for all text. UTF-8 encodes each Unicode codepoint as a sequence of one to six bytes and has a number of nice properties which make it a good choice for working with Unicode text in C programs:

ASCII characters are encoded by their familiar ASCII codepoints.

ASCII characters never appear as part of any other character.

The zero byte doesn't occur as part of a character, so that UTF-8 strings can be manipulated with the usual C library functions for handling zero-terminated strings.

More information about Unicode and UTF-8 can be found in the UTF-8 and Unicode FAQ for Unix/Linux. GLib provides functions for converting strings between UTF-8 and other encodings, see g_locale_to_utf8() and g_convert().

Text coming from external sources (e.g. files or user input), has to be converted to UTF-8 before being handed over to GTK+. The following example writes the content of a IS0-8859-1 encoded text file to stdout:
gchar *text, *utf8_text;
gsize length;
GError *error = NULL;

if (g_file_get_contents (filename, &text, &length, NULL))
{
utf8_text = g_convert (text, length, "UTF-8", "ISO-8859-1",
NULL, NULL, &error);
if (error != NULL)
{
fprintf ("Couldn't convert file %s to UTF-8\n", filename);
g_error_free (error);
}
else
g_print (utf8_text);
}
else
fprintf (stderr, "Unable to read file %s\n", filename);

For string literals in the source code, there are several alternatives for handling non-ASCII content: direct UTF-8
If your editor and compiler are capable of handling UTF-8 encoded sources, it is very convenient to simply use UTF-8 for string literals, since it allows you to edit the strings in "wysiwyg". Note that choosing this option may reduce the portability of your code.
escaped UTF-8
Even if your toolchain can't handle UTF-8 directly, you can still encode string literals in UTF-8 by using octal or hexadecimal escapes like \212 or \xa8 to encode each byte. This is portable, but modifying the escaped strings is not very convenient. Be careful when mixing hexadecimal escapes with ordinary text; "\xa8abcd" is a string of length 1 !
runtime conversion
If the string literals can be represented in an encoding which your toolchain can handle (e.g. IS0-8859-1), you can write your source files in that encoding and use g_convert() to convert the strings to UTF-8 at runtime. Note that this has some runtime overhead, so you may want to move the conversion out of inner loops.

Here is an example showing the three approaches using the copyright sign © which has Unicode and ISO-8859-1 codepoint 169 and is represented in UTF-8 by the two bytes 194, 169:
g_print ("direct UTF-8: ©");
g_print ("escaped UTF-8: \302\251");
text = g_convert ("runtime conversion: ©", -1, "ISO-8859-1", "UTF-8", NULL, NULL, NULL);
g_print(text);
g_free (text);