有关 Mandrake Linux 的中文支持问题

ShiyuTang · 发表于 2003-2-8 22:50:23

从目前的情况看，Mandrake Linux 9.1 支持 GB18030 的可能性应该是比较小了。除非 MandrakeSoft 的资金情况突然好转，或者有相当好的中文开发者加盟，或者XFree86 的新版支持 GB18030。到目前为止，还没有任何一个自由的（Copyleft） GB18030 字体，很显然，Mandrake Linux 的标准版是只有自由软件的。还有，对 GB18030 的支持在技术上也是一个难题。

大家可以参考苏哲先生的文章：《在 XFree86 窗口系统中实现对 GB18030 的支持》：
http://www-900.ibm.com/developerWor...rt1/index.shtml
http://www-900.ibm.com/developerWor...rt2/index.shtml

还可以研究一下 TurboLinux 的 XFree86 对 GB18030 的补丁如何用在 Mandrake Linux 上。地址是：
http://pkgcvs.turbolinux.co.jp/cgi-....0/?hideattic=0

后面的这篇文章是原来在Mandrake i18n （国际化）开发组中关于中文字符集的讨论情况。现在主要开发中文的 Danny Zeng 有好长时间没有开发了，Mandrake 的中文支持堪忧。
From: Pablo Saratxaga
Subject: Re: [i18n] Chinese encoding questions
Date: Tue, 27 Aug 2002 13:12:41 -0700 (PDT)

--------------------------------------------------------------------------------

Kaixo!
Li Wed, 21 Aug 2002 16:48:25 +0800,
"Danny Zeng" <Danny.Zeng@synopsys.com> scrijheut:

"Z> Can someone explain the differences of all the encoding used in Chinese
"Z> locale(s)? And what is the status of their support in Mandrake?

"Z> There is already translation teams for Simplified Chinese
"Z> (gb2312 encoding) and Traditional Chinese (Big5 encoding).

No.
There are translation teams for *simplified* and *traditional* chinese.
gb2312 and Big5 jsut happen to be the most used encodings, but it could
be UTF-8 or EUC-TW or whatever known by iconv.
gettext() is able to do the conversion.
Note that both KDE3 and Gnome2 convert to UTF-8 to use internally.

"Z> It will become too complicated, if we need to translate for all encoding
"Z> supported separately, like gb18030 and GBK for Simplified,
"Z> and new UTF-8 for Traditional Chinese.

No, it is not needed.
In fact gettext() is even able to transcode between simplified and traditional
chinese, eg if you have your locale set to LC_CTYPE=zh_CN.GB2312 and
your LANGUAGE=zh_TW then you will see the traditional chinese translation
in simplified chars...
But I've been told it wasn't always desirable, as there are differnces
in terminology that make those transcoded translations look unnatural.

Also, I'm not sure it still works now that a single encoding (UTF-8) starts
to be used internally almost everywhere.

So, the reason there is "zh_CN" and "zh_TW" po files is because despite
they are both "Chinese", there are differences in writting that justify it;
similarly, there is also "pt" and "pt_BR" as Brazilian portuguese has
some differences in terminology, in particular in computer field (and
computer related words appear a lot in *.po files

)

"Z> I understand that GB18030 is a superset of gb2312 and the current
"Z> standard supported by most OS's, and it obsolete GBK.

GB18030 is a norm edited by the Chinese (of PRC) government, and it
is a bijection with unicode (that is, all defined gb18030 codes
have a correspondant unicode code, and vice versa).
It is unicode reorderd insuch a way that all gb2312 codes just happen
to be in the same place in gb2312 and gb18030.
GB18030 is sometimes called the "Chinese UTF-8", as the same way UTF-8
is a way to encode unicode to remain ascii compatible, GB18030 is a way
to encode unicode to remain GB2312.

Now, with current (eg Gnome2, KDE3) and future programs the encoding
sued by the locale will be mostly irrelevant, as utf-8 will be used
internally and it will be perfectly transparent to the user.
Non-UTF-8 locales will be kept only for compatibility, to make the transition
easy.

Currently, there still are several programs that rely on XFree86 locales
mechanism for font choosing (instead of using UTF-8 internally and using
Xft for fonts), and the way XFree86 X11 fonts handling is done makes
it hard, to say the least, to switch to UTF-8 in all cases, as X11 fonts
are supposed to be complete; wich for a locale covering unicode range
(be it an UTF-8 locale or a GB18030 locale) is almost impossible with
an acceptable quality.

"Z> Is GB18030 also a superset of Big5?

In a sense yes, as it includes all characters defined in Big5 too.
But not at the same place; so probably users of Big5 will prefer to go
directly to UTF-8 instead of passing trough gb18030.

"Z> Can I change to GB18030 and still be able to edit my old files
"Z> that use GB2312?

Yes. As all GB2312 codes are at the same place in gb18030.
GB18030 hs been designed to allow just that.
If you use a tool to handle *.po files it will probably also call iconv
to convert to utf-8 internally, so it will completly irrelevant what to
put in the charset= line, as long as it is valid (and includes the characters
you want to type, of course)

On the other hand, programs and tools that don't yet use Xft for font
handling will have a hard time to be able to display it, as X11 still
mostly knows only about *-gb2312.1980 for X11 fonts.

"Z> On the other hand, files in GB18030 encoding may not work on a machine
"Z> support only gb2312, I guess.

If the machine has a gettext implementation using iconv() and has
a mapping table from GB18030 -> GB2312 (which is the case of all modern
GNU/Linux systems) there will be no problem.
In case the "GB18030" encoding will be unknonw on a given lachin, it will
probably be displayed "as is"; so, if the encoding is "GB18030" and
the locale is "GB2312" and it is displayed "as is", it will display nicely
for all chars present in gb2312, only those not in gb2312 will display
as junk.

"Z> Many users like me need to read both traditional Chinese and simplified.
"Z> It looks both GB18030 and UTF-8 promised to handle all the Chinese
"Z> characters, and Japanese and Korean. How do they really work?

Currently it depends of the applications.
For old applications, using X11 fonts, it will probably not work very well;
probably UTF-8 will work better than GB18030.

With newer programs (Gnome2, KDE3, yudit,...) it works perfectly (as long
as you have a font with the needed chars.
I ensured than in upcoming mdk the pseudo-font aliases "sans", "serif" and
"mono" will include the traditional chinese, simplified chinese and
korean chars (for japanese I couldn't, due to the fonts we ship not being
recognized by Xft; but if MS Mincho or Code2000 is installed it will work too)

"Z> Can I input and display all the supported characters in any application?
"Z> Or, how easy is it to add support for this feature in all
"Z> the applications?

Input is however a problem.
Currently the X11 input framework doesn't allow easily switching your input
method (it is your program that has to handle it, and msot don't; I only
knew of "yudit" with that possibility in fact).
Maybe it would be possible with Gtk2 programs ? By changing to another,
gtk-specific input method, changing your XIM, then switching back to the
X input method in the Gtk input method menu choosing...

"Chinput" and "xcin" both supports traditional and simplified, but I don't
think you can switch from one to the other at runtime. (If I'm wrong on this
please tell me)

"Z> Anyone out there is running Chinese desktop using UTF-8, how does it work?

It works better than Greek or Russian in UTF-8 in fact

The main problem is the fonts for old programs not using Xft yet.
And for CJK languages XFree86 just uses legacy encoded fonts (eg the
fonts *-KSC5601.1987-0, *-JISX0208.1983-0, *-GB2312.1980-0) to display
CJK chars.

For modern programs (Gnome2 , KDE3) it works perfectly (but you are limited
to one XIM input method currently)

"Z> Regards,
"Z> Danny

--
Ki 鏰 vos v鍄e b閚,
Pablo Saratxaga

Le galline terrorizzate fanno le uova alla choc?
-- Da it.hobby.umorismo

faint · 发表于 2003-2-9 01:25:09

呵呵，好东西。

ShiyuTang · 发表于 2003-2-9 14:59:00

原来帖子中的地址好像有些问题，我再贴一遍。
《在 XFree86 窗口系统中实现对 GB18030 的支持》：
http://www-900.ibm.com/developer ... 6/part1/index.shtml
http://www-900.ibm.com/developer ... 6/part2/index.shtml
TurboLinux 的 XFree86 对 GB18030 的补丁：
http://pkgcvs.turbolinux.co.jp/c ... /4.2.0/?hideattic=0

ShiyuTang · 发表于 2003-2-9 15:06:20

我个人认为，与其去支持复杂的 GB18030 标准，还不如去直接支持 UTF-8 国际标准，毕竟 UTF-8 是以后主流的发展方向。而且，我觉得 gbk 和 GB2312 就已经够用了，大家又不是中文方面的专家，要那么多古代汉语中的字和罕见的字干什么呢？

tianhm · 发表于 2003-2-11 01:00:34

企业要用，字数一定要多的。不能只想着自己吗。尤其你还是斑竹呢。

faint · 发表于 2003-2-11 20:45:12

UTF-8 是大势所趋，但本地化也很重要。

fundawang · 发表于 2003-2-12 00:49:53

最初由 ShiyuTang 发布
我个人认为，与其去支持复杂的 GB18030 标准，还不如去直接支持 UTF-8 国际标准，毕竟 UTF-8 是以后主流的发展方向。而且，我觉得 gbk 和 GB2312 就已经够用了，大家又不是中文方面的专家，要那么多古代汉语中的字和罕见的字干什么呢？

你错了。GB18030不是一个摆在那里没有用的东西。它是中国政府对中文软件，特别是操作系统的准入通行证。任何软件如果想要获得信息产业部的批文，必须支持GB18030。这是强制的，没有任何商量的余地。别以为RH傻，向中易买了字体，它也是被逼不得已的，否则就不能在中国大陆销售。
而Mandrake不同。他在中国没有分销商，所以不受到约束。;)

GB18030好像和目前Unicode的最新版本是兼容的，是四个字节编码的。GB18030除了简繁通用字、片假名平假名等符号、古代汉语用字外，最重要的是支持中国的少数民族语言，如彝族语什么的。这个标准好像在UTF-8里面无法实现，UTF-16好像也没有提到。我们不能够剥夺世界上任何一种语言的用户使用计算机的权利，所以在这一点上，我们应该支持政府。GB18030甚至还包括了西欧、泰语、越南等几乎所有的语言。从根本上来说，GB18030应该和Unicode一一对应，绝无遗漏。UTF-8这个子集，恐怕远远不能满足世界上所有语种的需要。

不过非常遗憾的是，GB18030是世界上唯一一个四字节编码的字符集，显然不可能要求所有的操作系统为了支持单单一个字符集来进行扩充设计。连WindowsXP也不过是将这个字符集映射到了GB2312/GBK上而已。这也就是Linux内核支持GB18030的缓慢根本所在。

ShiyuTang · 发表于 2003-2-12 15:08:16

应该自己创造一个 GB18030 的开放源代码的字体。这是问题的核心，有了字体，MandrakeSoft 不给 XFree 打补丁都不行。这个字体最好支持 FreeType 2，这样我们就可以不用进行麻烦的汉化了。

		自动登录	找回密码
密码			注册

有关 Mandrake Linux 的中文支持问题

此言差异

浏览过的版块