Creo que necesitas 4bytes para representar el carácter chino
The 16-bit fixed width encodings, such as Unicode up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 90,000 Han characters—and the requirement by the Chinese government that software in China support the GB18030 character set.
The four byte scheme can be thought of as consisting of two units, each of two bytes. Each unit has a similar format to a GBK two byte character but with a range of values for the second byte of 0x30-0x39 (the ASCII codes for decimal digits). The first byte has the range 0x81 to 0xFE, as before. This means that a string search routine that is safe for GBK should also be reasonably safe for GB18030 (in much the same way that a basic byte-oriented search routine is reasonably safe for EUC).http://en.wikipedia.org/wiki/GB18030
This gives a total of 1,587,600 (126*10*126*10) possible 4 byte sequences, which is easily sufficient to cover Unicode's 1,112,046 (17 × 65536 - 2048 surrogates - 18 noncharacters) code points.
Unfortunately, to further complicate matters there are no simple rules to translate between a 4 byte sequence and its corresponding code point. Instead, codes are allocated sequentially (with the first byte containing the most significant part and the last the least significant part) only to Unicode code points that are not mapped in any other manner.
Quizas te pasa por esto