Search results
Results from the WOW.Com Content Network
UTF-16 (16-bit Unicode Transformation Format) is a character encoding method capable of encoding all 1,112,064 valid code points of Unicode. [a] ... then it is big ...
Clause D98 of conformance (section 3.10) of the Unicode standard states, "The UTF-16 encoding scheme may or may not begin with a BOM. However, when there is no BOM, and in the absence of a higher-level protocol, the byte order of the UTF-16 encoding scheme is big-endian." Whether or not a higher-level protocol is in force is open to interpretation.
UTF-16 is popular because many APIs date to the time when Unicode was 16-bit fixed width (referred as UCS-2). However, using UTF-16 makes characters outside the Basic Multilingual Plane a special case which increases the risk of oversights related to their handling. That said, programs that mishandle surrogate pairs probably also have problems ...
For instance, the BQ27421 Texas Instruments battery gauge uses the little-endian format for its registers and the big-endian format for its random-access memory. SPARC historically used big-endian until version 9, which is bi-endian. Similarly early IBM POWER processors were big-endian, but the PowerPC and Power ISA descendants are now bi-endian.
The sequence also has no meaning in any arrangement of UTF-32 encoding, so, in summary, it serves as a fairly reliable indication that the text stream is encoded as UTF-16 in big-endian byte order. Conversely, if the first two bytes are 0xFF, 0xFE, then the text stream may be assumed to be encoded as UTF-16LE because, read as a 16-bit little ...
Microsoft was one of the first companies to implement Unicode in their products. Windows NT was the first operating system that used "wide characters" in system calls.Using the (now obsolete) UCS-2 encoding scheme at first, it was upgraded to the variable-width encoding UTF-16 starting with Windows 2000, allowing a representation of additional planes with surrogate pairs.
Concerning UTF-16 big endian vs little endian [ edit ] I have noticed that the Python interpreter reverses the byte order of UTF-16 big endian and little endian as compared to what is actually in the Unicode standard when given invalid input.
UTF-16: Little endian \000034 UTF-32: Big endian \000035 UTF-32: Little endian \000170 ISO/IEC 646 INV \000899 8-bit binary data References