Search results
Results from the WOW.Com Content Network
UTF-16 (16-bit Unicode Transformation Format) is a character encoding method capable of encoding all 1,112,064 valid code points of Unicode. [a] ...
UTF-16 is popular because many APIs date to the time when Unicode was 16-bit fixed width (referred as UCS-2). However, using UTF-16 makes characters outside the Basic Multilingual Plane a special case which increases the risk of oversights related to their handling. That said, programs that mishandle surrogate pairs probably also have problems ...
As of Unicode version 16.0, there are 155,063 characters with code points, covering 168 modern and historical scripts, as well as multiple symbol sets.This article includes the 1,062 characters in the Multilingual European Character Set 2 subset, and some additional related characters.
The numbers in the names of the encodings indicate the number of bits per code unit (for UTF encodings) or the number of bytes per code unit (for UCS encodings and UTF-1). UTF-8 and UTF-16 are the most commonly used encodings. UCS-2 is an obsolete subset of UTF-16; UCS-4 and UTF-32 are functionally equivalent. UTF encodings include:
Microsoft was one of the first companies to implement Unicode in their products. Windows NT was the first operating system that used "wide characters" in system calls.Using the (now obsolete) UCS-2 encoding scheme at first, it was upgraded to the variable-width encoding UTF-16 starting with Windows 2000, allowing a representation of additional planes with surrogate pairs.
For that reason, ISO/IEC 10646 was limited to contain as many characters as could be encoded by UTF-16 and no more, that is, a little over a million characters instead of over 679 million. The UCS-4 encoding of ISO/IEC 10646 was incorporated into the Unicode standard with the limitation to the UTF-16 range and under the name UTF-32 , although ...
For UTF-8, the BOM is optional, while it is a must for the UTF-16 and the UTF-32 encodings. (Note: UTF-16 and UTF-32 without the BOM are formally known under different names, they are different encodings, and thus needs some form of encoding declaration – see UTF-16BE, UTF-16LE, UTF-32LE and UTF-32BE.) The use of the BOM character (U+FEFF ...
The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 16.0, five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16, which can encode 2 20 code points (16 planes) as pairs of words, plus the BMP as a single word. [2]