Search results
Results from the WOW.Com Content Network
As of Unicode version 16.0, there are 155,063 characters with code points, covering 168 modern and historical scripts, as well as multiple symbol sets. This article includes the 1,062 characters in the Multilingual European Character Set 2 subset, and some additional related characters.
The Unicode Consortium and the ISO/IEC JTC 1/SC 2/WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set.The Universal Coded Character Set, most commonly called the Universal Character Set (abbr. UCS, official designation: ISO/IEC 10646), is an international standard to map characters, discrete symbols used in natural language, mathematics, music, and other ...
UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. [ 1 ] [ a ] The encoding is variable-length as code points are encoded with one or two 16-bit code units .
This number arises from the limitations of the UTF-16 character encoding, which can encode the 2 16 code points in the range U+0000 through U+FFFF except for the 2 11 code points in the range U+D800 through U+DFFF, which are used as surrogate pairs to encode the 2 20 code points in the range U+10000 through U+10FFFF.
Thus "H₂O" (using a subscript 2 character) is supposed to be identical to "H 2 O" (with subscript markup). In reality, many fonts that include these characters ignore the Unicode definition, and instead design the digits for mathematical numerator and denominator glyphs, [3] [4] which are aligned with the cap line and the baseline, respectively.
The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version 16.0, five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16, which can encode 2 20 code points (16 planes) as pairs of words, plus the BMP as a single word. [2]
The original edition of the UCS defined UTF-16, an extension of UCS-2, to represent code points outside the BMP. A range of code points in the S (Special) Zone of the BMP remains unassigned to characters. UCS-2 disallows use of code values for these code points, but UTF-16 allows their use in pairs.
UTF-16 or UTF-32, which can be used for all languages as well, are less widely used because they can be harder to handle in programming languages that assume a byte-oriented ASCII superset encoding, and they are less efficient for text with a high frequency of ASCII characters, which is usually the case for HTML documents.