Search results
Results from the WOW.Com Content Network
Each Unicode code point is encoded either as one or two 16-bit code units. Code points less than 2 16 ("in the BMP") are encoded with a single 16-bit code unit equal to the numerical value of the code point, as in the older UCS-2. Code points greater than or equal to 2 16 ("above the BMP") are encoded using two 16-bit code units.
The UCS includes 2048 code points in the Basic Multilingual Plane (BMP) for surrogate code point pairs. Together these surrogates allow any code point in the sixteen other planes to be addressed by using two surrogate code points. This provides a simple built-in method for encoding the 20.1 bit UCS within a 16 bit encoding such as UTF-16.
A code point is represented by a sequence of code units. The mapping is defined by the encoding. Thus, the number of code units required to represent a code point depends on the encoding: UTF-8: code points map to a sequence of one, two, three or four code units. UTF-16: code units are twice as long as 8-bit code units.
Decimal code points in the range 160 –255 must be entered with a leading zero (so that the Windows code page is chosen) and furthermore the Windows code page CP1252 must be used. [ b ] For example, Alt + 0 2 4 7 yields a ÷ , corresponding to its code point, but the character produced by Alt + 2 4 7 depends on the OEM code page , such as Code ...
The number of code points in each block must be a multiple of 16. A block may contain code points that are reserved, not-assigned, etc. Each character that is assigned, has a single "block name" value from the 338 names assigned as of Unicode version 16.0. Unassigned code points outside of an existing block have the default value "No_block".
All code points in the BMP are accessed as a single code unit in UTF-16 encoding and can be encoded in one, two or three bytes in UTF-8. Code points in planes 1 through 16 (the supplementary planes) are accessed as surrogate pairs in UTF-16 and encoded in four bytes in UTF-8.
Code points are commonly used in character encoding, where a code point is a numerical value that maps to a specific character.In character encoding code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but sometimes represent symbols, control characters, or formatting. [4]
The reserved code points (the "holes") in the alphabetic ranges up to U+1D551 duplicate characters in the Letterlike Symbols block. In order, these are ℎ / ℬ ℰ ℱ ℋ ℐ ℒ ℳ ℛ / ℯ ℊ ℴ / ℭ ℌ ℑ ℜ ℨ / ℂ ℍ ℕ ℙ ℚ ℝ ℤ.