Unicode version 4.0.0, was released in the
April, 2003. The previous version was
Unicode 3.2 and the next is
Unicode 4.1.
All the gory details can be found at
http://www.unicode.org/versions/Unicode4.0.0/
The primary feature of Unicode 4.0 is the addition of 1,288 newly encoded characters.
Version 4.0.1 was released in March, 2004
(http://www.unicode.org/versions/Unicode4.0.1/)
The main new features in Unicode 4.0.1 (compared to 4.0.0) are:
- The first significant update of the Unihan Database (Unihan.txt)
since Unicode 3.2.0, including a large number of fixes and
additional data items.
- Significant clarifications in four definitions used in conformance.
- Unicode Character Database:
- New character properties: STerm and Variation_Selector
- Updated significantly: Terminal_Punctuation, Math, Script, and Line_Break
- Changed: general category of U+200B ZERO WIDTH SPACE
- Changed: bidi class of some characters including: +, -, / and FRACTION SLASH
- Added: property value aliases
- Revised: formats in some of the data files
- Changes in the recommended loose comparison of Character name values.
- Clearer definition of the encoding of Bengali Reph and Ya-phalaa
The changes in 4.0.0 since the previous version, Unicode 3.2, are as follows :
New Code Blocks
15 new
code blocks were added in 4.0
U+1900 to U+194F Limbu 66/80
U+1950 to U+197F Tai Le 35/48
U+19E0 to U+19FF Khmer Symbols 32/32
U+1D00 to U+1D7F Phonetic Extensions 108/128
U+2B00 to U+2BFF Miscellaneous Symbols and Arrows 14/256
U+4DC0 to U+4DFF Yijing Hexagram Symbols 64/64
U+10000 to U+1007F Linear B Syllabary 88/128
U+10080 to U+100FF Linear B Ideograms 123/128
U+10100 to U+1013F Aegean Numbers 57/64
U+10380 to U+1039F Ugaritic 31/32
U+10450 to U+1047F Shavian 48/48
U+10480 to U+104AF Osmanya 40/48
U+10800 to U+1083F Cypriot Syllabary 55/64
U+1D300 to U+1D35F Tai Xuan Jing Symbols 87/96
U+E0100 to U+E01EF Variation Selectors Supplement 240/240
New Characters
Excluding those in the new
code blocks, there were 138 new characters added in Unicode 4.0
Number of characters in each General Category :
Letter, Uppercase Lu : 5
Letter, Lowercase Ll : 11
Letter, Other Lo : 16
Mark, Non-Spacing Mn : 25
Mark, Spacing Combining Mc : 1
Number, Other No : 11
Punctuation, Connector Pc : 1
Punctuation, Open Ps : 1
Punctuation, Close Pe : 1
Punctuation, Other Po : 2
Symbol, Currency Sc : 2
Symbol, Modifier Sk : 17
Symbol, Other So : 41
Other, Format Cf : 4
Number of characters in each Bidirectional Category :
Left To Right L : 24
Right To Left Arabic AL : 14
European Number Terminator ET : 2
Non Spacing Mark NSM : 25
Other Neutral ON : 73
The columns below should be interpreted as :
- The Unicode code for the character
- The character in question
- The Unicode name for the character
- The Unicode General Category for the character
- The Unicode Bidirectional Category for the character
If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.
Latin Extended B
Miscellaneous additions
- U+0221 ȡ Latin small letter D with curl Ll L
- * phonetic use in Sinology
Additions for Sinology
- U+0234 ȴ Latin small letter L with curl Ll L
- U+0235 ȵ Latin small letter N with curl Ll L
- U+0236 ȶ Latin small letter T with curl Ll L
IPA Extensions
Additions for Sinology
- U+02AE ʮ Latin small letter turned h with fishhook Ll L
- U+02AF ʯ Latin small letter turned h with fishhook and tail Ll L
Spacing Modifier Letters
UPA modifiers
- U+02EF ˯ modifier letter low down arrowhead Sk ON
- U+02F0 ˰ modifier letter low up arrowhead Sk ON
- U+02F1 ˱ modifier letter low left arrowhead Sk ON
- U+02F2 ˲ modifier letter low right arrowhead Sk ON
- U+02F3 ˳ modifier letter low ring Sk ON
- U+02F4 ˴ modifier letter middle grave accent Sk ON
- U+02F5 ˵ modifier letter middle double grave accent Sk ON
- U+02F6 ˶ modifier letter middle double acute accent Sk ON
- U+02F7 ˷ modifier letter low tilde Sk ON
- U+02F8 ˸ modifier letter raised colon Sk ON
- U+02F9 ˹ modifier letter begin high tone Sk ON
- U+02FA ˺ modifier letter end high tone Sk ON
- U+02FB ˻ modifier letter begin low tone Sk ON
- U+02FC ˼ modifier letter end low tone Sk ON
- U+02FD ˽ modifier letter shelf Sk ON
- U+02FE ˾ modifier letter open shelf Sk ON
- U+02FF ˿ modifier letter low left arrow Sk ON
Combining Diacritical Marks
Additions for the Uralic Phonetic Alphabet
- U+0350 ͐ combining right arrowhead above Mn NSM
- U+0351 ͑ combining left half ring above Mn NSM
- U+0352 ͒ combining fermata Mn NSM
- U+0353 ͓ combining x below Mn NSM
- U+0354 ͔ combining left arrowhead below Mn NSM
- U+0355 ͕ combining right arrowhead below Mn NSM
- U+0356 ͖ combining right arrowhead and up arrowhead below Mn NSM
- U+0357 ͗ combining right half ring above Mn NSM
Double diacritics
- U+035D ͝ combining double breve Mn NSM
- U+035E ͞ combining double macron Mn NSM
- U+035F ͟ combining double macron below Mn NSM
Greek and Coptic
Additional archaic letters for Bactrian
- U+03F7 Ϸ Greek capital letter sho Lu L
- U+03F8 ϸ Greek small letter sho Ll L
Variant letterform
- U+03F9 Ϲ Greek capital lunate sigma symbol Lu L
Archaic letters
- U+03FA Ϻ Greek capital letter san Lu L
- U+03FB ϻ Greek small letter san Ll L
Arabic
Subtending marks
- U+0600 Arabic number sign Cf AL
- U+0601 Arabic sign sanah Cf AL
- U+0602 Arabic footnote marker Cf AL
- U+0603 Arabic sign safha Cf AL
Punctuation
- U+060D ؍ Arabic date separator Po AL
Poetic marks
- U+060E ؎ Arabic poetic verse sign So ON
- U+060F ؏ Arabic sign misra So ON
Honorifics
- U+0610 ؐ Arabic sign sallallahou alayhe wassallam Mn NSM
- * represents sallallahu alayhe wasallam "may God's peace and blessings be upon him"
- U+0611 ؑ Arabic sign alayhe assallam Mn NSM
- * represents alayhe assalam "upon him be peace"
- U+0612 ؒ Arabic sign rahmatullah alayhe Mn NSM
- * represents rahmatullah alayhe "may God have mercy upon him"
- U+0613 ؓ Arabic sign radi allahou anhu Mn NSM
- * represents radi allahu 'anhu "may God be pleased with him"
- U+0614 ؔ Arabic sign takhallus Mn NSM
- * sign placed over the name or nom-de-plume of a poet, or in some writings used to mark all proper names
Koranic annotation sign
- U+0615 ؕ Arabic small high tah Mn NSM
- * marks a recommended pause position in some Korans published in Iran and Pakistan
- * should not be confused with the small TAH sign used as a diacritic for some letters such as 0679
Other combining marks
- U+0656 ٖ Arabic subscript alef Mn NSM
- U+0657 ٗ Arabic inverted damma Mn NSM
- U+0658 ٘ Arabic mark noon ghunna Mn NSM
- * Kashmiri and Baluchi
- * indicates nasalization in Urdu
Extended Arabic letters for Parkari
- U+06EE ۮ Arabic letter dal with inverted v Lo AL
- U+06EF ۯ Arabic letter reh with inverted v Lo AL
Extended Arabic letter for Parkari
- U+06FF ۿ Arabic letter heh with inverted v Lo AL
Syriac
Persian letters
- U+072D ܭ Syriac letter persian bheth Lo AL
- U+072E ܮ Syriac letter persian ghamal Lo AL
- U+072F ܯ Syriac letter persian dhalath Lo AL
Sogdian letters
- U+074D ݍ Syriac letter sogdian zhain Lo AL
- U+074E ݎ Syriac letter sogdian khaph Lo AL
- U+074F ݏ Syriac letter sogdian fe Lo AL
Devanagari
Independent vowels
- U+0904 ऄ Devanagari letter short a Lo L
Bengali
Various signs
- U+09BD ঽ Bengali sign avagraha Lo L
Gurmukhi
Based on ISCII 1988
- U+0A01 ਁ Gurmukhi sign adak bindi Mn NSM
- U+0A03 ਃ Gurmukhi sign visarga Mc L
Gujarati
Independent vowels
- U+0A8C ઌ Gujarati letter vocalic l Lo L
- * used with Sanskrit text
Additions for use with Sanskrit text
- U+0AE1 ૡ Gujarati letter vocalic ll Lo L
- U+0AE2 ૢ Gujarati vowel sign vocalic l Mn NSM
- U+0AE3 ૣ Gujarati vowel sign vocalic ll Mn NSM
Currency sign
- U+0AF1 ૱ Gujarati rupee sign Sc ET
Oriya
Consonants
- U+0B35 ଵ Oriya letter va Lo L
- ref U+0B2C ବ Oriya letter ba (Oriya)
Oriya-specific additions
- U+0B71 ୱ Oriya letter wa Lo L
- ref U+0B13 ଓ Oriya letter O (Oriya)
- ref U+0B35 ଵ Oriya letter va (Oriya)
Tamil
Tamil symbols
- U+0BF3 ௳ Tamil day sign So ON
- U+0BF4 ௴ Tamil month sign So ON
- U+0BF5 ௵ Tamil year sign So ON
- U+0BF6 ௶ Tamil debit sign So ON
- U+0BF7 ௷ Tamil credit sign So ON
- U+0BF8 ௸ Tamil as above sign So ON
Currency symbol
- U+0BF9 ௹ Tamil rupee sign Sc ET
Tamil symbol
- U+0BFA ௺ Tamil number sign So ON
Kannada
Various signs
- U+0CBC ಼ Kannada sign nukta Mn NSM
- U+0CBD ಽ Kannada sign avagraha Lo L
Khmer
Various signs
- U+17DD ៝ Khmer sign atthacan Mn NSM
- * mostly obsolete
- * indicates that the base character is the final consonant of a word with its inherent vowel sound
- ref U+17D1 ៑ Khmer sign viriam (Khmer)
Numeric symbols for divination lore
- U+17F0 ៰ Khmer symbol lek attak son No ON
- U+17F1 ៱ Khmer symbol lek attak muoy No ON
- U+17F2 ៲ Khmer symbol lek attak pii No ON
- U+17F3 ៳ Khmer symbol lek attak bei No ON
- U+17F4 ៴ Khmer symbol lek attak buon No ON
- U+17F5 ៵ Khmer symbol lek attak pram No ON
- U+17F6 ៶ Khmer symbol lek attak pram muoy No ON
- U+17F7 ៷ Khmer symbol lek attak pram pii No ON
- U+17F8 ៸ Khmer symbol lek attak pram bei No ON
- U+17F9 ៹ Khmer symbol lek attak pram buon No ON
General Punctuation
General punctuation
- U+2053 ⁓ swung dash Po ON
- U+2054 ⁔ inverted undertie Pc ON
Letterlike Symbols
Additional letterlike symbols
- U+213B ℻ facsimile sign So ON
- ref U+2121 ℡ telephone sign (Letterlike Symbols)
Miscellaneous Technical
Keyboard and UI symbols
- U+23CF ⏏ eject symbol So ON
- * UI symbol to eject media
Special character extension
- U+23D0 ⏐ vertical line extension So ON
- * used for extension of arrows
- ref U+23AF ⎯ horizontal line extension (Miscellaneous Technical)
Enclosed Alphanumerics
Additional white on black circled number
- U+24FF ⓿ negative circled digit zero No ON
- ref U+2776 ❶ dingbat negative circled digit one (Dingbats)
Miscellaneous Symbols
Weather symbol
- U+2614 ☔ umbrella with rain drops So ON
- aka showery weather
Miscellaneous symbol
- U+2615 ☕ hot beverage So ON
- aka tea or coffee, depending on locale
- * can be used to indicate a wait
- ref U+231A ⌚ watch (Miscellaneous Technical)
- ref U+231B ⌛ hourglass (Miscellaneous Technical)
Yijing monogram and digram symbols
- U+268A ⚊ monogram for yang So ON
- U+268B ⚋ monogram for yin So ON
- U+268C ⚌ digram for greater yang So ON
- U+268D ⚍ digram for lesser yin So ON
- U+268E ⚎ digram for lesser yang So ON
- U+268F ⚏ digram for greater yin So ON
Map markers
- U+2690 ⚐ white flag So ON
- U+2691 ⚑ black flag So ON
Warning signs
- U+26A0 ⚠ warning sign So ON
- U+26A1 ⚡ high voltage sign So ON
Enclosed CJK Letters and Months
Parenthesized Korean words
- U+321D ㈝ parenthesized korean character ojeon So ON
- U+321E ㈞ parenthesized korean character o hu So ON
Squared Latin abbreviation
- U+3250 ㉐ partnership sign So ON
Circled Korean words
- U+327C ㉼ circled korean character chamko So ON
- U+327D ㉽ circled korean character jueui So ON
Squared Latin abbreviations
- U+32CC ㋌ square hg So ON
- U+32CD ㋍ square erg So ON
- U+32CE ㋎ square ev So ON
- U+32CF ㋏ limited liability sign So ON
CJK Compatibility
Squared Latin abbreviations
- U+3377 ㍷ square dm So ON
- U+3378 ㍸ square dm squared So ON
- U+3379 ㍹ square dm cubed So ON
- U+337A ㍺ square iu So ON
Squared Latin abbreviations
- U+33DE ㏞ square v over m So ON
- U+33DF ㏟ square a over m So ON
Squared Latin abbreviation
- U+33FF ㏿ square gal So ON
Arabic Presentation Forms A
Symbol
- U+FDFD ﷽ Arabic ligature bismillah ar rahman ar raheem So ON
CJK Compatibility Forms
Glyphs for vertical variants
- U+FE47 ﹇ presentation form for vertical left square bracket Ps ON
- ref U+23B4 ⎴ top square bracket (Miscellaneous Technical)
- U+FE48 ﹈ presentation form for vertical right square bracket Pe ON
- ref U+23B5 ⎵ bottom square bracket (Miscellaneous Technical)
Deseret
Uppercase letters
- U+10426 𐐦 Deseret capital letter oi Lu L
- U+10427 𐐧 Deseret capital letter ew Lu L
Lowercase letters
- U+1044E 𐑎 Deseret small letter oi Ll L
- U+1044F 𐑏 Deseret small letter ew Ll L
Mathematical Alphanumeric Symbols
Script symbols
- U+1D4C1 𝓁 mathematical script small l Ll L
- ref U+2113 ℓ script small l (Letterlike Symbols)
Altered Characters
In addition, 18 characters were altered in 4.0
Latin-1 Supplement
U+00AD
soft hyphen had its
General Category changed from
Punctuation, Dash to
Other, Format
Spacing Modifier Letters
U+02B9
ʹ modifier letter prime had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02BA
ʺ modifier letter double prime had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02C6
ˆ modifier letter circumflex accent had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02C7
ˇ caron had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02C8
ˈ modifier letter vertical line had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02C9
ˉ modifier letter macron had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02CA
ˊ modifier letter acute accent had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02CB
ˋ modifier letter grave accent had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02CC
ˌ modifier letter low vertical line had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02CD
ˍ modifier letter low macron had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02CE
ˎ modifier letter low grave accent had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
U+02CF
ˏ modifier letter low acute accent had its
General Category changed from
Symbol, Modifier to
Letter, Modifier
Kannada
U+0CBF
ಿ Kannada vowel sign i had its
Bidirectional Category changed from
Non Spacing Mark to
Left To Right
U+0CC6
ೆ Kannada vowel sign e had its
Bidirectional Category changed from
Non Spacing Mark to
Left To Right
Khmer
U+17B4
឴ Khmer vowel inherent aq had its
General Category changed from
Mark, Spacing Combining to
Other, Format
U+17B5
឵ Khmer vowel inherent aa had its
General Category changed from
Mark, Spacing Combining to
Other, Format
Mongolian
U+180E
Mongolian vowel separator had its
General Category changed from
Other, Format to
Separator, Space
U+180E
Mongolian vowel separator had its
Bidirectional Category changed from
Boundary Neutral to
Whitespace
http://unicode.org