WORKING DRAFT
Core Spec / qa
  • Preface

  • 1 Introduction

    Coverage • Design Goals • Text Handling

  • 2 General Structure

    Architectural Context • Unicode Design Principles • Compatibility Characters • Code Points and Characters • Encoding Forms • Encoding Schemes • Unicode Strings • Unicode Allocation • Details of Allocation • Writing Direction • Combining Characters • Equivalent Sequences • Special Characters • Conforming to the Unicode Standard

  • 3 Conformance

    Versions of the Unicode Standard • Conformance Requirements • Semantics • Characters and Encoding • Properties • Combination • Decomposition • Surrogates • Unicode Encoding Forms • Unicode Encoding Schemes • Normalization Forms • Conjoining Jamo Behavior • Default Case Algorithms

  • 4 Character Properties

    Unicode Character Database • Case • Combining Classes • Directionality • General Category • Numeric Value • Bidi Mirrored • Name • Unicode 1.0 Names • Letters, Alphabetic, and Ideographic • Properties for Text Boundaries • Characters with Unusual Properties • Characters and Sequences That Should Not Be Emitted

  • 5 Implementation Guidelines

    Data Structures for Character Conversion • Programming Languages and Data Types • Unknown and Missing Characters • Handling Surrogate Pairs in UTF-16 • Handling Numbers • Normalization • Compression • Newline Guidelines • Regular Expressions • Language Information in Plain Text • Editing and Selection • Strategies for Handling Nonspacing Marks • Rendering Nonspacing Marks • Locating Text Element Boundaries • Identifiers • Sorting and Searching • Binary Order • Case Mappings • Mapping Compatibility Variants • Unicode Security • Ignoring Characters in Processing • U+FFFD Substitution in Conversion

  • 6 Writing Systems and Punctuation

    Writing Systems • General Punctuation

  • 7 Europe-I

    Latin • Greek • Coptic • Cyrillic • Glagolitic • Armenian • Georgian • Modifier Letters • Combining Marks

  • 8 Europe-II

    Linear A • Linear B • Cypriot Syllabary • Cypro-Minoan • Ancient Anatolian Alphabets • Old Italic • Runic • Old Hungarian • Gothic • Elbasan • Caucasian Albanian • Vithkuqi • Todhri • Old Permic • Ogham • Shavian • Sidetic

  • 9 Middle East-I

    Hebrew • Arabic • Syriac • Samaritan • Mandaic • Yezidi

  • 10 Middle East-II

    Old North Arabian • Old South Arabian • Phoenician • Imperial Aramaic • Manichaean • Pahlavi and Parthian • Avestan • Chorasmian • Elymaic • Nabataean • Palmyrene • Hatran

  • 11 Cuneiform and Hieroglyphs

    Sumero-Akkadian • Ugaritic • Old Persian • Egyptian Hieroglyphs • Meroitic • Anatolian Hieroglyphs

  • 12 South and Central Asia-I

    Devanagari • Bengali (Bangla) • Gurmukhi • Gujarati • Oriya (Odia) • Tamil • Telugu • Kannada • Malayalam

  • 13 South and Central Asia-II

    Thaana • Sinhala • Newa • Tibetan • Mongolian • Limbu • Meetei Mayek • Mro • Warang Citi • Ol Chiki • Ol Onal • Nag Mundari • Chakma • Lepcha • Saurashtra • Masaram Gondi • Gunjala Gondi • Wancho • Toto • Tangsa • Sunuwar • Gurung Khema • Kirat Rai • Tolong Siki

  • 14 South and Central Asia-III

    Brahmi • Kharoshthi • Bhaiksuki • Phags-pa • Marchen • Zanabazar Square • Soyombo • Old Turkic • Old Sogdian • Sogdian • Old Uyghur

  • 15 South and Central Asia-IV

    Syloti Nagri • Kaithi • Sharada • Takri • Siddham • Mahajani • Khojki • Dogra • Khudawadi • Multani • Tirhuta • Modi • Nandinagari • Grantha • Dives Akuru • Ahom • Sora Sompeng • Tulu-Tigalari

  • 16 Southeast Asia-I

    Thai • Lao • Myanmar • Khmer • Tai Le • New Tai Lue • Tai Tham • Tai Viet • Kayah Li • Cham • Pahawh Hmong • Nyiakeng Puachue Hmong • Pau Cin Hau • Hanifi Rohingya • Tai Yo

  • 17 Southeast Asia-II

    Philippine Scripts: Tagalog, Hanunóo, Buhid, and Tagbanwa • Buginese • Balinese • Javanese • Rejang • Batak • Sundanese • Makasar • Kawi

  • 18 East Asia

    Han • Ideographic Description Characters • Bopomofo • Hiragana and Katakana • Halfwidth and Fullwidth Forms • Hangul • Yi • Nüshu • Lisu • Miao • Tangut • Khitan Small Script

  • 19 Africa

    Ethiopic • Osmanya • Tifinagh • N’Ko • Vai • Bamum • Bassa Vah • Mende Kikakui • Adlam • Medefaidrin • Garay • Beria Erfe

  • 20 Americas

    Cherokee • Canadian Aboriginal Syllabics • Osage • Deseret

  • 21 Notational Systems

    Braille • Western Musical Symbols • Byzantine Musical Symbols • Znamenny Musical Notation • Ancient Greek Musical Notation • Duployan • Sutton SignWriting

  • 22 Symbols

    Currency Symbols • Letterlike Symbols • Numerals • Superscript and Subscript Symbols • Mathematical Symbols • Invisible Mathematical Operators • Technical Symbols • Geometrical Symbols • Miscellaneous Symbols • Enclosed and Square

  • 23 Special Areas and Format Characters

    Control Codes • Layout Controls • Deprecated Format Characters • Variation Selectors • Private-Use Characters • Surrogates Area • Noncharacters • Specials • Tag Characters

  • 24 About the Code Charts

    Character Names List • CJK and Other Ideographs • Hangul Syllables

  • A Notational Conventions

    Typographic Conventions • Extended BNF • Rendering

  • B Unicode Publications and Resources

    The Unicode Consortium • Unicode Publications • Other Unicode Online Resources

  • C Relationship to ISO/IEC 10646

    History • Encoding Forms in ISO/IEC 10646 • UTF-8 and UTF-16 • Synchronization of the Standards • Identification of Features for Unicode • Character Names • Character Functional Specifications

  • D Version History of the Standard

  • E Han Unification History

    Development of the URO • Continuing Research on Ideographs • CJK Sources

  • F Documentation of CJK Strokes

Quality assurance

Link Checker:

  • /pdf/

Nu Html Checker:

  • /preface/
  • /chapter-1/
  • /chapter-2/
  • /chapter-3/
  • /chapter-4/
  • /chapter-5/
  • /chapter-6/
  • /chapter-7/
  • /chapter-8/
  • /chapter-9/
  • /chapter-10/
  • /chapter-11/
  • /chapter-12/
  • /chapter-13/
  • /chapter-14/
  • /chapter-15/
  • /chapter-16/
  • /chapter-17/
  • /chapter-18/
  • /chapter-19/
  • /chapter-20/
  • /chapter-21/
  • /chapter-22/
  • /chapter-23/
  • /chapter-24/
  • /appendix-a/
  • /appendix-b/
  • /appendix-c/
  • /appendix-d/
  • /appendix-e/
  • /appendix-f/

The Unicode Standard, Version 18.0 (Working Draft)

© 1991-2025 Unicode, Inc. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc. in the U.S. and other countries. See Terms of Use.