The Thai layout in Unicode is based on the Thai Industrial Standard 620-2529 and its updated version 620-2533.
In common with Indic scripts, each Thai letter is a consonant possessing an inherent vowel sound. Thai letters further feature inherent tones. The inherent vowel and tone can be modified with vowel signs and tone marks. Most Thai vowel signs are rendered by full letter sized in-line glyphs placed either before, after or around the glyph for the base consonant. When the vowel's glyph is before the consonant, it is encoded as a separate character before the consonant. This differs from all other Indic scripts, but is necessary to comply with the Thai Industrial Standard.
There are several punctuation marks particular to Thai :
U+0E4F ๏ Thai character fongman is the Thai bullet, used to mark items in lists or appearing at the beginning of a verse, sentence, paragraph or other textual segment. U+0E46 ๆ Thai character maiyamok is used to mark repetition of preceding letters. U+0E2F ฯ Thai character paiyannoi is used to indicate elision or abbreviation of letters. It is also used as a regular letter, such as in the Thai name for Bangkok. Paiyannoi is also used in combination (U+0E2F U+0E25 U+0E2F) to create a construct called paiyanyai which means et cetera and is comparable to U+17D8 ៘ Khmer sign beyyal. U+0E5A ๚ Thai character angkhankhu is used to mark the end of a long segment of text. It can be followed by U+0E30 ะ Thai character sara a to mark even longer segments of text, such as at the end of a verse in poetry. U+0E5B ๛ Thai character khomut marks the end of a chapter or document, where it always follows the angkhankhu + sara a combination. The angkhankhu + sara a combination is closely related to U+17D4 ។ Khmer sign khan and U+17D5 ៕ Khmer sign bariyoosan which are themselves ultimately related to the Devanagari characters U+0964 । Devanagari danda and U+0965 ॥ Devanagari double danda. Thai words are not separated by spaces, but spaces are introduces where Western typography might use a comma or period. To mark a word boundary (e.g. for line breaking) use U+200B zero width space.
U+0E46 ๆ Thai character maiyamok is used to mark repetition of preceding letters.
U+0E2F ฯ Thai character paiyannoi is used to indicate elision or abbreviation of letters. It is also used as a regular letter, such as in the Thai name for Bangkok. Paiyannoi is also used in combination (U+0E2F U+0E25 U+0E2F) to create a construct called paiyanyai which means et cetera and is comparable to U+17D8 ៘ Khmer sign beyyal.
U+0E5A ๚ Thai character angkhankhu is used to mark the end of a long segment of text. It can be followed by U+0E30 ะ Thai character sara a to mark even longer segments of text, such as at the end of a verse in poetry.
U+0E5B ๛ Thai character khomut marks the end of a chapter or document, where it always follows the angkhankhu + sara a combination.
The angkhankhu + sara a combination is closely related to U+17D4 ។ Khmer sign khan and U+17D5 ៕ Khmer sign bariyoosan which are themselves ultimately related to the Devanagari characters U+0964 । Devanagari danda and U+0965 ॥ Devanagari double danda.
Thai words are not separated by spaces, but spaces are introduces where Western typography might use a comma or period. To mark a word boundary (e.g. for line breaking) use U+200B zero width space.
Sinhala <-- Thai --> Lao
All the characters in this code block were added in Unicode 1.1
Number of characters in each General Category :
Letter, Modifier Lm : 1 Letter, Other Lo : 56 Mark, Non-Spacing Mn : 16 Number, Decimal Digit Nd : 10 Punctuation, Other Po : 3 Symbol, Currency Sc : 1
Number of characters in each Bidirectional Category :
Left To Right L : 70 European Number Terminator ET : 1 Non Spacing Mark NSM : 16
The columns below should be interpreted as :
If the characters below show up poorly, or not at all, see Unicode Support for possible solutions.
Thai
Based on TIS 620-2533
Sign
Vowels
Currency symbol
Vowel
Tone marks
Signs
Digits
http://unicode.org Some prose may have been lifted verbatim from unicode.org, as is permitted by their terms of use at http://www.unicode.org/copyright.html
While the script began as a directly phonetic representation of Thai, sound changes have caused the script to become somewhat more complicated. Several qualities of Thai consonants, such as pre-aspiration and pre-glottalization have disappeared. Others, such as aspirated/unaspirated distinction and voiced/unvoiced distinction, became more limited. An example of this in English would be if the sounds represented by 'k' and hard 'g' merged to just 'k', yet both letters were still used. Concurrently, tonal distinctions became more pronounced. The result was that there soon grew to be an abundance of unneeded consonantal signs and not enough tonal signs. Thus, consonants representing the same sounds were divided into three groups, with each group corresponding to a certain class of tones. These groups are called kla:ng, sû:ng, and tàm (mid, high, and low, respectively).
With modern standardization, Thai has moved further away from phonemic correspondance with the spoken language. Like English (and this is the main reason why English spelling is so absurdly complicated), many Thai spellings contain etymological information that has nothing to do with the pronunciation of the word. Extra unpronounced characters are retained to indicate that a word originated from Sanskrit, much as the spelling of 'night' indicates that the word is of Germanic origin even though 'gh' is certainly not pronounced. The situation becomes especially complicated with final stop consonants. A native Thai word can only end in -p, -t, or -k for a stop, yet there are sixteen different individual letters for representing those three sounds, twenty-seven letters that can transform into one of those sounds in final position, and a proliferation of silent etymological letters.
Handwritten Thai sometimes makes use of small 'heads' which are written first similar to those present in other Brahmi writing systems like Devanagari, Kannada, and Oriya.
I speak English, Finnish, Japanese, French and Spanish with various degrees of fluency, and I'm working on adding Mandarin to this list. My Russian's getting a little rusty but I can still decipher Cyrillic. Without any formal lessons I've picked up survival-level German, Swedish and Malay. The Hebrew and Arabic alphabets I learned just for yucks, although I can't really speak either language. And I can recite a few poems in Slovene by heart, quote subway announcements in Czech, and puzzle out Malti orthography.
So what do I think is the most difficult language I've encountered so far? No doubt about it: Thai.
Second, you have a slew of consonants to deal with. Every guide to the language proclaims that there are 44 of the little buggers, but it's not quite that bad: yes, there are 44 letters (more on that later), but only around 20 distinct sounds. Thai consonants have one distinction difficult for the English speaker, namely that between aspirated (with-a-puff-of-air) and unaspirated (without-a-puff) consonants. The aspirated ones match those normally used in English, and are usually transcribed as ph, kh, and th despite being pronounced "p", "k" and "t"; the unaspirated ones are found in English combinations like "spat" and "skip", and are usually transcribed p, k, t despite being pronounced something akin to "bp", "g" (hard), and "dt" respectively.
Still there? Then we have the vowels, and (if we count diphthongs) there are no less than 28 of them. Actually, English does feature most of them, but the average native speaker has never been taught to distinguish the front 'a' of man (ae in Thai) with the back 'a' of car (aa in Thai), and in Thai vowel length is also important. And then you have just weird sounds like the oei of kàthoei (transvestite), helpfully described as "as the u sound in hut, only more closed, plus i" by one of my guidebooks.
First of all, there are 44 consonants, 32 vowel signs and 4 tone marks to learn. Vowel signs are scattered before, above and after consonants; often several are required for a single vowel sound. Conversely, if the sound is a long O, no sign at all is needed. Most consonant sounds have multiple letters, choosing the one to use depends on the etymology of the word and the tone of the syllable. For example, to write "tîo" (เตี่ยว) you write เ (E) + ต (T) + ย (EI) + ว (W), then slap a bar with notch on top ี of the T to indicate there's a long I sound too, and finish with a little dot ่ to note the tone. And don't forget to choose the right one from the 8 different letters all pronounced "T" (2 unaspirated, 6 aspirated).
But this was at least phonetic. Many Thai words, especially those imported from Pali, retain archaic spellings that no longer correspond to their pronunciation. And then the cruelest blow of all: Thai does not use spaces between words. A sentence or name will be an uninterrupted flow of Thai characters, figuring out where one word ends and the next begins is left to the reader.
1 nèung 2 sãwng 3 sãam (saam) 4 sìi (say) 5 hâa 6 hók (lok) 7 jét (chat) 8 páet (bart) 9 kâo (gow) 10 sìp (sup) 11 sìp-ét 12 sìp-sãwng 20 yîi-sìp 21 yîi-sìp-ét 30 sãam-sìp 100 rawy 1000 phan
References
Lonely Planet Thai phrasebook 3 months in Bangkok
printable version chaos
Everything2 Help
cooled by Teiresias