To display an HTML page correctly, the browser must know what character-set to use.

The following table lists the 128 ASCII characters and their equivalent HTML entity codes.

ISO Character Sets

The following table lists the different character-sets being used around the world:

Character Set Description
ISO-8859-1 Latin alphabet part 1: North America, Western Europe, Latin America, the Caribbean, Canada, Africa
ISO-8859-2 Latin alphabet part 2: Eastern Europe
ISO-8859-3 Latin alphabet part 3: SE Europe, Esperanto, miscellaneous others
ISO-8859-4 Latin alphabet part 4: Scandinavia/Baltics (and others not in ISO-8859-1)
ISO-8859-5 Latin/Cyrillic part 5: The languages that are using a Cyrillic alphabet such as Bulgarian, Belarusian, Russian and Macedonian
ISO-8859-6 Latin/Arabic part 6: The languages that are using the Arabic alphabet
ISO-8859-7 Latin/Greek part 7: The modern Greek language as well as mathematical symbols derived from the Greek
ISO-8859-8 Latin/Hebrew part 8: The languages that are using the Hebrew alphabet
ISO-8859-9 Latin 5 part 9: The Turkish language. Same as ISO-8859-1 except Turkish characters replace Icelandic ones
ISO-8859-10 Latin 6 Lappish, Nordic, Eskimo: The Nordic languages
ISO-8859-15 Latin 9 (aka Latin 0): Similar to ISO 8859-1 but replaces some less common symbols with the euro sign and some other missing characters
ISO-2022-JP Latin/Japanese part 1: The Japanese language
ISO-2022-JP-2 Latin/Japanese part 2: The Japanese language
ISO-2022-KR Latin/Korean part 1: The Korean language

Unicode Standard

Unicode can be implemented by different character-sets. The most commonly used encodings are UTF-8 and UTF-16:

Character-set Description
UTF-8 A character in UTF8 can be from 1 to 4 bytes long. UTF-8 can represent any character in the Unicode standard. UTF-8 is backwards compatible with ASCII. UTF-8 is the preferred encoding for e-mail and web pages
UTF-16 16-bit Unicode Transformation Format is a variable-length character encoding for Unicode, capable of encoding the entire Unicode repertoire. UTF-16 is used in major operating systems and environments, like Microsoft Windows 2000/XP/2003/Vista/CE and the Java and .NET byte code environments


