VAD

Unicode – Characters (Part 5)

/

/

Unicode – Characters (Part 5)

An interactive scrolling of all previously recorded 137,374 Unicode characters, which are listed in 285 blocks and currently include 146 fonts from around the world (as of 2018), can be viewed on the website of Babelstone. (The hexadecimal encoding of the characters also runs through the scroll.)

Further Search Options

An approximate idea of where the different character blocks are found in Unicode, whether further ahead or further back in the list, is provided by some overviews.

In the Unicode character table of “Sa•design” you can also browse for the hexadecimal codes for Unicode characters up to the Unicode version 10.0.0 of 2017 by scrolling through the table. If you then click with the cursor of the mouse on the desired character, another field will open. Right under the enlarged representation of the character, the hexadecimal code is shown:

Unicode (hexadecimal), unicode-table.com (Sa•design 2017). CC: AN 2018, BY-NC-SA.

In the table of “Sa•design” you can also find the characters you are looking for sorted by blocks of related characters of specific writing systems. If you click on the respective letters within these blocks, the corresponding hexadecimal code for Unicode appears. The code can be used to enter the character, e.g., into your own Word document. Alternatively, there is the option of inserting the character into your document via “copy and paste”.

Unicode (hexadecimal), with „copy and paste” option, unicode-table.com (Sa•design 2017). CC: AN 2018, BY-NC-SA.

With the software “BabelMap” you can browse through the Unicode character repertoire. For each character that you mark with your mouse, the corresponding hexadecimal code for Unicode is shown at the bottom left below the font display:

Unicode (hexadecimal) in BabelMap Version 11.0.0.1, 2018. CC: AN 2018, BY-NC-SA.

The online tool “BabelMap Online (Unicode 11.0)” also offers a good alternative to the program version of BabelMap for the hexadecimal codes.

In ScriptSource (by SIL), too, you can search for the hexadecimal codes. They are arranged there alphabetically, according to languages or writing systems. An alphabetical list of diacritics based on Latin script is also available on other websites.

In “Unicode Lookup” (2009) you can find the hexadecimal encodings in the third column, under “Hex”:

Unicode (hexadecimal) in Unicode Lookup (2009), unicodelookup.com. CC: AN 2018, BY-NC-SA.

On the website of InternationalPhoneticAlphabet.org (2017), you can also find the codes for the International Phonetic Alphabet (IPA) in hexadecimal form. The hexadecimal notation is placed behind the respective character’s glyph, in the third column, under “hex”.

Keyboard inputs in the Unicode Hexadecimal System (“Alt”+”X”)

If you have found the characters listed in these tables, it is best to use the combination “U +” to insert the corresponding hexadecimal code into your Word text. You let your cursor stand right next to the hexadecimal code, and then simultaneously press the keys “Alt” and “X”. If you release both keys, the desired character appears.

Example: If you have found the first character in the second line, “hu”, in the Unicode table for Amharic, you will find specified there the code “1201”. Now write “U+1201”, and afterwards press simultaneously the keys “Alt” and “X” (in the German version, “Alt” and “C”). If you release the keys, the character ሁ appears.

The method works in both directions. So you can switch back from the glyph to the code, again, by simultaneously pressing “Alt” and “C”.

Note: The “U+” is used (for 16 and 32 bit encodings) to indicate in which order (“byte-order mark (BOM)“) the bytes are encrypted. For applications based on UTF-8, you usually may omit it, since there is only one possible way of ordering anyway. Further information can be found at Unicode 11.0.0, pp. 40-41.

„By convention, Unicode codepoints are represented in hexadecimal notation with a minimum of four digits and preceded with “U+”; so, for example, “U+0345”, “U+10345” and “U+20345”. Also by convention, any leading zeroes above four digits are suppressed; thus we would write “U+0456  CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I” but not “U+03456  !!unknown USV!!”.”

(Constable, SIL 2001)

The input method “Alt” + “X” works especially under Windows, but also in some other applications. For Linux and Mac, there exist other input options.

While US applications usually continue working with the input variant “Alt” + “X”, this variant does not work anymore, from Word 7 onwards, in the German versions of Word. Here, the input method “Alt” + “C” has to be used instead. However, the “Alt” + “X” form of input sometimes continues to be used in dialog boxes.

Tip: You should always find out whether you are currently working with an American or other, for example, German keyboard layout, because they may each produce different characters: “For example, on a German keyboard …, the key labelled as “z”  produces the Unicode codepoint U+007A, thus (as expected) a “z”. If you change the key assignment to US-American, the same key creates the codepoint U+0079, i.e., an “y”.” (wiki.selfhtml.org)

Continue to Unicode Characters Part 6.

Facebook
Twitter
LinkedIn
Email
Print