I needed to typeset some text including fragments in Arabic, Hebrew and Ethiopic. There is good support for Arabic and Hebrew, but it took some effort to get a working Ethiopic setup.
I installed it in /usr/X11R6/lib/X11/fonts/truetype/ and did
# ln -s /usr/X11R6/lib/X11/fonts/truetype/CODE2000.TTF /usr/share/texmf/fonts/truetype/CODE2000.ttf # cd /usr/share/texmf/fonts # ttf2tfm truetype/CODE2000.ttf tfm/code2k@Unicode@ # echo 'code2k@Unicode@ CODE2000.ttf' >> /etc/ttf2pk/ttfonts.map # texhash
Other fonts are available, both commercial and free. May report later on some of those.
Not having a keyboard with Arabic / Hebrew / Ethiopic keys, it is easiest to type a transliteration of the required text, where the language support package transforms the transliteration into the correct symbols. Several systems for the transliteration of Ethiopic exist, each in several flavours. I chose SERA.
The file ethin.otp converts transliterated text:
% input: 8-bit input: 1; % output: 16-bit output: 2; expressions: % Sorted for decreasing length, then increasing Unicode % ``' `1' `0' `0' `0' `0' => "\CharLet{7C}"; ``' `s' `W' `a' => "\CharLET{27}"; ``' `1' `0' `0' => "\CharLet{7B}"; `l' `W' `a' => "\CharLET{0F}"; `H' `W' `a' => "\CharLET{17}"; `m' `W' `a' => "\CharLET{1F}"; ``' `s' `e' => "\CharLET{20}"; ... `P' => "\CharLet{35}"; `S' => "\CharLet{3D}"; `f' => "\CharLet{4D}"; `p' => "\CharLet{55}"; `:' => "\CharLet{61}"; `,' => "\CharLet{63}"; `;' => "\CharLet{64}"; . => \1;It is the composition of two transformations: eth2uni.otp that converts SERA to Unicode:
... ``' `1' `0' `0' `0' `0' => @"137C; ``' `s' `W' `a' => @"1227; ``' `1' `0' `0' => @"137B; ...and uni2font.otp that converts Unicode to font positions, via
@"1200 => "\CharLET{00}"; @"1201 => "\CharLET{01}"; ... @"12FF => "\CharLET{FF}"; @"1300 => "\CharLet{00}"; ... @"137C => "\CharLet{7C}";(Probably there is a more compact way to write this, but something like
@"1200-@"12FF => "\CharLET{" #((\1 div: @"10)-@"F0) #((\1 mod: @"10)+@"30) "}";is not good enough because one has to test for hex digits larger than 9. One could redefine the macros to work in decimal.)
Given these *.otp files, one converts them to *.ocp and installs:
# otp2ocp ethin.otp # cp ethin.otp /usr/share/texmf/omega/otp/oethiopic # cp ethin.ocp /usr/share/texmf/omega/ocp/oethiopic # texhash
It would be cleaner to use eth2uni.otp and uni2font.otp separately, but that works only on small pages, and lambda segfaults on big pages. Folding the two transformations into ethin.otp makes things work on the (still rather small) files I have tried so far.
The resulting mapping is shown in the two tables below.
\documentclass[a4paper]{article} \usepackage[english]{babel} \usepackage{omega} \usepackage[LET,Let,T1,OT1]{fontenc} \ocp\EthIn=ethin \ocplist\EthiopicOCP= \addbeforeocplist 1 \EthIn \nullocplist \def\CharLET#1{\fontencoding{LET}\selectfont\char"#1 } \def\CharLet#1{\fontencoding{Let}\selectfont\char"#1 } \newenvironment{ethiopic}{\fontencoding{LET}\fontfamily{c2000}\selectfont% \pushocplist\EthiopicOCP}% {\popocplist} \newcommand\geez[1]{\Large\begin{ethiopic}#1\end{ethiopic}} \begin{document} \geez{ fedel } \end{document}The resulting output is
Ethiopic uses a syllabary with more than 256 glyphs. This means that one needs two 256-char fonts to cover it. In other words, there will be font changes inside words, and it is most convenient to have a setup where each character says from what font it must be taken.
(That is the reason for the CharLET and CharLet macros above.)
Characters are numbered by the Unicode range U+1200 up to U+137C.
The Unicode book has in version 3.0 the wrong glyph for U+125C. In version 4.0 this is corrected.
Characters are named in the Unicode book by names like ETHIOPIC SYLLABLE HA.
Characters are often represented using the SERA System for representing Ethiopic in ASCII. This system uses transliterations of 1-6 bytes per symbol. See the tables above.
Syllables come in series of length 7 or 8 or 12.
The vowels of the syllables in a series of length 7 (or 8) are indicated by A, U, I, AA, EE, E, O (and WA) in the Unicode names. SERA uses in these cases e, u, i, a, E, -, o, Wa, respectively, where - stands for the empty string. When no written consonant precedes, that is, for U+12A5, the ETHIOPIC SYLLABLE GLOTTAL E, the SERA system uses I.
The vowels of the syllables in a series of length 12 are indicated by A, U, I, AA, EE, E, O, WA, WI, WAA, WEE, WE in the Unicode names. SERA uses in these cases e, u, i, a, E, -, o, We, Wi, Wa, WE, W, respectively.
(Thus: -Wa corresponds to WA in a series of 8, for example lWa is ETHIOPIC SYLLABLE LWA, but -Wa corresponds to WAA in a series of 12, for example kWa is ETHIOPIC SYLLABLE KWAA.)
There is a LaTeX package called ethiop. On my system this doesn't work. Indeed, the documentation that it comes with (the file ethiodoc.tex) is no longer accepted by current LaTeX/babel setup. I don't know who is to blame. (One can get that file through LaTeX by enclosing all fragments \selectlanguage{ethiop}...\selectlanguage{english} in braces, and turning {\eth :} into {\eth :{}}.)
Bug in lambda: With two tables in the same file it crashes: Segmentation fault.