TEI

Dictionaries

Background

Brill publishes are great number of dictionaries, some as monographs, others as reference works. Some bear the word “dictionary” in their title, like The Brill Dictionary of Ancient Greek Online, others words like “Lexicon” or “Concordance”, others are harder still to recognize. Many dictionaries can be found on http://dictionaries.brillonline.com/, some on http://chinesereferenceshelf.brillonline.com/, and others on Brill Online Reference Works or Brill.com.

No systematic approach to structuring (“tagging”) the content has been applied so far. These guidelines aim to change that, using the TEI standards and best practices as a reference point.

Guidelines

TEI discusses dictionaries in Chapter 9 Dictionaries. As always: TEI offers a wealth of options, so Brill chooses what is pertinent, based on concrete cases. It is perhaps good to add that TEI should not be taken as a tag set but as a model for structuring content. TEI helps us analyze content: this is the basis for any structuring, in XML or whatever medium.

Explanation and examples

The structure may very per dictionary. Let’s leave aside front and back matter and subdivisions for the moment and focus on the entry. (TEI often supposes that the electronic version wants to preserve the print features. However, online this does not always make sense; also, there is a difference between making an electronic facsimile of a dictionary and transforming a dictionary into a digital publication environment). The semantics of the entry may, again, differ per dictionary. For example, some dictionaries may provide distinct entries for homographs, other may combine them. However, typically an entry contains a headword, i.e. some morphological form of the lexical item described, is sorted in some meaningful order (alphabetical or otherwise), and is divided into senses (or subsenses). This gives us the following basic structure:

<entry>
    <form>
        <orth></orth>
    </form>
    <hom n="1">
        <sense n="1"></sense>
        <sense n="2"></sense>
        <!--  etc. etc. -->
    </hom>
</entry>

For the sake of consistence, it might be prudent to use <hom> even if there is only one homograph. Let’s now take a closer look at the entry structure. Entries can contain different bits of information about a lexical entity, called “top-level constituents” in TEI speak. These include:

This gives us the following more complex structure:

<entry>
    <hom>
        <form>
            <orth></orth>
            <gramGrp>
                <pos></pos>
            </gramGrp>
        </form>
        <sense>
            <def></def>
            <cit>
                <quote></quote>
            </cit>
            <usg></usg>
            <xr></xr>
            <etym></etym>
            <note></note>
            <re></re>
        </sense>
    </hom>
</entry>

Of course, this is an artificial example; in reality, there will be more information, and the sequence of tags will be different.

Applying CTS, CITE, and CapiTainS

There is no point in repeating the Brill CTS Guidelines here. Instead, we give a summary:

Entries belong to the physical structure of a dictionary. THey are objects in a collection. For purposes of referencing, they need to receive a CITE URN. There can also be other identifiers:

Senses belong to the logical structure of the dictionary. They can be connected to texts and corpora, as annotations. For such purposes, they need to receive a CITE URN. So, a sense has a

Likewise, a subsense has

Citations receive a CTS or CITE. To be precise:

For example:

<ref target="urn:cts:brill.bra00010.brw00001.296.11"><cit type="example"><quote xml:lang="grc-Grek">ἐπεὶ οὖν τῷ ἀληϑῶς ἀγαϑῷ πρὸς τὸ μή άληϑῶς ἀγαϑόν έστιν ἡ ἀ., ἀμεσος δὲ τῶν δύο τούτων ἡ έναντίωσις</quote><bibl>34,34,9</bibl></cit></ref>