Each Brill publication carries with it a considerable amount of information about the work itself, its source, its revisions, and (if applicable) its encodings and schemas. The purpose of this information – sometimes called metadata – is to make the work accessible and discoverable for scholars who use it, applications that process it, and librarians that catalogue it. To achieve this purpose, such information is declared in a formal way. In TEI, this is done in the TEI header.
See TEI Guidelines Chapter 2 The TEI Header.
Each TEI conformant XML file consists of a metadata section, called TEI header, and a section for one or more texts, the <body>
. The TEI header describes the encoded work and so documents the text itself, its source, its encoding, and its revisions.
The TEI header is tagged as <teiHeader>
and has five major parts:
The following subsections discuss each of these parts.
Brill needs to record bibliographical data concerning its publications. The TEI element <fileDesc>
in the <teiHeader>
is the place to do so. This element is mandatory.
The name of the element <fileDesc>
suggests a “file description”, but is a description of the work manifested in the file rather than of the file itself. This is because the file, an XML file, can occur in multiple places in the production and publication workflow; and it is the work that interests its users, not the file. Note that <fileDesc>
does not describe previous or other manifestations of the work. Within the <fileDesc>
element, TEI has three large components: <titleStmt>
, <publicationStmt>
, and <sourceDesc>
. Their use by Brill is explained below, starting with the title statement.
See TEI Guidelines Chapter 2.2 The File Description. As always, TEI offers a host of options. For reasons of simplicity and uniformity, Brill chooses a minimum number of generic elements based on concrete needs.
Brill uses <titleStmt>
to store information about (1) the title of the publication in <title>
; (2) the author or editor in <author>
or <editor>
; (3) the funder in <funder>
. “Publication” in this context is usually an entry in a reference work. For the author, first name and last name are recorded in <forename>
and <surname>
, as well as some identifier, such as ORCID, VIAF or ISNI in @ref
. Funder is currently not used – hence the value “placeholder” is currently used as default – but is expected to gain importance as Open Access becomes more wide-spread. No doubt funder name and identifier will be recorded, particularly the FundRef ID.
A @level
attribute to <title>
indicates that the publication is a part of a greater whole: an entry in a reference work. TEI uses the value “a” (for “analytic”) in these cases. This value, and the encoding of bibliographic data in general, is covered in more detailed in the section on bibliographies.
A further attribute, @sortKey
, indicates the sort order of the title if it contains non-alpha-numerical information or if a special sequence is required. This information is not attached to <title>
but to an additional (and otherwise empty) element, <idno>
, which also carries a @type="sort"
attribute.
The title statement may further contain a <respStmt>
element, stating who else was involved in the publication, e.g. the person responsible for the translation or the digitization.
<titleStmt>
<title level="a">Philological and Historical Commentary on Ammianus Marcellinus XXII</title>
<title level="s">Ammianus Marcellinus Online</title>
<author ref="http://viaf.org/viaf/32019540">
<name type="person">
<forename>J.</forename>
<surname>den Boeft</surname>
</name>
<idno type="ORCID">https://orcid.org/0000-0001-6606-2405</idno>
<idno type="ISNI"/>
</author>
<funder ref="placeholder"/>
</titleStmt>
In the publication statement, Brill records the following: (1) publisher – usually Brill, but could also be an imprint; (2) publication date; (3) date; (4) digital object identifier (DOI) of the publication (i.e. entry); (5) product identifiers for access management systems – either the “sams-id”, or “publisher-id”, or both; (6) information about the conditions of use, particularly the availability, with the status values of “free”, “restricted” or “unknown” and the Creative Commons license.
<publicationStmt>
<publisher>BRILL</publisher>
<pubPlace>Leiden | Boston | Singapore</pubPlace>
<date>2018</date>
<idno type="DOI"/>
<idno type="sams-id"/>
<idno type="publisher-id">AMO<idno/>
<idno type="ISSN"/>
<availability status="restricted">
<licence target="https://creativecommons.org/licenses/by-nc/4.0/">Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)</licence>
</availability>
</publicationStmt>
The source description describes the work of which the entry forms a part. It gives (1) the title; (2) the author or editors (again with first name, last name, and identifier); (3) publisher; (4) place of publication; (5) publication date; (6) work identifier like ISBN or ISSN. This information is placed in a <bibl>
element in a <sourceDesc>
element. More about <bibl>
and bibliographic data can be found below.
<sourceDesc>
<bibl>
<title level="m">Philological and Historical Commentary on Ammianus Marcellinus XXII</title>
<author ref="http://viaf.org/viaf/32019540">
<name type="person">
<forename>J.</forename>
<surname>den Boeft</surname>
</name>
<idno type="ORCID"/>
<idno type="ISNI"/>
</author>
<publisher> Egbert Forsten</publisher>
<pubPlace>Groningen </pubPlace>
<date>1995</date>
<idno type="ISBN">90 6980 086 1</idno>
</bibl>
</sourceDesc>
<respStmt>
).<affiliation>
and <orgName>
in <author>
or <editor>
.<revisionDesc>
(see also below) is used to record previous publication dates. (Remember, an XML file can serve as the basis for both print and online publications). Such information might be more clearly stored in the source description.<titleStmt>
and <sourceDesc>
. For now, Brill has placed entry (“part”) information in the former – with the exception of the entry identifier, which is an attribute of the root element – and reference work (“whole”) information in the latter. Consider a more intuitive arrangement.Brill wishes to record the required CSS stylesheets to render the XML files in an HTML environment, as well as various items related to font and type. The TEI element <encodingDesc>
in the teiHeader is the place to do so. It is also the place for two more specialize features.
See the TEI Guidelines Chapter 2.3 The Encoding Description.
The <encodingDesc>
element is the second major subdivision of the TEI header. In Brill XML, it is mandatory. In TEI, this element specifies the methods and editorial principles used when encoding (“XML-izing”) the document. (Elements with names like <projectDesc>
or <editorialDecl>
give a flavor of its intended use). In Brill XML, however, it is used to convey information about rendering and functionality to subsequent parsers. There are four points of information.
<encodingDesc>
, a <styleDefDecl>
element declares the requires version of the Cascading Style Sheets: <styleDefDecl scheme="css" schemeVersion="3.0"/>
<tagsDecl>
element wrapped in <encodingDesc>
, for example as follows: <tagsDecl><rendition xml:id="italic">font: italic;</rendition></tagsDecl>
. Any element that carries the @rendition
attribute with the value “#italic” will have its contents rendered in the italic type by the stylesheet.<encodingDesc>
, a <classDecl>
element may declare classification schemes. Brill doesn’t use this at the moment. (Instead, a classification like the Brill-internal subjects are listed in <textClass>
in <profileDesc>
). But it makes sense to do so, in terms of discoverability, transparency and interoperability, and so it is good to know it is possible. The actual tagging may look something like this: <taxonomy><category><catDesc><term>
<encodingDesc>
, there is a <refsDescl>
element containing a <cRefPattern>
which defines X-paths. This mechanism allows applications to take a reference and locate the relevant text part in the document. It acts as a bridge between a reference (with a formal syntax, as defined by CITE) and an XML file (likewise CITE-compliant).<encodingDesc>
<styleDefDecl scheme="css" schemeVersion="3.0"/>
<tagsDecl>
<rendition xml:id="italic">font:italic;</rendition>
<rendition xml:id="bold">font:bold;</rendition>
<rendition xml:id="underline">font:underline;</rendition>
<rendition xml:id="subscript">font:subscript;</rendition>
<rendition xml:id="superscript">font:superscript;</rendition>
<rendition xml:id="smallcaps">font:smallcaps;</rendition>
</tagsDecl>
</encodingDesc>
For purposes of discoverability, it is highly recommended to replace the idiosyncratic Brill subjects by subjects that follow an internationally accepted standard, such as the Library of Congress Subject Headings (LCSH). Declare the subjects in a <textClass>
element.
A crosswalk from the idiosyncratic Brill subjects (used in MRWs) to LCSH subjects is found below. Note that no simple 1:1 crosswalk is possible, because LCSH constitutes a complex system of thought, with its own structure and choices.
Idiosyncratic Brill subject | LCSH subject | URI |
---|---|---|
African Studies | Africa | http://id.loc.gov/authorities/subjects/sh85001531 |
Biblical Studies | Bible–Study and teaching | http://id.loc.gov/authorities/subjects/sh85013718 |
Classical Studies | Classical languages | http://id.loc.gov/authorities/subjects/sh85026704 |
History | History | http://id.loc.gov/authorities/subjects/sh85061212 |
International Relations | International Relations | http://id.loc.gov/authorities/subjects/sh85067435 |
Jewish Studies | Jews–Study and teaching | http://id.loc.gov/authorities/subjects/sh85070451 |
Language and Linguistics | Linguistics | http://id.loc.gov/authorities/subjects/sh85077222 |
Law | Law | http://id.loc.gov/authorities/subjects/sh85075119 |
Middle East and Islamic Studies | Islam | http://id.loc.gov/authorities/subjects/sh85068390 |
Religious Studies | Religion | http://id.loc.gov/authorities/subjects/sh85112549 |
It is often useful to provide more information about a publication than mere bibliographical details. In particular, Brill wishes to record (1) abstract; (2) languages; and (3) keywords. In TEI XM, the optional <profileDesc>
element is the place to do so.
The TEI Guidelines describe this element in Chapter 2.4 The Profile Description. As always, TEI offers many options – for example, the element may be used to describe the situation in which the document was produced, the participants, and their setting – but these Guidelines confine themselves to Brill’s present needs.
Nested in <profileDesc>
, one or more <abstract>
elements house the abstracts. Use the universal @xml:lang
attribute to state the abstract’s language. Use <p>
elements to store text blocks.
<abstract xml:lang="eng-lat">
<p>Originally founded to aid and defend pilgrims in the Holy Land, the crusading orders soon expanded their activities to the military defense of Christendom from its internal and external enemies.</p>
</abstract>
The <language>
element declares which languages are used in the document. (Distinguish this from the @xml:lang
attribute in the <teiHeader>
which declares the language of the document’s metadata). Nested within it sits the <language>
element. This is a closed elements that declares the language through a value of its @ident
attribute. This attributes identifies the language in a formal way: it is uses the international standard of BCP 47 codes: https://tools.ietf.org/html/bcp47.
If the language is indicated for particular parts of the document with the use of @xml:lang
– in Brill publications, this is often the case – then the exact values of @ident
must be used. More information about Brill’s use of these tags and BCP 47 can be found in a separate section of these guidelines.
<langUsage>
<language ident="en"/>
</langUsage>
Brill uses two different kinds of keywords:
Brill subject classification of publication and hence the documents within it, e.g. “History” or “Classics”
Keywords classifying a document, usually assigned in the MRW CMSs. In a special case, such keywords are used to create search facets on the Brill publication platform.
Within the <textClass>
element, the <keywords>
element houses the keywords. Within it, each keyword is placed with in a <term>
element. Brill uses the keywords@ana
attribute to state the class of keywords; these are the search facets on the platform. The keywords@scheme
attribute identifies the scheme. For now, the default value is “brill:subject”.
<textClass scheme="brill:subject">
<keywords><term>History</term></keywords>
</textClass>
@ana
)<textClass>
<keywords ana="#classification\_region">
<term>Asia</term>
</keywords>
<keywords ana="#classification\_country">
<term>China, People's Republic of</term>
</keywords>
</textClass>
For reasons of discoverability and interoperability, it is strongly recommended to use a recognized international subject classification instead of the proprietary scheme in current use. See the Appendix on Subject terms for a proposal using the Library of Congress Subject Headings.
The present solution of storing keyword categories in keyword@ana
is simple, but not completely in line with TEI. It would be better to define the scheme in a <taxonomy>
element in <classDecl>
(itself in <encodingDesc>
and then refer to it in a <catRef>
element in <textClass>
.
The optional element xenoData (outside metadata) provides a container element into which metadata in non-TEI formats may be placed. Think of MARC21 or Dublin Core records, or RDF triples.
See http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD9
How to manage changes to documents, either before or after publication? TEI XML allows for a “change log” in the <teiHeader>
element. This may be used in combination with a version management system, for example as part of a content management system.
The change log is covered in the TEI Guidelines, chapter 2.6 The Revision Description: http://www.tei-c.org/release/doc/tei-p5-doc/en/html/HD.html#HD6
The change log takes the form of a <revisionDesc>
element. Nested in it are <change>
elements, which may take the following attributes: @when
to document the precise revision date, @type
to document the sort of revision, e.g. “correction”, and @who
to document the person making the change. A @status
attribute can be used to document the status of the revision or the revised document, e.g. “draft”. It is also possible to group <change>
elements in a <listChange>
element. The contents of the <change>
element can be used to describe the change.
TEI strongly recommends the use of this element: “No significant change should be made in any TEI-conformant file without corresponding entries being made in the change log”. Brill follows this advice, focusing on changes after publication.
At present, Brill distinguished three statuses of a document after publication:
first-online. A document has this status when it is published in print first and online later. The print date is then found in the <publicationStmt>
.
first-print. A document has this status when it is published online first and in print later. The online publication date is declared in <publicationStmt>
.
last-update. A document has this status when it is revised after publication, usually online.
These three status are added as values of the @status
attribute to the <revisionDesc>
element.
<revisionDesc status="last-update" when="20170623" status="correction" who="Angelos Chaniotis"/>
In the current scheme, statuses like “first-update”, “second-update”, etc. are missing. Use them instead of “last-update” which is to be avoided. Also, generate such information from a version management system, not manually.
Only use <sourceDesc>
(see also above) for the source in a conversion project.
Use <publicationStmt>
(see also above) in the following manner:
<publicationStmt>
<publisher>Brill</publisher>
<pubPlace>Leiden | Boston</pubPlace>
<date>2018</date>
<idno></idno>
<idno type="DOI"/>
<idno type="publisher-id">AMO</idno>
<idno type="ISSB"/>
<availability status="restricted">
<licence target="placeholder">Placeholder</licence>
</availability>
</publicationStmt>
These <idno>
elements are now found in <bibl>
in <sourceDesc>
. Also, place <title level ="a">
(and <title level ="m">
or <title level ="s">
in <titleStmt>
, followed by first the author of the entry and then the editor of the MRW as whole.
@type
and @when
attributes to the publication date, e.g. <date type="online" when="20061001">2006</date>
.