Message tei:94 - Read From: BITNET list server at UICVM To: Joseba Abaitua Subject: File: "P3ST DOC" >Content-Transfer-Encoding: 7BIT Chapter 3 STRUCTURE OF THE TEI DOCUMENT TYPE DEFINITION This chapter describes the overall structure of the encoding scheme defined by these Guidelines. It introduces the conceptual framework within which the following chapters are to be understood, and describes the technical means by which that conceptual framework is implemented in SGML. It assumes some familiarity with SGML; see chapter 2, "A Gentle Introduction to SGML," on page 2. The TEI encoding scheme consists of number of modules or DTD frag- ments which we refer to below as tag sets. Selected tag sets may be combined in many different ways, according to principles described in this chapter, within the framework of the TEI main DTD. Auxiliary tag sets are also defined for specific purposes independent of the TEI main DTD. The DTD fragments from which the main TEI DTD is constructed may be classified as follows: * core DTD fragments * base DTD fragments * additional DTD fragments The first two sections of this chapter discuss these distinctions and list the specific tag sets included in each category. Section 3.3, "Invocation of the TEI DTD," on page 4 describes how to invoke the TEI document type declaration, and how to specify which of the various base tag sets and optional additional tag sets are used in a document. The global attributes, characteristics postulated of every element or tag in the encoding scheme, are defined in section 3.5, "Global Attri- butes," on page 4. The remainder of the chapter contains a more technical description of the SGML mechanisms used to implement the encoding scheme. It may be skipped at a first reading, but a proper understanding of the topics addressed here is essential for anyone planning to modify or extend the TEI encoding scheme in any way (see also chapter 29, "Modifying the TEI DTD," on page 47), and also highly desirable for those wishing to take full advantage of its modular nature. The structure of the main TEI DTD file itself is outlined in section 3.6, "The TEI2.DTD File," on page 4. The element classes used to define smaller groups of elements and their characteristics are described in section 3.7, "Element Classes," on page 5. Both global attributes and element classes are implemented using SGML parameter entities; various other uses of parameter entities in the TEI DTDs are discussed in section 3.8, "Other Parameter Entities in TEI DTDs," on page 6. 3.1 Main and Auxiliary DTDs These Guidelines define a large number of SGML tags for marking up documents, all of which are formally defined within the document type declaration (DTD) files provided by the TEI and documented in the remainder of the present document. The tags are grouped into tag sets or DTD fragments, each comprising a set of declarations for tags which belong together in some respect, typically related to their intended application area. All tags used to transcribe documents are available for use within the main DTD of the TEI and are defined in Parts III and IV of these Guidelines. There are DTD fragments for prose and mixed matter, verse and verse collections, drama, dictionaries, analysis and interpretation of text, text criticism, etc. A full list, including the files in which they are defined, and the rules determining their selection and combina- tion, is given in section 3.2, "Core, Base, and Additional Tag Sets." A number of auxiliary DTDs are also defined in these Guidelines. These are used for the encoding of ancillary descriptive information useful when processing electronic documents. Part V of these Guidelines describes several such auxiliary document types, specifically: independent header: for use with sets of TEI headers regarded as docu- ments in their own right, for example by libraries or archives exchanging details of their holdings (see chapters 5, "The TEI Head- er," on page 7 and 24, "The Independent Header," on page 43). writing system declaration: used to define and document character sets or transliteration schemes (see chapters 4, "Characters and Character Sets," on page 7and 25, "Writing System Declaration," on page 43). feature system declaration: used to define and document sets of analyt- ic features (see chapters 16, "Feature Structures," on page 30 and 26, "Feature System Declaration," on page 45). tag set declaration: used to define and document descriptive documenta- tion for TEI-conformant tag sets (see chapter 27, "Tag Set Documenta- tion," on page 45). An independent header typically describes the encoding of a specific document, but in the case of a planned corpus or collection, it may define a set of encoding practices common to all texts in the collec- tion. The other auxiliary document types provide information likely to be relevant to many documents, rather than to individual documents. When individual TEI documents are exchanged between sites, they should be accompanied by whatever auxiliary documents apply to them. When larger groups of documents are exchanged, the relevant auxiliary documents need be exchanged only once. For further information see chap- ter 30, "Rules for Interchange," on page 48. The DTD files containing these auxiliary DTDs are: teishd2.dtd : independent header teiwsd2.dtd : writing system declaration teifsd2.dtd : feature system declaration teitsd2.dtd : tag set declaration Some of these auxiliary DTDs also make use of the core tag set defined as part of the main TEI DTD; this is described in the relevant chapters of part V. 3.2 Core, Base, and Additional Tag Sets The main TEI DTD is constructed by selecting an appropriate combina- tion of smaller tag sets, each containing some set of tags likely to be used together. These building blocks include: core tag sets : standard components of the TEI main DTD in all its forms; these are always included without any special action by the encoder; base tag sets : basic building blocks for specific text types; exactly one base must be selected by the encoder (unless one of the "combined" bases is used); additional tag sets : extra tags useful for particular purposes. All additional tag sets are compatible with all bases and with each other; an encoder may therefore add them to the selected base in any combina- tion desired. Each tag set is contained in one or more system files, which are defined by appropriate SGML parameter entity declarations and invoked as a unit by appropriate SGML parameter entity references.(15) Several such dec- larations may be needed to invoke all parts of a given tag set, since as well as defining elements or attributes, a tag set may (for example) add new items to the set of global attributes or add classes to the system of element classes. Consistent naming principles are applied throughout the TEI scheme for these and other entities. Thus, assuming a tag set named xxx, the following parameter entities may be encountered: TEI.xxx: used to enable or disable tag set xxx; must have the value INCLUDE (tag set is enabled) or, by default, IGNORE (tag set not enabled). TEI.xxx.ent: refers to a system file containing any parameter entity declarations unique to tag set xxx. TEI.xxx.dtd: refers to a system file containing the element and attri- bute list declarations for tag set xxx. a.xxx: contains definitions of attributes which are to be added to the set of global attributes when tag set xxx is enabled. m.comp.xxx: if xxx is a base tag set, this contains a list of any component-level elements unique to it (for a definition of component- level elements, see section 3.7, "Element Classes," on page 5). mix.xxx: a special entity for use in defining the set of component- level elements when the mixed base tag set is in use. gen.xxx: a special entity for use in defining the set of component- level elements when the general base tag set is in use. Few tag sets declare all of these entities; only those actually used are declared. The interpretation of the parameter entity declarations, and the inclusion of the appropriate tag sets, are handled by a single "driver file" for the main TEI DTD. This file, tei2.dtd, is described in detail below in section 3.6, "The TEI2.DTD File." The remainder of the present section identifies the files in which each tag set is contained, and the parameter entities associated with them. 3.2.1 The Core Tag Sets Two "core" tag sets are always included in every invocation of the main TEI DTD. The tags and attributes that they contain are therefore available to any TEI document. The parameter entities used for this purpose, and the files they refer to, are: TEI.core.dtd : refers to the file teicore2.dtd, which declares the core tags defined in chapter 6, "Elements Available in All TEI Docu- ments," on page 10 TEI.header.dtd : refers to the file teihdr2.dtd, which declares the tags of the TEI header defined in chapter 5, "The TEI Header," on page 7 Together with these tag sets, part II also documents a tag set for default text structure and front and back matter. This tag set is embedded by the base tag set selected, and may vary with the base; it is therefore described in the next section. 3.2.2 The Base Tag Sets The base tag sets are those which define the basic building blocks of different text types. The basic structures of verse (line, stanza, can- to, etc.), for example, are not those of prose (paragraph, section, chapter, etc.), while dictionaries use yet another set of basic struc- tures. Each base corresponds to one chapter of Part III of this docu- ment. In general, exactly one base tag set must be selected for any TEI- conformant document. Errors will result if none, or more than one, is selected, because the same elements may be differently defined in dif- ferent base tag sets. For documents which mingle structurally dissimi- lar elements and require elements from more than one base, however, either the mixed base or the general base may be used; see section 3.4, "Combining TEI Base Tag Sets." These bases require the encoder to spec- ify which of the other bases are to be combined. The encoder selects a base tag set by declaring the appropriate SGML parameter entity with the replacement text INCLUDE. To invoke the base tag set for prose, for example, the encoder must ensure that the DTD subset in the document contains the declaration: The entities used to select the different base tag sets, and the files containing the SGML declarations for each base, are listed below. TEI.prose : selects the base tag set for prose, contained in teip- ros2.dtd. TEI.verse : selects the base tag set for verse, contained in teiv- ers2.dtd and teivers2.ent. TEI.drama : selects the base tag set for drama, contained in teid- ram2.dtd and teidram2.ent. TEI.spoken : selects the base tag set for transcriptions of spoken texts, contained in teispok2.dtd and teispok2.ent. TEI.dictionaries : selects the base tag set for print dictionaries, contained in teidict2.dtd and teidict2.ent. TEI.terminology : selects the base tag set for terminological data files, contained in teiterm2.dtd, teiterm2.ent, teite2n.dtd, and teite2f.ent. TEI.general : selects the generic mixed-mode base base tag set, con- tained in teigen2.dtd. TEI.mixed : selects the base tag set for free mixed-mode texts, con- tained in teimix2.dtd. As shown in the list, each base tag set is normally contained in one or two system files: a required one (with the extension dtd) defining the elements in the tag set and their attributes, and an optional one (with the file extension ent) defining any global attributes or special- ized element classes enabled by that tag set. The parameter entities for these files have the same name as the enabling parameter entity for the base, with the suffixes ent and dtd respectively: the prose base, for example, is enabled by declaring the parameter entity TEI.prose as INCLUDE; this in turn enables declarations of TEI.prose.ent and TEI.prose.dtd as the system files teipros2.ent and teipros2.dtd. For further details, see section 3.6, "The TEI2.DTD File." Most base tag sets (but not necessarily all) embed common definitions of text structure, front matter, and back matter, by referring to three standard parameter entities; these are: TEI.structure.dtd : refers to the file teistr2.dtd, with default defi- nitions for ,
, etc. TEI.front.dtd : refers to the file teifron2.dtd, with tags for front matter TEI.back.dtd : refers to the file teiback2.dtd, with tags for back matter These default-structure tags are documented in chapter 7, "Default Text Structure," on page 15. 3.2.3 The Additional Tag Sets The additional tag sets define optional tags required by different encoders for different types of analysis and processing; each corre- sponds to a chapter in part IV of this document. In any TEI encoding, any or all of these additional tag sets may be made available, as they are all compatible with each other and with every base tag set. They are invoked in the same way as base tag sets, by defining the appropri- ate parameter entity as INCLUDE; the relevant parameter entities, and the files containing the additional tag sets, are these: TEI.linking : embeds the files teilink2.dtd and teilink2.ent, with tags for linking, segmentation, and alignment (chapter 14, "Linking, Segmentation, and Alignment," on page 26) TEI.analysis : embeds the files teiana2.dtd and teiana2.ent, with tags for simple analytic mechanisms (chapter 15, "Simple Analytic Mecha- nisms," on page 29) TEI.fs : embeds the file teifs2.dtd, with tags for feature structure analysis (chapter 16, "Feature Structures," on page 30) TEI.certainty : embeds the file teicert2.dtd, with tags for indicating uncertainty and probability in the markup (chapter 17, "Certainty and Responsibility," on page 33) TEI.transcr : embeds the files teitran2.dtd and teitran2.ent, with tags for manuscripts, analytic bibliography, and transcription of pri- mary sources (chapter 18, "Transcription of Primary Sources," on page 34) TEI.textcrit : embeds the files teitc2.dtd and teitc2.ent, with tags for critical editions (chapter 19, "Critical Apparatus," on page 35) TEI.names.dates : embeds the files teind2.dtd and teind2.ent, with specialized tags for names and dates (chapter 20, "Names and Dates," on page 37) TEI.nets : embeds the file teinet2.dtd, with tags for graphs, digraphs, trees, and other networks (chapter 21, "Graphs, Networks, and Trees," on page 38) -- not to be confused with the graphics markup of TEI.figures TEI.figures : embeds the files teifig2.dtd and teifig2.ent, with tags for graphics, figures, illustrations, tables, and formulae (chapter 22, "Tables, Formulae, and Graphics," on page 39) -- not to be con- fused with the graph-theoretic markup of TEI.nets TEI.corpus : embeds the file teicorp2.dtd, with tags for additional tags for language corpora (chapter 23, "Language Corpora," on page 40) Like the base tag sets, the additional tag sets are each contained in one or two system files: a required one (with the file extension dtd) defining the elements in the tag set and their attributes, and an optional one (with the file extension ent) defining any global attri- butes or specialized element classes enabled by that tag set. The parameter entities for these files have the same name as the enabling parameter entity for the tag set, with the suffixes ent and dtd respec- tively: the additional tag set for linking, segmentation, and align- ment, for example, is enabled by declaring the parameter entity TEI.linking as INCLUDE; this in turn enables declarations of TEI.linking.ent and TEI.linking.dtd as the system files teilink2.ent and teilink2.dtd. 3.2.4 User-Defined Tag Sets As described in chapter 29, "Modifying the TEI DTD," on page 47, users may modify the markup language defined here by renaming elements, suppressing elements, adding new elements, or modifying element or attribute-list declarations. In general, local modifications will be most conveniently grouped into two files: one containing the local mod- ifications to parameter entities used in the DTDs, and the other con- taining new or modified declarations of elements and their attributes. These files will be embedded in the TEI DTD if they are associated with the following two parameter entities: TEI.extensions.ent : local modifications to parameter entities TEI.extensions.dtd : declarations of new elements and modified decla- rations for existing elements In some cases, users may wish to provide completely new base or addi- tional tag sets, to be invoked in the same way as those defined in this document; such tag sets should also be divided into "entity files" and "DTD files" in the same way as the standard tag sets. Such modifica- tions should be undertaken only with a thorough understanding of the interface among core, base, and additional tag sets as documented in the final sections of this chapter; see in particular section 3.6.2, "Embed- ding Local Modifications." Further recommendations for the creation of user-defined extension or modification are provided in chapters 29, "Modifying the TEI DTD," on page 47 and 28, "Conformance," on page 47. 3.3 Invocation of the TEI DTD Like any other SGML document, a TEI document must begin with a docu- ment type definition (DTD). Local systems may allow the DTD to be implicit, but for interchange purposes it must be explicit. Because of its highly modular nature, it may in any case be desirable for the com- ponent parts of the TEI DTD to be made explicit even for local process- ing. The simplest version of the TEI DTD names the main TEI DTD file as an external file, and specifies a single base tag set for use in the docu- ment, using the parameter entity names specified in section 3.2, "Core, Base, and Additional Tag Sets," on page 3. For example, a document using the base tag set for prose will begin with a document type decla- ration something like this: ]> A document using the base tag set for drama will define a different parameter entity: ]> If one or more of the additional tag sets described in Part IV are to be used, they are invoked in the same way as the base tag set. A docu- ment using the base tag set for prose, with the additional tag sets for text criticism and for linking, segmentation, and alignment, for exam- ple, will begin with a document type declaration something like this: ]> If local modifications are used, they may be stored in separate files and pointed to using the parameter entities TEI.extensions.ent and TEI.extensions.dtd. If such local modifications are added to the exam- ple just given, this is the result: ]> If the document requires tags which are defined in different base tag sets (e.g. prose and drama) or embeds smaller texts which use different base tag sets, then one of the mixed-type bases must be used. Their proper invocation is described below in section 3.4, "Combining TEI Base Tag Sets." 3.4 Combining TEI Base Tag Sets The TEI DTD has been designed to simplify the task of choosing an appropriate set of tags for the text in hand. The core tag set includes tags appropriate to the majority of simple tagging requirements for prose, verse and drama, irrespective of the base tag set chosen. For more detailed tagging, the encoder may choose the prose base for prose texts, the verse base for verse, and so on. In discussing these base tag sets elsewhere in these Guidelines, it is generally assumed for clarity of exposition that a text will fall into one, not several, of these types. It is not uncommon, however, for a text to combine prose and verse, or other forms treated by the TEI as different bases. Examples include: * when the text is a collection of other texts, which do not all use the same base: e.g. an anthology of prose, verse, and drama * when the text contains other smaller, embedded texts: e.g. a poem or song included in a prose narrative * when some sections of the text are written in one form, and others in a different form: e.g. a novel where some chapters are in prose, others take the form of dictionary entries and still others the form of scenes in a play * when the text moves back and forth among forms not between sections but within a single section: e.g. mixed prose-and-verse forms like many pastorals or like some portions of the Poetic Edda The TEI DTD provides the following mechanisms to handle these cases: * a definition of a corpus or collection as a series of docu- ments, sharing a common TEI header (see chapter 23, "Language Corpo- ra," on page 40) * a definition of composite texts which comprise front matter, a group or several possibly nested groups of collected texts, themselves possibly composite (see section 7.3, "Groups of Texts," on page 15) * a notion of embedded text which allows one text to be embedded with- in another (that is, is defined as a component-level element, as described briefly at the conclusion of section 7.3, "Groups of Texts," on page 15) Whichever mechanism is adopted, if the whole of the resulting docu- ment is to be parseable by the main TEI DTD it may need to combine ele- ments from different TEI base tag sets. Two special-purpose base tag sets are defined for this purpose: * the general base, which allows different sections of a text to use different bases, but ensures that each section uses only one base * the mixed base, which allows chunk- and inter-level elements from any base to mix within any text division When either of these "combined" bases is used, the user must specify all of the other bases to be included in the mix as well as either the general or the mixed base. This is the only exception to the general rule that no more than one base tag set may be enabled in a TEI docu- ment. The following set of declarations for example allows for any mix- ture of the low level structural tags defined in the prose, drama and dictionary base tag sets: ]> The following set of declarations has the same effect, but with the additional restriction that each text division (i.e. each member of the element class divn) must be homogenous with respect to the mixture of available bases. Because in a "general" base, each
of the text may use a different base, the divisions of the text prefixed by this set of declarations will each be composed of elements taken solely from one of the prose, verse or dictionary base tag sets: ]> The actual DTD fragments for the combined bases do nothing but embed the default tag set for overall text structure. The mixed-base tag set is in file teimix2.dtd: %TEI.structure.dtd; The general-base tag set is in file teigen2.dtd: %TEI.structure.dtd; 3.5 Global Attributes The following attributes are defined for every TEI element.(16) id : provides a unique identifier for the element bearing the ID val- ue. n : gives a number (or other label) for an element, which is not nec- essarily unique within the document. lang : indicates the language of the element content, usually using a two- or three-letter code from ISO 639. rend : indicates how the element in question was rendered or presented in the source text. Some tag sets (e.g. those for terminology, linking, and analysis) define other global attributes; these are documented in the appropriate chapters of Part III and Part IV. See also section 3.7.1, "Classes Which Share Attributes," on page 5. An additional attribute, TEIform, is also defined for every TEI ele- ment. Unlike the other attributes defined for every element, TEIform is not defined by class global because its default value is different in every case and must be defined individually for each element.(17) TEIform : indicates the standard TEI name (generic identifier) for a given element. Any TEI element may be given values for id, n, lang, rend, or TEI- form, simply by specifying values for these attributes. The following two examples convey the same information about the text: that the material transcribed occurs within a

element (paragraph). They dif- fer only in that the second provides an identifier for the paragraph, to which other elements (e.g. notes or hypertext links) can conveniently refer.

If to do were as easy as to know what were good to do, chapels had been churches and poor men's cottages princes' palaces. It is a good divine that follows his own instructions ...

If to do were as easy as to know what were good to do, chapels had been churches and poor men's cottages princes' palaces. It is a good divine that follows his own instructions ...

ID values must be legal SGML names; by default, this means they must begin with a letter from A to Z or a to z and contain no characters oth- er than the letters A to Z or a to z, the digits 0 to 9, the full stop, and the hyphen. Furthermore, by default upper and lower case letters are not distinguished: thus, the strings a23 and A23 are identical, and may not be used to identify two distinct elements. If two elements are given the same identifier, the SGML parser will signal a syntax error. The following example, therefore, is not valid:

What's it going to be then, eh?

There was me, that is Alex, and my three droogs, that is Pete, Georgie, and Dim, ...

For a discussion of methods of providing unique identifiers for ele- ments, see section 6.9.2, "Creating New Reference Systems," on page 13. The n attribute allows identifying information (e.g. chapter num- bers, etc.) to be encoded even if it would not be a legal id value. Its value may be any string of characters; typically it is a number or other similar enumerator or label. For example, the numbers given to the items of a numbered list may be recorded with the n attribute; this would make it possible to record errors in the numeration of the origi- nal, as in this list of chapters, transcribed from a faulty original in which the number 10 is used twice, and 11 is omitted: About These Guidelines A Gentle Introduction to SGML Verse Drama Spoken Materials Printed Dictionaries The n attribute may also be used to record non-unique names associated with elements in a text, possibly together with a unique identifier as in the following examples:
The lang attribute indicates the language, writing system, and char- acter set associated with a given element and all its contents. If it is not specified, the value is inherited from that of the immediately enclosing element. As a rule, therefore, it is simplest to specify the base language of the text on the element, and allow most ele- ments to take the default value for lang; the language of an element then need be explicitly specified only for elements in languages other than the base language. The following two encodings convey the same information about the language of the text, since in the first the lang attributes on the elements specify the same value as that on the parent

ele- ment, while in the second they inherit that value without specifying it.

... Both parties deprecated war, but one of them would make war rather than let the nation survive, and the other would accept war rather than let it perish, and the war came.

... Both parties deprecated war, but one of them would make war rather than let the nation survive, and the other would accept war rather than let it perish, and the war came.

In the following example, by contrast, the lang attribute on the element must be given if we wish to record the fact that the technical terms used are Latin rather than English; no lang attribute is needed on the element, by contrast, because it is in the same lan- guage as its parent. It is strongly recommended that all language shifts in the source be explicitly identified by use of the lang attri- bute, as described in chapter 4, "Characters and Character Sets," on page 7.

The constitution declares that no bill of attainder or ex post facto law shall be passed. ...

Formally, the lang attribute is an IDREF; a reference to the id value of a element in the TEI header.(18) This means that each language used in the document should be declared in the TEI header using the element defined in section 5.4.2, "Language Usage," on page 9. The rend attribute is used to give information about the physical presentation of the text in the source. In the following example, it is used to indicate that both the emphasized word and the proper name are printed in italics:

... Their motives might be pure and pious; but he was equally alarmed by his knowledge of the ambitious Bohemond, and his ignorance of the Transalpine chiefs: ...

If all or most and elements are rendered in the text by italics, it will be more convenient to register that fact in the TEI header once and for all and specify a rend value only for any elements which deviate from the usual rendition. The contents of the rend attribute are free text. In any given project, encoders are advised to settle on a standard vocabulary with which to describe typographic or manuscript rendition of the text, and to document their usage of that vocabulary in the element of the TEI header. The TEIform attribute is used to allow application programs to handle TEI-encoded documents correctly even if some or all elements have been renamed. Most users can ignore this attribute entirely; it is only rel- evant when the TEI DTDs are modified.(19) The default value of TEIform for any element is the generic identifi- er of that element, as described in this document. The value for

is p, the value for is div1, etc. When elements are renamed, as described in chapter 29, "Modifying the TEI DTD," on page 47, the decla- ration of TEIform is not modified. If is renamed , for example, the default value of TEIform remains div1. An application pro- gram which does not recognize the new generic identifier can check to see whether the attribute TEIform exists, and examine its value if it does to find out which TEI element, if any, is being used. Modifications of DTDs, however, may involve more than simple renaming of elements: sometimes elements are given not just new names, but com- plete new definitions. In such cases, the TEIform attribute may be used to indicate the standard TEI element corresponding to the modified ele- ment. For example, if a local modification of a DTD renamed the element as and also modified its formal declarations (e.g. to change its content model), then the TEIform attribute on the modified element should be given the default value div1, in order to indicate that the local element is a modification of the standard TEI . When new elements are introduced, they may be identified as special- ized variants of existing TEI elements by giving them the appropriate default value for TEIform. For example, if a local element called were introduced, as a specialized variant of the (line group) element which must contain exactly four lines, then its declara- tion might give its TEIform as lg, to signify that a quatrain is a par- ticular type of line group, thus: The formal definition of the global attributes is as follows: 3.6 The TEI2.DTD File All TEI-encoded documents use the same top-level DTD file, which refers to a number of other DTD files, the exact set of other files referred to depending on which base and which additional tagsets are in use. The remainder of this chapter describes in some detail the organi- zation and function of this file and those it embeds; it is necessarily of a rather technical and specialized nature. The main TEI DTD is always invoked by specifying the file tei2.dtd. This file: 1. takes care of certain necessary preliminaries: a. embeds any locally defined changes to the standard TEI parameter entities, so that local modifications can take precedence over default declarations; b. declares TEI-specific keywords used in other declarations and declares default values of IGNORE for all the parameter entities used to select base and additional tag sets (see section 3.8.3, "Parameter Entities for TEI Keywords," on page 6); c. declares parameter entities for TEI generic identifiers (by embedding the file teigis2.ent; see section 3.8.2, "Parame- ter Entities for Element Generic Identifiers," on page 6); 2. declares parameter entities for element classes, content models, and global attributes (by embedding teiclas2.ent; see section 3.7.3, "The TEICLAS2.ENT File," on page 5); 3. declares the top-level elements and ; 4. embeds DTD files containing local modifications (if any), the core tag sets, the base tag set, and the additional tag sets. 3.6.1 Structure of the TEI2.DTD File Each parameter entity associated with a tag set controls several marked sections in the main DTD file tei2.dtd. If the entity has been declared in the DTD subset with the text INCLUDE, then the marked sec- tions it controls will be parsed; otherwise, they will be ignored. The marked sections controlled by each entity: 1. declare and refer to the entity file for the tag set, which defines its global attributes and element classes 2. declare and refer to the DTD file for the tag set, which defines its elements and their attributes 3. declare the parameter entity component) in a form suitable for texts using that base The tei2.dtd file has the following structure: %TEI.elementNames; %TEI.elementClasses; A TEI-conformant document must use the tei2.dtd file, or one derived from it in the manner described in chapter 29, "Modifying the TEI DTD," on page 47. It must also specify which base and which additional tag sets are to be invoked, using the mechanisms described in section 3.3, "Invocation of the TEI DTD." 3.6.2 Embedding Local Modifications As noted above in section 3.2.4, "User-Defined Tag Sets," local modi- fications to the DTD are most conveniently grouped into two files, one containing modifications to the TEI parameter entities, and the other new or changed declarations of elements and their attributes. These files should be associated with the parameter entities TEI.extensions.ent and TEI.extensions.dtd by declarations included in the document's DTD subset. For example, if the relevant files are called project.ent and project.dtd, then declarations like the following would be appropriate: When an SGML entity is declared more than once, the first declaration is binding and the others are ignored. The local modifications to parameter entities should therefore be handled before the standard parameter entities themselves are declared in tei2.dtd. The entity TEI.extensions.ent is referred to before any TEI declarations are han- dled, to allow the user's declarations to take priority. If the user does not provide a TEI.extensions.ent entity, the entity will be expand- ed to the empty string. For example the encoder might wish to add two phrase-level elements and , perhaps as synonyms for and . As described in chapter 29, "Modifying the TEI DTD," on page 47, this involves two distinct steps: one to define the new ele- ments, and the other to ensure that they are placed into the TEI docu- ment structure at the right place. We deal with the second first, by specifying the element class to which the new elements should be attached. To do this, the standard parameter entity x.phrase should be modified to include the two new generic identifiers. The file contain- ing local declarations of the standard parameter entities will thus con- tain a declaration of the following form: The relevant fragment of the DTD is this: %TEI.extensions.ent; The second type of modification needed is most conveniently performed after all the standard TEI parameter entities have been declared; this allows the element declarations provided by the user to make use of the parameter entities which define standard TEI content models and attri- bute definitions. To facilitate this, the parameter entity TEI.extensions.dtd is used to embed local element declarations before any of the TEI tag sets are embedded by the file tei2.dtd, but after all the TEI element classes and other parameter entities have been declared. The task of declaring the non-standard and elements is thus simplified: they can, for example, use the same parameter entities as the element. A suitable local DTD-modifications file might look like the following (note that the standard parameter-entity reference for phrase sequence is used): For further examples of local modifications to both parameter enti- ties and element declarations, see chapter 29, "Modifying the TEI DTD," on page 47. The relevant fragment of the DTD is this: %TEI.extensions.dtd; 3.6.3 Embedding the Core Tag Sets The core tag sets are embedded by the file tei2.dtd using the parame- ter entities TEI.header and TEI.core. The relevant fragment of the DTD is this: %TEI.header.dtd; %TEI.core.dtd; The default text structure tags, which are also documented as part of the core, are embedded by the base tag set, unless the base defines its own text structure tags; see the chapters on the individual bases. 3.6.4 Embedding the Base Tag Set The tei2.dtd file embeds the appropriate files for the base tag set previously selected by means of the parameter entities described in sec- tion 3.2, "Core, Base, and Additional Tag Sets," on page 3. A parameter entity for the file containing the relevant DTD fragment is declared and referred to inside a conditional marked section controlled by the appro- priate parameter entity. The relevant fragment of tei2.dtd is this: %TEI.prose.dtd; ]]> %TEI.verse.dtd; ]]> %TEI.drama.dtd; ]]> %TEI.spoken.dtd; ]]> %TEI.dictionaries.dtd; ]]> %TEI.terminology.dtd; ]]> %TEI.general.dtd; ]]> %TEI.mixed.dtd; ]]> 3.6.5 Embedding the Additional Tag Sets The tei2.dtd file embeds the appropriate files for any additional base tag set previously enabled by means of the parameter entities described in section 3.2, "Core, Base, and Additional Tag Sets," on page 3. A parameter entity for the file containing the relevant DTD fragment is declared and referred to, inside a conditional marked section con- trolled by the appropriate parameter entity. The relevant fragment of tei2.dtd is this: %TEI.linking.dtd; ]]> %TEI.analysis.dtd; ]]> %TEI.fs.dtd; ]]> %TEI.certainty.dtd; ]]> %TEI.transcr.dtd; ]]> %TEI.textcrit.dtd; ]]> %TEI.names.dates.dtd; ]]> %TEI.nets.dtd; ]]> %TEI.figures.dtd; ]]> %TEI.corpus.dtd; ]]> 3.7 Element Classes The TEI DTD contains over four hundred element types. To aid compre- hension, modularity and modification, the majority of these elements are formally classified in some way. This section describes the various element classes recognized in the TEI DTD. Element classes are used to express two distinct kinds of commonality among elements. The elements of a class may share some set of SGML attributes, or they may appear in the same locations in the content models of the TEI DTDs, or both. A class is known as an a-class if its members share attributes, and as an m-class if its members appear at the same locations in the content mod- els of other TEI elements. An element is said to inherit attributes, or the ability to appear at a given point in a document, from any classes of which it is a member. Classes may have subclasses and superclasses, and the characteristics of a superclass are inherited by all members of its subclasses. Both types of element classes are represented in the TEI DTDs by parameter entities. For other uses of parameter entities in the TEI DTDs, see section 3.8, "Other Parameter Entities in TEI DTDs," on page 6. This section describes the major element classes of each type togeth- er with the formal declarations for their parameter entities, which are contained in the file teiclas2.ent. All element classes are documented in the alphabetical reference section in Part VII. 3.7.1 Classes Which Share Attributes An a-class groups together elements which share some set of common attributes. For example, the members of the class names are all ele- ments which contain proper nouns: e.g. , , or . All of these elements use the same attributes (key and reg) to record information about the referent or the regularized form of the proper nouns. Similarly, the members of the pointer class share a set of attributes useful for managing cross-reference links and other point- ers.(20) The attributes shared by the members of an a-class are defined in a parameter entity; member elements inherit the attributes by referring to the parameter entity within their attribute-list declaration (examples below). This practice helps ensure that if the attribute definitions for the class change, all members of the class will automatically inher- it the new definitions. Parameter entities used for this purpose form their names by taking the name of the class they define and prefixing the string a.; we refer to these entities as a-dot entities. For example, the declaration for the names class includes attribute definitions for its two attributes reg and key: Members of the class typically inherit these definitions by referring to a.names: Subclasses of a-classes inherit the attributes of their superclass similarly, by referring to the a-dot entity of the superclass in defin- ing their own a-dot entity. For example, the class xPointer is a sub- class of the class pointer, as shown implicitly by the declaration of its a-dot entity: (For an explanation of the parameter entity extptr used in the above example, see section 3.8.3, "Parameter Entities for TEI Keywords," on page 6.) The a-classes declared in the core tag sets of these Guidelines are: declaring : elements which have a decls attribute for specifying which declarations in the header apply to the element, as described in sec- tion 23.3, "Associating Contextual Information with a Text," on page 41 declarable : header elements containing declarations, which can be pointed at by the decls attribute, as described in section 23.3, "Associating Contextual Information with a Text," on page 41 divn : structural elements which behave in the same way as divisions, as described in section 7.1, "Divisions of the Body," on page 15 enjamb : elements which carry the enjamb attribute for indicating metrical enjambement interpret : elements which contain overtly interpretive or extra- textual analysis or commentary on a text or some portion of it. metrical : elements which carry metrical information (metrical pat- tern, realization of the pattern, rhyme) names : elements which contain proper nouns and share attributes for identifying their referents and regularizing their spelling (section 6.4.1, "Referring Strings," on page 11) personPart : elements which contain personal names or parts of them placePart : elements which contain place names or parts of them pointer : elements which point from one location in the document to another (section 6.6, "Simple Links and Cross References," on page 12) seg : elements for the systematic or arbitrary segmentation of the text temporalExpr : elements which contain temporal expressions timed : elements (in the base tag set for spoken texts) which have a duration in time expressible with the attributes, as described in sec- tion 11.2.5, "Temporal Information," on page 20 xPointer : elements which point from one location in the document to other locations within or outside the current document (section 14.2, "Extended Pointers," on page 26) All elements are considered members of the class global and thus include a reference to a.global; in their attribute definition list dec- laration. Some tag sets add specialized attributes to the set of global attributes; these additions are declared in the "ent" file of each tag set, using the following entity names. If the tag set does not define new global attributes, no entity of this type is declared. a.analysis : additional global attributes for the analysis tag set a.linking : additional global attributes for the linking tag set a.terminology : additional global attributes for the terminology base These entities are included in the teiclas2.ent file indirectly, when the entity-declaration files of each tag set are embedded, as shown below in section 3.7.6, "Elements Marked for Text Type." For purposes of documentation, these attributes are treated as if inherited by the class global from superclasses called terminology, etc., and are docu- mented under the class name. One further complication to the inheritance mechanism should be men- tioned here. In rare cases, a member of an a-class may override the definition of an inherited attribute. For example, the element inherits the global id attribute from the class global -- as does every other element. On elements, however, id is not optional but required. The declaration of therefore does not refer to the class global, but instead defines all the inherited attributes explicit- ly, using its own declaration for id and the default inherited declara- tions for the other global attributes: Because this declaration does not use the parameter entity a.global, clearly any change in the definition for that entity will not be reflected in this declaration. Consequently, any changes made to the global attributes as such will not be inherited by the element. Instead, such changes must be replicated manually. Care must thus be taken in modifying attribute definitions for a-classes if any members of the class override the inherited definitions, to ensure that all members of the class really do get the modified definitions. 3.7.2 Classes Used in Content Models When the members of a class are structurally similar and can appear at the same kinds of structural locations in the document, they are grouped together into an m-class (or "model-class"). M-classes are implemented by defining a parameter entity for use in the formal decla- ration of element content models. The parameter entity takes the name of the class it defines, and prefixes the string m., which can be inter- preted as "model" or as "members". The replacement text of the entity is a list of the members of the class, separated by |, the SGML symbol for alternation. For each class an additional entity is defined, which also takes the name of the class, this time prefixed by the string x. (for extension); the default value of these x-dot entities is always the empty string. A reference to the corresponding x-dot entity is always included within the replacement string for each m-dot entity. This enables an encoder to add new members to a class simply by declaring a new value for an x-dot entity. For example, the class bibl has the three members , , and . Its content-model entity is defined thus: With the default value of the x-dot entity, this is the same as defining m.bibl with the replacement text bibl | bibl.full | bibl.struct. If an encoder wishes to add a new bibliographic element called , it can be added to the bibl class by redefining the x-dot entity thus: This changes the replacement text of m.bibl from its default value to myBib | bibl | bibl.full | bibl.struct. If more than one element is to be added to a class, the x-dot entity for the class should be redefined as a list of the new generic identifiers, each one (including the last) followed by a vertical bar. The same effect could be achieved simply by redefining the whole of the new m.bibl entity directly, but the x-dot method requires no repetition of the already existing members of the class and thus minimizes the chance of error. Like a-classes, m-classes may have subclasses or superclasses. Just as elements inherit from a class the ability to appear in certain loca- tions of a document (wherever the class can appear), so all members of a subclass inherit the ability to appear wherever any superclass can appear. Superclasses transmit their location characteristics to their subclasses by referring, in declaring their m-dot entity, to the m-dot entities of the subclasses. For example, the class phrase includes the classes data, edit, hqphrase, loc, and seg as members, as can be seen in the declaration for its m-dot entity: When the entity m.phrase is referred to in content models, all members of all subclasses are included in the model. 3.7.3 The TEICLAS2.ENT File The most important element classes used in TEI content models are declared in the DTD file teiclas2.ent, which is the default replacement text for the entity TEI.elementClasses and is embedded by the tei2.dtd file. These element classes are described, and their declarations reproduced, in the following sections. The class system is structured around the following threefold divi- sion of elements: chunks : elements such as paragraphs and other paragraph-level ele- ments, which can appear directly within texts or within text subdivi- sions (i.e.

elements), but not within other chunks phrase-level elements : elements such as highlighted phrases, book titles, or editorial corrections which can occur only within chunks (paragraphs or paragraph-level elements), but not between them (and thus cannot appear directly within a
)(21) inter-level elements : elements such as lists, notes, quotations, etc. which can appear either between chunks (as children of a
) or within them Together the two sets of chunks and inter-level elements make up the set of: text components : elements which can appear directly within texts or text divisions; also called simply components or "component-level ele- ments" In general, the body of any text comprises a series of components, optionally grouped into
elements. Some elements belong to none of these classes; these include high- level structural elements like and as well as some spe- cialized elements which appear only within particular structures (like , , and ). The majority of elements found in normal running text, however, are assigned by the TEI DTDs to one or the other of these classes. Some component elements (e.g.

or ) are common to all base tag sets, while others are unique to individual tag sets. This distinc- tion is reflected in the parameter entity declarations, as shown below. The teiclas2.ent file has the following overall structure: 3.7.4 Low-Level Element Classes The following low-level classes group together sets of semantically or structurally similar elements. These classes may include both ele- ments in the core and elements declared in particular tag sets; a refer- ence is given at least to the relevant section on the core tags. The following are phrase-level element classes: hqphrase : elements for highlighted phrases or material marked by quo- tation marks, including those defined in section 6.3, "Highlighting and Quotation," on page 10 data : elements for recording information about the referents of a text, including those defined in section 6.4, "Names, Numbers, Dates, Abbreviations, and Addresses," on page 11 date : elements for recording dates, including those defined in sec- tion 6.4.4, "Dates and times," on page 11 edit : elements for recording simple editorial interventions in a text, including those defined in section 6.5, "Simple Editorial Chang- es," on page 12 loc : elements for recording location information in a text, including those defined in section 6.9, "Reference Systems," on page 13 seg : elements for marking arbitrary segments at the level of individ- ual characters or phrases, including those documented in section 14.3, "Segments and Anchors," on page 27 and 15.1, "Linguistic Segment Cat- egories," on page 29 sgmlKeywords : elements for marking generic identifiers, attribute names, SGML tags, and sample attribute values, when they occur in the text (used in tag set documentation, for which see chapter 27, "Tag Set Documentation," on page 45) versePhrases : phrase-level elements specific to verse, documented in section 9.3, "Components of the Verse Line," on page 17 formPointers : elements for referring, within a dictionary entry, to the orthographic form or pronunciation of the headword, documented in section 12.4, "Headword and Pronunciation References," on page 23 The following are inter-level element classes: hqinter : elements for highlighted phrases or material marked by quo- tation marks, including those defined in section 6.3, "Highlighting and Quotation," on page 10 bibl : elements for bibliographic citations; see section 6.10, "Bibli- ographic Citations and References," on page 13 lists : elements for lists; see section 6.7, "Lists," on page 12 notes : general-purpose annotation elements; see section 6.8, "Notes, Annotation, and Indexing," on page 13 stageDirection : elements for specialized stage-direction elements documented in section 10.2.3, "Stage Directions," on page 18 The following classes of elements may appear anywhere within the element: metadata : elements which convey non-textual information about the text (meta-information, as it were) globincl : elements which may appear anywhere within the ele- ment (because the class is used in an inclusion exception on that ele- ment) The entity declarations for these classes are these: 3.7.5 High-Level Element Classes The following element classes are used to implement the threefold structural distinction among phrases, chunks, and intermediate elements discussed above in section 3.7.3, "The TEICLAS2.ENT File." In this ter- minology, chunks (or chunk elements are elements which can occur only in chunk-level sequences (e.g. between but not within paragraphs); inter- level elements can occur either within chunks (at phrase-level) or between chunks (e.g. at paragraph-level), and phrase-level elements can occur only at phrase level, within chunks (e.g. within but not between paragraphs). The element class common includes all component-level (chunk- and inter-level) elements common to more than one base. It is used in implementing the combined bases described in section 3.4, "Combining TEI Base Tag Sets," on page 4. The relevant portion of the DTD looks like this: 3.7.6 Elements Marked for Text Type The following element classes are used to group together component- level elements which are allowed only in texts of a particular type (i.e. texts using a specific base). comp.verse : elements unique to verse comp.drama : elements unique to drama comp.spoken : elements unique to spoken texts comp.dictionaries : elements unique to dictionaries comp.terminology : elements unique to terminological data Declarations for these base-specific element classes are included in the entity file of each base, which is in turn embedded by the teic- las2.dtd file in the DTD fragment shown below. If the tag set defines additions to the set of global attributes, or declares a class of component-level elements unique to the tag set, then it has an entity file which is embedded here; otherwise not. %TEI.verse.ent; ]]> %TEI.drama.ent; ]]> %TEI.spoken.ent; ]]> %TEI.dictionaries.ent; ]]> %TEI.terminology.ent; ]]> %TEI.linking.ent; ]]> %TEI.analysis.ent; ]]> %TEI.transcr.ent; ]]> %TEI.textcrit.ent; ]]> %TEI.names.dates.ent; ]]> %TEI.figures.ent; ]]> 3.7.7 Standard Content Models As far as possible, the TEI DTDs use the following set of frequently- encountered content models, to help achieve consistency among different elements. phrase : a single sequence of character data or single phrase-level element phrase.seq : sequence of character data and phrase-level elements component : a single chunk- or inter-level element component.seq : sequence of chunk- and inter-level elements; this is the usual content of a

element paraContent : sequence of character data, phrase-level elements, and inter-level elements; this is the usual content of chunks (including, most prominently, paragraphs) specialPara : specialized content model, allowing either a sequence of chunks or the same content as paraContent; this is used for elements such as notes and list items, which can behave either as chunk-level elements, or else as containers for groups of component-level ele- ments. The relevant portion of the DTD looks like this: ]]> ]]> ]]> ]]> ]]> ]]> 3.7.8 Components in Mixed and General Bases When the mixed or general base is in use, the definitions of the entities component and component.seq are rather more complex. The rele- vant portion of the DTD is this: ]]> ]]> ]]> ]]> ]]> ]]> ]]> 3.7.9 Miscellaneous Content-Model Classes The following element classes occupy specific places in content mod- els; some are relevant only when certain tag sets are selected agent : elements which denote an individual or organization to whom or which responsibility for an action can be assigned addrPart : elements which can occur as part of an address biblPart : elements which can occur in bibliographic citations demographic : elements which record demographic characteristics of the participants in a text or language interaction (used in tag set for corpora and collections) divbot : elements which can occur as part of the closing material of a text division or body divtop : elements which can occur as part of the opening material of a text division or body dramafront : elements which can occur in the front matter of drama and other performance texts front : elements which can occur (at the level of text divisions) in front matter only personPart : elements which contain parts of a personal name placePart : elements which contain parts of a place name refsys : milestone elements used in reference systems tpParts : elements which occur within title pages They are declared in the following DTD fragment: 3.8 Other Parameter Entities in TEI DTDs The TEI DTDs use SGML parameter entities for several purposes: * to define sets of attributes shared by given classes of elements * to define classes of elements which can occur at the same locations in content models * to identify what base tag set should be used for a document * to identify what additional tag sets should be included * to include or exclude the declaration of each element * to specify the name of each element The first two applications of parameter entities are described above in section 3.7, "Element Classes," on page 5. This chapter describes the other uses of parameter entities in the TEI DTDs. The parameter entities used to specify which base tag set and which additional tag sets are to be used in a given document are listed in section 3.2, "Core, Base, and Additional Tag Sets," on page 3. Their default definition is always "IGNORE": the encoder selects the TEI base and additional tag sets by declaring the appropriate parameter entities with the entity text "INCLUDE". The DTD and entity files are listed in section 3.2, "Core, Base, and Additional Tag Sets," on page 3. If the standard TEI entities are modi- fied to point at locally developed DTD files containing local modifica- tions or extensions to the TEI DTDs, the use of the standard parameter entity names ensures that the modification will be obvious upon examina- tion of the document's DTD. The following entities are referred to by the main tei2.dtd file to embed portions of the TEI DTDs or locally developed extensions. TEI.extensions.ent : identifies a local file containing extensions to the TEI parameter entities; see section 3.6.2, "Embedding Local Modi- fications," on page 4 TEI.extensions.dtd : identifies a local file containing extensions to the TEI tag set; see section 3.6.2, "Embedding Local Modifications," on page 4 TEI.elementNames : identifies a file containing parameter entity dec- larations for names of TEI elements; see section 3.8.2, "Parameter Entities for Element Generic Identifiers," TEI.keywords : identifies a file containing parameter entity declara- tions for TEI keywords, including the default declaration ("IGNORE") of the marked-section keyword for each tag set; see section 3.8.3, "Parameter Entities for TEI Keywords," TEI.elementClasses : identifies a file containing definitions of parameter entities used in content models; see section 3.7.3, "The TEICLAS2.ENT File," on page 5 TEI.singleBase : defined as INCLUDE (for normal bases) or IGNORE (for mixed and general base); used to prevent multiple definitions of the default text structure. 3.8.1 Inclusion and Exclusion of Elements The TEI DTDs use marked sections and parameter entity references to allow users to exclude the definitions of individual elements, in order either to make the elements illegal in a document or to allow the ele- ment to be redefined, as further described in chapter 29, "Modifying the TEI DTD," on page 47. Parameter entities used for this purpose have exactly the same name as the generic identifier of the element concerned. The default defini- tion for these parameter entities is "INCLUDE" but they may be changed to "IGNORE" in order to exclude the standard element and attribute defi- nition list declarations from the DTD. The declarations for the element

, for example, are preceded by a definition for a parameter entity with the name p and contained within a marked section whose keyword is given as "%p;": ]]> These parameter entities are defined immediately preceding the ele- ment whose declarations they control; because their names are completely regular, they are not documented individually in the reference section of this document. 3.8.2 Parameter Entities for Element Generic Identifiers In the TEI DTDs, elements are not referred to directly by their gen- eric identifiers; instead, the DTDs refer to parameter entities which expand to the standard generic identifiers. This allows users to rename elements by redefining the appropriate parameter entity (as described more fully in chapter 29, "Modifying the TEI DTD," on page 47). Parame- ter entities used for this purpose are formed by taking the standard name (generic identifier) of the element and attaching the string "n." as a prefix. Thus the standard generic identifiers for paragraphs, notes, and quotations,

, , and are defined by declarations of the following form: Since by default parameter entities are case-sensitive, the specific mix of upper and lower case letters in the standard name must be pre- served in the entity name. The formal declarations of the parameter entities used for generic identifiers are contained in the file teigis2.ent; since their names and replacement texts are fully predictable, these parameter entities are not individually documented in the reference section of these Guide- lines. The parameter entity TEI.elementNames is used to embed the file teigis2.ent in the DTD. A full set of alternate generic identifiers can be substituted for the standard set by defining TEI.elementNames to point at a different file.(22) 3.8.3 Parameter Entities for TEI Keywords The TEI uses the following parameter entities to signal information which cannot be expressed using SGML keywords: INHERITED: indicates that an attribute value is inherited from the enclosing element, if not specified ISO-date: indicates that an attribute value should be a legal ISO date in the form yyyy-mm-dd (e.g. 1993-06-28). extptr: indicates that an attribute value should be a legal expression in the TEI extended-pointer notation In addition, the parameter entities which control the selection of base and additional tag sets may be regarded as a keyword. The parameter entity INHERITED is used to signal that the default value for an attribute should be inherited from an enclosing element. The definition for INHERITED is the string "#IMPLIED"; as for all implied defaults, the application program is responsible for deducing the default attribute value when no value is specified in the element start-tag. Since the parameter entity is resolved by the SGML parser, the application program will see no difference between attributes whose default is "%INHERITED" and those whose default is "#IMPLIED" -- infor- mation about which attribute values are inherited and which are inferred in some other way must be built into the application in advance. The parameter entity ISO-date is used to signal that the value for an attribute should be an ISO-standard date value; in this notation,(23) a date like "September 22, 1968" would be written "1968-09-22". The parameter entity ISO-date expands to "CDATA". The keywords controlling the selection of base and additional tag sets (described in section 3.2, "Core, Base, and Additional Tag Sets," on page 3) all have the default value "IGNORE"; the user can override this by a local declaration, as described in section 3.3, "Invocation of the TEI DTD," on page 4. The parameter entities for TEI keywords are included in file tei- key2.dtd, which is the default replacement text for the entity TEI.keywords and is embedded by the file tei2.dtd. The file teikey2.dtd has the following contents: The relevant portion of the main DTD looks like this: %TEI.keywords.ent; --------------------------------- (15) A parameter entity is an SGML entity used only in markup declara- tions; references to parameter entities are delimited by a percent sign and a semicolon rather than the ampersand and colon used for general entity references. The entity TEI.core.ent, for example, would be referred to using the string %TEI.core.ent;. Parameter entities can also be used to control the inclusion or exclusion of marked sections of the document or DTD; the TEI DTD uses marked sections to handle the selection of different base and additional tag sets. (16) More exactly, these are the attributes of the element class global, to which all elements belong; for further discussion of attribute classes and ways in which attributes may be inherited and over- ridden, see section 3.7.1, "Classes Which Share Attributes," on page 5. (17) A dummy element class TEIform is defined in the reference section, solely for documentary purposes. (18) SGML validation checks that all IDREF values exist as id values on elements somewhere in the current SGML document. It is a require- ment of the TEI scheme, not of SGML, that the lang attribute point to a element. (19) The TEIform attribute is based on the notion of architectural forms developed for HyTime (ISO 10744). (20) Because the details of their pointing mechanism differ, the members of the pointer class do not, however, share their pointing attri- butes. (21) Note that in this context, phrase means any string of characters, and can apply to individual words, parts of words, and groups of words indifferently; it does not refer only to linguistically moti- vated phrasal units. This may cause confusion for readers accus- tomed to applying the word in a more restrictive sense. (22) It is expected that after completion of the full text of these Guidelines, the TEI will prepare alternate sets of generic identi- fiers in languages other than English. It should be noted, how- ever, that in the interests of simplicity parameter entities are used only for generic identifiers; attribute names, standard attri- bute values, and parameter entity names are less easily modified. (23) Defined by ISO 8601: 1988, Data elements and interchange formats -- Information interchange -- Representation of dates and times ([Geneva]: International Organization for Standardization, 1988). (24) The most widely used such entity set is to be found in Annex D to ISO 8879; it is also reproduced or summarized in most SGML text- books, notably Charles F. Goldfarb, The SGML Handbook (Oxford: Clarendon Press, 1990). A list of some frequently used standard entity names may be found in chapter 37, "Obtaining TEI WSDs," on page 77. Extensive entity sets are being developed by the TEI and others are being documented in the fascicles of ISO/TR 9573: Tech- nical Report: Information processing -- SGML support facilities -- Techniques for using SGML ([Geneva]: ISO, 1988 et seq.).