5.2 Feature Tags

These characterize a part of the whole work. They describe one of the work's features, elements, or attributes, but such units elude thorough listing and resist classification. Nonetheless, here is a brief survey.

The discretely identifiable things in any work, I believe, are introductory, main, interrupting, or concluding. In other words, I believe that the presentation, lecture, or speech by a single person to an audience is a metaphor for the parts of a work that would be understood and agreed to by the Renaissance. SGML uses two metaphors to classify units. One is spatial and cites the human body (the main thing), approached from the front (introductory) and from the back (concluding). The second is temporal and speaks of something (the main thing) having openers and closers. SGML guidelines do not regard the interruption or digression or aside as a dominant class of feature.

Anything can be the main thing. For example, prefaces and indexes are sometimes published as a separate work, the main thing in question, although they normally are introductory and concluding things.
Introductory units in some way label, or summarize, analyze, or introduce the main thing. They can include a titlepage (the whole book), a table of contents, a heading, title or incipit (the main thing or one of its parts), dramatis personae, an epigraph, a number (for the act, scene, canto, chapter, etc.), an abstract, argument, or summary, a preface, prologue, dedicatory letter or poem (by the author), an addressee and salutation, a runningtitle, pagination, or foliation (opening the page), and even the paragraph and the line itself (which begins the smallest units of text). Some of these things are natural to speech; others result from the need to reframe it in writing for reading and reference.
Concluding units in some way label, summarize, analyze, or close off the main thing. They can include a colophon, an index, a bibliography, an explicit, a closing formula (like "Amen," "Finis," etc.), an abstract, argument, or summary, an afterword, an epilogue, a signature (to a letter), a bibliographical signature and catchword (for a page), and errata. Again, some of these units belong to speech and rhetoric; others to the trade of presenting writing as something different from speech.
Interrupting units can include the author's quotation from someone else's work, the intrusion of a contribution by someone other than the author, and notes of all kinds, whether by the author or by another.

The next class of units relate to arrangement and expressiveness. Paragraphing, indentation (for lists and verse), columns, different fonts, speech prefixes and stage directions, and even rules, ornaments, and the use of leaf and other special organizational symbols fall into this category. They all may be seen to characterize the author during the delivery of the work to the audience, although the interpretation of how they do so is sometimes unclear. TEI interprets what it aptly calls "rendition" as independent of a work's logical content.

The language and the discourse of the whole work form a category. Tags for typographical errors, damaged text, scribal deletions and additions, diacritics, word separators, language shift, names, rhyme and metre, press and textual variants, the parts of speech and lemmas of words, and words that are split or cited as themselves belong to this class of things, all capable of being tagged.

Finally, words, phrases, sentences, paragraphs, or other passages may be tagged for semantic meaning or for their part in a theoretical understanding of some aspect of the world.

The first, second, and third classes of units will generally be tagged in RET editions because we are most capable of reaching consensus on what they are.

That the RET use of SGML is not always TEI-conformant is no criticism of TEI guidelines, which were written for the widest user community. They clearly suit 20th-century texts, especially technical writing (where ambiguity is frowned on), better than works in the English Renaissance.

5.2.1 SGML FEATURE TAGS

SGML can store multiple attributes in any tag; and this leads to some economy in tagging. For example, <plydv1 type="act" n="1"> can replace COCOA <act 1><plydv1 act1>, and <plydv3 type="speech" speaker="Gloucester"> can replace COCOA <plydv3 speech><speaker Gloucester>. This multiple-attribute capability, for instance, could be used to tag each section of text with its kind. The following example departs from RET guidelines but it illustrates one philosophy of tagging.

<lang type="e">

<genre type="play" subgenre="history-tragedy"> 

<ln bkl="1"><isa mode="bibl"
type="page"
  n="283">283</isa></ln> 

<ln bkl="2"><isa mode="text" type="title"><f
  type="rl">THE TRAGEDIE OF</f></isa> </ln> 

<ln bkl="3">KING LEAR<f type="i"> </isa> </ln> 

<ln bkl="4"><isa mode="play" div="1" type="act"
  no="1"><lang type="l">A{ct}us Primus.  <isa mode="play"
  type="scene" no="1" classscen="1">Sc{oe}na Prima. </f> <f
  type="r"><lang type="e"> </isa> </ln> 

<ln bkl="5"><isa mode="play" type="stagdir" n="1">
  Enter Kent, Glouce{{s}t}er, and Edmond. </isa> </ln> 

<ln bkl="6"><isa mode="play" type="sppfx" n="1"
  normal="Kent">Kent. </isa> </ln> 

<ln bkl="7" tl="1">"isa mode="play" type="speech"
  n="1" speaker="Kent"> </f> <f type="5bk">I<f
  type="r"> Thought the King had more a{ff}e{ct}ed the</isa> </ln> 

<ln bkl="8" tl="2">Duke of </f> <f type="i">Albany</f>
  <f type="r">, then </f> <f type="i">Cornwall</f> <f type="r">.
  </isa> </ln> 

<ln bkl="9" tl="3"><isa mode="play" type="spffx"
  n="1" normal="Gloucester"> </f> <f type="i">Glou. </f>
  <f type="r"><isa mode="play" type="speech" n="2"
  speaker="Gloucester"> It did alwayes {s}eeme {s}o to vs : But</isa> </ln> 

</lang> 

... 

</f> 

</genre>

Each passage has three SGML-like tags: an isa tag (like the COCOA tt tag) that tells what the passage is, an f tag that indicates its font, and an ln tag that indicates its line-number according to several ways of counting. The isa tag has attributes like mode, type, n, classscene, normalized, and speaker. This scheme enables us to select words from one of two modes (bibl and play), from several types within each (e.g., act, scene, stage, sppfx, and speech within mode play) and from within that last type, speech, each character (e.g., Kent, Gloucester, etc.). We can also retrieve the different kinds of speech prefixes employed for any one (normalized) character name.
Here is a list of the main SGML tags employed in RET editions. Asterisked tags are taken from TEI guidelines.
Variable- TEI Attri. End Description Attributes Page Values Tag No. *a name= 861 </a> hypertextual reference href= *actor 850 </actor> dramatis personae actor's name addresse </addresse> person who is addressed *add place= 850 inline </add> ms addition to work supralinear infralinear left right top bottom opposite verso mixed hand= handwriting *anchor id= none point in text specified with id identifier for a cross-reference *app type= 863 </app> apparatus (e.g., press variants) from= to= *author 870 </author> author's name *back 872 </back> end material in book *bibl 873 </bibl> bibl. ref. *bibtitle 1197 </title> title in bibl. ref. level= a[nalytic] m[onographic] j[ournal] s[eries] u[npublished] bkdv type= titlepage </bkdv> unstructural subdivision imprimatur of book dedication table of contents errata epigraph colophon toreader bkdv1 type= gathering </bkdv1> largest unit of book n= 1 number of largest unit format= folio subunit of largest unit in= 6s no. of formes in gathering bkdv2 type= page </bkdv2> second largest unit of book n= 1 no. of 2nd largest unit sig= Aa2 signature of 2nd largest unit side= inner side of forme on which 2nd largest unit occurs forme= 2 which forme the 2nd largest unit is on bkdv3 type= col line </bkdv3> 3rd largest unit of book: line or column n= 2 number of column or line bkdv4 type= line </bkdv4> 4th largest unit of book n= 3 number of line *body 879 </body> main part of book booksell </booksell> name of bookseller *c type= </c> character *cit work= 899 </cit> citation of work author= classcen n= 1 </classcen> classical scene closing </closing> phrase, etc., marking work's end *col n= 906 1 </col> column correct </correct> correct reading in errata *damage type= 912 </damage> the kind of damage extent= how much damage *del type= 922 overstrike </del> type of deletion erasure bracketed subpunction hand= scribe deleting *docAuthor 945 </docAuthor> author of book *docDate 945 </docDate> date of publ. of book docDatcp </docDatcp> date of composition *docEdition 946 </docEdition> edition of book docEditr </docEditr> editor of book *docImpr 947 </docImpr> imprint of book *docTitle 948 </docTitle> title of book docVolno </docVolno> volume number of book *edition 949 </edition> edition information *editor 950 </editor> editor's name egt n= 1 </egt> example of use of dictionary headword egttr n= 1 </egttr> translation of example using dictionary lemma error from= err1 </error> error in errata; starting location to= a1 ending location etym </ety> etymology explan </explan> explanation, senses, commentary, etc., on headword f type= </f> font + superscript _ subscript 2 2-line bk block letter bl black letter c capitals d double i italic l lapidary p pica r roman s small sc small capitals t titling *form type= 984 lemma </form> e.g., headword in phrase dictionary entry compound derivative inflected category= colour content classification tool, etc. gender= f m n inflect= poss, etc. morphological inflection modernsp= originsp= pos= noun, det, etc. part-of-speech syntact= subj, pred, etc. grammatical role *front 988 </front> preliminaries of book *fw type= 995 catch </fw> forme-work (printer's fol business) page rttop sig *gap extent= 996 </gap> gap in text (not damage) gender type= m f n </gen> gender cited for dictionary headword *gloss target= 1001 </gloss> code for term defined *group 1006 </group> collection of texts in a book *handShft new= 1008 1 </handShft> type of handwriting shifted into old= 2 type of hand shifted from heading </heading> title or phrase, etc., marking work's start headngno </headngno> no. for section of work i type= 3spr </i> inflection cited for dictionary headword img id= Hodnett 25 </img> identifier for a graphic element like a woodcut src= rumms.gif file in which image may be found *imprimat 1018 </imprimat> authority to publish *imprint 1019 </imprint> publisher information lang type= e f g gr </lang> language name i l s w *mileston ed= 1059 </mileston> standard edition cited unit= type of reference unit for edition n= value(s) for unit mode type= v p </mode> verse or prose *name type= 1065 ps pl </name> proper name *note place= 1072 lmargin </note> marginal, foot-, or end- rmargin note bottom inline end interlinear target= 154 location of <ptr> tag where note belongs notesymb id= 22 </notesymb> encloses note number or letter in text *num type= card ord cardinal, ordinal, etc. *p 1090 </p> unnumbered paragraph perform </perform> performance information plydv type= argument </plydv> unstructural subdivision dedication of play dedicatory poem drampers epilogue epistle preface epigraph plydv1 type= act </plydv1> largest division of play n= 1 number of scene plydv2 type= scene </plydv2> 2nd largest division of play n= 1 number of scene classcen= 1 classical scene (marked by entrance or exit of characters) plydv3 type= line </plydv3> 3rd largest division of speech play n= 1 number of line or speech plydv4 type= line </plydv4> 4th largest division of play n= 1 number of line pmdv type= argument </pmdv> nonstructural dedication subdivision of poem dedicatory poem epilogue epistle preface pmdv1 type= book </pmdv1> largest division of poem canto fitt poem n= 1 number of poem pmdv2 type= stanza </pmdv2> 2nd largest division of canto poem paragraph epigraph motto n= 1 number of division rhyme= abab rhyme scheme pmdv3 type= long </pmdv3> 3rd largest division of bob poem wheel motto line n= 1 number of division pmdv4 type= line </pmdv4> 4th largest division of poem n= 1 number of division price </price> cost of work printer </printer> name of printer *publish 1111 </publish> name of publisher *pubPlace 1112 </pubPlace> place of publication *quot work= 1116 Paradise Lost </quot> quotation from work author= John Milton author of work *rdg resp= 1119 </rdg> textual variant and responsibility status= clear clarity of reading unclear *ref target= 1124 </ref> location for erratum item in text resp= who made the pointer type= what kind of thing it is *restore type= 1134 overstrike </restore> scribe's cancellation of erasure deletion bracketed subpunction hand= ret </ret> text in RET series *role 1136 </role> role name in dramatis personae scene </scene> script type= a anglicana </script> script used by scribe af anglicana formata s secretary i italic, etc. *sic corr= 1152 </sic> correct reading resp= who made the correction *signed 1154 </signed> writer's signature *sp who= 1159 Hamlet </sp> speech type= spoken how spoken thought direct= y n unspecified kind of speaking *speaker 1162 </speaker> speech prefix *stage type= 1163 entrance </stage> stage direction exit mixed subhead </subhead> subheading, to be used after <heading> *supplied 1171 </supplied> string supplied by reason= hung editor of e-text *term type= 1186 elision </term> a word or phrase cited expansion= as an object *text 1189 </text> the entire work ttdv type= argument </ttdv> unstructural subdivision toreader of poem epigraph epilogue dedication dedicatory poem epistle preface close ttdv1 type= novel </ttdv1> largest division of treatise prose text dictionary n= 1 number of division ttdv2 type= chapt </ttdv2> 2nd largest division alpha of prose text n= 1 number of division ttdv3 type= par </ttdv3> 3rd largest division entry of prose text n= 1 number of division lemma= normalized headword (dictionary entry) ttdv4 type= line </ttdv4> 4th largest division of sent prose text lemma phrase n= 1 number of scene *usg type= geo </usg> usage statement for dictionary headword *xref type= 31 </xref> cross-reference
5.2.2 COCOA FEATURE TAGS Consider the beginning of the first folio King Lear, encoded with COCOA tags:
<genre play>
<page 283>
<speaker ->
283
<bkdiv3 1><f rl><tt title>THE TRAGEDIE OF
<bkdiv3 2>KING LEAR<f i>
<bkdiv3 3><tt act><lang l>A{ct}us Primus. <tt scene>Sc{oe}na Prima.<f r>
<lang e>
<playdiv1 act1>
<playdiv2 scene1>
<act 1><scene 1><classscene 1>
<bkdiv3 4><tt stage>Enter Kent, Glouce{{s}t}er, and Edmond.
<bkdiv3 5><tt spffix>Kent.<speaker Kent>
<bkdiv3 6><tl 1><tt speech><f 5bk>I<f r> Thought the King had more a{ff}e{ct}ed the
<bkdiv3 7><tl 2>Duke of <f i>Albany<f r>, then <f i>Cornwall<f r>.<speaker ->
<bkdiv3 8><tl 3><f i>Glou.<f r> It did alwayes {s}eeme {s}o to vs : But

These feature tags attach attributes to the text that follows them: the through line number of the book or of the play-text, the act, the scene, the classical scene, the font, the language, and the current speaker. These are obvious to anyone who understands the conventions of a printed text. Note how the two x-type tags <bkt> and <tt> enable us to distinguish play titles, speeches, speech prefixes, and stage directions from one another. Various types of text appear in any book, and one of the functions of a tagging system is to enable someone to retrieve words from only one such type, or several types in combination. The variable for the book-type is bkt. The variable for the text-type is tt. Each of these precedes the passage it modifies and then is turned off by the next occurrence of a <bkt> or a <tt> tag, sometimes directly thereafter. Note that no more than one <bkt> or <tt> tag can be used for one passage of text at a time. Once the tag <bkt text> appears, <tt> tags are then inserted before the text that they characterize. Eventually they are turned off by another <bkt> tag.
Once you intend to use a feature tag, you must be careful to apply it consistently. Stage directions and speech prefixes, for example, have no speakers; hence <speaker ->, having the null or dummy value, may have to be used often. Here is a list of the main COCOA tags employed in RET editions.
FEATURE VARIABLE VALUES END TAG act number act <act -> book division bkdv <bkdv -> book division 1 bkdv1 <bkdv1 -> book division 2 bkdv2 <bkdv2 -> book division 3 bkdv3 <bkdv3 -> book division 4 bkdv4 <bkdv4 -> book type bkt <bkt -> addressee author catch closing colophon datepub dedication doctitle edition epigram errata fol heading headingno imprimatur msnote page performance placepub price printer publisher rtbot rttop sig signed tableofcontents text volno xref citation cit <cit -> classical scene classcen <classscene -> compositor change compshift <compshift -> correction in errata correct <correct -> error in errata list error <error -> foliation gathering fol <fol -> font f <f -> bl (black letter) bk (block letter) i (italic) r (roman) l (lapidary) sc (small capitals) + (superscript) _ (subscript) grammatical gender gend <gend -> f feminine m masculine n neuter genre genre <genre -> graphic img <img -> Hodnett25 handwriting change handshift none morphol. inflection inflect <inflect -> language lang <lang -> e (English) l (Latin) f (French) g (German) gr (Greek) i (Italian) s (Spanish) mode mode <mode -> v p milestone ref. milestone <milestone -> name name <name -> page page <page -> play division plydv <plydv -> play division 1 plydv1 <plydv1 -> play division 2 plydv2 <plydv2 -> play division 3 plydv3 <plydv3 -> play division 4 plydv4 <plydv4 -> poem division pmdv <pmdv -> poem division 1 pmdv1 <pmdv1 -> poem division 2 pmdv2 <pmdv2 -> poem division 3 pmdv3 <pmdv3 -> poem division 4 pmdv4 <pmdv4 -> quotation quotation <quotation -> quoted author qauthor <qauthor -> rhyme rhyme <rhyme -> scenenumber scene <scene -> script s <s -> speaker speaker <speaker -> grammat. role (word) syntact <syntact -> term as object term <term --> text division ttdv <ttdv -> text division 1 ttdv1 <ttdv1 -> text division 2 ttdv2 <ttdv2 -> text division 3 ttdv3 <ttdv3 -> text division 4 ttdv4 <ttdv4 -> text type tt <tt -> abstract act actor addressee closing dedication dedication poem drampers (dramatic personae) epigraph epistle heading headingno incipit inspeech nt (note) nt:lmargin nt:rmargin nt:foot nt:inline nt:end ntno (footnote number) poemtitle poemno pmdv quotation role scene signed speech sppfx (speech prefix) stage (stage direction) subheading text title
Note that font information includes the type face rather than pica size in most texts. See Shakespeare's sonnets for an attempt to overcome this limitation in part. Note also that some of these COCOA tags could have been replaced by counters, special characters occurring nowhere else in the text and so available for use. In a corpus, these characters are better saved for use in the alphabet and diacritics.