5.2 Feature Tags

These characterize a part of the whole work. They describe one of the work's features, elements, or attributes, but such units elude thorough listing and resist classification. Nonetheless, here is a brief survey.

The discretely identifiable things in any work, I believe, are introductory, main, interrupting, or concluding. In other words, I believe that the presentation, lecture, or speech by a single person to an audience is a metaphor for the parts of a work that would be understood and agreed to by the Renaissance. SGML uses two metaphors to classify units. One is spatial and cites the human body (the main thing), approached from the front (introductory) and from the back (concluding). The second is temporal and speaks of something (the main thing) having openers and closers. SGML guidelines do not regard the interruption or digression or aside as a dominant class of feature.

The next class of units relate to arrangement and expressiveness. Paragraphing, indentation (for lists and verse), columns, different fonts, speech prefixes and stage directions, and even rules, ornaments, and the use of leaf and other special organizational symbols fall into this category. They all may be seen to characterize the author during the delivery of the work to the audience, although the interpretation of how they do so is sometimes unclear. TEI interprets what it aptly calls "rendition" as independent of a work's logical content.

The language and the discourse of the whole work form a category. Tags for typographical errors, damaged text, scribal deletions and additions, diacritics, word separators, language shift, names, rhyme and metre, press and textual variants, the parts of speech and lemmas of words, and words that are split or cited as themselves belong to this class of things, all capable of being tagged.

Finally, words, phrases, sentences, paragraphs, or other passages may be tagged for semantic meaning or for their part in a theoretical understanding of some aspect of the world.

The first, second, and third classes of units will generally be tagged in RET editions because we are most capable of reaching consensus on what they are.

That the RET use of SGML is not always TEI-conformant is no criticism of TEI guidelines, which were written for the widest user community. They clearly suit 20th-century texts, especially technical writing (where ambiguity is frowned on), better than works in the English Renaissance.

5.2.1 SGML FEATURE TAGS

SGML can store multiple attributes in any tag; and this leads to some economy in tagging. For example, <plydv1 type="act" n="1"> can replace COCOA <act 1><plydv1 act1>, and <plydv3 type="speech" speaker="Gloucester"> can replace COCOA <plydv3 speech><speaker Gloucester>. This multiple-attribute capability, for instance, could be used to tag each section of text with its kind. The following example departs from RET guidelines but it illustrates one philosophy of tagging.
<lang type="e">
<genre type="play" subgenre="history-tragedy">
<ln bkl="1"><isa mode="bibl" type="page" n="283">283</isa></ln>
<ln bkl="2"><isa mode="text" type="title"><f type="rl">THE TRAGEDIE OF</f></isa> </ln>
<ln bkl="3">KING LEAR<f type="i"> </isa> </ln>
<ln bkl="4"><isa mode="play" div="1" type="act" no="1"><lang type="l">A{ct}us Primus. <isa mode="play" type="scene" no="1" classscen="1">Sc{oe}na Prima. </f> <f type="r"><lang type="e"> </isa> </ln>
<ln bkl="5"><isa mode="play" type="stagdir" n="1"> Enter Kent, Glouce{{s}t}er, and Edmond. </isa> </ln>
<ln bkl="6"><isa mode="play" type="sppfx" n="1" normal="Kent">Kent. </isa> </ln>
<ln bkl="7" tl="1">"isa mode="play" type="speech" n="1" speaker="Kent"> </f> <f type="5bk">I<f type="r"> Thought the King had more a{ff}e{ct}ed the</isa> </ln>
<ln bkl="8" tl="2">Duke of </f> <f type="i">Albany</f> <f type="r">, then </f> <f type="i">Cornwall</f> <f type="r">. </isa> </ln>
<ln bkl="9" tl="3"><isa mode="play" type="spffx" n="1" normal="Gloucester"> </f> <f type="i">Glou. </f> <f type="r"><isa mode="play" type="speech" n="2" speaker="Gloucester"> It did alwayes {s}eeme {s}o to vs : But</isa> </ln>
</lang>
...
</f>
</genre>
Each passage has three SGML-like tags: an isa tag (like the COCOA tt tag) that tells what the passage is, an f tag that indicates its font, and an ln tag that indicates its line-number according to several ways of counting. The isa tag has attributes like mode, type, n, classscene, normalized, and speaker. This scheme enables us to select words from one of two modes (bibl and play), from several types within each (e.g., act, scene, stage, sppfx, and speech within mode play) and from within that last type, speech, each character (e.g., Kent, Gloucester, etc.). We can also retrieve the different kinds of speech prefixes employed for any one (normalized) character name.

Here is a list of the main SGML tags employed in RET editions. Asterisked tags are taken from TEI guidelines.


Variable-      TEI      Attri.      End               Description
Attributes     Page     Values      Tag   
               No.

*a   name=     861                  </a>        hypertextual reference 
   href=
*actor         850                  </actor>    dramatis personae
                                                      actor's name
addresse                            </addresse> person who is addressed
*add place=    850      inline      </add>      ms addition to work 
                        supralinear 
                        infralinear  
                        left right       
                        top bottom  
                        opposite       
                        verso mixed            
   hand=                                              handwriting
*anchor id=             none                          point in text specified
                                                      with id
                                                      identifier for a 
                                                      cross-reference
*app type=      863                 </app>      apparatus (e.g., press
                                                      variants)
   from=
   to=
*author         870                 </author>   author's name
*back           872                 </back>     end material in book
*bibl           873                 </bibl>     bibl. ref.
*bibtitle      1197                 </title>    title in bibl. ref.
   level=               a[nalytic]
                        m[onographic]
                        j[ournal]  
                        s[eries]  
                        u[npublished]
bkdv type=              titlepage   </bkdv>     unstructural subdivision
                        imprimatur                    of book
                        dedication 
                        table of contents 
                        errata epigraph 
                        colophon toreader
bkdv1   type=           gathering   </bkdv1>    largest unit of book
   n=                   1                             number of largest unit
   format=              folio                         subunit of largest unit
   in=                  6s                            no. of formes in
                                                      gathering
bkdv2   type=           page        </bkdv2>    second largest unit of 
                                                      book
   n=                   1                             no. of 2nd largest unit
   sig=                 Aa2                           signature of 2nd largest 
                                                      unit
   side=                inner                         side of forme on which
                                                      2nd largest unit occurs
   forme=               2                             which forme the 2nd 
                                                      largest unit is on
bkdv3   type=           col line    </bkdv3>    3rd largest unit of 
                                                      book: line or column
   n=                   2                             number of column or line
bkdv4   type=           line        </bkdv4>    4th largest unit of book
   n=                   3                             number of line
*body           879                 </body>     main part of book
booksell                            </booksell> name of bookseller
*c   type=                          </c>        character
*cit work=      899                 </cit>      citation of work
   author=
classcen n=             1           </classcen> classical scene   
closing                             </closing>  phrase, etc., marking 
                                                      work's end   
*col n=         906     1           </col>      column
correct                             </correct>  correct reading in 
                                                      errata                                          
*damage type=   912                 </damage>   the kind of damage
   extent=                                            how much damage
*del type=      922     overstrike  </del>      type of deletion
                        erasure 
                        bracketed  
                        subpunction
   hand=                                              scribe deleting
*docAuthor      945                 </docAuthor> author of book
*docDate        945                 </docDate>  date of publ. of book
docDatcp                            </docDatcp> date of composition
*docEdition     946                 </docEdition> edition of book
docEditr                            </docEditr> editor of book
*docImpr        947                 </docImpr>  imprint of book
*docTitle       948                 </docTitle> title of book
docVolno                            </docVolno> volume number of
                                                      book
*edition        949                 </edition>  edition information
*editor         950                 </editor>   editor's name
egt n=                  1           </egt>      example of use of dictionary headword
egttr   n=              1           </egttr>    translation of example 
                                                      using dictionary lemma
error   from=           err1        </error>    error in errata; 
                                                      starting location
   to=                  a1                            ending location
etym                                </ety>      etymology
explan                              </explan>   explanation, senses,
                                                      commentary, etc., 
                                                      on headword
f type=                             </f>        font
                        + superscript
                        _ subscript
                        2 2-line
                        bk block letter
                        bl black letter
                        c capitals
                        d double
                        i italic
                        l lapidary
                        p pica
                        r roman
                        s small
                        sc small capitals
                        t titling
*form type=      984      lemma     </form>     e.g., headword in
                        phrase                        dictionary entry
                        compound
                        derivative 
                        inflected
   category=            colour                        content classification
                                                      tool, etc.
   gender=              f m n
   inflect=             poss, etc.                    morphological inflection
   modernsp=
   originsp=
   pos=                 noun, det, etc.               part-of-speech
   syntact=             subj, pred, etc.              grammatical role
*front         988                  </front>    preliminaries of book
*fw type=      995      catch       </fw>       forme-work (printer's
                        fol                           business)
                        page
                        rttop
                        sig
*gap extent=   996                  </gap>      gap in text (not damage)
gender type=            m f n       </gen>      gender cited for
                                                      dictionary headword
*gloss target= 1001                 </gloss>    code for term defined
*group         1006                 </group>    collection of texts in a
                                                      book
*handShft new= 1008     1           </handShft> type of handwriting 
                                                      shifted into
   old=                 2                             type of hand shifted
                                                      from
heading                             </heading>  title or phrase, etc., 
                                                      marking work's start
headngno                            </headngno> no. for section of work
i type=                 3spr        </i>        inflection cited for
                                                      dictionary headword                                                         
img id=                 Hodnett 25  </img>      identifier for a graphic
                                                      element like a woodcut
   src=                 rumms.gif                     file in which image may
                                                      be found
*imprimat      1018                 </imprimat> authority to publish
*imprint       1019                 </imprint>  publisher information
lang type=              e f g gr    </lang>     language name
                        i l s w
*mileston ed=  1059                 </mileston> standard edition cited
   unit=                                              type of reference unit
                                                      for edition
   n=                                                 value(s) for unit
mode type=              v p         </mode>     verse or prose
*name   type=  1065     ps pl       </name>     proper name      
*note   place= 1072     lmargin     </note>     marginal, foot-, or end-
                        rmargin                       note
                        bottom 
                        inline
                        end 
                        interlinear
   target=              154                           location of <ptr>
                                                      tag where note belongs
notesymb id=            22          </notesymb> encloses note number or 
                                                      letter in text
*num type=              card ord                      cardinal, ordinal, etc.   
*p             1090                 </p>        unnumbered paragraph
perform                             </perform>  performance information
plydv type=             argument    </plydv>    unstructural subdivision
                        dedication                    of play
                        dedicatory poem 
                        drampers 
                        epilogue  
                        epistle 
                        preface
                        epigraph
plydv1 type=            act         </plydv1>   largest division of play
   n=                   1                             number of scene
plydv2 type=            scene       </plydv2>   2nd largest division 
                                                      of play
   n=                   1                             number of scene
   classcen=            1                             classical scene (marked 
                                                      by entrance or exit of
                                                      characters) 
plydv3 type=            line        </plydv3>   3rd largest division of
                        speech                        play
   n=                   1                             number of line or speech
plydv4 type=            line        </plydv4>   4th largest division of 
                                                      play
   n=                   1                             number of line
pmdv type=              argument    </pmdv>     nonstructural 
                        dedication                    subdivision of poem
                        dedicatory poem
                        epilogue
                        epistle
                        preface
pmdv1 type=             book        </pmdv1>    largest division of poem
                        canto
                        fitt
                        poem
   n=                   1                             number of poem
pmdv2 type=             stanza      </pmdv2>    2nd largest division of
                        canto                         poem
                        paragraph
                        epigraph
                        motto
   n=                   1                             number of division
   rhyme=               abab                          rhyme scheme
pmdv3 type=             long        </pmdv3>    3rd largest division of
                        bob                           poem
                        wheel
                        motto
                        line
   n=                   1                             number of division
pmdv4 type=             line        </pmdv4>    4th largest division of                                                     poem
   n=                     1                           number of division
price                               </price>    cost of work
printer                             </printer>  name of printer
*publish       1111                 </publish>  name of publisher
*pubPlace      1112                 </pubPlace> place of publication
*quot work=    1116     Paradise Lost </quot>   quotation from work
   author=              John Milton                   author of work
*rdg resp=     1119                 </rdg>      textual variant and
                                                      responsibility
   status=              clear                         clarity of reading
                        unclear
*ref target=   1124                 </ref>      location for erratum
                                                      item in text
   resp=                                              who made the pointer
   type=                                              what kind of thing it is
*restore type= 1134     overstrike  </restore>  scribe's cancellation of
                        erasure                       deletion
                        bracketed
                        subpunction
   hand=
ret                                 </ret>      text in RET series
*role          1136                 </role>     role name in dramatis 
                                                      personae
scene                               </scene>
script type=            a anglicana </script>   script used by scribe
                        af anglicana formata
                        s secretary
                        i italic, etc.
*sic corr=     1152                 </sic>      correct reading
   resp=                                              who made the correction
*signed        1154                 </signed>   writer's signature
*sp who=       1159     Hamlet      </sp>       speech
   type=                spoken                        how spoken
                        thought
   direct=              y n unspecified               kind of speaking
*speaker       1162                 </speaker>  speech prefix
*stage type=   1163     entrance    </stage>    stage direction
                        exit
                        mixed
subhead                             </subhead>  subheading, to be used
                                                      after <heading>   
*supplied      1171                 </supplied> string supplied by
    reason=             hung                          editor of e-text
*term   type=  1186     elision     </term>     a word or phrase cited
   expansion=                                         as an object
*text          1189                 </text>     the entire work
ttdv type=              argument    </ttdv>     unstructural subdivision
                        toreader                      of poem
                        epigraph
                        epilogue 
                        dedication
                        dedicatory poem 
                        epistle
                        preface
                        close
ttdv1 type=             novel       </ttdv1>    largest division of
                        treatise                      prose text
                        dictionary
   n=                   1                             number of division
ttdv2 type=             chapt       </ttdv2>    2nd largest division
                        alpha                         of prose text
   n=                   1                             number of division
ttdv3 type=             par         </ttdv3>    3rd largest division
                        entry                         of prose text
   n=                   1                             number of division
   lemma=                                             normalized headword
                                                      (dictionary entry)
ttdv4   type=           line        </ttdv4>    4th largest division of
                        sent                          prose text
                        lemma
                        phrase
   n=                   1                             number of scene
*usg type=              geo         </usg>      usage statement for
                                                      dictionary headword
*xref type=             31          </xref>     cross-reference
5.2.2 COCOA FEATURE TAGS Consider the beginning of the first folio King Lear, encoded with COCOA tags:
<genre play>
<page 283>
<speaker ->
283
<bkdiv3 1><f rl><tt title>THE TRAGEDIE OF
<bkdiv3 2>KING LEAR<f i>
<bkdiv3 3><tt act><lang l>A{ct}us Primus. <tt scene>Sc{oe}na Prima.<f r>
<lang e>
<playdiv1 act1>
<playdiv2 scene1>
<act 1><scene 1><classscene 1>
<bkdiv3 4><tt stage>Enter Kent, Glouce{{s}t}er, and Edmond.
<bkdiv3 5><tt spffix>Kent.<speaker Kent>
<bkdiv3 6><tl 1><tt speech><f 5bk>I<f r> Thought the King had more a{ff}e{ct}ed the
<bkdiv3 7><tl 2>Duke of <f i>Albany<f r>, then <f i>Cornwall<f r>.<speaker ->
<bkdiv3 8><tl 3><f i>Glou.<f r> It did alwayes {s}eeme {s}o to vs : But
These feature tags attach attributes to the text that follows them: the through line number of the book or of the play-text, the act, the scene, the classical scene, the font, the language, and the current speaker. These are obvious to anyone who understands the conventions of a printed text. Note how the two x-type tags <bkt> and <tt> enable us to distinguish play titles, speeches, speech prefixes, and stage directions from one another. Various types of text appear in any book, and one of the functions of a tagging system is to enable someone to retrieve words from only one such type, or several types in combination. The variable for the book-type is bkt. The variable for the text-type is tt. Each of these precedes the passage it modifies and then is turned off by the next occurrence of a <bkt> or a <tt> tag, sometimes directly thereafter. Note that no more than one <bkt> or <tt> tag can be used for one passage of text at a time. Once the tag <bkt text> appears, <tt> tags are then inserted before the text that they characterize. Eventually they are turned off by another <bkt> tag.

Once you intend to use a feature tag, you must be careful to apply it consistently. Stage directions and speech prefixes, for example, have no speakers; hence <speaker ->, having the null or dummy value, may have to be used often. Here is a list of the main COCOA tags employed in RET editions.


FEATURE            VARIABLE   VALUES      END TAG

act number         act                    <act ->
book division      bkdv                   <bkdv ->
book division 1    bkdv1                  <bkdv1 ->
book division 2    bkdv2                  <bkdv2 ->
book division 3    bkdv3                  <bkdv3 ->
book division 4    bkdv4                  <bkdv4 ->
book type          bkt                    <bkt ->
                              addressee
                              author      
                              catch
                              closing
                              colophon
                              datepub
                              dedication
                              doctitle
                              edition
                              epigram
                              errata
                              fol
                              heading
                              headingno
                              imprimatur
                              msnote
                              page
                              performance
                              placepub
                              price
                              printer
                              publisher
                              rtbot
                              rttop
                              sig
                              signed
                              tableofcontents
                              text
                              volno
                              xref
citation           cit                    <cit ->
classical scene    classcen               <classscene ->
compositor change  compshift              <compshift ->
correction in errata correct              <correct ->
error in errata list error                <error ->
foliation gathering fol                   <fol ->
font               f                      <f ->      
                              bl (black letter)
                              bk (block letter)
                              i (italic)
                              r (roman)
                              l (lapidary)
                              sc (small capitals)
                              + (superscript)
                              _ (subscript)
grammatical gender gend                   <gend ->
                              f feminine
                              m masculine
                              n neuter
genre              genre                  <genre ->
graphic            img                    <img ->
                              Hodnett25
handwriting change handshift              none               
morphol. inflection inflect               <inflect ->
language           lang                   <lang ->
                              e (English)
                              l (Latin)
                              f (French)
                              g (German)
                              gr (Greek)
                              i (Italian)
                              s (Spanish)
mode               mode                   <mode ->
                              v
                              p
milestone ref.     milestone              <milestone ->
name               name                   <name ->
page               page                   <page ->
play division      plydv                  <plydv ->
play division 1    plydv1                 <plydv1 ->
play division 2    plydv2                 <plydv2 ->
play division 3    plydv3                 <plydv3 ->
play division 4    plydv4                 <plydv4 ->
poem division      pmdv                   <pmdv ->
poem division 1    pmdv1                  <pmdv1 ->
poem division 2    pmdv2                  <pmdv2 ->
poem division 3    pmdv3                  <pmdv3 ->
poem division 4    pmdv4                  <pmdv4 ->
quotation          quotation              <quotation ->
quoted author      qauthor                <qauthor ->
rhyme              rhyme                  <rhyme ->
scenenumber        scene                  <scene ->
script             s                      <s ->
speaker            speaker                <speaker ->
grammat. role (word) syntact              <syntact -> 
term as object     term                   <term -->
text division      ttdv                   <ttdv ->
text division 1    ttdv1                  <ttdv1 ->
text division 2    ttdv2                  <ttdv2 ->
text division 3    ttdv3                  <ttdv3 ->
text division 4    ttdv4                  <ttdv4 ->
text type          tt                     <tt ->      
                              abstract
                              act
                              actor
                              addressee
                              closing
                              dedication
                              dedication poem
                              drampers (dramatic personae)
                              epigraph
                              epistle
                              heading
                              headingno
                              incipit
                              inspeech   
                              nt   (note)
                              nt:lmargin
                              nt:rmargin
                              nt:foot
                              nt:inline
                              nt:end
                              ntno (footnote number)
                              poemtitle
                              poemno
                              pmdv
                              quotation
                              role   
                              scene   
                              signed
                              speech
                              sppfx   (speech prefix)
                              stage   (stage direction)
                              subheading
                              text
                              title   
Note that font information includes the type face rather than pica size in most texts. See Shakespeare's sonnets for an attempt to overcome this limitation in part. Note also that some of these COCOA tags could have been replaced by counters, special characters occurring nowhere else in the text and so available for use. In a corpus, these characters are better saved for use in the alphabet and diacritics.