The discretely identifiable things in any work, I believe, are introductory, main, interrupting, or concluding. In other words, I believe that the presentation, lecture, or speech by a single person to an audience is a metaphor for the parts of a work that would be understood and agreed to by the Renaissance. SGML uses two metaphors to classify units. One is spatial and cites the human body (the main thing), approached from the front (introductory) and from the back (concluding). The second is temporal and speaks of something (the main thing) having openers and closers. SGML guidelines do not regard the interruption or digression or aside as a dominant class of feature.
The next class of units relate to arrangement and expressiveness. Paragraphing, indentation (for lists and verse), columns, different fonts, speech prefixes and stage directions, and even rules, ornaments, and the use of leaf and other special organizational symbols fall into this category. They all may be seen to characterize the author during the delivery of the work to the audience, although the interpretation of how they do so is sometimes unclear. TEI interprets what it aptly calls "rendition" as independent of a work's logical content.
The language and the discourse of the whole work form a category. Tags for typographical errors, damaged text, scribal deletions and additions, diacritics, word separators, language shift, names, rhyme and metre, press and textual variants, the parts of speech and lemmas of words, and words that are split or cited as themselves belong to this class of things, all capable of being tagged.
Finally, words, phrases, sentences, paragraphs, or other passages may be tagged for semantic meaning or for their part in a theoretical understanding of some aspect of the world.
The first, second, and third classes of units will generally be tagged in RET editions because we are most capable of reaching consensus on what they are.
That the RET use of SGML is not always TEI-conformant is no criticism of TEI guidelines, which were written for the widest user community. They clearly suit 20th-century texts, especially technical writing (where ambiguity is frowned on), better than works in the English Renaissance.
<lang type="e">Each passage has three SGML-like tags: an isa tag (like the COCOA tt tag) that tells what the passage is, an f tag that indicates its font, and an ln tag that indicates its line-number according to several ways of counting. The isa tag has attributes like mode, type, n, classscene, normalized, and speaker. This scheme enables us to select words from one of two modes (bibl and play), from several types within each (e.g., act, scene, stage, sppfx, and speech within mode play) and from within that last type, speech, each character (e.g., Kent, Gloucester, etc.). We can also retrieve the different kinds of speech prefixes employed for any one (normalized) character name.
<genre type="play" subgenre="history-tragedy">
<ln bkl="1"><isa mode="bibl" type="page" n="283">283</isa></ln>
<ln bkl="2"><isa mode="text" type="title"><f type="rl">THE TRAGEDIE OF</f></isa> </ln>
<ln bkl="3">KING LEAR<f type="i"> </isa> </ln>
<ln bkl="4"><isa mode="play" div="1" type="act" no="1"><lang type="l">A{ct}us Primus. <isa mode="play" type="scene" no="1" classscen="1">Sc{oe}na Prima. </f> <f type="r"><lang type="e"> </isa> </ln>
<ln bkl="5"><isa mode="play" type="stagdir" n="1"> Enter Kent, Glouce{{s}t}er, and Edmond. </isa> </ln>
<ln bkl="6"><isa mode="play" type="sppfx" n="1" normal="Kent">Kent. </isa> </ln>
<ln bkl="7" tl="1">"isa mode="play" type="speech" n="1" speaker="Kent"> </f> <f type="5bk">I<f type="r"> Thought the King had more a{ff}e{ct}ed the</isa> </ln>
<ln bkl="8" tl="2">Duke of </f> <f type="i">Albany</f> <f type="r">, then </f> <f type="i">Cornwall</f> <f type="r">. </isa> </ln>
<ln bkl="9" tl="3"><isa mode="play" type="spffx" n="1" normal="Gloucester"> </f> <f type="i">Glou. </f> <f type="r"><isa mode="play" type="speech" n="2" speaker="Gloucester"> It did alwayes {s}eeme {s}o to vs : But</isa> </ln>
</lang>
...
</f>
</genre>
Here is a list of the main SGML tags employed in RET editions. Asterisked tags are taken from TEI guidelines.
Variable- TEI Attri. End Description
Attributes Page Values Tag
No.
*a name= 861 </a> hypertextual reference
href=
*actor 850 </actor> dramatis personae
actor's name
addresse </addresse> person who is addressed
*add place= 850 inline </add> ms addition to work
supralinear
infralinear
left right
top bottom
opposite
verso mixed
hand= handwriting
*anchor id= none point in text specified
with id
identifier for a
cross-reference
*app type= 863 </app> apparatus (e.g., press
variants)
from=
to=
*author 870 </author> author's name
*back 872 </back> end material in book
*bibl 873 </bibl> bibl. ref.
*bibtitle 1197 </title> title in bibl. ref.
level= a[nalytic]
m[onographic]
j[ournal]
s[eries]
u[npublished]
bkdv type= titlepage </bkdv> unstructural subdivision
imprimatur of book
dedication
table of contents
errata epigraph
colophon toreader
bkdv1 type= gathering </bkdv1> largest unit of book
n= 1 number of largest unit
format= folio subunit of largest unit
in= 6s no. of formes in
gathering
bkdv2 type= page </bkdv2> second largest unit of
book
n= 1 no. of 2nd largest unit
sig= Aa2 signature of 2nd largest
unit
side= inner side of forme on which
2nd largest unit occurs
forme= 2 which forme the 2nd
largest unit is on
bkdv3 type= col line </bkdv3> 3rd largest unit of
book: line or column
n= 2 number of column or line
bkdv4 type= line </bkdv4> 4th largest unit of book
n= 3 number of line
*body 879 </body> main part of book
booksell </booksell> name of bookseller
*c type= </c> character
*cit work= 899 </cit> citation of work
author=
classcen n= 1 </classcen> classical scene
closing </closing> phrase, etc., marking
work's end
*col n= 906 1 </col> column
correct </correct> correct reading in
errata
*damage type= 912 </damage> the kind of damage
extent= how much damage
*del type= 922 overstrike </del> type of deletion
erasure
bracketed
subpunction
hand= scribe deleting
*docAuthor 945 </docAuthor> author of book
*docDate 945 </docDate> date of publ. of book
docDatcp </docDatcp> date of composition
*docEdition 946 </docEdition> edition of book
docEditr </docEditr> editor of book
*docImpr 947 </docImpr> imprint of book
*docTitle 948 </docTitle> title of book
docVolno </docVolno> volume number of
book
*edition 949 </edition> edition information
*editor 950 </editor> editor's name
egt n= 1 </egt> example of use of dictionary headword
egttr n= 1 </egttr> translation of example
using dictionary lemma
error from= err1 </error> error in errata;
starting location
to= a1 ending location
etym </ety> etymology
explan </explan> explanation, senses,
commentary, etc.,
on headword
f type= </f> font
+ superscript
_ subscript
2 2-line
bk block letter
bl black letter
c capitals
d double
i italic
l lapidary
p pica
r roman
s small
sc small capitals
t titling
*form type= 984 lemma </form> e.g., headword in
phrase dictionary entry
compound
derivative
inflected
category= colour content classification
tool, etc.
gender= f m n
inflect= poss, etc. morphological inflection
modernsp=
originsp=
pos= noun, det, etc. part-of-speech
syntact= subj, pred, etc. grammatical role
*front 988 </front> preliminaries of book
*fw type= 995 catch </fw> forme-work (printer's
fol business)
page
rttop
sig
*gap extent= 996 </gap> gap in text (not damage)
gender type= m f n </gen> gender cited for
dictionary headword
*gloss target= 1001 </gloss> code for term defined
*group 1006 </group> collection of texts in a
book
*handShft new= 1008 1 </handShft> type of handwriting
shifted into
old= 2 type of hand shifted
from
heading </heading> title or phrase, etc.,
marking work's start
headngno </headngno> no. for section of work
i type= 3spr </i> inflection cited for
dictionary headword
img id= Hodnett 25 </img> identifier for a graphic
element like a woodcut
src= rumms.gif file in which image may
be found
*imprimat 1018 </imprimat> authority to publish
*imprint 1019 </imprint> publisher information
lang type= e f g gr </lang> language name
i l s w
*mileston ed= 1059 </mileston> standard edition cited
unit= type of reference unit
for edition
n= value(s) for unit
mode type= v p </mode> verse or prose
*name type= 1065 ps pl </name> proper name
*note place= 1072 lmargin </note> marginal, foot-, or end-
rmargin note
bottom
inline
end
interlinear
target= 154 location of <ptr>
tag where note belongs
notesymb id= 22 </notesymb> encloses note number or
letter in text
*num type= card ord cardinal, ordinal, etc.
*p 1090 </p> unnumbered paragraph
perform </perform> performance information
plydv type= argument </plydv> unstructural subdivision
dedication of play
dedicatory poem
drampers
epilogue
epistle
preface
epigraph
plydv1 type= act </plydv1> largest division of play
n= 1 number of scene
plydv2 type= scene </plydv2> 2nd largest division
of play
n= 1 number of scene
classcen= 1 classical scene (marked
by entrance or exit of
characters)
plydv3 type= line </plydv3> 3rd largest division of
speech play
n= 1 number of line or speech
plydv4 type= line </plydv4> 4th largest division of
play
n= 1 number of line
pmdv type= argument </pmdv> nonstructural
dedication subdivision of poem
dedicatory poem
epilogue
epistle
preface
pmdv1 type= book </pmdv1> largest division of poem
canto
fitt
poem
n= 1 number of poem
pmdv2 type= stanza </pmdv2> 2nd largest division of
canto poem
paragraph
epigraph
motto
n= 1 number of division
rhyme= abab rhyme scheme
pmdv3 type= long </pmdv3> 3rd largest division of
bob poem
wheel
motto
line
n= 1 number of division
pmdv4 type= line </pmdv4> 4th largest division of poem
n= 1 number of division
price </price> cost of work
printer </printer> name of printer
*publish 1111 </publish> name of publisher
*pubPlace 1112 </pubPlace> place of publication
*quot work= 1116 Paradise Lost </quot> quotation from work
author= John Milton author of work
*rdg resp= 1119 </rdg> textual variant and
responsibility
status= clear clarity of reading
unclear
*ref target= 1124 </ref> location for erratum
item in text
resp= who made the pointer
type= what kind of thing it is
*restore type= 1134 overstrike </restore> scribe's cancellation of
erasure deletion
bracketed
subpunction
hand=
ret </ret> text in RET series
*role 1136 </role> role name in dramatis
personae
scene </scene>
script type= a anglicana </script> script used by scribe
af anglicana formata
s secretary
i italic, etc.
*sic corr= 1152 </sic> correct reading
resp= who made the correction
*signed 1154 </signed> writer's signature
*sp who= 1159 Hamlet </sp> speech
type= spoken how spoken
thought
direct= y n unspecified kind of speaking
*speaker 1162 </speaker> speech prefix
*stage type= 1163 entrance </stage> stage direction
exit
mixed
subhead </subhead> subheading, to be used
after <heading>
*supplied 1171 </supplied> string supplied by
reason= hung editor of e-text
*term type= 1186 elision </term> a word or phrase cited
expansion= as an object
*text 1189 </text> the entire work
ttdv type= argument </ttdv> unstructural subdivision
toreader of poem
epigraph
epilogue
dedication
dedicatory poem
epistle
preface
close
ttdv1 type= novel </ttdv1> largest division of
treatise prose text
dictionary
n= 1 number of division
ttdv2 type= chapt </ttdv2> 2nd largest division
alpha of prose text
n= 1 number of division
ttdv3 type= par </ttdv3> 3rd largest division
entry of prose text
n= 1 number of division
lemma= normalized headword
(dictionary entry)
ttdv4 type= line </ttdv4> 4th largest division of
sent prose text
lemma
phrase
n= 1 number of scene
*usg type= geo </usg> usage statement for
dictionary headword
*xref type= 31 </xref> cross-reference
5.2.2 COCOA FEATURE TAGS
Consider the beginning of the first folio King Lear, encoded with COCOA tags:
<genre play>These feature tags attach attributes to the text that follows them: the through line number of the book or of the play-text, the act, the scene, the classical scene, the font, the language, and the current speaker. These are obvious to anyone who understands the conventions of a printed text. Note how the two x-type tags <bkt> and <tt> enable us to distinguish play titles, speeches, speech prefixes, and stage directions from one another. Various types of text appear in any book, and one of the functions of a tagging system is to enable someone to retrieve words from only one such type, or several types in combination. The variable for the book-type is bkt. The variable for the text-type is tt. Each of these precedes the passage it modifies and then is turned off by the next occurrence of a <bkt> or a <tt> tag, sometimes directly thereafter. Note that no more than one <bkt> or <tt> tag can be used for one passage of text at a time. Once the tag <bkt text> appears, <tt> tags are then inserted before the text that they characterize. Eventually they are turned off by another <bkt> tag.
<page 283>
<speaker ->
283
<bkdiv3 1><f rl><tt title>THE TRAGEDIE OF
<bkdiv3 2>KING LEAR<f i>
<bkdiv3 3><tt act><lang l>A{ct}us Primus. <tt scene>Sc{oe}na Prima.<f r>
<lang e>
<playdiv1 act1>
<playdiv2 scene1>
<act 1><scene 1><classscene 1>
<bkdiv3 4><tt stage>Enter Kent, Glouce{{s}t}er, and Edmond.
<bkdiv3 5><tt spffix>Kent.<speaker Kent>
<bkdiv3 6><tl 1><tt speech><f 5bk>I<f r> Thought the King had more a{ff}e{ct}ed the
<bkdiv3 7><tl 2>Duke of <f i>Albany<f r>, then <f i>Cornwall<f r>.<speaker ->
<bkdiv3 8><tl 3><f i>Glou.<f r> It did alwayes {s}eeme {s}o to vs : But
Once you intend to use a feature tag, you must be careful to apply it consistently. Stage directions and speech prefixes, for example, have no speakers; hence <speaker ->, having the null or dummy value, may have to be used often. Here is a list of the main COCOA tags employed in RET editions.
FEATURE VARIABLE VALUES END TAG
act number act <act ->
book division bkdv <bkdv ->
book division 1 bkdv1 <bkdv1 ->
book division 2 bkdv2 <bkdv2 ->
book division 3 bkdv3 <bkdv3 ->
book division 4 bkdv4 <bkdv4 ->
book type bkt <bkt ->
addressee
author
catch
closing
colophon
datepub
dedication
doctitle
edition
epigram
errata
fol
heading
headingno
imprimatur
msnote
page
performance
placepub
price
printer
publisher
rtbot
rttop
sig
signed
tableofcontents
text
volno
xref
citation cit <cit ->
classical scene classcen <classscene ->
compositor change compshift <compshift ->
correction in errata correct <correct ->
error in errata list error <error ->
foliation gathering fol <fol ->
font f <f ->
bl (black letter)
bk (block letter)
i (italic)
r (roman)
l (lapidary)
sc (small capitals)
+ (superscript)
_ (subscript)
grammatical gender gend <gend ->
f feminine
m masculine
n neuter
genre genre <genre ->
graphic img <img ->
Hodnett25
handwriting change handshift none
morphol. inflection inflect <inflect ->
language lang <lang ->
e (English)
l (Latin)
f (French)
g (German)
gr (Greek)
i (Italian)
s (Spanish)
mode mode <mode ->
v
p
milestone ref. milestone <milestone ->
name name <name ->
page page <page ->
play division plydv <plydv ->
play division 1 plydv1 <plydv1 ->
play division 2 plydv2 <plydv2 ->
play division 3 plydv3 <plydv3 ->
play division 4 plydv4 <plydv4 ->
poem division pmdv <pmdv ->
poem division 1 pmdv1 <pmdv1 ->
poem division 2 pmdv2 <pmdv2 ->
poem division 3 pmdv3 <pmdv3 ->
poem division 4 pmdv4 <pmdv4 ->
quotation quotation <quotation ->
quoted author qauthor <qauthor ->
rhyme rhyme <rhyme ->
scenenumber scene <scene ->
script s <s ->
speaker speaker <speaker ->
grammat. role (word) syntact <syntact ->
term as object term <term -->
text division ttdv <ttdv ->
text division 1 ttdv1 <ttdv1 ->
text division 2 ttdv2 <ttdv2 ->
text division 3 ttdv3 <ttdv3 ->
text division 4 ttdv4 <ttdv4 ->
text type tt <tt ->
abstract
act
actor
addressee
closing
dedication
dedication poem
drampers (dramatic personae)
epigraph
epistle
heading
headingno
incipit
inspeech
nt (note)
nt:lmargin
nt:rmargin
nt:foot
nt:inline
nt:end
ntno (footnote number)
poemtitle
poemno
pmdv
quotation
role
scene
signed
speech
sppfx (speech prefix)
stage (stage direction)
subheading
text
title
Note that font information includes the type face rather than pica size in most texts. See Shakespeare's sonnets for an attempt to overcome this limitation in part. Note also that some of these COCOA tags could have been replaced by counters, special characters occurring nowhere else in the text and so available for use. In a corpus, these characters are better saved for use in the alphabet and diacritics.