COCOA has to register the features and the structural relations of any unit implicitly with divisional or x-type tags. In SGML, any feature tag can contain information about its relationship to any other tag.
A TEI-conformant scheme for the folio text of King Lear might look as follows. Note that font is declared either in the "rendition" attribute of a tag or in the <em> ("emphasis") tag and that special characters like long-s are declared with entities.
<text> <group> ... <text> <front> ... </front> <body> <div0 type="play-tragedy"> <lb><fw type="page" rend="r">283</fw></lb> &ornament; <lb><head rend="rl">THE TRAGEDIE OF</lb> <lb>KING LEAR.</lb></head> &rule; <lb><div1 type="act" n="1" rend="i"><foreign lang="l">A&ctlig;us Primus. <div2 type="scene" n="1" rend="i">Scœna Prima.</foreign></lb> <lb<fw type="col" n="1"><stage type="entrance" rend="i">Enter Kent, Glouce&longstlig;er, and Edmond.</stage></lb> <lb><sp rend="r"><speaker who="Kent" rend="i">Kent.</speaker></lb> <lb>&blockI; Thought the King had more affe&ctlig;ed the</lb> <lb>Duke of <em rend="i">Albany</em>, then<em rend="i"> Cornwall.</em> </lb></sp> <p><lb><sp rend="r"><speaker who="Gloucester" rend="i">Glou.</speaker> It did alwayes &longs;eeme &longs;o to vs : But</lb> ... </sp> ... </div2> ... </div1> ... </div0> </body> <back> ... </back> </text> ... </group> </text>The DTD for this scheme might specify a hierarchy that is eight levels deep: <sp> (speeches) can only occur within <div2> (scenes), <div2> only within <div1> (acts), <div1> only within <div0> (plays), <div0> only within <body> (the main part of the text, as distinguished from <front> [front matter or preliminaries] and <back> [back matter]), <body> only within <text>, <text> only within <group>, and then <group> only within a higher <text>. Actually a ninth level may be said to occur too: the <speaker> tag only occurs inside the <sp> (speech) tag. The <group> tag is used because Lear is part of a collection of play-texts.
Each level in the hierarchy can have a number of attributes. For example, each tag can be numbered (with the n attribute). Most important, the lower levels of the hierarchy, sometimes called the children, inherit the characteristics of the levels above them, sometimes called the parents. Text-retrieval software would thus be able to set boundaries during word-selection--for example, asking for all words spoken by Cordelia only in Act 3. SGML software might thus tell you, automatically, for any spoken word cited from a play, the hierarchical structure of elements characterizing that word (e.g., play.act.scene.speaker.speech.line). The first two functions are important to searching within large distributed libraries of online texts, but less so to individual scholarly workstations. The range of correlations that can be made among words and the features of all words in the text is greater in SGML than in COCOA.
There are, however, some relatively free-floating TEI tags, such as <lb> (line-break) and <fw> (forme-work), which belong under no structural tag but are found everywhere; and these create difficulties for the tagging of early texts. The <fw> tag (II, 995-96), which handles bibliographical information in a text, is a flat tag, without hierarchical relations and having only three attributes, type (e.g., header, footer, pagination, signature, and catchword) and rend (e.g., roman, italic, etc.). This is unsatisfactory for the purposes of transcribing the bibliographical structure of Renaissance printed books and manuscripts. While TEI guidelines allow for multiple hierarchies -- "in any kind of text, the encoder may wish to record the physical structure of the volume, page, column, and line, as well as the formal or logical structure of chapters and paragraphs or acts and scenes, etc." (751) -- the guidelines argue that they should be logical, content-oriented ones, not physical ones.
RET editions are tagged to suggest simultaneous structures but do not specify exactly how they are to be handled within SGML software. The nested structures generally anticipated for RET texts are the following seven:
<ttdv1 type="dictionary"> <ttdv2 type="alpha"> <ttdv3 type="entry"> <ttdv4 type="lemma"> ... </ttdv4> <ttdv4 type="phrase"> ... </ttdv4> </ttdv3> </ttdv2> </ttdv1>
<cit> <quot> ...</quot> <bibl> <author> ... </author> <title> ...</title> <pubDate>...</pubDate> </bibl> </cit>.Bibliographical elements are added or not as they appear in the text.
<titlePage> <docTitle> ... </docTitle> <docImpr> <pubPlace> ... </pubPlace> <publish> ... </publish> <docDate> ... </docDate> </docImpr> <imprimat> ... </imprimat> </titlePage>.Other bibliographical elements are used as needed.
All other tags may occur anywhere in the text.
Encoded in this way, the opening of King Lear is as follows. Note that font is declared in a "font" tag and that special characters like long-s are declared within braces.
<bkdv1 type="gathering" format="folio" n="65" in="6s">.... <bkdv2 type="page" n="283" sig="qq2r" side="outer" forme="2"> <bkdv3 type="col" n="0"> <bkdv4 type="line" n="1"><f type="r"><fw type="page">283</fw> {ornament} <bkdv4 type="line" n="2"><head><f type="rl">THE TRAGEDIE OF <bkdv4 type="line" n="3">KING LEAR</head> {rule} <bkdv4 type="line" n="4"><plydv1 type="act" n="1"><f type="i"><lang type="l"><head> A{ct}us Primus. <plydv2 type="scene" n="1"> Sc{oe}na Prima.</head><f type="r"></lang> {rule} <bkdv3 type="col" n="1"> <bkdv4 type="line" n="5"><stage type="entrance">Enter Kent, Glouce{{s}t}er, and Edmond.</stage></bkdv4> <bkdv4 type="line" n="6"><plydv3 type="speech" n="1"><speaker who="Kent"> <f type="i">Kent.</speaker></bkdv4> <bkdv4 type="line" n="7"><f type="5bk">I<f type="r"> Thought the King had more a{ff}e{ct}ed the</bkdv4> <bkdv4 type="line" n="8">Duke of <f type="i">Albany<f type="r">,then <f type="i">Cornwall.</plydv3></bkdv4> <bkdv4 type="line" n="9"><plydv3 type="speech" n="2"><speaker who="Gloucester"> <p><f type="i">Glou.<f type="r"> </speaker> It did alwayes {s}eeme {s}o to vs : But</bkdv4> ... </plydv3> </bkdv3> </bkdv2> .... </plydv2> .... </bkdv1>.... </plydv1> ...The word a{ff}e{ct}ted in Kent's first speech, for example, is tagged
The page number 283, in contrast, has no place in the play structure. It is just on book-line 1 (bkdv4) of column 0 (bkdv3) on page 283 (bkdv2) of gathering 65 (bkdv1). (Column 0 is a page with no columns.) Headings, stage directions, and speech prefixes belong in both book and play structures, however, and are tagged for being what they are so that they cannot be confused with speeches. Every piece of text has precise coordinates in at least one structure, for reference purposes.
The book-structure codes identify elements in the making of the book itself: the line of type (bkdv4), the column in which it appears (bkdv3), the page and forme in which it is set (bkdv2), and the gathering in which it is bundled (bkdv1). These four are truly a nested hierarchy. Folio gatherings consist of sheets of two pages on each side of the paper (printed in what are called outer and inner formes, one for each side of each sheet), and lines always fall within columns, and columns within pages.
Note that SGML, unfortunately, cannot represent the way in which pages were originally nested in an inner or an outer forme (i.e., side of a forme), and in which these sides are nested within a single forme. For example, the First Folio consists largely of folio sheets gathered in sixes (2( in 6s). Each sheet held two pages on each side. Three such sheets were placed on top of one another, to produce six sides, and then were all folded, to produce the twelve pages in the gathering. Assuming that that the printer imposes the formes in the order of the signatures (the one containing page 1 or signature A1r first), and that he always selects to print the outer forme before the inner forme, the book as structured during its making can be represented as follows:
Page Right/Left on Forme Forme Side: Forme: No. Side: Signature/Page Inner/Outer Folio 1 A1r outer 1 12 A6v outer 1 2 A1v inner 1 11 A6r inner 1 3 A2r outer 2 10 A5v outer 2 4 A2v inner 2 9 A5r inner 2 5 A3r outer 3 8 A4v outer 3 6 A3v inner 3 7 A4r inner 3The electronic text of course adopts the order of the sheets as they are gathered and folded, not as they were mounted on the press originally. Only if we re-sorted the electronic text by forme, then by side, and finally by left/right position on side, could we recognize this order within SGML; and then, of course, we could not represent the order of the pages as we read them.
With TACT, this inheritance is managed by creating citation references that string these tags together in sequences such as $bkt $plydv1 $plydv2 $plydv3.
Note that COCOA offers us no way to state explicitly and unambiguously the hierarchical relationships ("parent-child") among tags or to manage cross-references among tags.