Electronic Edition and Encoding (Introduction to SHAKE-SPEARES SONNETS 1609)

4. Electronic Edition and Encoding

This edition is based on The Folger Shakespeare Library's Aspley imprint (JolleyUttersonTiteLockerLampson copy), but it also has been checked against the Folger's Wright imprint, which at some point was cropped, resulting in the loss of running titles and even parts of lines of text. Facsimiles of the Bodleian Library's Malone 34 copy and the Huntington Library's Bridgewater copy (both Aspley imprints) were further consulted. No corrections are introduced or normalization attempted, but all known press variants are encoded in the text including the two question marks that are not impressed in that copy but that appear in all others, as well as the incorrect catchword at F3 and the incorrect number for Sonnet 116.

We are grateful to the Internet Shakespeare Editions, edited by Michael Best, for making available its images of pages in the facsimile of the Chalmers-Bridgewater Copy (Aspley imprint) in the Huntington Library published in London by Lovell Reeve in 1862.

In accordance with general RET guidelines, we keep closely to the original book copy or manuscript. We retain the original spelling, including many archaic letters, contracted or curtailed forms, marks of abbreviation, and ligatured typeface. That is, this edition should be regarded as a kind of type facsimile. We record both the bibliographical text (title-page, table of contents, running titles, foliation and pagination, catchwords, signatures, and colophons) and the contents themselves, what editors normally call the text. Insofar as diplomatic electronic editions are encoded to identify and classify textual objects on the page, however, we also unavoidably interpret the texts. For this reason, this edition includes images of the pages of an early published facsimile (the British Library copy reproduced by Noel Douglas and printed in Bradford and London by Percy Lund in 1926).

Each RET edition belongs to a corpus of English literature in the Renaissance. Encoding follows a basic set of guidelines for representing the Renaissance character set. No claim is made for the superiority of this method over others, except that it enables editors to encode all characters, whether available in ISO entity references or not.

In addition to the regular letters of the alphabet, the following characters appear in this electronic type facsimile:

æ or {ae} digraph

{ct} c/t ligature

{ff} f/f ligature

{ffi} f/f/i ligature

{fi} f/i ligature

{fl} f/l ligature

{s} longs

{{s}h} longs/h ligature

{{s}i} longs/i ligature

{{s}l} longs/l ligature

{{s}{s}} longs/longs ligature

{{s}t} longs/t ligature

{w} w made with two v's

{ } blank space

|_e| e-macron, abbreviated "e+m/n"

|_o| e-macron, abbreviated "e+m/n"

Single braces enclose a code for a special character. Single vertical bars delimit an abbreviation, in this instance two common brevigraphs. Anything within either single braces or single vertical bars is one piece of type.

For tagging of text features, however, this edition does not follow any one standard tagset. Instead, we offer three textually identical versions of the 1609 quarto, one minimally encoded in HTML for browsing, the second heavily encoded in non-HTML SGML (in which the Guidelines of the Text Encoding Initiative by Michael Sperberg-McQueen and Lou Burnard have been followed when feasible), and the third tagged in TACT COCOA markup for text-analysis purposes.

Unlike the oldspelling reference texts of the most recent MLA volumes of the New Variorum Edition of Shakespeare, these versions include all significant bibliographic and linguistic features of the text and decline to introduce emendations, even of probable errors. In his 1977 edition to As You Like It, Richard Knowles, general editor of the New Variorum, described the policy that the latest New Variorum editions have followed:

This edition differs somewhat from earlier volumes of the New Variorum Shakespeare. The text is not a type facsimile, but a modified diplomatic reprint of the First Folio text which ignores its significant typographical irregularities, corrects its obvious typographical errors, but retains its lineation. All significant departures from F1 are duly recorded. [emphasis added] (ix)

Under this editorial policy, many typographical features of the original text are ignored or silently regularized:

The reprint does not reproduce typographical features such as the long s, ligatures, display and swash letters, and ornaments; abbreviations printed as one letter above another are reproduced as two consecutive letters, the second one superscript. Minor typographical blemishes such as irregular spacing, printing spacetypes, and wrong font, damaged, turned transposed, misprinted, or clearly erroneous or missing letters or punctuation marks have been corrected, usually silently. If, however, the anomaly is likely to have any bibliographical significance, its correction is recorded in the appendix. Where the error is not clearly typographical, or where the correction is not an obvious one, the text has been left unaltered and various emendations have been recorded in the textual notes. ... In general, the attempt has been made to omit and ignore all insignificant typographical peculiarities, but to retain or at least record any accidental details of possible textual significance. (xixii)

The RET electronic texts thus encode some features that the New Variorum editorial policy does not. The volumes of this series seek to provide scholars with texts that as closely as possible reproduce in electronic form the earliest printed editions of Shakespeare's works (the quartos and the First Folio), operating under the assumption that both a work's bibliographic and linguistic codes contribute to its cultural meaning (cf. D. C. Greetham in McGann xviiixix). As Jerome McGann points out regarding orthographic variations in the 1609 quarto of the Sonnets and A Lover's Complaint, "As our scholarly knowledge increases ... we often discover that texts which had previously seemed corrupt are not so at all; that it is we (or our ignorance) who are at fault" (99). Not to reproduce bibliographic features, like the long s and ligatures, means that scholars, especially textual scholars, will not have information available that may be of use. For example, Randall McLeod has argued on grounds that include bibliographic ones that emending "wi{{s}h}" to "with" (where {{s}h} represents a long-s ligatured with h) in the first line of sonnet 111 ("Unemending Shakespeare Sonnet 111") and "{{s}t}ill" to "skill" (where {{s}t} represents a long-s ligatured with t) in line 12 of sonnet 106 (Clod "Information Upon Information") are not justified. Concerning the former, McLeod writes,

We have been wrong to talk of the reading of Q1 as "wish," for it is wi{{s}h}--to give it back its ligatured face and to countenance it for the first time. It is a threesort word, with two of its letters, s and h, tied, printed as a single type, sh. The "with," however, is a foursort word with no ligatures. The letters t + h never form a ligature in this fount (any that I have seen), whereas the letters s + h are never set without ligation in this text, though they can be occasionally found untied in contemporary English texts. It is strange but true that this simplest of physical facts, ligation, completely undercuts the only rationale editors have ever used to justify their emendations in Sonnet 111, that "wish"--as they put it--is an obvious typo. The editors have committed a blunder equivalent to saying "4 = 3." (8283)

This edition thus attempts to provide as much bibliographic and linguistic information about the 1609 quarto as possible.

HTML Tags

A layer of line-break ( ), paragraph (), centering (<center>), and horizontal-rule (<hr>) tags over the text formats it for Web reading. Type-styling tags for italic () and superscript () characters render the look of the original generally; more detailed font information, as well as the existence of press-variants, appears in the <f> and <variant> tags found in the base document but are ignored by HTML browsers. Hung words are duplicated at the end of the line in which they belong and are there enclosed by a <supplied> tag, also not displayed by browsers; the original hung words appear in the text where they occur but are placed within HTML comment delimiters so that they do not show. Catchwords, signatures, and running titles are retained and identified by simple bold-face labels () preceding them within square brackets. Line numbers and rhyme scheme for each sonnet also are added within square brackets in boldface. All tags in the HTML text are listed in the TEI header.

One more boldface label appears in the HTML text within square brackets: bibliographical information about the page, the gathering, the signature, the forme, and the supposed compositor. This label is linked to a digitized image of the page in question.

SGML Encoding

All tags in the SGML version of the text are visible in the HTML document because the less-than and greater-than symbols, < and >, are represented literally by entity references and thus are interpreted as plain text rather than as tags by Web browsers. Note that this SGML-encoded text has not been parsed. Anyone who wishes to use an enriched DTD, unlimited by HTML, will inevitably want to modify these tags.

The RET SGML tagset does not exactly correspond to what appears in the TEI Document-Type Definition (DTD), either in the choice of tags or in the hierarchical structures that they have. However, asterisked tags are taken from the TEI Guidelines, structural tags for the 1609 quarto include a basic model adopted by TEI, and this edition bears a TEI header. The authoritative guide for SGML is by Charles Goldfarb.

The SGML tag-set for the 1609 volume includes the following tags.

        Variable         Attributes      Attribute Values        Closing Tag 
        *<app>                                                   </app> [apparatus] 
        *<back>                                                  </back> 
        <bkdv1           gathering=      "1..11"                 </bkdv1> [book-division]
                         t=              "quarto" 
                         in=             "4s"> 
        <bkdv2           page=           "1..80"
        </bkdv2>         sig=            "A1r..L2v" 
                         side=           "inner | outer"
                         forme=          "1..11"> 
        <bkdv            t=              "ln"                    </bkdv>
                         n=              "1..2812"> 
        *<body>                                                  </body> 
        <bookseller>                                             </bookseller> 
        <closing>                                                </closing> 
        <compshift       name=           "A | B | A-like |       </compshift> [omitted] 
                                         B-like | A and/or B |   [compositor-shift] 
                                         unknown"> 
        *<docEdition>                                            </docEdition> 
        *<docTitle>                                              </docTitle> 
        <f               t=              "bk [block] |           </f> [omitted] 
                                         c [capitals] | 
                                         d [double]| 
                                         i [italic] | 
                                         l [lapidary] | 
                                         p [pica] | 
                                         r [roman] | 
                                         s [small]| 
                                         SC [superscript] | 
                                         t [titling] | 
                                         2 [2-line]"> 
        *<front>                                                 </front> 
        *<fw             t=              "catch | rttop |        </fw> [forme-work]
                                         sig"> ... 
        <gender          t=              "m">                    </gender>[omitted] 
        *<group>                                                 </group> 
        <heading>                                                </heading>
        <headingno>                                              </headingno> 
        *<lang           t=              "English">              </lang> [omitted] 
        <mode            t=              "p | v">                [prose or verse] 
        <pmdv1           t=              "complaint |            </pmdv1> [poem-division]
                                         sonnets" 
                         datecomp=       ""> 
        <pmdv2           t=              "sonnet| stanza"        </pmdv2> 
                         n=              "1..154" 
                         rhyme=          "ababbcc | etc."> 
        <pmdv3           t=              "ln"                    </pmdv3> 
                         n=              "1..327"> 
        <printer>                                                </printer> 
        *<publisher>                                             </publisher> 
        *<pubPlace>                                              </pubPlace> 
        *<rdg            source=         ""                      </rdg> [reading] 
                         status=         "unclear">
        <RETbook         author=         ""                      </RETbook> 
                         title= ""
                         date= "         "> 
        *<signed>                                                </signed> 
        *<text>                                                   </text>


Table 2: SGML Tag-set

SGML tags have at least three main features. They include both a starting tag and an ending tag and characterize the text that they surround. Only five of the above tags (<compshift>, <f>, <gender>, and <language>) lack ending tags. They are understood to exist before the next appearance of the same tag. Second, SGML tags may have multiple attributes, each with its own value. (Attribute-value pairs are somewhat like COCOA variable-value tags.) Third, SGML tags may be nested into hierarchical structures. For example, press variants in text (but not layout) are represented by a nesting of a <rdg> (reading) tag within an <app> (apparatus) tag, as the following example shows.

<bkdv t="ln" n="440"><pmdv3 t="ln" n="6">Intend a zelous pilgrimage to: <app> thee,
<rdg source="BL-Bright"> thee;</rdg>
</app>

The reading of this edition, "thee,", appears immediately after the opening <app> tag, which indicates that a press variant follows within the <rdg> tag. The <source> attribute of the opening <rdg> tag indicates where the variant may be found. The closing tag, </rdg>, clearly defines where the intrusive reading ends. Other reading tags can follow (but in this edition are not needed). The closing tag </app> defines where the apparatus ends.

The general encoding structure is as follows:

<text> 
        <front> 
        [titlepage and dedication] 
        </front>
        <body> 
                <group> 
                        <front> 
                        [heading for sonnets]
                        </front> 
                        <body> 
                        [sonnets] 
                        </body> 
                        <back>
                        [closing for sonnets] 
                        </back> 
                        <front> 
                        [heading for Complaint] 
                        </front> 
                        <body> 
                        [Complaint]
                        </body> 
                        <back> 
                        [closing for Complaint] 
                        </back>
                </group> 
                </body> 
                <back> 
                </back> 
</text>

The 1609 quarto is thus structured as a group of two works, each with front matter (a heading), body (a literary work), and back matter (a "Finis"), within a larger text that has a titlepage and dedication as its front matter, the group of two works as its body, and an empty slot for back matter. Two texts are thus nested within a larger text.

Two other RET structural encodings resemble the divisional tags used by TEI (<div0>, <div1>, etc.). The first exposes the bibliographical structure of the book itself. This structure is not hierarchically subordinate to <front>, <body>, and <back> tags. Pages within gatherings are encoded, respectively, by nested <bkdv2> and <bkdv1> tags. The bibliographical lineation also runs through the book, not nested within either gatherings or pages. Thus the tag for a book-line is the unnumbered <bkdv t="ln">. The second structure concerns the poetry itself. Verse-lines within a sonnet, and sonnets within a sonnet sequence or stanzas within a verse complaint, are encoded, respectively, by <pmdv3>, <pmdv2>, and <pmdv1> tags. The <pmdv> structure nests nicely within each of the two <body> sections under the <group> tag, but the bibliographical structure cuts across or overlaps both the basic TEI model and the <pmdv> structures. Thus there are two lineations, one for the book, and another for each poem. Because SGML does not permit the simultaneous use of overlapping structures, the RET document-type definition, which defines each of the tags in the edition, does not specify their hierarchies. However, they are implicit in the choice of numbered tags.

The recommended model for reference citations is as follows. The standard citation for the poetry is the value of the <stitle> tag (the book name), the value of the <pmdv1> tag (the name of the individual work, either the sonnets or the Complaint), the value of the <pmdv2> tag (the sonnet or stanza number), and the value of the <pmdv3> tag (the sonnet-stanza line-number). The standard citation for the physical book in this file is the value of the <stitle> tag (the book name), the value of the <bkdv1> tag (the gathering), the value of the <bkdv2 sig=""> tag and attribute (the signature), and the value of the <bkdv t="ln"> tag (the through-book line-number).

COCOA Encoding

COCOA tagging takes its name from the first widely available package for literary concordances produced by Oxford Computing Services in the 1970s.¹ COCOA-style tags have several implementations and no fixed set of rules. TACT, for example, can employ them but by relaxing the length of variable names in tags in effect makes its own version of COCOA tagging available. Its simple tag-grammar has three parts:

delimiter characters (the diamond brackets, or some other symbols not found in the text),
a variable or type name, and
a value or token name.

The variable names some general class of feature or attribute of the text that follows it, e.g., "author." The value gives the particular kind of this general class or category, e.g., "Edmund Spenser." The variable or type may take any form, but it always has the same spelling once employed. For example, other tags could be "title," "datepub," and "publisher," but these could not change into synonyms like "heading," "pubdate," and "publishers" and still remain the same tag. The value or token following it, "William Shakespeare," may change into other values such as "John Donne."

All COCOA tags apply to the text from wherever they occur and hold until another tag with the same variable appears. That is, every word in the text following "<author William Shakespeare>" would be tagged as being written by Shakespeare until a subsequent "<author>" tag occurred. The span of such COCOA tags, then, is always indefinite.

There are a small of number of reserved characters in the COCOA encoding that do not occur in the quarto and that have a special meaning:

< > : that is, diamond brackets, which serve as tag delimiters;
[[ ]] : that is, double square brackets, which serve as delimiters for ignored comments;
{ } : that is, single brace brackets, which serve as delimiters for special characters and ligatures;
| | : that is, double vertical bars, which serve as parts of hungwords moved to their proper line and position; and
% : that is, percent-sign, which separates two different words elided together (to enable one to retrieve them separately).

RET employs the following tags in this edition. The tags are listed in alphabetical order.

 
       Variable        Type of Tag     Value           Type of Value
                       (Explanation)                   (Explanation)

        <author         author's name   William Shakespeare>, etc.
        <bkt            book type       addressee>
                                        bookseller>
                                        catch>          catchword     
                                        closing>
                                        datepub>
                                        dedication>
                                        edition>
                                        placepub>
                                        printer>
                                        publisher>
                                        rttop>          running title
                                        sig>            bibliographical signature
                                        signed>         person's signature 
                                        title>          title of book
                                        ->              null
        <bkdv           minor section   titlepage>
                                        dedication>
                                        ->              null
        <bkdv1          gathering       gathering1>, etc.
        <bkdv2          forme           forme1>, etc.
        <bkdv3          side of forme   inner>
                                        outer> 
        <bkdv4          signature       sigA1r>, etc.
        <bkl            line-no.        1>, etc. 
        <bookseller     bookseller's 
                        name            W. Aspley>, etc. 
        <compshift      compositor 
                        shift           A>, etc.
        <datecomp       date of 
                        composition     1590-1609>, etc. 
        <datepub        date of 
                        publication     1609>, etc.
        <edate          date of 
                        publication
                        of electronic 
                        text            1997>, etc.
        <eeditor        editors of 
                        electronic 
                        text            H. M. Cook and I. Lancashire>, etc.
        <eplacepub      place of 
                        publication of 
                        electronic 
                        text            Toronto>, etc. 
        <f              font
                                        bk>             block
                                        c>              capitals
                                        d>              double
                                        i>              italic 
                                        l>              lapidary
                                        p>              pica 
                                        r>              roman
                                        s>              small
                                        SC>             superscript
                                        t>              titling
                                        2>              2line
        <forme          forme           inner>
                                        outer>
        <gender         author's 
                        gender          m>
        <lang           language         lang> 
        <library        holder of 
                        source copy     Folger Shakespeare Library>
        <mode           prose or 
                        verse           p>
                                        v>
        <page           page            1>, etc. 
        <period         era of source 
                        text            Renaissance>, etc. 
        <placepub       place of 
                        publication     London>, etc.
        <pmdv1          poem            sonnets>
                                        complaint>
        <pmdv2          poem section    sonnet1>, etc.
                                        stanza1>, etc.
        <pmdv3          poem line no.   1>, etc.
        <printer        printer's name  W. Aspley>, etc.
        <publisher      publisher's 
                        name            T. T.>, etc. 
        <rhyme          rhyme scheme    ababcdcdefefgg>, etc.
        <shelfmark      catalogue number 
                        for source copy Folger STC 22353>, etc.
        <sig            signature       A1r>, etc.
        <STC            Short-Title 
                        Catalogue no.   22353>, etc.
        <stitle         short-title     WS.Sonnets>, etc.
        <tt             texttype       heading>
                                        headingno>
                                        text>
                                        title>


Table 1: COCOA Tag-set

These include four kinds of tags. Global tags set characteristics for the entire book. Global tags do not change through the course of the work. For this reason they are only useful for identifying the electronic file and for retrieving information from it when combined with other works with different global tags. The following global tags apply to the 1609 quarto:

        <author William Shakespeare> 
        <title Shakespeares Sonnets> 
        <stitle WS.Sonnets> 
        <placepub London> 
        <printer G. Eld> 
        <publisher T. T.> 
        <datepub 1609> 
        <datecomp ca. 15901609> 
        <lang e> 
        <STC 22353, 22353a> 
        <library Folger Shakespeare Library> 
        <shelfmark Folger STC 22353, Folger STC 22353a> 
        <eeditor Hardy M. Cook and Ian Lancashire> 
        <eplacepub CCH, University of Toronto> 
        <eseries RET3> 
        <edate 1997> 
        <gender m> 
        <period Renaissance>

Feature tags identify units in the book as being of a certain kind and often as having some attribute. The <mode> and <tt> tags exemplify this type. Structural tags classify passages of text according to whether they repeat themselves, or re-cycle, within the work, especially nested inside some other unit of text. The numbered <bkdv> and <pmdv> tags function in this way. Last, word-level tags mark a string with one of its attributes, such as its font.

Every electronic edition should have a default or recommended model or template for reference citations. This is the signature, followed by the identifier for the responsible compositor (in parentheses), a colon, the sonnet or the stanza number, and the verse line number. This appears in the TACT .MKS file as the reference template: "$bkdv4 ($compshift): $pmdv2. $pmdv3."

Notes

¹ This section is adapted from "Bilingual Dictionaries in an English Renaissance Knowledge Base" by Ian Lancashire, in T. R. Wooldridge, ed., Historical Dictionary Databases, CCHWP 2 (Toronto: Centre for Computing in the Humanities, 1992), pp. 69-88.

æ or {ae}	digraph
{ct}	c/t ligature
{ff}	f/f ligature
{ffi}	f/f/i ligature
{fi}	f/i ligature
{fl}	f/l ligature
{s}	longs
{{s}h}	longs/h ligature
{{s}i}	longs/i ligature
{{s}l}	longs/l ligature
{{s}{s}}	longs/longs ligature
{{s}t}	longs/t ligature
{w}	w made with two v's
{ }	blank space
\|_e\|	e-macron, abbreviated "e+m/n"
\|_o\|	e-macron, abbreviated "e+m/n"