2. GENERAL RULES FOR TRANSCRIPTION
RET editions keep closely to the original physical book or manuscript. They retain the original spelling, including sometimes many archaic letters, contracted or curtailed forms, marks of abbreviation, and ligatured typeface. The editor's specific purposes dictate the degree to which the original character set is maintained. All RET editions record both the bibliographical text (title-page, table of contents, running titles, foliation and pagination, catchwords, signatures, and colophons) and the textual contents. Occasionally RET may produce critical editions, in which variant readings are recorded.
Each RET edition belongs to one corpus, English literature in the Renaissance. For this reason, the encoding of all texts follows guidelines. No claim is made for the superiority of these guidelines to other methods, except that they are informed by the characteristics of Renaissance texts. RET encodes all texts in SGML, the only international ISO standard for the exchange of electronic documents, and some also in the COCOA markup, employed in traditional text-analysis software and described in the TACT manual. The encodings differ in syntax, but the texts are identical in all other respects, and the information found only in the tags is similar. The Guidelines of the Text Encoding Initiative, which employ SGML, are an exceptionally useful tool for anyone encoding literary texts. SGML-encoded RET texts, for example, always have a TEI header and adopt many TEI entities and elements.
The principles of RET editions are as follows:
- The entire book or manuscript is always transcribed.
- Encoding is governed by the need for economy, readability, accuracy, and faithfulness to the conventions of writing in the Renaissance.
- Each electronic transcription generally follows only one physical copy of a book or manuscripts. Apparent errors are left as is, with suggested emendations being made wthin tags or ignore-brackets.
- All texts employ one character-encoding scheme and a common set of reference tags. Without this kind of uniformity, the library of texts could not be analyzed as a single corpus. Different characters in the text normally have their own unique representation in the electronic transcription. Where different characters have been conflated into one character (e.g., different forms of s or r), this practice is stated at the beginning of each electronic text. Otherwise, editions retain the spelling, capitalization, brevigraphs, special characters, and font of the original.
- Any character or group of characters that does not appear in lower ASCII is given a character code by which it is identified. Because different printers and scribes render the same character differently (e.g., roman {_a}, character representations in RET concern generic entities. For example, the italic ampersand takes many forms, but it is coded as |&| in all texts.
- Abbreviations are all tagged as such and are not expanded, except within tags or ignore-brackets. Damaged, interlineated, and canceled text is all tagged as such. Any changes to the text are managed either within SGML tags or COCOA ignore-brackets.
- Spacing between words is irregular in Renaissance texts. Word-separator spaces are normally rendered as single, no matter how small or large the physical gap between the words or between them and marks of punctuation. Words inappropriately joined are kept joined, but in SGML a tag records the apparent error, and in COCOA the character % separates the two words. Elisons are handled similarly, by a tag in SGML, and by the character # in COCOA (by interpreting % and # as unretained diacritics, one saves the original text by treating them as word-separators, one distinguishes the merged words for analytic purposes.
- The layout of each book is kept, including word-separation, lineation, indentation, paragraphing, and pagination, but this is managed by tagging rather than by trying to replicate the visual look of the text on screen.
- Graphics--ornaments, woodcuts, engravings, lines, illustrations, leaf-designs, etc.--are noted by tags or within ignore-brackets.
- All text is entered normally at the left margin nearest to where it appears. Footnotes are tagged and entered after the last line of text on a page. Marginalia are tagged and entered at the start or at the end of the nearest line of the main text, depending on whether or not marginal notes appear to the left or to the right. They may be keyed to letters or numbers in the text by means of comments or tags if the text itself explicitly keys a note to the text.
- Blank space is usually ignored. The electronic transcription of an empty page includes essential page tags but has no blank spaces. All spaces at the start or the end of a line are ignored except in verse, where indented lines in stanzas are indicated by tagging.