4.2 COMPUTER REPRESENTATION

4.2 Computer Representation

The basic RET character-set consists of most letter-forms and special characters collected from Anthony G. Petti's English Literary Hands from Chaucer to Dryden (1977), the examples of secretary hands edited by Giles Dawson and Laetitia Yeandle, synopses of 16th-century fonts cast from materials at Oxford (in Philip Gaskell's A New Introduction to Bibliography, fig. 20), and the first plate in Moxon's Exercises, which depicts the early compositor's upper- and lower- case. I have also added representative brevigraphs found in the Hengwrt manuscript of the Canterbury Tales.

Ideally, each character in an early text should have a unique screen representation that looks like the original. Unfortunately, ISO 646 IRV, the characters of lower ASCII (except that $ is replaced by the currency sign in ISO 646 IRV; see van Herwijnen, p. 262) offers only 97 visible characters, which may be assumed to be available in most computers.

                     {space} {tab} {Enter/Return} 
                     ABCDEFGHIJKLMNOPQRSTUVWXYZ
                     abcdefghijklmnopqrstuvwxyz
                     0123456789 
                     . , : ; ! ? - /  \ "
                     ^ _ `   '  ~ 
                     # $ % & * + @  =
                     [ ]  { } ( )    < >

Where these characters exist in the Renaissance, RET editions transcribe them simply as such, except for (1) five boundary characters (angle brackets < >, braces { }, and vertical bar |), (2) six characters that I have not found in Renaissance texts (the double quotation mark ", the underline by itself _, the number sign or hash #, the dollar sign $, the percent sign %, and the at-sign @,), and (3) three characters with multiple uses in the period (opening single quotation mark and grave ` accent, closing single quotation mark and acute accent ', and circumflex accent and caret ^).

The five boundary characters must be used to delimit RET tags but fortunately occur only infrequently in Renaissance texts, where special character codes can be used instead. The six characters not found in Renaissance texts are an encoder's windfall; they may function as metacharacters (e.g., the double quotation mark. indicating an umlaut accent). The multiple-use characters must be disambiguated by special character codes, which have to provide a means of representing the dozens of forms not found in ASCII.

Where no equivalent ASCII character exists for a Renaissance, a code is devised, using ASCII characters and special delimiters employed to surround and distinguish codes from the individual characters of which the code is made. RET uses two of the above sets of delimiters for this purpose: braces and vertical bars. Single braces enclose codes for joined letters or ligatures, and non-ASCII characters. Vertical bars enclose abbreviations, that is, characters that stand for some sequence of letter-numbers.

Character representations employed in both SGML and COCOA encodings of RET texts are the same. Ideally RET would use ISO-defined entity references for the complete character set. Entities are unique strings of common ASCII characters, normally with an ampersand prefix & and a semi-colon ; suffix, that stand for and, in computer-mediated text exchange, replace characters not in ASCII, such as accented letters, rarely-used characters such as thorn, and symbols such as the paragraphus, the leaf, and the dagger. For example, the ampersand itself has the entity reference &. Normally a text of ASCII characters and entities exists for exchange purposes, such as distribution over a network. Once someone wants to use this text off-line, a local system converts the entity references in it into its own character codes for the purposes of display. Users of exchange texts are not intended to read entities directly because they often obscure the words in which they appear.

Unfortunately, well under half of the characters and abbreviated forms in Renaissance texts now have ISO-defined entity references, and most of these have no local representation in any user's computer system. If local software existed that could display these characters for users of RET texts, or if entity references were inherently the most effective way of representing and displaying these characters, it would make sense to develop entity references for them. However, no software that I know of can display all these special characters -- although Peter Robinson's Collate comes closest to doing so -- and entity references make words unreadable. An alternate system of character representation is needed to permit users to be able to read the electronic texts as easily as they might read the original books and manuscripts.