5.4 Word-level Tags

The final group of tags applies to words individually. Although RET editions may be seldom tagged at the word-level, such analytic texts can prove useful for corpus linguistics, the study of languages empirically by means of electronic text databases.

5.4.1 SGML Word-level Tags

Variable-Attribute      Attribute Values        End Tag

*form type=             simple                  </form>      
                        lemma
                        variant
                        compound
                        derivative 
                        inflected
                        phrase
   category=            colour
                        tool, etc.
   gender=              f m n
   inflect=             poss, etc.
   modernsp=
   originsp=
   pos=                 noun, det, etc.
   syntact=             subj, pred, etc.

5.4.2 COCOA Word-level Tags

Some of these are employed by TACT.
Variable         Sample Value         Description

raw            <raw ->         original spelling of the word in the text
lemma          <lemma ->       dictionary word-form or headword
pos            <pos ->         part of speech
modern         <modern ->      normalized or modernized spelling 
concept        <concept ->     member of class
gend           <gend ->        gender
inflect        <inflect ->     morphological inflection
syntact        <syntact ->     grammatical role
Not all texts in this corpus use everything in this tagging scheme, but whenever a text departs from this default, an explanation is given.