Appendix 14: Type-Token and Letter-Statistics


Frequency  Observed Freq.  Words in    Types  Tokens   % of     % of   % of word
  Rank         of Rank    Frequency    Total   Total   Types   Tokens   in freq.

     1          2746        2746       2746    2746    64.34    13.28    13.28
     2           564        1128       3310    3874    77.55    18.74     5.46
     3           255         765       3565    4639    83.53    22.44     3.70
     4           165         660       3730    5299    87.39    25.63     3.19
     5           103         515       3833    5814    89.81    28.12     2.49
     6            58         348       3891    6162    91.17    29.81     1.68
     7            38         266       3929    6428    92.06    31.09     1.29
     8            41         328       3970    6756    93.02    32.68     1.59
     9            24         216       3994    6972    93.58    33.73     1.04
    10            28         280       4022    7252    94.24    35.08     1.35
    11            19         209       4041    7461    94.68    36.09     1.01
    12            19         228       4060    7689    95.13    37.19     1.10
    13            13         169       4073    7858    95.43    38.01     0.82
    14            10         140       4083    7998    95.67    38.69     0.68
    15             9         135       4092    8133    95.88    39.34     0.65
    16            10         160       4102    8293    96.11    40.12     0.77
    17             7         119       4109    8412    96.27    40.69     0.58
    18             8         144       4117    8556    96.46    41.39     0.70
    19            10         190       4127    8746    96.70    42.31     0.92
    20             1          20       4128    8766    96.72    42.40     0.10
    21            11         231       4139    8997    96.98    43.52     1.12
    22             3          66       4142    9063    97.05    43.84     0.32
    23             4          92       4146    9155    97.14    44.28     0.45
    24             3          72       4149    9227    97.21    44.63     0.35
    25             4         100       4153    9327    97.31    45.12     0.48
    26             4         104       4157    9431    97.40    45.62     0.50
    27             3          81       4160    9512    97.47    46.01     0.39
    28             4         112       4164    9624    97.56    46.55     0.54
    29             4         116       4168    9740    97.66    47.11     0.56
    30             4         120       4172    9860    97.75    47.70     0.58
    31             2          62       4174    9922    97.80    47.99     0.30
    32             1          32       4175    9954    97.82    48.15     0.15
    33             4         132       4179   10086    97.91    48.79     0.64
    34             4         136       4183   10222    98.01    49.45     0.66
    35             2          70       4185   10292    98.06    49.78     0.34
    36             3         108       4188   10400    98.13    50.31     0.52
    37             1          37       4189   10437    98.15    50.49     0.18
    38             1          38       4190   10475    98.17    50.67     0.18
    39             3         117       4193   10592    98.24    51.24     0.57
    42             1          42       4194   10634    98.27    51.44     0.20
    44             4         176       4198   10810    98.36    52.29     0.85
    45             2          90       4200   10900    98.41    52.73     0.44
    46             1          46       4201   10946    98.43    52.95     0.22
    47             2          94       4203   11040    98.48    53.40     0.45
    48             4         192       4207   11232    98.57    54.33     0.93
    49             3         147       4210   11379    98.64    55.04     0.71
    50             1          50       4211   11429    98.66    55.28     0.24
    51             1          51       4212   11480    98.69    55.53     0.25
    53             2         106       4214   11586    98.73    56.04     0.51
    54             1          54       4215   11640    98.76    56.31     0.26
    55             2         110       4217   11750    98.81    56.84     0.53
    60             1          60       4218   11810    98.83    57.13     0.29
    62             1          62       4219   11872    98.85    57.43     0.30
    63             3         189       4222   12061    98.92    58.34     0.91
    68             1          68       4223   12129    98.95    58.67     0.33
    71             2         142       4225   12271    98.99    59.36     0.69
    74             1          74       4226   12345    99.02    59.72     0.36
    78             1          78       4227   12423    99.04    60.09     0.38
    81             1          81       4228   12504    99.06    60.48     0.39
    86             1          86       4229   12590    99.09    60.90     0.42
    88             2         176       4231   12766    99.13    61.75     0.85
    90             1          90       4232   12856    99.16    62.19     0.44
    93             1          93       4233   12949    99.18    62.64     0.45
    95             1          95       4234   13044    99.20    63.10     0.46
    96             1          96       4235   13140    99.23    63.56     0.46
   105             1         105       4236   13245    99.25    64.07     0.51
   107             1         107       4237   13352    99.27    64.59     0.52
   108             1         108       4238   13460    99.30    65.11     0.52
   111             1         111       4239   13571    99.32    65.65     0.54
   116             1         116       4240   13687    99.34    66.21     0.56
   122             2         244       4242   13931    99.39    67.39     1.18
   125             1         125       4243   14056    99.41    67.99     0.60
   127             1         127       4244   14183    99.44    68.61     0.61
   136             1         136       4245   14319    99.46    69.26     0.66
   138             1         138       4246   14457    99.48    69.93     0.67
   144             1         144       4247   14601    99.51    70.63     0.70
   154             1         154       4248   14755    99.53    71.37     0.74
   158             1         158       4249   14913    99.55    72.14     0.76
   162             1         162       4250   15075    99.58    72.92     0.78
   174             1         174       4251   15249    99.60    73.76     0.84
   178             1         178       4252   15427    99.63    74.62     0.86
   181             1         181       4253   15608    99.65    75.50     0.88
   188             2         376       4255   15984    99.70    77.32     1.82
   193             1         193       4256   16177    99.72    78.25     0.93
   197             1         197       4257   16374    99.74    79.20     0.95
   212             1         212       4258   16586    99.77    80.23     1.03
   240             1         240       4259   16826    99.79    81.39     1.16
   274             1         274       4260   17100    99.81    82.72     1.33
   370             2         740       4262   17840    99.86    86.30     3.58
   387             1         387       4263   18227    99.88    88.17     1.87
   423             1         423       4264   18650    99.91    90.21     2.05
   443             1         443       4265   19093    99.93    92.36     2.14
   483             1         483       4266   19576    99.95    94.69     2.34
   506             1         506       4267   20082    99.98    97.14     2.45
   591             1         591       4268   20673   100.00   100.00     2.86


Number of Types   =     4268
Number of Tokens  =    20673
Type/Token ratio  =        0.206
Token/Type ratio  =        4.844
Hapax Legomena    =     2746
Hapax Dislegomena =      564
Hapax Legomena/Dislegomena ratio   =     4.8688
Hapax Legomena/Number of Types     =     0.6434
Hapax Legomena/Number of Tokens    =     0.1328
Hapax Legomena cubed/Types squared =  1136.7181
Variance ( S.D. squared )          =   580.4314
Standard Deviation (S.D.)          =    24.0921
Coefficient of skewness            =    14.5953
Coefficient of kurtosis            =   261.4507
Herdan's characteristic            =     0.0761
Yule's characteristic              =   602.9476
Carroll TTR (Types / Sqrt of 2 X Tokens) =    20.9898
Most Frequent word "and" occurred 591 times
repeat rate (Tokens / frequency most frequent word) =    34.9797


Word Length Statistics
----------------------

Word  Freq.    %                        Percentage
 Len                        10        20        30        40        50
                   +----+----+----+----+----+----+----+----+----+----+
   1    705   3.41 |***
   2   3523  17.04 |*****************
   3   4022  19.46 |*******************
   4   4494  21.74 |**********************
   5   3293  15.93 |****************
   6   1921   9.29 |*********
   7   1261   6.10 |******
   8    772   3.73 |****
   9    369   1.78 |**
  10    188   0.91 |*
  11     59   0.29 |
  12     52   0.25 |
  13      8   0.04 |
  14      3   0.01 |
  15      2   0.01 |
  16      1   0.00 |

Total letters (Tokens)   =    87453
Total Words (Types)      =    20673
Type/Token ratio         =        0.2364
Mean word length         =        4.2303
Variance (S.D. squared)  =        3.9467
Standard Deviation (S.D.)=        1.9866
Herdan's characteristic  =        0.0033


First letter in words statistics
--------------------------------

Letter Freq.    %                        Percentage
                            10        20        30        40        50
                   +----+----+----+----+----+----+----+----+----+----+
   a   1747   8.45 |********
   b   1193   5.77 |******
   c    560   2.71 |***
   d    789   3.82 |****
   e    408   1.97 |**
   _      0   0.00 |
   f    914   4.42 |****
   g    373   1.80 |**
   h   1020   4.93 |*****
   i   1424   6.89 |*******
   j      0   0.00 |
   k    119   0.58 |*
   l    769   3.72 |****
   m   1336   6.46 |******
   n    615   2.97 |***
   o   1043   5.05 |*****
   _      0   0.00 |
   p    572   2.77 |***
   q     30   0.15 |
   r    329   1.59 |**
   s   1816   8.78 |*********
   t   3403  16.46 |****************
   u      0   0.00 |
   v    298   1.44 |*
   w   1447   7.00 |*******
   x      0   0.00 |
   y    353   1.71 |**
   z      1   0.00 |
   0      0   0.00 |
   1      1   0.00 |
   2     22   0.11 |
   3     21   0.10 |
   4     13   0.06 |
   5     12   0.06 |
   6     11   0.05 |
   7     11   0.05 |
   8     11   0.05 |
   9     12   0.06 |

Sorted by frequency

Letter Freq.    %                       Percentage
                            10        20        30        40        50
                   +----+----+----+----+----+----+----+----+----+----+
   t   3403  16.46 |****************
   s   1816   8.78 |*********
   a   1747   8.45 |********
   w   1447   7.00 |*******
   i   1424   6.89 |*******
   m   1336   6.46 |******
   b   1193   5.77 |******
   o   1043   5.05 |*****
   h   1020   4.93 |*****
   f    914   4.42 |****
   d    789   3.82 |****
   l    769   3.72 |****
   n    615   2.97 |***
   p    572   2.77 |***
   c    560   2.71 |***
   e    408   1.97 |**
   g    373   1.80 |**
   y    353   1.71 |**
   r    329   1.59 |**
   v    298   1.44 |*
   k    119   0.58 |*
   q     30   0.15 |
   2     22   0.11 |
   3     21   0.10 |
   4     13   0.06 |
   5     12   0.06 |
   9     12   0.06 |
   7     11   0.05 |
   6     11   0.05 |
   8     11   0.05 |
   z      1   0.00 |
   1      1   0.00 |

Total initial letters (Tokens)   =    20673
Total different letters (Types)  =       38
Type/Token ratio                 =        0.0018
Arithmetric Mean                 =      544.0263
Standard Deviation (S.D.)        =      732.5250
Herdan's characteristic          =        0.2184
Repeat rate for initial letter "t" =          6.07


Final letter in words statistics
--------------------------------

Letter Freq.    %                        Percentage
                            10        20        30        40        50
                   +----+----+----+----+----+----+----+----+----+----+
   a    220   1.06 |*
   b      9   0.04 |
   c      3   0.01 |
   d   1777   8.60 |*********
   e   5323  25.75 |**************************
   _      1   0.00 |
   f    525   2.54 |***
   g    493   2.38 |**
   h    927   4.48 |****
   i    385   1.86 |**
   j      0   0.00 |
   k     57   0.28 |
   l    636   3.08 |***
   m    226   1.09 |*
   n   1145   5.54 |******
   o    893   4.32 |****
   _      0   0.00 |
   p     40   0.19 |
   q      0   0.00 |
   r   1044   5.05 |*****
   s   2196  10.62 |***********
   t   2505  12.12 |************
   u    362   1.75 |**
   v      1   0.00 |
   w    281   1.36 |*
   x      3   0.01 |
   y   1454   7.03 |*******
   z      0   0.00 |
   0     16   0.08 |
   1      0   0.00 |
   2     27   0.13 |
   3     25   0.12 |
   4     21   0.10 |
   5     15   0.07 |
   6     14   0.07 |
   7     15   0.07 |
   8     17   0.08 |
   9     17   0.08 |

Sorted by frequency

Letter Freq.    %                       Percentage
                            10        20        30        40        50
                   +----+----+----+----+----+----+----+----+----+----+
   e   5323  25.75 |**************************
   t   2505  12.12 |************
   s   2196  10.62 |***********
   d   1777   8.60 |*********
   y   1454   7.03 |*******
   n   1145   5.54 |******
   r   1044   5.05 |*****
   h    927   4.48 |****
   o    893   4.32 |****
   l    636   3.08 |***
   f    525   2.54 |***
   g    493   2.38 |**
   i    385   1.86 |**
   u    362   1.75 |**
   w    281   1.36 |*
   m    226   1.09 |*
   a    220   1.06 |*
   k     57   0.28 |
   p     40   0.19 |
   2     27   0.13 |
   3     25   0.12 |
   4     21   0.10 |
   8     17   0.08 |
   9     17   0.08 |
   0     16   0.08 |
   5     15   0.07 |
   7     15   0.07 |
   6     14   0.07 |
   b      9   0.04 |
   x      3   0.01 |
   c      3   0.01 |
   _      1   0.00 |
   v      1   0.00 |

Total final letters (Tokens)     =    20673
Total different letters (Types)  =       38
Type/Token ratio                 =        0.0018
Arithmetric Mean                 =      544.0263
Standard Deviation (S.D.)        =     1025.4043
Herdan's characteristic          =        0.3058
Repeat rate for final letter "e" =          3.88


All letters in words statistics
-------------------------------

Letter  Freq.  % in all  Initial  % in all  Final  % in all
     a   5822     6.66    1747     30.01     220      3.78
     b   1421     1.62    1193     83.95       9      0.63
     c   1614     1.85     560     34.70       3      0.19
     d   3321     3.80     789     23.76    1777     53.51
     e  12354    14.13     408      3.30    5323     43.09
     _      1     0.00       0      0.00       1    100.00
     f   1950     2.23     914     46.87     525     26.92
     g   1601     1.83     373     23.30     493     30.79
     h   5883     6.73    1020     17.34     927     15.76
     i   5851     6.69    1424     24.34     385      6.58
     j      0     0.00       0      0.00       0      0.00
     k    693     0.79     119     17.17      57      8.23
     l   3664     4.19     769     20.99     636     17.36
     m   2429     2.78    1336     55.00     226      9.30
     n   5382     6.15     615     11.43    1145     21.27
     o   6618     7.57    1043     15.76     893     13.49
     _      1     0.00       0      0.00       0      0.00
     p   1255     1.44     572     45.58      40      3.19
     q     59     0.07      30     50.85       0      0.00
     r   4953     5.66     329      6.64    1044     21.08
     s   5896     6.74    1816     30.80    2196     37.25
     t   8357     9.56    3403     40.72    2505     29.97
     u   3360     3.84       0      0.00     362     10.77
     v    342     0.39     298     87.13       1      0.29
     w   2201     2.52    1447     65.74     281     12.77
     x     78     0.09       0      0.00       3      3.85
     y   2022     2.31     353     17.46    1454     71.91
     z     25     0.03       1      4.00       0      0.00
     0     27     0.03       0      0.00      16     59.26
     1      1     0.00       1    100.00       0      0.00
     2     48     0.05      22     45.83      27     56.25
     3     46     0.05      21     45.65      25     54.35
     4     41     0.05      13     31.71      21     51.22
     5     32     0.04      12     37.50      15     46.88
     6     25     0.03      11     44.00      14     56.00
     7     25     0.03      11     44.00      15     60.00
     8     27     0.03      11     40.74      17     62.96
     9     28     0.03      12     42.86      17     60.71

Sorted by frequency

Letter Freq.    %                       Percentage
                            10        20        30        40        50
                   +----+----+----+----+----+----+----+----+----+----+
   e  12354  14.13 |**************
   t   8357   9.56 |**********
   o   6618   7.57 |********
   s   5896   6.74 |*******
   h   5883   6.73 |*******
   i   5851   6.69 |*******
   a   5822   6.66 |*******
   n   5382   6.15 |******
   r   4953   5.66 |******
   l   3664   4.19 |****
   u   3360   3.84 |****
   d   3321   3.80 |****
   m   2429   2.78 |***
   w   2201   2.52 |***
   y   2022   2.31 |**
   f   1950   2.23 |**
   c   1614   1.85 |**
   g   1601   1.83 |**
   b   1421   1.62 |**
   p   1255   1.44 |*
   k    693   0.79 |*
   v    342   0.39 |
   x     78   0.09 |
   q     59   0.07 |
   2     48   0.05 |
   3     46   0.05 |
   4     41   0.05 |
   5     32   0.04 |
   9     28   0.03 |
   8     27   0.03 |
   0     27   0.03 |
   7     25   0.03 |
   z     25   0.03 |
   6     25   0.03 |
   _      1   0.00 |
   _      1   0.00 |
   1      1   0.00 |

Total all letters (Tokens)       =    87453
Total different letters (Types)  =       38
Type/Token ratio                 =        0.0004
Arithmetric Mean                 =     2301.3947
Standard Deviation (S.D.)        =     2939.7315
Herdan's characteristic          =        0.2072
Repeat rate for all letter "e" =          7.08