Text and Data Mining at University of Toronto Libraries

Text and data mining are associated methods for identifying patterns within large bodies of text, in the case of text mining, or data, in the case of data mining. There are a number of different techniques associated with this method.

"What is Text Mining?" from Elsevier

"How does Text Mining Work?" from Elsevier

Resources and Training

Voyant Tools is a web-based platform for generating statistical information about text corpora that may offer preliminary information about your text(s). For text-wrangling and text mining skills, consult the University of Southern California's excellent list of training resources. Additionally, Programming Historian has excellent tutorials on working with text and textual data.

Getting Textual Datasets

Some vendors, publishers, journals, and other organizations have made text available via application programming interfaces (APIs) and below we list those available to University of Toronto community members. University of Toronto Libraries has some locally loaded materials available for text mining as well. Some openly accessible collections may also be useful; the University of Illinois at Urbana Champaign has compiled a list of open resources for text mining.

For help with using APIs or to inquire about available materials for text mining, contact us.

APIs

Scholarly Publishing APIs

Humanities Research APIs

Scientific Research APIs

Government and Institutional Data APIs 

 

 

We acknowledge MIT Libraries, Berkeley Libraries, and CEU Libraries, for providing some content and inspiration for this page.