Data Cleaning with OpenRefine

Data Cleaning with OpenRefine

Presenter David Kwasny
Location The BRIDGE, UTSC
Date
Time -

This workshop is part of the Digital Scholarship Unit's Fun Data Fridays Series.
These events are designed with an hour-long introduction presentation on the topic led by U of T Scarborough Librarian, followed by a peer-to-peer discussion and collaborative learning session. This program is designed to be a forum for how data is used pervasively across all disciplines and troubleshoot your data questions. Our goal is to work together to appreciate the unique issues and opportunities brought forth when discussing data cross-disciplinarily.   
 

About Data Cleaning with OpenRefine 

OpenRefine is a desktop application that uses your web browser as a graphical interface. It is described as “a power tool for working with messy data” (David Huynh, Creator of Google Refine, OpenRefine’s origin technology) - but what does this mean? It is probably easiest to describe the kinds of data OpenRefine is good at working with and the sorts of problems it can help you or your team solve. 

OpenRefine is most useful where you have data in a simple tabular format such as a spreadsheet, a comma separated values file (csv) or a tab delimited file (tsv) but with internal inconsistencies either in data formats, or where data appears, or in terminology used. OpenRefine can be used to standardize and clean data across your file.  

By using OpenRefine to resolve issues within your data set and cleaning up the various inconsistencies commonly found in research data you are more likely to achieve the data visualizations you desire. OpenRefine is a premier open-source technology designed to clean messy data and we are going to show you how. 

Overall Lesson Objectives 
- Explain what the OpenRefine software does 
- Explain how the OpenRefine software can help work with data files 
- Provide guidance in the use of OpenRefine to assist in standardizing datasets with controlled vocabularies and best practices 

Learning Outcomes 
- Learn to automate repetitive, boring, error-prone tasks and fix inconsistencies  
- Create, maintain, and analyze sustainable and reusable data 
- Work effectively with information technology systems  

Additional details:
OpenRefine is installed on The BRIDGE Lab computers. If you would like to use your own computer, OpenRefine is a free, open-source Java application. You can download OpenRefine from http://openrefine.org/download.html. This lesson has been tested with the 3.7.7 version of OpenRefine. 

Location: The BRIDGE, IC Building main floor

Facilitator:
David Kwasny, Data & Digital Literacy Librarian, UTSC Library
DS3