Summary and Setup

ATTENTION This is an experimental test of The Carpentries Workbench lesson infrastructure. It was automatically converted from the source lesson via the lesson transition script.

If anything seems off, please contact Zhian Kamvar zkamvar@carpentries.org

A part of the data workflow is preparing the data for analysis. Some of this involves data cleaning, where errors in the data are identifed and corrected or formatting made consistent. This step must be taken with the same care and attention to reproducibility as the analysis.

OpenRefine (formerly Google Refine) is a powerful free and open source tool for working with messy data: cleaning it and transforming it from one format into another.

This lesson will teach you to use OpenRefine to effectively clean and format data and automatically track any changes that you make. Many people comment that this tool saves them literally months of work trying to make these edits by hand.

Getting Started

Data Carpentry’s teaching is hands-on, so participants are encouraged to use their own computers to ensure the proper setup of tools for an efficient workflow.

These lessons assume no prior knowledge of the skills or tools.

To get started, follow the directions in the “Setup” tab to download data to your computer and follow any installation instructions.

Prerequisites

This lesson requires a working copy of OpenRefine (also called GoogleRefine).

To most effectively use these materials, please make sure to install everything before working through this lesson.

For Instructors

If you are teaching this lesson in a workshop, please see the Instructor notes.

Data

The data for this lesson is a part of the Data Carpentry Social Sciences workshop. It is a teaching version of the Studying African Farmer-Led Irrigation (SAFI) database. The SAFI dataset represents interviews of farmers in two countries in eastern sub-Saharan Africa (Mozambique and Tanzania). These interviews were conducted between November 2016 and June 2017 and probed household features (e.g. construction materials used, number of household members), agricultural practices (e.g. water usage), and assets (e.g. number and types of livestock).

The data used in this lesson is a subset of the teaching version that has been intentionally ‘messed up’ for this lesson.

Download the data file to your computer.

Software

For this lesson you will need OpenRefine (formerly Google Refine) and a web browser. Basic installation steps are provided on this page. The OpenRefine installation manual provides more details about installation, upgrades and configuration.

Note: this is a Java program that runs on your machine (not in the cloud). It runs inside your browser, but no web connection is needed for this lesson.

Callout

You do not need administrative rights on the computer to install OpenRefine. However, if anti-malware software blocks OpenRefine when you try to start it, you may need administrative rights to allow OpenRefine to run. OpenRefine is safe to run.

Windows

  • Check that you have Firefox, Edge, Opera or Chrome browsers installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer.

  • Download the software from openrefine.org.

  • Unzip the downloaded file into a directory by right-clicking and selecting “Extract…”. Name that directory something like OpenRefine.

    Callout

    The path to the directory you extract the application files into should be short, because some of OpenRefine’s files have very long names. If the path is too long, OpenRefine cannot start.

  • Go to your newly created OpenRefine directory.

  • Launch OpenRefine by opening openrefine.exe. This will launch a command prompt window, but you can ignore that and wait for the browser to launch.

  • If you see Internet Explorer start, or OpenRefine does not automatically open for you, point one of the supported browsers at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

Mac

  • Check that you have Firefox, Edge, Opera or Chrome browsers installed and set as your default browser. OpenRefine runs in your default browser. It will not run correctly in Internet Explorer.
  • Download the software from openrefine.org.
  • Unzip the downloaded file into a directory by double-clicking it. Name that directory something like OpenRefine.
  • Go to your newly created OpenRefine directory.
  • Drag the OpenRefine app into the Applications folder.
  • Launch OpenRefine: Control-click the app icon, then choose “Open” from the shortcut menu. For Troubleshooting help, see the Apple support page.
  • If you are using a different browser than listed above, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.

Linux

  • Check that you have Firefox or Chrome browsers installed and set as your default browser. OpenRefine runs in your default browser.
  • Download the software from openrefine.org.
  • Unzip the downloaded file into a directory. Name that directory something like OpenRefine.
  • Go to your newly created OpenRefine directory.
  • Launch OpenRefine by typing ./refine into the terminal within the OpenRefine directory.
  • If you are using a different browser than listed above, or if OpenRefine does not automatically open for you, point your browser at http://127.0.0.1:3333/ or http://localhost:3333 to launch the program.