Summary and Schedule
ATTENTION This is an experimental test of The Carpentries Workbench lesson infrastructure. It was automatically converted from the source lesson via the lesson transition script.
If anything seems off, please contact Zhian Kamvar zkamvar@carpentries.org
This Library Carpentry lesson introduces people working in library- and information-related roles to working with data in OpenRefine. At the conclusion of the lesson you will understand what the OpenRefine software does and how to use the OpenRefine software to work with data files.
Prerequisites
To complete this lesson you will need to:
- Install OpenRefine or use it through a cloud service
- Download a data file
- Use a supported browser
See our setup page for more information.
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to OpenRefine | What is OpenRefine? What can it do? |
Duration: 00h 15m | 2. Importing data into OpenRefine | How do I get data into OpenRefine? |
Duration: 00h 30m | 3. Layout of OpenRefine, Rows vs Records |
How is data organised in OpenRefine? How do I access options to amend data in OpenRefine? What is the difference between Rows and Records in OpenRefine? How do I work with single cells that contain multiple values in a list? |
Duration: 00h 45m | 4. Faceting and filtering |
What is a facet in OpenRefine? What is a filter in OpenRefine? How can I use filters and facets to explore data in OpenRefine? How can I correct common data issues in my data with OpenRefine? |
Duration: 01h 05m | 5. Clustering |
What is Clustering in OpenRefine and when would you use it? How does clustering work in OpenRefine? |
Duration: 01h 25m | 6. Working with columns and sorting |
How do I move, rename or remove columns in OpenRefine? How do I sort data in OpenRefine? |
Duration: 01h 35m | 7. Introduction to Transformations |
How do I use transformations to programmatically edit my data? What are the kind of transformations Open Refine supports? What is GREL? |
Duration: 01h 45m | 8. Writing Transformations |
Where do I write GREL expressions in the OpenRefine interface? How do I write a valid GREL expression? |
Duration: 02h 00m | 9. Transformations - Undo and Redo | How do the Undo and Redo features work? |
Duration: 02h 05m | 10. Transforming Strings, Numbers, Dates and Booleans |
How do I use transformations to programmatically edit my data? How do I transform the various data types? |
Duration: 02h 25m | 11. Transformations - Handling Arrays | How do I use Arrays in data transformation? |
Duration: 02h 45m | 12. Exporting data | How do I export data from OpenRefine? |
Duration: 02h 50m | 13. Looking Up Data |
How do I fetch data from an Application Programming Interface (API) to
be used in OpenRefine? How do I reconcile my data by comparing it to authoritative datasets How do I install extensions for OpenRefine |
Duration: 03h 20m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Getting ready
You need to install OpenRefine and download a data file to follow this lesson.
Installing and running OpenRefine
OpenRefine is a free, open-source Java application. You can download OpenRefine from http://openrefine.org/download.html. This lesson has been tested with all versions of OpenRefine up to the latest tested version, 3.6.1
Packages are available on https://openrefine.org/download.html for Windows, macOS, and Linux. Please download the latest stable version, choosing the “kit” for your operating system. Current versions of the “Windows kit with embedded Java” and “Mac kit” include everything you need to run OpenRefine. The “Linux kit” and traditional “Windows kit” require a “Java Runtime Environment” (JRE) installed on your system (see notes below).
If you are using an older version of OpenRefine, it is recommended you upgrade to the latest tested version.
Please follow OpenRefine’s manual to install and run it.
When running OpenRefine, initially a command line window will open. This is a window with a black background. As OpenRefine runs, lines of text will appear in the command line window. Then the Open Refine interface will open in your default web browser. You do not need to interact with the command line window. Leave it open in the background, and work on datasets in your web browser.
Notes:
- When you download OpenRefine for Windows or Linux from the address above, you are downloading an archive file (zip or tar). To install OpenRefine unzip the downloaded file to a permanent location on your computer. This can be to a personal directory or to an applications or software directory - OpenRefine should run wherever you put the unzipped folder. The location has to be a “local” drive as problems have been reported trying to run OpenRefine from a Network drive.
- The options “Windows kit with embedded Java” and “Mac kit” include Java as part of the package. You do not need to install Java if you use one of these kits. This is the preferred method on Windows and Mac systems.
- On Windows, if you use the traditional “Windows kit” without embedded Java, you will need a “Java Runtime Environment” (JRE) on your system. If you do not already have JRE or JDK installed, you can visit Adopt OpenJDK or Oracle Java to download an installer package. Please note that Oracle significantly changed their license terms in 2019 limiting it to “personal use” without a paid license. If you use OpenRefine at work or in research, OpenJDK is preferred.
- On Linux a “Java Runtime Environment” (JRE) will be required to run
OpenRefine. If you do not already have JRE or JDK installed on your
system, most distribution repositories will contain OpenJRE / OpenJDK
packages. Install the default version available from your distribution.
For example, on Ubuntu/Debian:
sudo apt install default-jre
. - OpenRefine does not support Internet Explorer. Please use Firefox, Microsoft Edge, Chrome or Safari instead.
OpenRefine cloud services
If you are unable to install OpenRefine (due to IT restrictions, for example), please try openrefineder using MyBinder. It’s free to use without registration, but it’s the older OpenRefine 3.4.1, restricted to 1-2 GB RAM, and the server will be deleted after 10 minutes of inactivity.
Downloading the data
You can download doaj-article-sample.csv, which is a csv file that will open in a new browser tab. Be sure to right click or control click in order to save the file (NOTE: In Safari, right click and select download linked file; in Chrome and Firefox, right click and select save link as…). Make a note of the location (i.e. the folder, your desktop) to which you save the file.
Exiting OpenRefine
To exit OpenRefine, close all the browser tabs or windows, then navigate to the command line window. To close this window and ensure OpenRefine exits properly, hold down [control] and press [c] on your keyboard. This will save all changes to your projects.
Getting help
If you encounter problems installing or running OpenRefine, a good source of support is the OpenRefine mailing list and user forum. Include your operating system when searching to find the most relevant answers for your issue, such as threads related to Windows, macOS, or Linux.
You may also want to check the Stack Overflow OpenRefine tag or the OpenRefine Gitter room.
There are also general and specialist tutorials about using OpenRefine available on the web, including:
- Official wiki List of OpenRefine External Resources
- Getting started with OpenRefine by Thomas Padilla
- Cleaning Data with OpenRefine by Seth van Hooland, Ruben Verborgh and Max De Wilde
- Blog posts on using OpenRefine from Owen Stephens
- Identifying potential headings for Authority work using III Sierra, MS Excel and OpenRefine
- Free your metadata website
- Data Munging Tools in Preparation for RDF: Catmandu and LODRefine by Christina Harlow
- Cleaning Data with OpenRefine by John Little
- OpenRefine Blog