Content from What is Wikidata?


Last updated on 2023-04-24 | Edit this page

Overview

Questions

  • What are Items and Statements?
  • How does the Wikidata interface look like?
  • How is Wikidata linked to other Wiki projects?

Objectives

  • Feel comfortable describing Wikidata to colleagues.
  • Learn about Wikimedia projects (e.g. Wikipedia, WikiCommons) and Wikidata is related to them.
  • Know why linked open data is important in my work as a cataloging or teaching librarian.
  • Know able identify components of a Wikidata item page, how Wikidata is organized and how to navigate in it.

What is Wikidata?


Wikidata’s description explains that “Wikidata is a free and open knowledge base that can be read and edited by both humans and machines.” Wikidata functions as the central database for a variety of Wiki projects, including Wikipedia, Wiktionary, and Wikisource, among others.

Most users will be familiar with Wikpedia, which describes itself as “a free encyclopedia, written collaboratively by the people who use it. It is a special type of website designed to make collaboration easy, called a wiki. Many people are constantly improving Wikipedia, making thousands of changes per hour. All of these changes are recorded in article histories and recent changes.”

Wikidata contains various data types (e.g. text, images, quantities, coordinates, geographic shapes, dates). The data can be queried via a query interface called SPARQL, which we will cover later in this lesson. Data is published under the Creative Commons Public Domain 1.0 license. It can be modified, copied, and distributed without permission.

Wikidata also contains authority files, bibliographic data, and other content normally managed in library databases.

Importantly, Wikidata can be interlinked to other open data sets on the linked data web.

1.1 Intro interface


  • Let’s try this out in the next section of this lesson and see if we as humans can simply read the data on Wikidata:

  • Explore a Wikidata Item page:

    • Start by going to the Wikidata Main Page by typing “www.wikidata.org” into your browser. This is what you should see:

    Wikidata_Main_Page
    Screenshot of Wikidata Main Page

    • Now go to the search bar in the top right corner and enter “british library”. This will give you a list with search results. Click the entry that says: “British Library (Q23308) national library of the United Kingdom”. Now you should see the british library’s item page: https://www.wikidata.org/wiki/Q23308

    • Let us explore the item British Library (Q23308). The top part of the item page serves for identifying the item. It has:

      • unique identifier (Q + a number)
      • label
      • description
      • aliases
    • The bottom part is the “statement” section, that adds statements to the item. A statement has:

      • property (P + a number)
      • value
      • qualifier (optional)
      • references (optional)
      • is a so called triple which will be explained later
      • As you can see a property can have multiple values for one property; for example “member of”; can be further specified by qualifiers (not showen on item British Library).
  • All these new definitions like statements, qualifiers and so on can be confusing. If you are not sure you can check this overview graphic https://upload.wikimedia.org/wikipedia/commons/a/ae/Datamodel_in_Wikidata.svg:

Datamodel in Wikidata
  • Usually pages can be edited by anyone; click the pen on the upper-right; Q23308 - British Library is semi-protected; don’t worry if you made a mistake, you can always go back in history

    • “View history” - more later
    • “Log in” and other things for registered users
  • All structured data is under the creative Creative Commons CC0 License: “The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.” from https://creativecommons.org/publicdomain/zero/1.0/

  • Further Link describing Wikidata in one page (visual)

1.2 Play games to open


  • Visit the Wikipedia page of the city you were born in two languages of you choice (you can choose different language version in the left side of a Wikipedia page) and look the size of the population. Are the numbers the same in the different language? Visit the item in Wikidata.

1.3 Wikidata Item Eastereggs


1.4 Linking Wikidata to other Wiki resources


Key Points

  • Item
  • Statement

Content from Underlying concepts of Wikidata


Last updated on 2023-04-24 | Edit this page

Overview

Questions

  • What is a RDF triple?
  • What are the underlying components of RDF?

Objectives

  • Know what a triple is, and relate structure of a Wikidata statement to traditional metadata field structure
  • Know how linked data can create more context for patrons/users in library catalogs
  • Know how linked data can improve recall in library catalogs? (TODO: Check if we want to address this here).

2.1 Concepts foundations: ways of storing data.


There are many types of databases, the most common types are:

2.1.1 Relational databases:

A relational database is a set of formally described related tables from which data can be accessed or reassembled. This model organizes data into one or more tables (or “relations”) of columns and rows, with a unique key identifying each row. each table/relation represents one “entity type” and these entities are connected via constrained relationships. This model is fully structured and mostly uses SQL (Structured Query Language) to retrive and manuplate data.
Examples:

relational database

2.1.2 Graph / Semantic databases

Semantic web is an extension of the World Wide Web standards, which promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF) is used to store data. Most RDF fundamentally uses SPARQL (Simple Protocol and Rdf Query Language) to read stored data while relational databases uses SQL (Structured Query Language) to do so. In SQL relational database terms, RDF data can also be considered or viewed as a table with only three columns – the subject column, the predicate column, and the object column.

data structure diagram

2.2 Concepts foundations (RDF and RDF triples)


  • The RDF is a conceptual data model, It is based on the idea of making statements about resources in expressions of the form (subject–predicate–object), known as triples.

  • The subject denotes the resource, and the predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object, for example: John-is-a person, John-born in-1980, John-works as-Engineer

  • RDF data are stored on containers known as triplestores.

  • https://en.wikipedia.org/wiki/Semantic_triple

RDF Tripe

2.3 Underlying components


  • Items
    Items represents subjects such Douglas Adams and have identifiers that starts with letter “Q” like: Q42 for Douglas Adams.
    Each item must have a name in one or more langauges, optionally have alternative names and descrition.
  • Properties
    Properties represents attributes of the subject such occupation and have identifiers that starts with letter “P” like: P106 for Occupation.
  • Claims
    Claims are the triples, which combine the formation of Item and Property and a value such: Douglas Adams (Q42) - occupation (P106) - comedian.
    Note: value can be already stored in wikidata, therefore the bot assigns the Q number of the value instead.
  • Statement
    A Claim is a part of a statement, a statement also includes: References, Ranks, and Qualifiers.
  • References
    Used to store the source of the claim, using properties, such stated in, qoute, and etc.
  • Ranks
    A useful component to mark outdated claims.
  • Qualifiers
    Qualifiers are besicly properties but on claims rather than items.

Is data stored in the RDF triple format part of your work as a librarian?

Take some time to think about if data stored in the RDF triple format is part of your work as a librarian. Can you give an example in the format of an RDF triplet?

TO DO: PLEASE ADD A REAL LIFE EXAMPLE

Point out one RDF triple on the Wikidata item page of former astronaut Mae Jemison.

Got to the Wikidata page of Mae Jemison and point out one RDF triple. An RDF triplet consists of a subject, a predicate and an object. Can you assign the three corresponding Wikidata terms?

Got to Wikidata and either search for “Mae Jemison” or enter the ID Q34091. In the picture below the statement “Mae C. Jemison - part of - NASA Astronaut Group 12” is an RDF triple. Wikidata_Main_Page
Screenshot of Wikidata Main Page

2.4 Scholia - a webserive with Wikidata as underlying database


  • Introduction with The Linked Open Data Cloud
  • the structure enables queries
  • reference to DBPedia
  • you can build your own web services with Wikidata as database > Scholia
    • e.g. search for Alex Bateman

2.5 Wikidata one pager


2.6 How Wikidata compares with other data sets


FIXME

Key Points

  • First key point. (FIXME)

Content from Introduction to editing


Last updated on 2023-04-24 | Edit this page

Overview

Questions

  • How to create and edit a Wikidata item?

Objectives

  • Be able to create and edit a Wikidata entry.
  • Know properties and relations, and where to find lists of approved properties and relations.
  • Be able to add new statements that link to other items.
  • Be aware of property constraints.
  • Know community norms around Wikidata and why they are important.
  • Be able to add references appropriately.
  • Know what identifiers are and how to add them to a statement.
  • Know different stable identifiers (e.g. ORCID for authors, DOI for works) and why makes sense to use them as properties.
  • Know the correct use of properties.

3.1 Introduction


Here we will work in the test instance of Wikidata so you will not break anything. Also keep in mind that the editing history is kept in Wikidata so error can also be easily fixed there. The test instance is cleaned regularly. You can quickly figure out if you are on the Wikidata instance (colored logo) or the test version (black-and-white only).

3.2 Create new items


In the following we will create new items. In order to avoid to fill Wikidata with test entries, we will use the test instance (https://test.wikidata.org/) and not the official, production version (https://wikidata.org/).

Go to the test instance at https://test.wikidata.org/

Click “Create a new Item” link on the left site. You will see a form that looks like this:

Front Page of the test instance
  • Please fill the form. You can now add an entry about anything you want like a book, a research article or and author. We will generate an entry of Mae Jemison an American engineer, physician and NASA astronaut. You can also add yourself (if you feel famous enough). We choose “en” int the Language drop-down menue, write “Mae Jemison” in the Label field, “an American engineer, physician and NASA astronaut.” in the Description field and “Mae Carol Jemison” in the Aliases field.
Empty create form of the test instance
Empty create form of the test instance
  • Once we are done we click click “Create”. You should see you newly created article. The URL, the adress shown in your web browser, should contain “Q” and some number that is unique for this entry at the end.
Freshly created Item of Mae Jemison

You can compare the entry that you have generated on the test instance with the current version of the item in Wikidata (Q34091).

Wikidata Jemison

3.3 Add Statements - birth reference


Why Wikidata uses references: Like in Wikipedia it is important that content can be verified by others to make sure it is correct and comes from a reliable source of information, such as a book, scientific publication, or newspaper article. A Reference (or source) is used to point to specific sources that back up a claim in Wikidata. A reference can be a link to a URL or an item; for example, a book. Wikidata does not aim to answer the question of whether a statement is correct, but only whether the statement appears in a reference.

Task:

  • Support a statement by reference

    • Add the birth date (October 17, 1965) of Mae Jemison as a statement using property P569 “date of birth” to the “Mae X Jemison” item you created above.
    • Afterwards add a reference to the satement with the following url as the source: https://www.biography.com/astronaut/mae-c-jemison

3.4 Add Statements - Add ID to Mae Jemison


Task: Support a person by it’s IDs. Give the participtants the identifiers and source page for an ID and let them add it on the Mae Jemison item on the test instance of Wikidata:

3.5 Norms and good practices


Key Points

  • create new items
  • add new statements

Content from Advanced editing


Last updated on 2023-04-24 | Edit this page

Overview

Questions

  • How to automatically add statements with sourcemd and quickstatements?

Objectives

  • Be familiar with some tools for editing, e.g. TABernacle, Wikidata Games, QuickStatements, Source MetaData or Author Disambiguator/Author resolver.

4.1 Disclaimer


The tools are under heavy development and due to that they might change or don’t work as expected. If that happens just move on to the next episode.

4.2 Introduction


So now we will work in the productive version. We will use DOI to automatically put an article into Wikidata via sourcemd. If you are familiar in Life Science you can use our example with PubMed for finding DOIs of new article, optional you can choose an journal related to your scientific field. Sourcemd gets it metadata from Crossref, also look to sourcemd:instructions

Potential open access journal:

4.3 Adding statements via sourcemd and quickstatements


Go to pubmed, scroll down to “latest literature” and select an article: latest_articles

Save the DOI, PMID or PMCID of the article:

choose_doi

Go to sourcemd and paste the DOI or PMID into the search field:

paste_into_sourcemd

Click on “check source”. Now you can see automatically generated statements including meta data of the article like author names or date of publication. Click on “Open in QuickStatements”.

open quickstatements

A new window with QuickStatements will pop up. Now you’ll get an overview of the new item and its statements. Confirm the changes by hitting the the “run” button:

run_editing

4.4 (OPTIONAL) Converting “author strings” to “author”


Find Author Strings Author Disambiguator

Key Points

  • First key point. (FIXME)

Content from Introduction to querying


Last updated on 2023-04-24 | Edit this page

Overview

Questions

  • What is SPARQL?
  • How to use SPARQL to query Wikidata?
  • How to use Wikidata querying tools?

Objectives

  • Know what a query language is, and how SPARQL differs SQL.
  • Be able to use SPARQL to query Wikidata.
  • Potentially be able to use a tool like TABernacle to edit based on a query.
  • Have a cursory knowledge of the plethora of Wikidata querying tools and how they can be used by librarians.
  • Know the purpose and usefulness of maintenance queries for identifying missing information.
  • Be able to create maintenance queries.

FIXME

There are different ways to query information in Wikidata. The simplest way is to search for an entry in Wikidata and looking up all information for that entry, e.g. search for Richard Feynmann. This search looks by default in the Q-pages as well as the P-pages. However, we can restrict a search for a property by only looking in the P-pages, e.g. if we want to look whether there is property for the ISBN we can restrict that search to properties only. Moreover, for a given entry there is always the possibility to see other pages which links to that (e.g. using it as an object), e.g. all pages linking to Richard Feynman: https://www.wikidata.org/wiki/Special:WhatLinksHere/Q39246

That is not much different from other searches you may be familiar with. However, the real potential of Wikidata as a huge knowledge graph, can be experienced through more advanced querying with the Wikidata query service where the queries have to entered in SPARQL.

% To discover Wikidata objects nearby there is the nearby search: % https://www.wikidata.org/wiki/Special:Nearby

5.1 What is SPARQL?


SPARQL is a query language for RDF data and is a W3C recommendations since 2008. The data has to be stored as triples where the object of one triple can be the subject of another triple. Thus, one can think about a huge knowledge graph, where the nodes are connected by the predicates with other nodes. For example here we see all the information about the book “The Meaning of It All” from Wikidata as a graph:

Example for the knowledge graph spanned by one Wikidata item % source: http://tinyurl.com/y267yz5q

However, this is only the graph spanned by one item and its connected entries, which then itself also have more connections, e.g. we can open some links from the author Richard Feynman:

Example for the knowledge graph spanned by one Wikidata item and more details about Feynman % click on that node in the above query

For querying data now in this knowledge graph with SPARQL we define some graph patterns which we want to search. The simplest form is a triple where we replace one of the components with a variable, which is indicated by a string starting with a question mark:

  • Query for the publisher: { wd:Q7750812 wdt:P123 ?publisher . }
  • Query for the connection: { wd:Q7750812 ?property wd:Q353060 . }
  • Query for the publications from Addison-Wesley: { ?book wdt:P123 wd:Q353060 . }

5.2 Wikidata Query Service


The Wikidata query service can be found at https://query.wikidata.org/. There is the main window on the right to formulate your query in SPARQL. On the left there is the query helper and at the bottom the result will show up.

We will only cover here SELECT-statements and start by typing

SELECT * WHERE {

}

Hint It is enough to start typing “SELECT” and then use the auto-completion with Ctrl+Space. % TODO what is this for on a Mac?

Inside the parenthesis you can then place the statements describing the graph pattern you are looking for.

Exercise: Your first SPARQL query

Write your first SPARQL query for the publisher of the above mentioned book by copying the part from above point inside a SELECT-statement.

SELECT * WHERE {
   wd:Q7750812 wdt:P123 ?publisher .
}

Showing labels to Q-numbers

Namespaces and Prefixes

Prefixes are short abbrevations in the Wikidata Query Service. Some prefixes in Wikidata are: wd, wdt, p, ps, bd, etc.

Example:

SELECT ?item ?itemLabel
WHERE
{
  ?item wdt:P50 wd:Q23434.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

Items should be prefixed with wd: and properties with wdt: .

Namespaces in Wikidata are:

  • Main namespace
  • Property
  • Wikidata: it is for information and discussions about Wikidata itself. etc.

More conditions

  • publications from Addison-Wesley vs. books from Addison-Wesley vs. books authored by Richard Feynman from Addison-Wesley
  • LIMIT
  • ORDER
  • FILTER
  • OPTIONAL

Exercises

5.3 Try examples


Cats example

SELECT ?item ?itemLabel
WHERE
{
  ?item wdt:P31 wd:Q146. # Must be of a cat
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language
}

Map of libraries

SELECT distinct * WHERE {
  ?item wdt:P31/wdt:P279* wd:Q7075;
        wdt:P625 ?geo .
}

scholarly articles by Alex Bateman

SELECT ?item ?itemLabel ?journalLabel
WHERE
{
  ?item wdt:P31 wd:Q13442814.
  ?item wdt:P50 wd:Q18921408.
  ?item wdt:P1433 ?journal.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}

Russian poets

SELECT ?item ?itemLabel ?place ?placeLabel ?coord
WHERE
{
  ?item wdt:P31 wd:Q5.
  ?item wdt:P106 wd:Q49757.
  ?item wdt:P19 ?place.
  ?place wdt:P17 wd:Q159.
  ?place wdt:P625 ?coord

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

chemicals example

SELECT ?item ?itemLabel WHERE {

  ?item wdt:P31 wd:Q11173, wd:Q12140.

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}
SELECT ?item ?itemLabel ?struc ?formula

WHERE {

  ?item wdt:P31 wd:Q11173, wd:Q12140.
  ?item wdt:P117 ?struc.
  ?item wdt:P274 ?formula
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }

}
SELECT ?item ?itemLabel ?formula ?mass ?struc

WHERE {

  ?item wdt:P31 wd:Q11173, wd:Q12140.
  ?item wdt:P117 ?struc.
  ?item wdt:P274 ?formula.
  ?item wdt:P2067 ?mass.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }

}

ORDER BY DESC(?mass)
LIMIT 10

People born in Berlin filtered by year 1970

SELECT ?item ?itemLabel ?dob
WHERE
{
  ?item wdt:P31 wd:Q5.
  ?item wdt:P19 wd:Q64.
  ?item wdt:P569 ?dob.

  FILTER(YEAR(?dob) = 1970)

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}

5.4 More Advanced queries


further links

https://commons.wikimedia.org/wiki/File:Wikidata_Query_Service_in_Brief.pdf
https://www.uni-mannheim.de/media/Einrichtungen/dws/Files_Teaching/Semantic_Web_Technologies/SWT05-SPARQL-v1.pdf
https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial

Key Points

  • First key point. (FIXME)

Content from Advanced bulk updating, bots


Last updated on 2023-04-24 | Edit this page

Overview

Questions

  • How do you bulk upload to Wikidata?

Objectives

  • Know how to run a bulk import into Wikidata.
  • Be able to create items and/or claims using quickstatements.
  • Be familiar with the tools used for bulk edits and imports.
  • Be able to articulate how bulk import tools can apply to cataloging and digital science/archive projects.
  • Know how to write effective queries in terms of performance. (TODO: may refine or improve or delete)

FIXME

Learning outcomes


  • Understand how to run a bulk import into Wikidata

  • practice using quickstatements? (module 3 already includes QuickStatements)

  • Be familiar with the tools used for bulk edits and imports

  • Tools for bulk upload: - Quick statements (https://www.wikidata.org/wiki/Help:QuickStatements)

  • Connect bulk import possibilities to cataloging and digital science/archive projects?

  • Understand how to write a good queries in terms of performance

  • 6.1 Bulk uploads/harvests (lead to OpenRefine modules)

Test test test

6.2 Bulk edits


6.3 Bulk creation/harvesting


6.4 Performance


Key Points

  • First key point. (FIXME)