Content from What is Wikidata?
Last updated on 2023-04-24 | Edit this page
Overview
Questions
- What are Items and Statements?
- How does the Wikidata interface look like?
- How is Wikidata linked to other Wiki projects?
Objectives
- Feel comfortable describing Wikidata to colleagues.
- Learn about Wikimedia projects (e.g. Wikipedia, WikiCommons) and Wikidata is related to them.
- Know why linked open data is important in my work as a cataloging or teaching librarian.
- Know able identify components of a Wikidata item page, how Wikidata is organized and how to navigate in it.
What is Wikidata?
Wikidata’s description explains that “Wikidata is a free and open knowledge base that can be read and edited by both humans and machines.” Wikidata functions as the central database for a variety of Wiki projects, including Wikipedia, Wiktionary, and Wikisource, among others.
Most users will be familiar with Wikpedia, which describes itself as “a free encyclopedia, written collaboratively by the people who use it. It is a special type of website designed to make collaboration easy, called a wiki. Many people are constantly improving Wikipedia, making thousands of changes per hour. All of these changes are recorded in article histories and recent changes.”
Wikidata contains various data types (e.g. text, images, quantities, coordinates, geographic shapes, dates). The data can be queried via a query interface called SPARQL, which we will cover later in this lesson. Data is published under the Creative Commons Public Domain 1.0 license. It can be modified, copied, and distributed without permission.
Wikidata also contains authority files, bibliographic data, and other content normally managed in library databases.
Importantly, Wikidata can be interlinked to other open data sets on the linked data web.
1.1 Intro interface
Let’s try this out in the next section of this lesson and see if we as humans can simply read the data on Wikidata:
-
Explore a Wikidata Item page:
- Start by going to the Wikidata Main Page by typing “www.wikidata.org” into your browser. This is what you should see:
Screenshot of Wikidata Main PageNow go to the search bar in the top right corner and enter “british library”. This will give you a list with search results. Click the entry that says: “British Library (Q23308) national library of the United Kingdom”. Now you should see the british library’s item page: https://www.wikidata.org/wiki/Q23308
-
Let us explore the item British Library (Q23308). The top part of the item page serves for identifying the item. It has:
- unique identifier (Q + a number)
- label
- description
- aliases
-
The bottom part is the “statement” section, that adds statements to the item. A statement has:
- property (P + a number)
- value
- qualifier (optional)
- references (optional)
- is a so called triple which will be explained later
- As you can see a property can have multiple values for one property; for example “member of”; can be further specified by qualifiers (not showen on item British Library).
All these new definitions like statements, qualifiers and so on can be confusing. If you are not sure you can check this overview graphic https://upload.wikimedia.org/wikipedia/commons/a/ae/Datamodel_in_Wikidata.svg:
-
Usually pages can be edited by anyone; click the pen on the upper-right; Q23308 - British Library is semi-protected; don’t worry if you made a mistake, you can always go back in history
- “View history” - more later
- “Log in” and other things for registered users
All structured data is under the creative Creative Commons CC0 License: “The person who associated a work with this deed has dedicated the work to the public domain by waiving all of his or her rights to the work worldwide under copyright law, including all related and neighboring rights, to the extent allowed by law. You can copy, modify, distribute and perform the work, even for commercial purposes, all without asking permission.” from https://creativecommons.org/publicdomain/zero/1.0/
-
Further Link describing Wikidata in one page (visual)
1.2 Play games to open
- Visit the Wikipedia page of the city you were born in two languages of you choice (you can choose different language version in the left side of a Wikipedia page) and look the size of the population. Are the numbers the same in the different language? Visit the item in Wikidata.
1.3 Wikidata Item Eastereggs
1.4 Linking Wikidata to other Wiki resources
- Link from Wikipedia to Wikidata
- e.g. https://en.wikipedia.org/wiki/On_the_Origin_of_Species
- => Follow the link “Wikidata item” on the left side under “tools”
- => https://www.wikidata.org/wiki/Q20124
- => the Wikipedia article is linked on the Wikidata’s item page. You can find it on the right side.
- => link to WikiCommons and WikiSource
- e.g. https://en.wikipedia.org/wiki/On_the_Origin_of_Species
Key Points
- Item
- Statement
Content from Underlying concepts of Wikidata
Last updated on 2023-04-24 | Edit this page
Overview
Questions
- What is a RDF triple?
- What are the underlying components of RDF?
Objectives
- Know what a triple is, and relate structure of a Wikidata statement to traditional metadata field structure
- Know how linked data can create more context for patrons/users in library catalogs
- Know how linked data can improve recall in library catalogs? (TODO: Check if we want to address this here).
2.1 Concepts foundations: ways of storing data.
There are many types of databases, the most common types are:
2.1.1 Relational databases:
A relational database is a set of formally described related tables
from which data can be accessed or reassembled. This model organizes
data into one or more tables (or “relations”) of columns and rows, with
a unique key identifying each row. each table/relation represents one
“entity type” and these entities are connected via constrained
relationships. This model is fully structured and mostly uses SQL
(Structured Query Language) to retrive and manuplate data.
Examples:
2.1.2 Graph / Semantic databases
Semantic web is an extension of the World Wide Web standards, which promote common data formats and exchange protocols on the Web, most fundamentally the Resource Description Framework (RDF) is used to store data. Most RDF fundamentally uses SPARQL (Simple Protocol and Rdf Query Language) to read stored data while relational databases uses SQL (Structured Query Language) to do so. In SQL relational database terms, RDF data can also be considered or viewed as a table with only three columns – the subject column, the predicate column, and the object column.
2.2 Concepts foundations (RDF and RDF triples)
The RDF is a conceptual data model, It is based on the idea of making statements about resources in expressions of the form (subject–predicate–object), known as triples.
The subject denotes the resource, and the predicate denotes traits or aspects of the resource, and expresses a relationship between the subject and the object, for example: John-is-a person, John-born in-1980, John-works as-Engineer
RDF data are stored on containers known as triplestores.
2.3 Underlying components
- Items
Items represents subjects such Douglas Adams and have identifiers that starts with letter “Q” like: Q42 for Douglas Adams.
Each item must have a name in one or more langauges, optionally have alternative names and descrition. - Properties
Properties represents attributes of the subject such occupation and have identifiers that starts with letter “P” like: P106 for Occupation. - Claims
Claims are the triples, which combine the formation of Item and Property and a value such: Douglas Adams (Q42) - occupation (P106) - comedian.
Note: value can be already stored in wikidata, therefore the bot assigns the Q number of the value instead. - Statement
A Claim is a part of a statement, a statement also includes: References, Ranks, and Qualifiers. - References
Used to store the source of the claim, using properties, such stated in, qoute, and etc. - Ranks
A useful component to mark outdated claims. - Qualifiers
Qualifiers are besicly properties but on claims rather than items.
Is data stored in the RDF triple format part of your work as a librarian?
Take some time to think about if data stored in the RDF triple format is part of your work as a librarian. Can you give an example in the format of an RDF triplet?
TO DO: PLEASE ADD A REAL LIFE EXAMPLE
Point out one RDF triple on the Wikidata item page of former astronaut Mae Jemison.
Got to the Wikidata page of Mae Jemison and point out one RDF triple. An RDF triplet consists of a subject, a predicate and an object. Can you assign the three corresponding Wikidata terms?
Got to Wikidata and either search for “Mae Jemison” or enter the ID
Q34091. In the picture below the statement “Mae C. Jemison -
part of - NASA Astronaut Group 12” is an RDF triple.
Screenshot of Wikidata Main
Page
2.4 Scholia - a webserive with Wikidata as underlying database
- Introduction with The Linked Open Data Cloud
- the structure enables queries
- reference to DBPedia
- you can build your own web services with Wikidata as database >
Scholia
- e.g. search for Alex Bateman
2.5 Wikidata one pager
2.6 How Wikidata compares with other data sets
FIXME
Key Points
- First key point. (FIXME)
Content from Introduction to editing
Last updated on 2023-04-24 | Edit this page
Overview
Questions
- How to create and edit a Wikidata item?
Objectives
- Be able to create and edit a Wikidata entry.
- Know properties and relations, and where to find lists of approved properties and relations.
- Be able to add new statements that link to other items.
- Be aware of property constraints.
- Know community norms around Wikidata and why they are important.
- Be able to add references appropriately.
- Know what identifiers are and how to add them to a statement.
- Know different stable identifiers (e.g. ORCID for authors, DOI for works) and why makes sense to use them as properties.
- Know the correct use of properties.
3.1 Introduction
Here we will work in the test instance of Wikidata so you will not break anything. Also keep in mind that the editing history is kept in Wikidata so error can also be easily fixed there. The test instance is cleaned regularly. You can quickly figure out if you are on the Wikidata instance (colored logo) or the test version (black-and-white only).
3.2 Create new items
In the following we will create new items. In order to avoid to fill Wikidata with test entries, we will use the test instance (https://test.wikidata.org/) and not the official, production version (https://wikidata.org/).
Go to the test instance at https://test.wikidata.org/
Click “Create a new Item” link on the left site. You will see a form that looks like this:
- Please fill the form. You can now add an entry about anything you want like a book, a research article or and author. We will generate an entry of Mae Jemison an American engineer, physician and NASA astronaut. You can also add yourself (if you feel famous enough). We choose “en” int the Language drop-down menue, write “Mae Jemison” in the Label field, “an American engineer, physician and NASA astronaut.” in the Description field and “Mae Carol Jemison” in the Aliases field.
- Once we are done we click click “Create”. You should see you newly created article. The URL, the adress shown in your web browser, should contain “Q” and some number that is unique for this entry at the end.
You can compare the entry that you have generated on the test instance with the current version of the item in Wikidata (Q34091).
3.3 Add Statements - birth reference
Why Wikidata uses references: Like in Wikipedia it is important that content can be verified by others to make sure it is correct and comes from a reliable source of information, such as a book, scientific publication, or newspaper article. A Reference (or source) is used to point to specific sources that back up a claim in Wikidata. A reference can be a link to a URL or an item; for example, a book. Wikidata does not aim to answer the question of whether a statement is correct, but only whether the statement appears in a reference.
Task:
-
Support a statement by reference
- Add the birth date (October 17, 1965) of Mae Jemison as a statement using property P569 “date of birth” to the “Mae X Jemison” item you created above.
- Afterwards add a reference to the satement with the following url as the source: https://www.biography.com/astronaut/mae-c-jemison
3.4 Add Statements - Add ID to Mae Jemison
Task: Support a person by it’s IDs. Give the participtants the identifiers and source page for an ID and let them add it on the Mae Jemison item on the test instance of Wikidata:
- VIAF ID
- identifier: 33699121
- source page: https://viaf.org/viaf/33699121/
- Library of Congress authority ID
- identfier: n95004729
- source page: http://id.loc.gov/authorities/names/n95004729.html
- IMDb ID
- identifier: nm0420648
- source page: https://www.imdb.com/name/nm0420648/ Site note:
- ORCID is an often used ID, in this case Mae Jemison doesn’t have one, but it’s good to mention ORCID anyway.
3.5 Norms and good practices
- Customization of languages for user interface
- Wikidata “item” vs. “article” vs. “entry”
- Policies for
- Books
Key Points
- create new items
- add new statements
Content from Advanced editing
Last updated on 2023-04-24 | Edit this page
Overview
Questions
- How to automatically add statements with sourcemd and quickstatements?
Objectives
- Be familiar with some tools for editing, e.g. TABernacle, Wikidata Games, QuickStatements, Source MetaData or Author Disambiguator/Author resolver.
4.1 Disclaimer
The tools are under heavy development and due to that they might change or don’t work as expected. If that happens just move on to the next episode.
4.2 Introduction
So now we will work in the productive version. We will use DOI to automatically put an article into Wikidata via sourcemd. If you are familiar in Life Science you can use our example with PubMed for finding DOIs of new article, optional you can choose an journal related to your scientific field. Sourcemd gets it metadata from Crossref, also look to sourcemd:instructions
Potential open access journal:
4.3 Adding statements via sourcemd and quickstatements
Go to pubmed, scroll down to “latest literature” and select an article:
Save the DOI, PMID or PMCID of the article:
Go to sourcemd and paste the DOI or PMID into the search field:
Click on “check source”. Now you can see automatically generated statements including meta data of the article like author names or date of publication. Click on “Open in QuickStatements”.
A new window with QuickStatements will pop up. Now you’ll get an overview of the new item and its statements. Confirm the changes by hitting the the “run” button:
4.4 (OPTIONAL) Converting “author strings” to “author”
Find Author Strings Author Disambiguator
Key Points
- First key point. (FIXME)
Content from Introduction to querying
Last updated on 2023-04-24 | Edit this page
Overview
Questions
- What is SPARQL?
- How to use SPARQL to query Wikidata?
- How to use Wikidata querying tools?
Objectives
- Know what a query language is, and how SPARQL differs SQL.
- Be able to use SPARQL to query Wikidata.
- Potentially be able to use a tool like TABernacle to edit based on a query.
- Have a cursory knowledge of the plethora of Wikidata querying tools and how they can be used by librarians.
- Know the purpose and usefulness of maintenance queries for identifying missing information.
- Be able to create maintenance queries.
FIXME
There are different ways to query information in Wikidata. The simplest way is to search for an entry in Wikidata and looking up all information for that entry, e.g. search for Richard Feynmann. This search looks by default in the Q-pages as well as the P-pages. However, we can restrict a search for a property by only looking in the P-pages, e.g. if we want to look whether there is property for the ISBN we can restrict that search to properties only. Moreover, for a given entry there is always the possibility to see other pages which links to that (e.g. using it as an object), e.g. all pages linking to Richard Feynman: https://www.wikidata.org/wiki/Special:WhatLinksHere/Q39246
That is not much different from other searches you may be familiar with. However, the real potential of Wikidata as a huge knowledge graph, can be experienced through more advanced querying with the Wikidata query service where the queries have to entered in SPARQL.
% To discover Wikidata objects nearby there is the nearby search: % https://www.wikidata.org/wiki/Special:Nearby
5.1 What is SPARQL?
SPARQL is a query language for RDF data and is a W3C recommendations since 2008. The data has to be stored as triples where the object of one triple can be the subject of another triple. Thus, one can think about a huge knowledge graph, where the nodes are connected by the predicates with other nodes. For example here we see all the information about the book “The Meaning of It All” from Wikidata as a graph:
% source: http://tinyurl.com/y267yz5q
However, this is only the graph spanned by one item and its connected entries, which then itself also have more connections, e.g. we can open some links from the author Richard Feynman:
% click on that node in the above query
For querying data now in this knowledge graph with SPARQL we define some graph patterns which we want to search. The simplest form is a triple where we replace one of the components with a variable, which is indicated by a string starting with a question mark:
- Query for the publisher:
{ wd:Q7750812 wdt:P123 ?publisher . }
- Query for the connection:
{ wd:Q7750812 ?property wd:Q353060 . }
- Query for the publications from Addison-Wesley:
{ ?book wdt:P123 wd:Q353060 . }
5.2 Wikidata Query Service
The Wikidata query service can be found at https://query.wikidata.org/. There is the main window on the right to formulate your query in SPARQL. On the left there is the query helper and at the bottom the result will show up.
We will only cover here SELECT
-statements and start by
typing
SELECT * WHERE {
}
Hint It is enough to start typing “SELECT” and then use the auto-completion with Ctrl+Space. % TODO what is this for on a Mac?
Inside the parenthesis you can then place the statements describing the graph pattern you are looking for.
Exercise: Your first SPARQL query
Write your first SPARQL query for the publisher of the above mentioned book by copying the part from above point inside a SELECT-statement.
SELECT * WHERE {
wd:Q7750812 wdt:P123 ?publisher .
}
Namespaces and Prefixes
Prefixes are short abbrevations in the Wikidata Query Service. Some prefixes in Wikidata are: wd, wdt, p, ps, bd, etc.
Example:
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P50 wd:Q23434.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Items should be prefixed with wd: and properties with wdt: .
Namespaces in Wikidata are:
- Main namespace
- Property
- Wikidata: it is for information and discussions about Wikidata itself. etc.
5.3 Try examples
Cats example
SELECT ?item ?itemLabel
WHERE
{
?item wdt:P31 wd:Q146. # Must be of a cat
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". } # Helps get the label in your language, if not, then en language
}
Map of libraries
SELECT distinct * WHERE {
?item wdt:P31/wdt:P279* wd:Q7075;
wdt:P625 ?geo .
}
scholarly articles by Alex Bateman
SELECT ?item ?itemLabel ?journalLabel
WHERE
{
?item wdt:P31 wd:Q13442814.
?item wdt:P50 wd:Q18921408.
?item wdt:P1433 ?journal.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}
Russian poets
SELECT ?item ?itemLabel ?place ?placeLabel ?coord
WHERE
{
?item wdt:P31 wd:Q5.
?item wdt:P106 wd:Q49757.
?item wdt:P19 ?place.
?place wdt:P17 wd:Q159.
?place wdt:P625 ?coord
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
chemicals example
SELECT ?item ?itemLabel WHERE {
?item wdt:P31 wd:Q11173, wd:Q12140.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}
SELECT ?item ?itemLabel ?struc ?formula
WHERE {
?item wdt:P31 wd:Q11173, wd:Q12140.
?item wdt:P117 ?struc.
?item wdt:P274 ?formula
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}
SELECT ?item ?itemLabel ?formula ?mass ?struc
WHERE {
?item wdt:P31 wd:Q11173, wd:Q12140.
?item wdt:P117 ?struc.
?item wdt:P274 ?formula.
?item wdt:P2067 ?mass.
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE], en". }
}
ORDER BY DESC(?mass)
LIMIT 10
People born in Berlin filtered by year 1970
SELECT ?item ?itemLabel ?dob
WHERE
{
?item wdt:P31 wd:Q5.
?item wdt:P19 wd:Q64.
?item wdt:P569 ?dob.
FILTER(YEAR(?dob) = 1970)
SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
5.4 More Advanced queries
further links
https://commons.wikimedia.org/wiki/File:Wikidata_Query_Service_in_Brief.pdf
https://www.uni-mannheim.de/media/Einrichtungen/dws/Files_Teaching/Semantic_Web_Technologies/SWT05-SPARQL-v1.pdf
https://www.wikidata.org/wiki/Wikidata:SPARQL_tutorial
Key Points
- First key point. (FIXME)
Content from Advanced bulk updating, bots
Last updated on 2023-04-24 | Edit this page
Overview
Questions
- How do you bulk upload to Wikidata?
Objectives
- Know how to run a bulk import into Wikidata.
- Be able to create items and/or claims using quickstatements.
- Be familiar with the tools used for bulk edits and imports.
- Be able to articulate how bulk import tools can apply to cataloging and digital science/archive projects.
- Know how to write effective queries in terms of performance. (TODO: may refine or improve or delete)
FIXME
Learning outcomes
Understand how to run a bulk import into Wikidata
practice using quickstatements? (module 3 already includes QuickStatements)
Be familiar with the tools used for bulk edits and imports
Tools for bulk upload: - Quick statements (https://www.wikidata.org/wiki/Help:QuickStatements)
Connect bulk import possibilities to cataloging and digital science/archive projects?
Understand how to write a good queries in terms of performance
6.1 Bulk uploads/harvests (lead to OpenRefine modules)
Test test test
6.2 Bulk edits
6.3 Bulk creation/harvesting
6.4 Performance
Key Points
- First key point. (FIXME)