Code Handout - Introduction to R
Last updated on 2023-01-05 | Edit this page
This document contains all of the functions that were covered in the Introduction to R workshop. Each function is presented alongside an example of how it can be used.
Creating Objects
-
<-– “assignment arrow”, assigns a value (vector, dataframe, single value) to the name of a variable
R
x <- 3
y <- c(1, 2, 3)
z <- x + y
-
c()– the “concatenate” function combines inputs to form a vector, the values have to be the same data type.
R
animals <- c("bird", "cat", "dog")
numbers <- c(1, 14, 57, 89)
logicals <- c(TRUE, FALSE, TRUE, TRUE)
Inspecting Objects
-
str()– compact display of the structure of an R object
R
str(animals)
-
class()– returns the type of element of any R object
R
class(logicals)
-
typeof()– returns the data type or storage mode of any R object
R
typeof(numbers)
Functions in R
-
args()– returns the arguments of a function
R
args(round)
- named arguments – the name of the argument the function expects
- You can choose to not name your arguments, if you know the exact order they should be in!
- However, we generally discourage this.
R
## Either of these work, since the digits argument is named explicitly.
round(3.14159, digits = 2)
round(digits = 2, 3.14159)
## This does not work, since the arguments are not named and in the incorrect order.
round(2, 3.14159)
Functions to Summarize Data
-
sqrt()– returns the square root of a numeric variable
R
sqrt(numbers)
-
mean()– returns the mean of a numeric variable- You can add the
na.rmargument, to removeNAvalues before calculating the mean.
- You can add the
R
sqrt(numbers)
-
max()– returns the maximum of a numeric variable- You can add the
na.rmargument, to removeNAvalues before calculating the max.
- You can add the
R
sqrt(numbers)
-
sum()– returns the sum of a numeric variable- You can add the
na.rmargument, to removeNAvalues before calculating the sum.
- You can add the
R
sqrt(numbers)
-
length()– returns the length of a vector (of any datatype)
R
length(animals)
Subsetting Data
-
[]– used to subset elements from a vector
R
animals[3]
## selects the third element
animals[2:3]
## selects the second and third element
animals[c(1, 3)]
## selects the first and third element
- relational operators – return logical values indicating where a
relation is satisfied. The most commonly used logical operators for data
analysis are as follows:
-
==means “equal to” -
!=means “not equal to” -
>or<means “greater than” or “less than” -
>=or<=means “greater than or equal to” or “less than or equal to”
-
R
animals == "dog"
animals != "cat"
numbers > 4
numbers <= 12
- logical operators – join subset criteria together
-
&means “and” – where two criteria must both be satisfied -
|means “or” – where at least one criteria must be satisfied
-
R
numbers > 4 & numbers < 20
animals == "dog" | animals == "cat"
-
%in%– the “inclusion operator”, allows you to test if any of the elements of a search vector (on the left hand side) are found in the target vector (on the right hand side).- The levels of the target vector must be included in a vector
(
c()).
- The levels of the target vector must be included in a vector
(
R
possessions <- c("car", "bicycle", "radio", "television", "mobile_phone")
possessions %in% c("car", "bicycle", "motorcycle")
Missing Data
-
is.na()– returns a vector of logical values indicating which elements of a vector haveNAvalues- Often combined with
!, where the!negates the previous statement (e.g.!TRUEis equal toFALSE).
- Often combined with
R
missing <- c(1, 3, NA, 7, 12, NA)
is.na(missing)
!is.na(missing)
-
na.omit()– removes the observations withNAvalues
R
na.omit(missing)
-
complete.cases()– returns a vector of logical values indicating which elements of a vector are not missing (NA) values
R
complete.cases(missing)