Code Handout - Introduction to R
Last updated on 2023-01-05 | Edit this page
This document contains all of the functions that were covered in the Introduction to R workshop. Each function is presented alongside an example of how it can be used.
Creating Objects
-
<-
– “assignment arrow”, assigns a value (vector, dataframe, single value) to the name of a variable
R
x <- 3
y <- c(1, 2, 3)
z <- x + y
-
c()
– the “concatenate” function combines inputs to form a vector, the values have to be the same data type.
R
animals <- c("bird", "cat", "dog")
numbers <- c(1, 14, 57, 89)
logicals <- c(TRUE, FALSE, TRUE, TRUE)
Inspecting Objects
-
str()
– compact display of the structure of an R object
R
str(animals)
-
class()
– returns the type of element of any R object
R
class(logicals)
-
typeof()
– returns the data type or storage mode of any R object
R
typeof(numbers)
Functions in R
-
args()
– returns the arguments of a function
R
args(round)
- named arguments – the name of the argument the function expects
- You can choose to not name your arguments, if you know the exact order they should be in!
- However, we generally discourage this.
R
## Either of these work, since the digits argument is named explicitly.
round(3.14159, digits = 2)
round(digits = 2, 3.14159)
## This does not work, since the arguments are not named and in the incorrect order.
round(2, 3.14159)
Functions to Summarize Data
-
sqrt()
– returns the square root of a numeric variable
R
sqrt(numbers)
-
mean()
– returns the mean of a numeric variable- You can add the
na.rm
argument, to removeNA
values before calculating the mean.
- You can add the
R
sqrt(numbers)
-
max()
– returns the maximum of a numeric variable- You can add the
na.rm
argument, to removeNA
values before calculating the max.
- You can add the
R
sqrt(numbers)
-
sum()
– returns the sum of a numeric variable- You can add the
na.rm
argument, to removeNA
values before calculating the sum.
- You can add the
R
sqrt(numbers)
-
length()
– returns the length of a vector (of any datatype)
R
length(animals)
Subsetting Data
-
[]
– used to subset elements from a vector
R
animals[3]
## selects the third element
animals[2:3]
## selects the second and third element
animals[c(1, 3)]
## selects the first and third element
- relational operators – return logical values indicating where a
relation is satisfied. The most commonly used logical operators for data
analysis are as follows:
-
==
means “equal to” -
!=
means “not equal to” -
>
or<
means “greater than” or “less than” -
>=
or<=
means “greater than or equal to” or “less than or equal to”
-
R
animals == "dog"
animals != "cat"
numbers > 4
numbers <= 12
- logical operators – join subset criteria together
-
&
means “and” – where two criteria must both be satisfied -
|
means “or” – where at least one criteria must be satisfied
-
R
numbers > 4 & numbers < 20
animals == "dog" | animals == "cat"
-
%in%
– the “inclusion operator”, allows you to test if any of the elements of a search vector (on the left hand side) are found in the target vector (on the right hand side).- The levels of the target vector must be included in a vector
(
c()
).
- The levels of the target vector must be included in a vector
(
R
possessions <- c("car", "bicycle", "radio", "television", "mobile_phone")
possessions %in% c("car", "bicycle", "motorcycle")
Missing Data
-
is.na()
– returns a vector of logical values indicating which elements of a vector haveNA
values- Often combined with
!
, where the!
negates the previous statement (e.g.!TRUE
is equal toFALSE
).
- Often combined with
R
missing <- c(1, 3, NA, 7, 12, NA)
is.na(missing)
!is.na(missing)
-
na.omit()
– removes the observations withNA
values
R
na.omit(missing)
-
complete.cases()
– returns a vector of logical values indicating which elements of a vector are not missing (NA
) values
R
complete.cases(missing)