Preparation: First steps with R

For Session 3

Authors

Kim Antunez, François Briatte

Go through the code and exercises below.

Feel free to skip those exercises marked as harder.

Packages

Install a package (required only once). Comment the line once it is done.

install.packages("remotes")

Once a package is installed, load it (required once per work session).

library(remotes)

Warnings and errors

Codes will sometimes produce some messages – results printed in colour to let you know what happened (or what did not happen) as a result of you running the code.

Some code will generate warnings:

remotes::install_cran("imaginary_package")
Installing 1 packages: imaginary_package
Installing package into '/Users/runner/work/_temp/Library'
(as 'lib' is unspecified)
Warning: package 'imaginary_package' is not available for this version of R

A version of this package for your version of R might be available elsewhere,
see the ideas at
https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

Some code will generate errors:

remotes::imaginary_function("whatever")

All messages are printed in the same colour, but only errors will stop your code from running properly

One of the most common errors is forgetting to set the working directory to the folder that actually contains the files that you are trying to use

Functions, arguments

When you executed the install_cran function above which comes from the {remotes} package, installed the {tidyverse} package; that is because you passed the “tidyverse” argument to it

The force = FALSE argument is an optional argument that allows you to force the installation of the package even if it already exists in your “library” (folder) of packages; set the argument to TRUE to do that

Example of the package::function(argument) syntax:

dplyr::nth(LETTERS, 5)
[1] "E"

Exercise 1

Note

What do you need to do in order for the following line of code to work, and what does that line of code actually do?

nth(LETTERS, 6)

You need to load the package first.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
nth(LETTERS, 6)
[1] "F"

Math and Logic

Try these lines of code on your computer!

TRUE == 1
[1] TRUE
FALSE == 0
[1] TRUE
2^3 >= 2 * 2 * 2
[1] TRUE
2^3 != 8 # note: `!` means `not` in computer languages
[1] FALSE

Exercise 2

Note

Compute the following Body Mass Index :

  • height 1.76
  • weight 67.5

Is this person clinically overweight (BMI >= 25) ?

The BMI is the following:

BMI <- 67.5 / (1.76^2)
BMI 
[1] 21.79106

This person is not overweight:

BMI <- 67.5 / (1.76^2)
BMI >= 25
[1] FALSE

Other special values

Try these lines of code on your computer!

# two functions, one nested in the other
log(exp(1))
[1] 1
# special value: (negative) Infinity
log(0)
[1] -Inf
# special value: Not a Number
log(-1)
Warning in log(-1): NaNs produced
[1] NaN
# special value: Missing (Not Available)
log(0) > log(-1)
Warning in log(-1): NaNs produced
[1] NA

In practice, you will encounter a lot of NA’s, some TRUE and FALSE values, and hopefully not so much of the rest

One last special value, rarely encountered, and which stands for absolute nothingness (not zero, not missing) : the NULL

Screen messages

Try these lines of code on your computer!

# innocuous
message("this is fine")
this is fine
# possibly problematic
sqrt(-1)
Warning in sqrt(-1): NaNs produced
[1] NaN
# definitely problematic
sqrt("Nicolas Cage")

Adjust your reaction accordingly: read messages, ask yourself if everything is fine when you see warnings, and stop when you see errors, which are the only type of message that will stop your code from executing

Objects

Try these lines of code on your computer!

# create a vector (sequence) of values
x <- 3:1
x
[1] 3 2 1
# the length of object `x` is its number of values
length(x)
[1] 3
# the object `x` holds numeric values (specifically, integer values)
class(x)
[1] "integer"
# text ('strings' of characters), recognizable by the "double quotes"
class("Nicolas Cage")
[1] "character"
# non-integer numbers (classes and types are not exactly the same thing)
typeof(9/2)
[1] "double"

Exercise 3 [HARD]

Note

Explain the results of the following lines (harder exercise)

#1.
as.character(x)
[1] "3" "2" "1"
#2.
as.integer(9/2)
[1] 4
#3.
as.numeric("Nicolas Cage")
Warning: NAs introduced by coercion
[1] NA
  1. To transform a number into character, it justs add the quotes around it.
  2. 9/2 = 4.5 but an integer does not have decimals, this is why it transforms it to 4 instead.
  3. It is not possible to transform this string into a numeric value.

Vector operations

Try these lines of code on your computer!

The only thing worth remembering here is the <- operator for assignment; the rest is just to show you some internals of the R programming language, which we will dive into only occasionally.

x_squared <- x^2
x_squared
[1] 9 4 1
# vectorized logical test
x_squared < 5
[1] FALSE  TRUE  TRUE
# vector element extraction
x_squared[ 1 ]
[1] 9
x_squared[ 2 ]
[1] 4
x_squared[ 3 ]
[1] 1
# [EXERCISE]: explain the results of the following lines
x_squared[ c(1, 3) ]
[1] 9 1
which(x_squared < 5)
[1] 2 3
# [EXERCISE]: explain the results of the following lines (harder exercise)
x_squared[ -1 ]
[1] 4 1
x_squared[ c(1, 5, NA) ]
[1]  9 NA NA

Last notes

Accept right now that the R internals exist, and that some of them will remain obscure as you learn the language.

Also accept that, as with any language like mathematics or English, there are multiple ways to express (to do) the same thing in R.

3^2 == 3 * 3
[1] TRUE
9^(1/2) == sqrt(9)
[1] TRUE

Exercise 4

Note

Explain the results of the following lines

#1.
logical_test <- c(1, 2, 3) == 3:1
#2.
all(logical_test)
[1] FALSE
#3.
any(logical_test)
[1] TRUE
  1. 1 is not equal to 3, 2 is equal to 2 and 3 is not equal to 1
  2. not all of the 3 equalities are TRUE
  3. at least one of the 3 equality is TRUE

Exercise 5 [HARD]

Note

Explain the results of the following lines (much harder exercise)

#1.
c(1, 2, 3) >= 1:4 # hint: vector recycling
Warning in c(1, 2, 3) >= 1:4: longer object length is not a multiple of shorter
object length
[1]  TRUE  TRUE  TRUE FALSE
#2.
c(1, 2, 3, "Nicolas Cage") * 2 # hint: type coercion
#3.
paste(month.abb[ 1:5 ], substr(lubridate::today(),1, 4))
[1] "Jan 2024" "Feb 2024" "Mar 2024" "Apr 2024" "May 2024"
  1. 1:4 == c(1,2,3,4) AND: 1>=1 TRUE ; 2>=2 TRUE ; 3>=3 TRUE ; 1>=4 FALSE.
  2. all elements of a vector must be of the same type. Here it is character because there is at least one string (Nicolas Cage), so it cannot be multiplied by 2.
  3. month.abb[ 1:5 ] performs the 5 first months of a year ; lubridate::today() the date (default format: yyyy-mm-dd) and substr allow us to take the characters 1 to 4 that is to say the year.

Conclusion

  • R is a language; treat it as such
  • learning it will involve a lot of trial and error
  • understanding the whole language is not a requirement to use it
  • do your readings and homework, and you will be fine

What’s deliberately missing from this script?

  • project management (e.g. setting the working directory)
  • data frames (covered at length in several other sessions)
  • piping with either |> or %>% (also part of other sessions)
  • more keyboard shortcuts (those will come with practice)