3 Intermediate R

https://learn.datacamp.com/courses/intermediate-r

(Note: If you do Intermediate R for Finance instead, change the title of this bookdown chapter above and change the DataCamp chapter titles below. Whichever version of Intermediate R you do, delete this note)

3.1 Conditionals and Control Flow

Relational Operators:

#Equality
TRUE == TRUE

## [1] TRUE

#Inequality
TRUE!=FALSE

## [1] TRUE

#Less than or equal to/Greater than or equal to
3<=5

## [1] TRUE

6>7

## [1] FALSE

"Hello">"Goodbye"

## [1] TRUE

We can directly use these operators with vectors to check if each element of the vector satisfies the operator

digits <- c(5,6,7,8,9)
big_digits <- c(55,66,77,88,89)
digits>7

## [1] FALSE FALSE FALSE  TRUE  TRUE

digits>big_digits

## [1] FALSE FALSE FALSE FALSE FALSE

Logical Operators

big_digits >70 & big_digits < 100

## [1] FALSE FALSE  TRUE  TRUE  TRUE

extreme <- digits < 6 | digits > 8

sum(extreme)

## [1] 2

Conditional Statements

We can use the if/ if else / else if statements to create conditional statements.

Example :

appleton_population <- 450988

if (appleton_population < 250000){
  print("Small City")
} else if (appleton_population < 400000){
  print("Big City")
} else {
  print("Huge City")
}

## [1] "Huge City"

3.2 Loops

While Loop :

We can create a loop by using the while loop along with specific conditions.

Example :

num_students <- 0
 while(num_students <= 10) {
   print(paste("Occupancy =", num_students))
   num_students <- num_students + 1
 }

## [1] "Occupancy = 0"
## [1] "Occupancy = 1"
## [1] "Occupancy = 2"
## [1] "Occupancy = 3"
## [1] "Occupancy = 4"
## [1] "Occupancy = 5"
## [1] "Occupancy = 6"
## [1] "Occupancy = 7"
## [1] "Occupancy = 8"
## [1] "Occupancy = 9"
## [1] "Occupancy = 10"

The line of code to increment num_students is very crucial. Without that, R would create an infiinite loop.

The Break statement:

The break statement breaks out of the while loop whenever a certain condition is made.

Example:

num_students <- 0
while(num_students <= 10){
  if (num_students == 5){
    print("Max Occupancy Reached")
    break
  }
  print(paste("Occupancy = ", num_students))
  num_students <- num_students + 1
}

## [1] "Occupancy =  0"
## [1] "Occupancy =  1"
## [1] "Occupancy =  2"
## [1] "Occupancy =  3"
## [1] "Occupancy =  4"
## [1] "Max Occupancy Reached"

For Loop:

We can use a for loop to print all the elements in a vectors, matrices, lists, data frames.

Example :

students <- list("Jack", "Max", "Sam", " Jill", "John")

for(student in students) {
  print(student)
}

## [1] "Jack"
## [1] "Max"
## [1] "Sam"
## [1] " Jill"
## [1] "John"

We can use the break statement just like we did in case of while loops to break a for loop.

Example:

students <- list("Jak", "Max", "Sam", " Jill", "John")

for(student in students) {
  if(nchar(student) > 3){
    break
    }
  print(student)
}

## [1] "Jak"
## [1] "Max"
## [1] "Sam"

Next Statement:

The next statement skips the remainder of the code inside a for loop and proceeds to the next iteration.

Example :

students <- list("Jak", "Max", "Sam", " Jill", "Jon")

for(student in students) {
  if(nchar(student) > 3){
    next
    }
  print(student)
}

## [1] "Jak"
## [1] "Max"
## [1] "Sam"
## [1] "Jon"

We can manually control which element to select in a fort loop by creating a looping index.

Example:

students <- list("Jak", "Max", "Sam", " Jill", "Jon")

for(i in 1:length(students)) {
  print(paste(students[i], "is on position", i, "in the students vector"))
}

## [1] "Jak is on position 1 in the students vector"
## [1] "Max is on position 2 in the students vector"
## [1] "Sam is on position 3 in the students vector"
## [1] " Jill is on position 4 in the students vector"
## [1] "Jon is on position 5 in the students vector"

Nested for loop:

jan_exp <- c(700, 400)
feb_exp <- c(800, 500)
mar_exp <- c(400, 600)

monthly_exp_matrix <- matrix(c(jan_exp, feb_exp, mar_exp), nrow = 3, byrow = TRUE)

for (i in 1:nrow(monthly_exp_matrix)) {
  for (j in 1:ncol(monthly_exp_matrix)) {
    print(paste("On row", i, "and column", j, "the matrix contains", monthly_exp_matrix[i,j]))
  }
}

## [1] "On row 1 and column 1 the matrix contains 700"
## [1] "On row 1 and column 2 the matrix contains 400"
## [1] "On row 2 and column 1 the matrix contains 800"
## [1] "On row 2 and column 2 the matrix contains 500"
## [1] "On row 3 and column 1 the matrix contains 400"
## [1] "On row 3 and column 2 the matrix contains 600"

As with the while loop, we can use the if and else statements inside the for loop.

number <- c(16, 9, 13, 5, 2, 17, 14)
for (n in number) {
  if (n > 10 ) {
    print(paste(n, "is greater than 10"))
  } else {
    print(paste(n, "is less than 10"))
  }
  print(n)
}

## [1] "16 is greater than 10"
## [1] 16
## [1] "9 is less than 10"
## [1] 9
## [1] "13 is greater than 10"
## [1] 13
## [1] "5 is less than 10"
## [1] 5
## [1] "2 is less than 10"
## [1] 2
## [1] "17 is greater than 10"
## [1] 17
## [1] "14 is greater than 10"
## [1] 14

3.3 Functions

To consult the documentation on any function, we can use one of following R commands:

help(sd)
?sd

A quick hack to see the arguments of a function is the args() function.

args(sd)

## function (x, na.rm = FALSE) 
## NULL

We can use different functions and use all it’s arguments to make our graphs provide better information.

linkedin <- c(16, 9, 13, 5, NA, 17, 14)
facebook <- c(17, NA, 5, 16, 8, 13, 14)
avg_sum <- mean((facebook + linkedin), na.rm = TRUE)
avg_sum_trimmed <- mean((linkedin + facebook), trim = 0.2, na.rm = TRUE)
avg_sum

## [1] 26

avg_sum_trimmed

## [1] 26.33333

We can create our function using the format :

my_func <- function(arg1, arg2){ body }

five_times_squared <- function(x){
  y <- 5 * x^2
  return(y)
}
five_times_squared(3)

## [1] 45

We can make functions with no arguments.

scare_me <- function(){
    print("Boo!")
}

Function Scoping:

Function scoping implies that variables that are defined inside a function are not accessible outside that function.

It is possible to add control-flow constructs, loops and even other functions to your function body. We can call functions on each value of a vector using the []

linkedin <- c(16, 9, 13, 5, 55, 17, 14)
facebook <- c(17, 2, 5, 16, 8, 13, 14)

interpret <- function(num_views) {
  if (num_views > 15) {
    print("You're popular!")
    return(num_views)

  } else {
    print("Try to be more visible!")
    return(0)

  }
}

interpret(linkedin[1])

## [1] "You're popular!"

## [1] 16

interpret(facebook[2])

## [1] "Try to be more visible!"

## [1] 0

R Packages:

Install Packages: install.packages() Load Packages: library(), require() Load Package = attach package to search list Google for cool R packages!

3.4 The apply family

Lapply takes a vector or list X, and applies a function to each of its members.

Example:

linkedin <- c(16, 9, 13, 5, 55, 17, 14)
sqrt_math <- lapply(linkedin, sqrt)
sqrt_math

## [[1]]
## [1] 4
## 
## [[2]]
## [1] 3
## 
## [[3]]
## [1] 3.605551
## 
## [[4]]
## [1] 2.236068
## 
## [[5]]
## [1] 7.416198
## 
## [[6]]
## [1] 4.123106
## 
## [[7]]
## [1] 3.741657

We can also use our own function with lapply.

linkedin <- c(16, 9, 13, 5, 55, 17, 14)
linkedin_interpret <- lapply(linkedin, interpret)

## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "You're popular!"
## [1] "You're popular!"
## [1] "Try to be more visible!"

linkedin_interpret

## [[1]]
## [1] 16
## 
## [[2]]
## [1] 0
## 
## [[3]]
## [1] 0
## 
## [[4]]
## [1] 0
## 
## [[5]]
## [1] 55
## 
## [[6]]
## [1] 17
## 
## [[7]]
## [1] 0

Lapply:

lapply() provides a way to handle functions that require more than one argument

pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
char_list <- lapply(pioneers, nchar)
sqrt_char <- lapply(char_list, sqrt)

select_el <- function(x,y) {
  x^y
}

magic <- lapply (char_list, select_el, y = 2)
magic

## [[1]]
## [1] 100
## 
## [[2]]
## [1] 100
## 
## [[3]]
## [1] 121
## 
## [[4]]
## [1] 144

Sapply :

sapply() function does the same job as lapply() function but returns a vector. Like lapply, we can use our own functions for sapply.

avg_views <- function(x){
  mean(linkedin[1:7])
}
sapply(linkedin, avg_views )

## [1] 18.42857 18.42857 18.42857 18.42857 18.42857 18.42857 18.42857

Vapply:

The syntax for Vapply is :

vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE)

Over the elements inside X, the function FUN is applied. The FUN.VALUE argument expects a template for the return argument of this function FUN. USE.NAMES is TRUE by default; in this case vapply() tries to generate a named array, if possible.

basics <- function(x) {
  c(min = min(x), mean = mean(x), median = median(x), max = max(x))
}
vapply(linkedin, basics, numeric(4))

##        [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## min      16    9   13    5   55   17   14
## mean     16    9   13    5   55   17   14
## median   16    9   13    5   55   17   14
## max      16    9   13    5   55   17   14

Summary:

lapply():

apply function over list or vector
output = list

sapply():

apply function over list or vector
try to simplify list to array

vapply(): i) apply function over list or vector ii) explicitly specify output format

3.5 Utilities

Mathematical Utilities:

abs(): Calculate the absolute value. sum(): Calculate the sum of all the values in a data structure. mean(): Calculate the arithmetic mean. round(): Round the values to 0 decimal places by default. Try out ?round in the console for variations of round() and ways to change the number of digits to round to.

sum_round_abs <- sum(round(abs(linkedin)))

vec1 <- c(1.5, 2.5, 8.4, 3.7, 6.3)
vec2 <- rev(vec1)

mean_vec_abs <- mean(c(abs(vec1), abs(vec2)))

seq(): Generate sequences, by specifying the from, to, and by arguments. rep(): Replicate elements of vectors and lists. sort(): Sort a vector in ascending order. Works on numerics, but also on character strings and logicals. rev(): Reverse the elements in a data structures for which reversal is defined. str(): Display the structure of any R object. append(): Merge vectors or lists. is.(): Check for the class of an R object. as.(): Convert an R object from one class to another. unlist(): Flatten (possibly embedded) lists to produce a vector.

linkedin <- list(16, 9, 13, 5, 2, 17, 14)
facebook <- list(17, 7, 5, 16, 8, 13, 14)

# Convert linkedin and facebook to a vector: li_vec and fb_vec
li_vec <- unlist(linkedin)
fb_vec <- unlist(facebook)

# Append fb_vec to li_vec: social_vec
social_vec <- append(li_vec, fb_vec)

linkedin <- list(16, 9, 13, 5, 2, 17, 14)
facebook <- list(17, 7, 5, 16, 8, 13, 14)

# Convert linkedin and facebook to a vector: li_vec and fb_vec
li_vec <- unlist(linkedin)
fb_vec <- unlist(facebook)

# Append fb_vec to li_vec: social_vec
social_vec <- append(li_vec, fb_vec)

# Sort social_vec
sort(social_vec, decreasing = TRUE)

##  [1] 17 17 16 16 14 14 13 13  9  8  7  5  5  2

rep_seq <- rep(seq(1, 5, by = 2), times = 2)
rep_seq

## [1] 1 3 5 1 3 5

Regular Expressions:

Regular expressions are used to clean data to get them ready to work with after we get them. They can be used to see whether a pattern exists inside a character string or a vector of character strings.

[GREP,GREPL]

grepl(), which returns TRUE when a pattern is found in the corresponding character string. grep(), which returns a vector of indices of the character strings that contains the pattern.

emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
            "invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")
edu_has <- c(grepl("edu", x = emails))
edu_emails <- emails[edu_has]
edu_emails

## [1] "john.doe@ivyleague.edu"   "education@world.gov"     
## [3] "invalid.edu"              "quant@bigdatacollege.edu"

We can use the following to match contents in a string :

i)we cam ise the caret, ^, and the dollar sign, $ to match the content located in the start and end of a string, respectively.

@, because a valid email must contain an at-sign.

iii)(.*), which matches any character (.) zero or more times. Both the dot and the asterisk are metacharacters. We can use them to match any character between the at-sign and the “.edu” portion of an email address.

iv)\.edu$, to match the “.edu” part of the email at the end of the string. The \ part escapes the dot: it tells R that you want to use the . as an actual character.

# Use grepl() to match for .edu addresses more robustly
grepl("@.*\\.edu$", x = emails)

## [1]  TRUE FALSE FALSE FALSE  TRUE FALSE

# Use grep() to match for .edu addresses more robustly, save result to hits
hits <- grep("@.*\\.edu$", x = emails)

# Subset emails using hits
emails[hits]

## [1] "john.doe@ivyleague.edu"   "quant@bigdatacollege.edu"

[SUB,GSUB]

sub() and gsub() can be used to replace a character within a element. sub() only replaces the first match, whereas gsub() replaces all matches.

sub("@.*\\.edu$", "@datacamp.edu", emails)

## [1] "john.doe@datacamp.edu"    "education@world.gov"     
## [3] "dalai.lama@peace.org"     "invalid.edu"             
## [5] "quant@datacamp.edu"       "cookie.monster@sesame.tv"

Times & Dates:

today_date <- Sys.Date()
now <-  Sys.time()
today_date

## [1] "2021-05-29"

now

## [1] "2021-05-29 23:42:29 CDT"

If we want to assign a variable with specific date, we have to follow a certain format:

my_birthday <- as.Date(“yyyy-mm-dd”)

my_birthday <- as.Date("1999-06-07")
my_birthday

## [1] "1999-06-07"

Date Arithmetic:

Difference between today_date and my birthday.

dif <- today_date - my_birthday
dif

## Time difference of 8027 days

We cam use the following formats to change, create dates using as.Date(). Date increments in Days.

%Y: 4-digit year (1982) %y: 2-digit year (82) %m: 2-digit month (01) %d: 2-digit day of the month (13) %A: weekday (Wednesday) %a: abbreviated weekday (Wed) %B: month (January) %b: abbreviated month (Jan)

Example:

# Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"

# Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, "%b %d, '%y")
date2 <- as.Date(str2, "%Y-%m-%d")
date3 <- as.Date(str3, "%d/%B/%Y")


# Convert dates to formatted strings
format(date1, "%A")

## [1] "Thursday"

format(date2, "%d")

## [1] "15"

format(date3, "%b %Y")

## [1] "Jan 2006"

We cam use the following formats to change, create time using as.POSIXct(). Time increments in Seconds.

%H: hours as a decimal number (00-23) %I: hours as a decimal number (01-12) %M: minutes as a decimal number %S: seconds as a decimal number %T: shorthand notation for the typical format %H:%M:%S %p: AM/PM indicator

For a full list of conversion symbols, consult the strptime documentation in the console.

# Definition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"

# Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
time2 <- as.POSIXct(str2)

# Convert times to formatted strings
format(time1, "%M")

## [1] "01"

format(time2, "%I:%M %p")

## [1] "02:23 PM"