3 Intermediate R
https://learn.datacamp.com/courses/intermediate-r
(Note: If you do Intermediate R for Finance instead, change the title of this bookdown chapter above and change the DataCamp chapter titles below. Whichever version of Intermediate R you do, delete this note)
3.1 Conditionals and Control Flow
Relational Operators:
## [1] TRUE
## [1] TRUE
## [1] TRUE
## [1] FALSE
## [1] TRUE
We can directly use these operators with vectors to check if each element of the vector satisfies the operator
## [1] FALSE FALSE FALSE TRUE TRUE
## [1] FALSE FALSE FALSE FALSE FALSE
Logical Operators
## [1] FALSE FALSE TRUE TRUE TRUE
## [1] 2
Conditional Statements
We can use the if/ if else / else if statements to create conditional statements.
Example :
appleton_population <- 450988
if (appleton_population < 250000){
print("Small City")
} else if (appleton_population < 400000){
print("Big City")
} else {
print("Huge City")
}
## [1] "Huge City"
3.2 Loops
While Loop :
We can create a loop by using the while loop along with specific conditions.
Example :
num_students <- 0
while(num_students <= 10) {
print(paste("Occupancy =", num_students))
num_students <- num_students + 1
}
## [1] "Occupancy = 0"
## [1] "Occupancy = 1"
## [1] "Occupancy = 2"
## [1] "Occupancy = 3"
## [1] "Occupancy = 4"
## [1] "Occupancy = 5"
## [1] "Occupancy = 6"
## [1] "Occupancy = 7"
## [1] "Occupancy = 8"
## [1] "Occupancy = 9"
## [1] "Occupancy = 10"
The line of code to increment num_students is very crucial. Without that, R would create an infiinite loop.
The Break statement:
The break statement breaks out of the while loop whenever a certain condition is made.
Example:
num_students <- 0
while(num_students <= 10){
if (num_students == 5){
print("Max Occupancy Reached")
break
}
print(paste("Occupancy = ", num_students))
num_students <- num_students + 1
}
## [1] "Occupancy = 0"
## [1] "Occupancy = 1"
## [1] "Occupancy = 2"
## [1] "Occupancy = 3"
## [1] "Occupancy = 4"
## [1] "Max Occupancy Reached"
For Loop:
We can use a for loop to print all the elements in a vectors, matrices, lists, data frames.
Example :
## [1] "Jack"
## [1] "Max"
## [1] "Sam"
## [1] " Jill"
## [1] "John"
We can use the break statement just like we did in case of while loops to break a for loop.
Example:
students <- list("Jak", "Max", "Sam", " Jill", "John")
for(student in students) {
if(nchar(student) > 3){
break
}
print(student)
}
## [1] "Jak"
## [1] "Max"
## [1] "Sam"
Next Statement:
The next statement skips the remainder of the code inside a for loop and proceeds to the next iteration.
Example :
students <- list("Jak", "Max", "Sam", " Jill", "Jon")
for(student in students) {
if(nchar(student) > 3){
next
}
print(student)
}
## [1] "Jak"
## [1] "Max"
## [1] "Sam"
## [1] "Jon"
We can manually control which element to select in a fort loop by creating a looping index.
Example:
students <- list("Jak", "Max", "Sam", " Jill", "Jon")
for(i in 1:length(students)) {
print(paste(students[i], "is on position", i, "in the students vector"))
}
## [1] "Jak is on position 1 in the students vector"
## [1] "Max is on position 2 in the students vector"
## [1] "Sam is on position 3 in the students vector"
## [1] " Jill is on position 4 in the students vector"
## [1] "Jon is on position 5 in the students vector"
Nested for loop:
jan_exp <- c(700, 400)
feb_exp <- c(800, 500)
mar_exp <- c(400, 600)
monthly_exp_matrix <- matrix(c(jan_exp, feb_exp, mar_exp), nrow = 3, byrow = TRUE)
for (i in 1:nrow(monthly_exp_matrix)) {
for (j in 1:ncol(monthly_exp_matrix)) {
print(paste("On row", i, "and column", j, "the matrix contains", monthly_exp_matrix[i,j]))
}
}
## [1] "On row 1 and column 1 the matrix contains 700"
## [1] "On row 1 and column 2 the matrix contains 400"
## [1] "On row 2 and column 1 the matrix contains 800"
## [1] "On row 2 and column 2 the matrix contains 500"
## [1] "On row 3 and column 1 the matrix contains 400"
## [1] "On row 3 and column 2 the matrix contains 600"
As with the while loop, we can use the if and else statements inside the for loop.
number <- c(16, 9, 13, 5, 2, 17, 14)
for (n in number) {
if (n > 10 ) {
print(paste(n, "is greater than 10"))
} else {
print(paste(n, "is less than 10"))
}
print(n)
}
## [1] "16 is greater than 10"
## [1] 16
## [1] "9 is less than 10"
## [1] 9
## [1] "13 is greater than 10"
## [1] 13
## [1] "5 is less than 10"
## [1] 5
## [1] "2 is less than 10"
## [1] 2
## [1] "17 is greater than 10"
## [1] 17
## [1] "14 is greater than 10"
## [1] 14
3.3 Functions
To consult the documentation on any function, we can use one of following R commands:
A quick hack to see the arguments of a function is the args() function.
## function (x, na.rm = FALSE)
## NULL
We can use different functions and use all it’s arguments to make our graphs provide better information.
linkedin <- c(16, 9, 13, 5, NA, 17, 14)
facebook <- c(17, NA, 5, 16, 8, 13, 14)
avg_sum <- mean((facebook + linkedin), na.rm = TRUE)
avg_sum_trimmed <- mean((linkedin + facebook), trim = 0.2, na.rm = TRUE)
avg_sum
## [1] 26
## [1] 26.33333
We can create our function using the format :
my_func <- function(arg1, arg2){ body }
## [1] 45
We can make functions with no arguments.
Function Scoping:
Function scoping implies that variables that are defined inside a function are not accessible outside that function.
It is possible to add control-flow constructs, loops and even other functions to your function body. We can call functions on each value of a vector using the []
linkedin <- c(16, 9, 13, 5, 55, 17, 14)
facebook <- c(17, 2, 5, 16, 8, 13, 14)
interpret <- function(num_views) {
if (num_views > 15) {
print("You're popular!")
return(num_views)
} else {
print("Try to be more visible!")
return(0)
}
}
interpret(linkedin[1])
## [1] "You're popular!"
## [1] 16
## [1] "Try to be more visible!"
## [1] 0
R Packages:
Install Packages: install.packages() Load Packages: library(), require() Load Package = attach package to search list Google for cool R packages!
3.4 The apply family
Lapply takes a vector or list X, and applies a function to each of its members.
Example:
## [[1]]
## [1] 4
##
## [[2]]
## [1] 3
##
## [[3]]
## [1] 3.605551
##
## [[4]]
## [1] 2.236068
##
## [[5]]
## [1] 7.416198
##
## [[6]]
## [1] 4.123106
##
## [[7]]
## [1] 3.741657
We can also use our own function with lapply.
## [1] "You're popular!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "Try to be more visible!"
## [1] "You're popular!"
## [1] "You're popular!"
## [1] "Try to be more visible!"
## [[1]]
## [1] 16
##
## [[2]]
## [1] 0
##
## [[3]]
## [1] 0
##
## [[4]]
## [1] 0
##
## [[5]]
## [1] 55
##
## [[6]]
## [1] 17
##
## [[7]]
## [1] 0
Lapply:
lapply() provides a way to handle functions that require more than one argument
pioneers <- c("GAUSS:1777", "BAYES:1702", "PASCAL:1623", "PEARSON:1857")
char_list <- lapply(pioneers, nchar)
sqrt_char <- lapply(char_list, sqrt)
select_el <- function(x,y) {
x^y
}
magic <- lapply (char_list, select_el, y = 2)
magic
## [[1]]
## [1] 100
##
## [[2]]
## [1] 100
##
## [[3]]
## [1] 121
##
## [[4]]
## [1] 144
Sapply :
sapply() function does the same job as lapply() function but returns a vector. Like lapply, we can use our own functions for sapply.
## [1] 18.42857 18.42857 18.42857 18.42857 18.42857 18.42857 18.42857
Vapply:
The syntax for Vapply is :
vapply(X, FUN, FUN.VALUE, …, USE.NAMES = TRUE)
Over the elements inside X, the function FUN is applied. The FUN.VALUE argument expects a template for the return argument of this function FUN. USE.NAMES is TRUE by default; in this case vapply() tries to generate a named array, if possible.
basics <- function(x) {
c(min = min(x), mean = mean(x), median = median(x), max = max(x))
}
vapply(linkedin, basics, numeric(4))
## [,1] [,2] [,3] [,4] [,5] [,6] [,7]
## min 16 9 13 5 55 17 14
## mean 16 9 13 5 55 17 14
## median 16 9 13 5 55 17 14
## max 16 9 13 5 55 17 14
Summary:
lapply():
- apply function over list or vector
- output = list
sapply():
- apply function over list or vector
- try to simplify list to array
vapply(): i) apply function over list or vector ii) explicitly specify output format
3.5 Utilities
Mathematical Utilities:
abs(): Calculate the absolute value. sum(): Calculate the sum of all the values in a data structure. mean(): Calculate the arithmetic mean. round(): Round the values to 0 decimal places by default. Try out ?round in the console for variations of round() and ways to change the number of digits to round to.
sum_round_abs <- sum(round(abs(linkedin)))
vec1 <- c(1.5, 2.5, 8.4, 3.7, 6.3)
vec2 <- rev(vec1)
mean_vec_abs <- mean(c(abs(vec1), abs(vec2)))
seq(): Generate sequences, by specifying the from, to, and by arguments. rep(): Replicate elements of vectors and lists. sort(): Sort a vector in ascending order. Works on numerics, but also on character strings and logicals. rev(): Reverse the elements in a data structures for which reversal is defined. str(): Display the structure of any R object. append(): Merge vectors or lists. is.(): Check for the class of an R object. as.(): Convert an R object from one class to another. unlist(): Flatten (possibly embedded) lists to produce a vector.
linkedin <- list(16, 9, 13, 5, 2, 17, 14)
facebook <- list(17, 7, 5, 16, 8, 13, 14)
# Convert linkedin and facebook to a vector: li_vec and fb_vec
li_vec <- unlist(linkedin)
fb_vec <- unlist(facebook)
# Append fb_vec to li_vec: social_vec
social_vec <- append(li_vec, fb_vec)
linkedin <- list(16, 9, 13, 5, 2, 17, 14)
facebook <- list(17, 7, 5, 16, 8, 13, 14)
# Convert linkedin and facebook to a vector: li_vec and fb_vec
li_vec <- unlist(linkedin)
fb_vec <- unlist(facebook)
# Append fb_vec to li_vec: social_vec
social_vec <- append(li_vec, fb_vec)
# Sort social_vec
sort(social_vec, decreasing = TRUE)
## [1] 17 17 16 16 14 14 13 13 9 8 7 5 5 2
## [1] 1 3 5 1 3 5
Regular Expressions:
Regular expressions are used to clean data to get them ready to work with after we get them. They can be used to see whether a pattern exists inside a character string or a vector of character strings.
[GREP,GREPL]
grepl(), which returns TRUE when a pattern is found in the corresponding character string. grep(), which returns a vector of indices of the character strings that contains the pattern.
emails <- c("john.doe@ivyleague.edu", "education@world.gov", "dalai.lama@peace.org",
"invalid.edu", "quant@bigdatacollege.edu", "cookie.monster@sesame.tv")
edu_has <- c(grepl("edu", x = emails))
edu_emails <- emails[edu_has]
edu_emails
## [1] "john.doe@ivyleague.edu" "education@world.gov"
## [3] "invalid.edu" "quant@bigdatacollege.edu"
We can use the following to match contents in a string :
i)we cam ise the caret, ^, and the dollar sign, $ to match the content located in the start and end of a string, respectively.
- @, because a valid email must contain an at-sign.
iii)(.*), which matches any character (.) zero or more times. Both the dot and the asterisk are metacharacters. We can use them to match any character between the at-sign and the “.edu” portion of an email address.
iv)\.edu$, to match the “.edu” part of the email at the end of the string. The \ part escapes the dot: it tells R that you want to use the . as an actual character.
## [1] TRUE FALSE FALSE FALSE TRUE FALSE
# Use grep() to match for .edu addresses more robustly, save result to hits
hits <- grep("@.*\\.edu$", x = emails)
# Subset emails using hits
emails[hits]
## [1] "john.doe@ivyleague.edu" "quant@bigdatacollege.edu"
[SUB,GSUB]
sub() and gsub() can be used to replace a character within a element. sub() only replaces the first match, whereas gsub() replaces all matches.
## [1] "john.doe@datacamp.edu" "education@world.gov"
## [3] "dalai.lama@peace.org" "invalid.edu"
## [5] "quant@datacamp.edu" "cookie.monster@sesame.tv"
Times & Dates:
## [1] "2021-05-29"
## [1] "2021-05-29 23:42:29 CDT"
If we want to assign a variable with specific date, we have to follow a certain format:
my_birthday <- as.Date(“yyyy-mm-dd”)
## [1] "1999-06-07"
Date Arithmetic:
Difference between today_date and my birthday.
## Time difference of 8027 days
We cam use the following formats to change, create dates using as.Date(). Date increments in Days.
%Y: 4-digit year (1982) %y: 2-digit year (82) %m: 2-digit month (01) %d: 2-digit day of the month (13) %A: weekday (Wednesday) %a: abbreviated weekday (Wed) %B: month (January) %b: abbreviated month (Jan)
Example:
# Definition of character strings representing dates
str1 <- "May 23, '96"
str2 <- "2012-03-15"
str3 <- "30/January/2006"
# Convert the strings to dates: date1, date2, date3
date1 <- as.Date(str1, "%b %d, '%y")
date2 <- as.Date(str2, "%Y-%m-%d")
date3 <- as.Date(str3, "%d/%B/%Y")
# Convert dates to formatted strings
format(date1, "%A")
## [1] "Thursday"
## [1] "15"
## [1] "Jan 2006"
We cam use the following formats to change, create time using as.POSIXct(). Time increments in Seconds.
%H: hours as a decimal number (00-23) %I: hours as a decimal number (01-12) %M: minutes as a decimal number %S: seconds as a decimal number %T: shorthand notation for the typical format %H:%M:%S %p: AM/PM indicator
For a full list of conversion symbols, consult the strptime documentation in the console.
# Definition of character strings representing times
str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
str2 <- "2012-3-12 14:23:08"
# Convert the strings to POSIXct objects: time1, time2
time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
time2 <- as.POSIXct(str2)
# Convert times to formatted strings
format(time1, "%M")
## [1] "01"
## [1] "02:23 PM"