2 Introduction to R
https://learn.datacamp.com/courses/free-introduction-to-r
(Note: If you do Introduction to R for Finance instead, change the title of this bookdown chapter above and change the DataCamp chapter titles below. Whichever version of Intro to R you do, delete this note)
2.1 Intro to basics
How it Works:
We can use ‘#’ to add comments on R which are just texts or reminders which won’t actually run when we run the entire program.
Example:
Arithmetic with R:-
## [1] 9
## [1] 5
## [1] 25
## [1] 5
## [1] 8
## [1] 1
Variable Assignment:-
We can store a value in a variable using var_name <- 44. We can then do various arithmetical operations with the variables instead rather than the value of the variables. This is most helpful when the value of the variable is a very large string or number.
Example:
## [1] "43567565343 is a huge number and is annoying to type everytime"
## [1] 33
Basic Data Types in R:
Decimal values like 4.5 are called numerics. Whole numbers like 4 are called integers. Integers are also numerics. Boolean values (TRUE or FALSE) are called logical. Text (or string) values are called characters.
We can check what kind of data we are working with by using the class() function. This is important because we can’t do create functions with different data types.
## [1] "numeric"
## [1] "character"
## [1] "logical"
## [1] "numeric"
2.2 Vectors
Vectors are one-dimension arrays that can hold numeric data, character data, or logical data.
Example:
## [1] "Country" "Blues" "Indie"
We should be careful before naming vectors and think through it to come up with a name which give us an idea of what the vector consists of. We can assign vectors to other vectors to make life easier.
# Total Expenditure for the week 1
week1_exp <- c(140, 30, 20, 220, 40)
# Total Expenditure for the week 1
week2_exp <- c(247, 750, 108, 308, 810)
# The variable days_vector
days_vector <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday")
# Assign the names of the day to week1_exp and week2_exp
names(week1_exp) <- days_vector
names(week2_exp) <- days_vector
week1_exp
## Monday Tuesday Wednesday Thursday Friday
## 140 30 20 220 40
## Monday Tuesday Wednesday Thursday Friday
## 247 750 108 308 810
We can then add the two vectors to find the total expenditure for 2 weeks.
## Monday Tuesday Wednesday Thursday Friday
## 387 780 128 528 850
We can use the sum() function to calculate the sum of all elements of a vector.
total_week1 = sum(week1_exp)
total_week2 = sum(week2_exp)
total_exp = total_week1 + total_week2
total_exp
## [1] 2673
We can compare the two variables using > or < which will return either TRUE or FALSE.
Example:
## [1] TRUE
We can use the [] to select individual elements from a vector.
Example:
## Thursday
## 528
We can select multiple elements from a vector and make a new vector.
Example:
## Wednesday Thursday Friday
## 20 220 40
We can check whether each element of a vector fulfils a certain condition.
For example:
## Monday Tuesday Wednesday Thursday Friday
## TRUE FALSE FALSE TRUE FALSE
We can get the the desired elementsfrom a vector by placing the desired vector between the square brackets that follow the main vector.
Example:
## Monday Thursday
## 140 220
2.3 Matrices
Matrix format : matrix(1:9, byrow = TRUE, nrow = 3)
Here, we use 1:9 which is a shortcut for c(1, 2, 3, 4, 5, 6, 7, 8, 9)
The argument byrow indicates that the matrix is filled by the rows. If we want the matrix to be filled by the columns, we just place byrow = FALSE.
The third argument nrow indicates that the matrix should have three rows.
Example:
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1 2 3 4 5
## [2,] 6 7 8 9 10
## [3,] 11 12 13 14 15
## [4,] 16 17 18 19 20
## [5,] 21 22 23 24 25
We can follow the following code to make a matrix with row names and column names :
jan_exp <- c(700, 400)
feb_exp <- c(800, 500)
mar_exp <- c(400, 600)
monthly_exp_matrix <- matrix(c(jan_exp, feb_exp, mar_exp), nrow = 3, byrow = TRUE)
time_interval <- c("1st Half", "2nd Half")
months <- c("January", "February", "March")
colnames(monthly_exp_matrix) <- time_interval
rownames(monthly_exp_matrix) <- months
monthly_exp_matrix
## 1st Half 2nd Half
## January 700 400
## February 800 500
## March 400 600
We can use the rowSums() and colSums() function to calculate the row wise/column wise sums in a matrix.
## January February March
## 1100 1300 1000
## 1st Half 2nd Half
## 1900 1500
We can use cbind() and rbind() to add new rows and columns to a matrix.
April<-c(550, 300)
new_matrix <- rbind(monthly_exp_matrix, April)
Extra <- c(31,45,32,12)
updated_matrix <- cbind(new_matrix, Extra)
updated_matrix
## 1st Half 2nd Half Extra
## January 700 400 31
## February 800 500 45
## March 400 600 32
## April 550 300 12
Similar to vectors, we can use the [] to select one or more elements from a matrix. The format for that is my_matrix [row, column]
## [1] 32
Standard arithmetic operators like +, -, /, *, etc. work in an element-wise way on matrices in R. We can even use the operators on two seperate matrices.
Example:
mean_spend_perday <- updated_matrix/30
mean_spend_perday_withtax <- updated_matrix * 1.10
mean_tax <- 0.10 * mean_spend_perday
mean_exp_withtax_perday <- mean_tax + mean_spend_perday
mean_spend_perday
## 1st Half 2nd Half Extra
## January 23.33333 13.33333 1.033333
## February 26.66667 16.66667 1.500000
## March 13.33333 20.00000 1.066667
## April 18.33333 10.00000 0.400000
## 1st Half 2nd Half Extra
## January 770 440 34.1
## February 880 550 49.5
## March 440 660 35.2
## April 605 330 13.2
## 1st Half 2nd Half Extra
## January 25.66667 14.66667 1.136667
## February 29.33333 18.33333 1.650000
## March 14.66667 22.00000 1.173333
## April 20.16667 11.00000 0.440000
2.4 Factors
There are two types of variables : Quantitative or Continuous and Categorical. Quantitative or Continuous variables are the variables which only consists of values which are numbers/countable. Ex: Number of students, Price, etc.
Categorical variables are the variables which are non countable. Ex: Cities, Countries, Sex, Type, Month, etc.
The difference between a categorical variable and a continuous variable is that a categorical variable can belong to a limited number of categories. A continuous variable, on the other hand, can correspond to an infinite number of values.
We can factor vectors to it’s factor levels by the factor() function and check it’s factor levels.
Example :
monthly_yearly <- c("Monthly", "Yearly", "Daily")
factor_monthly_yearly <- factor(monthly_yearly)
factor_monthly_yearly
## [1] Monthly Yearly Daily
## Levels: Daily Monthly Yearly
There are two types of categorical variables: a nominal categorical variable and an ordinal categorical variable.
A nominal variable is a categorical variable without an implied order. This means that it is impossible to say that ‘one is worth more than the other’. Ex : Animals, Planets, Cities
Ordinal variables do have a natural ordering. Ex: Tall, Short or Spicy, Spicier, Spiciest.
We can change the names of the factor levels with the function levels().
Example:
months <- c("j", "f", "j", "j", "f", "j")
factor_months <- factor(months)
levels(factor_months) <- c("January", "February")
factor_months
## [1] February January February February January February
## Levels: January February
We can use the summary() function to get a quick overview of the contents of a variable.
Example :
## January February
## 2 4
In order to change a factor to an ordinal factor i.e the factors whose categories have a natural ordering, we use :
factor(some_vector, ordered = TRUE, levels = c(“lev1”, “lev2” …))
Example :
factor_month_ordinal <- factor(factor_months, ordered = TRUE, levels = c("January", "February"))
summary(factor_month_ordinal)
## January February
## 2 4
2.5 Data frames
A data frame has the variables of a data set as columns and the observations as rows.
The function head() enables you to show the first observations of a data frame.
The function tail() prints out the last observations in your data set.
The function str() shows you the structure of your data set.
To create a data frame:
day <- c("Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday",
"Saturday")
day_type <- c("Lazy",
"Hard Work",
"Hard Work",
"Hard Work", "Hard Work",
"Medium Work", "Lazy")
water_intake <- c(2.5, 5.9, 7.0, 6.5,
7.2, 4.1, 4.0)
excercise_time <- c(20, 100, 120, 90,
70, 60, 45)
morning_jog <- c(FALSE, TRUE, TRUE, FALSE, TRUE, FALSE, TRUE)
fitness_data <- data.frame(day,day_type, water_intake, excercise_time, morning_jog)
str(fitness_data)
## 'data.frame': 7 obs. of 5 variables:
## $ day : chr "Sunday" "Monday" "Tuesday" "Wednesday" ...
## $ day_type : chr "Lazy" "Hard Work" "Hard Work" "Hard Work" ...
## $ water_intake : num 2.5 5.9 7 6.5 7.2 4.1 4
## $ excercise_time: num 20 100 120 90 70 60 45
## $ morning_jog : logi FALSE TRUE TRUE FALSE TRUE FALSE ...
We can use [] to bring out values of certain rows and columns of the data set.
Example :
## [1] 2.5
## day day_type water_intake excercise_time morning_jog
## 1 Sunday Lazy 2.5 20 FALSE
## [1] 2.5 5.9 7.0 6.5 7.2 4.1 4.0
We can also do this using the subset function.
## day day_type water_intake excercise_time morning_jog
## 2 Monday Hard Work 5.9 100 TRUE
## 3 Tuesday Hard Work 7.0 120 TRUE
## 5 Thursday Hard Work 7.2 70 TRUE
## 7 Saturday Lazy 4.0 45 TRUE
We can sort/order data according to a certain variable in the data set. In R, this is done with the help of the function order().
## day day_type water_intake excercise_time morning_jog
## 1 Sunday Lazy 2.5 20 FALSE
## 7 Saturday Lazy 4.0 45 TRUE
## 6 Friday Medium Work 4.1 60 FALSE
## 5 Thursday Hard Work 7.2 70 TRUE
## 4 Wednesday Hard Work 6.5 90 FALSE
## 2 Monday Hard Work 5.9 100 TRUE
## 3 Tuesday Hard Work 7.0 120 TRUE
Summary till now:
Vectors (one dimensional array): can hold numeric, character or logical values. The elements in a vector all have the same data type.
Matrices (two dimensional array): can hold numeric, character or logical values. The elements in a matrix all have the same data type.
Data frames (two-dimensional objects): can hold numeric, character or logical values. Within a column all elements have the same data type, but different columns can be of different data type.
2.6 Lists
We can construct a list with the list() function with matrices, vectors, or other lists.
## [[1]]
## Monday Tuesday Wednesday Thursday Friday
## 387 780 128 528 850
##
## [[2]]
## 1st Half 2nd Half
## January 700 400
## February 800 500
## March 400 600
##
## [[3]]
## day day_type water_intake excercise_time morning_jog
## 1 Sunday Lazy 2.5 20 FALSE
## 2 Monday Hard Work 5.9 100 TRUE
## 3 Tuesday Hard Work 7.0 120 TRUE
## 4 Wednesday Hard Work 6.5 90 FALSE
## 5 Thursday Hard Work 7.2 70 TRUE
## 6 Friday Medium Work 4.1 60 FALSE
## 7 Saturday Lazy 4.0 45 TRUE
We can change names of each list components using:
names(intro_list) <- list("daily_exp", "monthly_exp", "fitness")
#or another method
intro_list <- list(daily_exp = total_dailyexp, monthly_exp = monthly_exp_matrix, fitness = fitness_data)
We can select elements/components from a list usin the []. If we want to select the 2nd element from the 3rd component of the list :
## water_intake
## 1 2.5
## 2 5.9
## 3 7.0
## 4 6.5
## 5 7.2
## 6 4.1
## 7 4.0