Type values and mathematical formulas into R’s command prompt
1 + 1
## [1] 2
Assign values to symbols (variables)
x = 1
x + x
## [1] 2
Invoke functions such as c()
, which takes any number of values and returns a single vector
x = c(1, 2, 3)
x
## [1] 1 2 3
R functions, such as sqrt()
, often operate efficienty on vectors
y = sqrt(x)
y
## [1] 1.000000 1.414214 1.732051
There are often several ways to accomplish a task in R
x = c(1, 2, 3)
x
## [1] 1 2 3
x <- c(4, 5, 6)
x
## [1] 4 5 6
x <- 7:9
x
## [1] 7 8 9
10:12 -> x
x
## [1] 10 11 12
Sometimes R does ‘surprising’ things that can be fun to figure out
x <- c(1, 2, 3) -> y
x
## [1] 1 2 3
y
## [1] 1 2 3
‘Atomic’ vectors
Types include integer, numeric (float-point; real), complex, logical, character, raw (bytes)
people <- c("Brian", "Jim", "Herve", "Dan", "Val", "Martin")
people
## [1] "Brian" "Jim" "Herve" "Dan" "Val" "Martin"
Atomic vectors can be named
population <- c(Buffalo=259000, Rochester=210000, `New York`=8400000)
population
## Buffalo Rochester New York
## 259000 210000 8400000
log10(population)
## Buffalo Rochester New York
## 5.413300 5.322219 6.924279
Statistical concepts like NA
(not available)
truthiness <- c(TRUE, FALSE, NA)
truthiness
## [1] TRUE FALSE NA
Logical concepts like ‘and’ (&
), ‘or’ (|
), and ‘not’ (!
)
!truthiness
## [1] FALSE TRUE NA
truthiness | !truthiness
## [1] TRUE TRUE NA
truthiness & !truthiness
## [1] FALSE FALSE NA
Numerical concepts like infinity (Inf
) or not-a-number (NaN
, e.g., 0 / 0)
undefined_numeric_values <- c(NA, 0/0, NaN, Inf, -Inf)
undefined_numeric_values
## [1] NA NaN NaN Inf -Inf
sqrt(undefined_numeric_values)
## Warning in sqrt(undefined_numeric_values): NaNs produced
## [1] NA NaN NaN Inf NaN
Common string manipulations
toupper(people)
## [1] "BRIAN" "JIM" "HERVE" "DAN" "VAL" "MARTIN"
substr(people, 1, 3)
## [1] "Bri" "Jim" "Her" "Dan" "Val" "Mar"
R is a green consumer – recylcing short vectors to align with long vectors
x <- 1:3
x * 2 # '2' (vector of length 1) recycled to c(2, 2, 2)
## [1] 2 4 6
truthiness | NA
## [1] TRUE NA NA
truthiness & NA
## [1] NA FALSE NA
It’s very common to nest operations, which can be simultaneously compact, confusing, and expressive ([
: subset; <
: less than)
substr(tolower(people), 1, 3)
## [1] "bri" "jim" "her" "dan" "val" "mar"
population[population < 1000000]
## Buffalo Rochester
## 259000 210000
Lists
The list type can contain other vectors, including other lists
frenemies = list(
friends=c("Larry", "Richard", "Vivian"),
enemies=c("Dick", "Mik")
)
frenemies
## $friends
## [1] "Larry" "Richard" "Vivian"
##
## $enemies
## [1] "Dick" "Mik"
[
subsets one list to create another list, [[
extracts a list element
frenemies[1]
## $friends
## [1] "Larry" "Richard" "Vivian"
frenemies[c("enemies", "friends")]
## $enemies
## [1] "Dick" "Mik"
##
## $friends
## [1] "Larry" "Richard" "Vivian"
frenemies[["enemies"]]
## [1] "Dick" "Mik"
Factors
Character-like vectors, but with values restricted to specific levels
sex = factor(c("Male", "Male", "Female"),
levels=c("Female", "Male", "Hermaphrodite"))
sex
## [1] Male Male Female
## Levels: Female Male Hermaphrodite
sex == "Female"
## [1] FALSE FALSE TRUE
table(sex)
## sex
## Female Male Hermaphrodite
## 1 2 0
sex[sex == "Female"]
## [1] Female
## Levels: Female Male Hermaphrodite
Variables are often related to one another in a highly structured way, e.g., two ‘columns’ of data in a spreadsheet
x = rnorm(1000) # 1000 random normal deviates
y = x + rnorm(1000) # another 1000 deviates, as a function of x
plot(y ~ x) # relationship bewteen x and y
Convenient to manipulate them together
data.frame()
: like columns in a spreadsheet
df = data.frame(X=x, Y=y)
head(df) # first 6 rows
## X Y
## 1 -1.7569371 -0.70884344
## 2 -1.6527157 -1.97487316
## 3 -0.5161684 -1.36055768
## 4 0.2218860 0.09724608
## 5 -0.6661832 -1.82587026
## 6 -0.5512824 0.71819197
plot(Y ~ X, df) # same as above
See all data with View(df)
. Summarize data with summary(df)
summary(df)
## X Y
## Min. :-3.27963 Min. :-5.20065
## 1st Qu.:-0.71917 1st Qu.:-1.02837
## Median :-0.06830 Median :-0.08605
## Mean :-0.06072 Mean :-0.09962
## 3rd Qu.: 0.64606 3rd Qu.: 0.90735
## Max. : 2.77080 Max. : 4.37988
Easy to manipulate data in a coordinated way, e.g., access column X
with $
and subset for just those values greater than 0
positiveX = df[df$X > 0,]
head(positiveX)
## X Y
## 4 0.2218860 0.09724608
## 9 0.6701959 0.82361589
## 10 1.1216619 1.49955242
## 14 0.6156470 0.11297448
## 15 0.2805778 -1.84736727
## 16 0.7633320 -1.63962235
plot(Y ~ X, positiveX)
R is introspective – ask it about itself
class(df)
## [1] "data.frame"
dim(df)
## [1] 1000 2
colnames(df)
## [1] "X" "Y"
matrix()
a related class, where all elements have the same type (a data.frame()
requires elements within a column to be the same type, but elements between columns can be different types).
A scatterplot makes one want to fit a linear model (do a regression analysis)
Variables found in the second argument
fit <- lm(Y ~ X, df)
Visualize the points, and add the regression line
plot(Y ~ X, df)
abline(fit, col="red", lwd=3)
Summarize the fit as an ANOVA table
anova(fit)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 1040.0 1039.96 1022.2 < 2.2e-16 ***
## Residuals 998 1015.4 1.02
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Introspection – what class is fit
? What methods can I apply to an object of that class?
class(fit)
## [1] "lm"
methods(class=class(fit))
## [1] add1 alias anova case.names coerce confint
## [7] cooks.distance deviance dfbeta dfbetas drop1 dummy.coef
## [13] effects extractAIC family formula hatvalues influence
## [19] initialize kappa labels logLik model.frame model.matrix
## [25] nobs plot predict print proj qr
## [31] residuals rstandard rstudent show simulate slotsFromS3
## [37] summary variable.names vcov
## see '?methods' for accessing help and source code
Help available in Rstudio or interactively
Check out the help page for rnorm()
?rnorm
‘Usage’ section describes how the function can be used
rnorm(n, mean = 0, sd = 1)
Arguments, some with default values. Arguments matched first by name, then position
‘Arguments’ section describes what the arguments are supposed to be
‘Value’ section describes return value
‘Examples’ section illustrates use
Often include citations to relevant technical documentation, reference to related functions, obscure details
Can be intimidating, but in the end actually very useful