A class is an R object with a formal structure; think of classes as nouns. A generic and associated methods are functions that transform nouns; think of generics and methods as verbs.
R has two main class systems, and these differ from class systems in many programming languages. The primary differences is that in R methods are associated with generics, whereas in other programming languages methods are associated with classes.
Consider this simple work flow
x <- rnorm(10)
y <- x + rnorm(10)
df <- data.frame(X=x, Y=y)
fit <- lm(Y ~ X, df)
x
and y
are examples of so-called ‘atomic’ vectors, the building blocks of R data represenations. df
is a data.frame
, and is an example of an R class – an assembly of different atomic types (here, a list of numeric vectors, in this case) with an associated ‘class’ attribute
class(df)
## [1] "data.frame"
attributes(df)
## $names
## [1] "X" "Y"
##
## $row.names
## [1] 1 2 3 4 5 6 7 8 9 10
##
## $class
## [1] "data.frame"
dput(df)
## structure(list(X = c(0.316466059374151, -0.0290990768980746,
## 0.233530610491406, -0.153293223624643, 0.159430839622362, 1.63223674585324,
## 0.859170096614589, -0.412265948468438, 0.0306311062978289, -1.03914199775336
## ), Y = c(0.778510942506688, -2.13310593734469, 0.0325801294032117,
## -0.566422028776406, 0.951729589533829, 2.29810146589824, 1.76793651271658,
## -1.58211435993906, -0.613110945276851, 0.81776333410081)), .Names = c("X",
## "Y"), row.names = c(NA, -10L), class = "data.frame")
There are several reasons to introduce classes, including
Enforcing constraints on class members, e.g., vectors in a data.frame
must be of equal length
Providing functionality that would otherwise be tedious to maintain, e.g., row.names
.
Separating the implementation of the object from the way the user interacts with the object’s interface.
The last point is a primary reason for use of classes, and can be seen in the fit
object – it has complicated internal structure that is somehow computationally conveient, but not really the business of the end user.
str(fit)
## List of 12
## $ coefficients : Named num [1:2] -0.00979 1.15778
## ..- attr(*, "names")= chr [1:2] "(Intercept)" "X"
## $ residuals : Named num [1:10] 0.422 -2.09 -0.228 -0.379 0.777 ...
## ..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
## $ effects : Named num [1:10] -0.554 2.484 -0.363 -0.157 0.711 ...
## ..- attr(*, "names")= chr [1:10] "(Intercept)" "X" "" "" ...
## $ rank : int 2
## $ fitted.values: Named num [1:10] 0.3566 -0.0435 0.2606 -0.1873 0.1748 ...
## ..- attr(*, "names")= chr [1:10] "1" "2" "3" "4" ...
## $ assign : int [1:2] 0 1
## $ qr :List of 5
## ..$ qr : num [1:10, 1:2] -3.162 0.316 0.316 0.316 0.316 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ : chr [1:10] "1" "2" "3" "4" ...
## .. .. ..$ : chr [1:2] "(Intercept)" "X"
## .. ..- attr(*, "assign")= int [1:2] 0 1
## ..$ qraux: num [1:2] 1.32 1.11
## ..$ pivot: int [1:2] 1 2
## ..$ tol : num 1e-07
## ..$ rank : int 2
## ..- attr(*, "class")= chr "qr"
## $ df.residual : int 8
## $ xlevels : Named list()
## $ call : language lm(formula = Y ~ X, data = df)
## $ terms :Classes 'terms', 'formula' language Y ~ X
## .. ..- attr(*, "variables")= language list(Y, X)
## .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. ..$ : chr [1:2] "Y" "X"
## .. .. .. ..$ : chr "X"
## .. ..- attr(*, "term.labels")= chr "X"
## .. ..- attr(*, "order")= int 1
## .. ..- attr(*, "intercept")= int 1
## .. ..- attr(*, "response")= int 1
## .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. ..- attr(*, "predvars")= language list(Y, X)
## .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. ..- attr(*, "names")= chr [1:2] "Y" "X"
## $ model :'data.frame': 10 obs. of 2 variables:
## ..$ Y: num [1:10] 0.7785 -2.1331 0.0326 -0.5664 0.9517 ...
## ..$ X: num [1:10] 0.3165 -0.0291 0.2335 -0.1533 0.1594 ...
## ..- attr(*, "terms")=Classes 'terms', 'formula' language Y ~ X
## .. .. ..- attr(*, "variables")= language list(Y, X)
## .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
## .. .. .. ..- attr(*, "dimnames")=List of 2
## .. .. .. .. ..$ : chr [1:2] "Y" "X"
## .. .. .. .. ..$ : chr "X"
## .. .. ..- attr(*, "term.labels")= chr "X"
## .. .. ..- attr(*, "order")= int 1
## .. .. ..- attr(*, "intercept")= int 1
## .. .. ..- attr(*, "response")= int 1
## .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
## .. .. ..- attr(*, "predvars")= language list(Y, X)
## .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
## .. .. .. ..- attr(*, "names")= chr [1:2] "Y" "X"
## - attr(*, "class")= chr "lm"
Instead, the user can manipulate the object through it’s interface, defined in part by the methods that operate on the class.
methods(class=class(fit))
## [1] add1 alias anova case.names
## [5] coerce confint cooks.distance deviance
## [9] dfbeta dfbetas drop1 dummy.coef
## [13] effects extractAIC family formula
## [17] hatvalues influence initialize kappa
## [21] labels logLik model.frame model.matrix
## [25] nobs plot predict print
## [29] proj qr residuals rstandard
## [33] rstudent show simulate slotsFromS3
## [37] summary variable.names vcov
## see '?methods' for accessing help and source code
anova(fit)
## Analysis of Variance Table
##
## Response: Y
## Df Sum Sq Mean Sq F value Pr(>F)
## X 1 6.1691 6.1691 4.1604 0.07571 .
## Residuals 8 11.8625 1.4828
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Some aspects of S3 classes and methods
The class
attribute determines what class an object is; there is no formal class definition.
Classes can have linear inheritance. All the methods that apply to an object of class lm
can be used on fit1
. There may be additional methods that apply only to class my
.
fit1 <- fit
class(fit1)
## [1] "lm"
class(fit1) = c("my", class(fit1))
class(fit1)
## [1] "my" "lm"
A generic is a plain-old-function that has UseMethod()
in it’s body.
fun <- function(object, ...)
UseMethod("fun")
A method is a plain old function whose name is constructed by pasting a generic function name and a S3 class name together.
fun.lm <- function(object, ...)
message("fun.lm method")
fun(fit)
## fun.lm method
fun(fit1)
## fun.lm method
fun.my <- function(object, ...)
message("fun.my method")
fun(fit)
## fun.lm method
fun(fit1)
## fun.my method
Inheritance can be exploited in the function body using NextMethod()
fun.my <- function(object, ...) {
message("fun.my method")
NextMethod()
}
fun(fit1)
## fun.my method
## fun.lm method
Classes, generics, and methods introduce some complexity, for instance getting help…
?plot
returns the help for the plot generic and is, well, quite generic.?plot.lm
returns the help for the plot.lm
method, and is very informative.… or finding source code
plot
simply prints enough information to know that it is a genericplot.lm
is the method, but the method has not been exported from the package (stats) where it is defined. To see the code, use stats:::plot.lm
.The S4 system introduces
Here’s an S4 class definition representing people with first and last names.
.Person <- setClass("Person",
slots=c(
first ="character",
last ="character"
)
)
setClass()
defines the class. It returns a ‘generator’ function that can be used to create an instance of the class. My convention is to assign the generator to a variable named after the class and preceeded by a .
. The reason is that the argument signature of the generator is not inforrmative for the user – it consists of ...
, rather than named arguments. Thus my convention is to write a user-facing constructor
Person <- function(firstname=character(), lastname=character()) {
.Person(first=firstname, last=lastname)
}
Here’s a people instance
people <- Person(
firstname = c("George", "John", "Thomas"),
lastname = c("Washington", "Adams", "Jefferson")
)
A new class often requires methods work with the data. To separate the implementation from the interface, we’ll write a couple of ‘accessor’ functions that extract relevant components of the data. The accessors use knowledge of the class structure, but we will strive to make all other operations ignorant of implementation details.
firstname <- function(x)
slot(x, "first")
lastname <- function(x)
slot(x, "last")
We’ll now implement length()
and show()
methods, using existing generics. The generics can be discovered with getGeneric()
. For instance,
getGeneric("length")
## standardGeneric for "length" defined from package "base"
##
## function (x)
## standardGeneric("length", .Primitive("length"))
## <bytecode: 0x36dbd80>
## <environment: 0x36d7e98>
## Methods may be defined for arguments: x
## Use showMethods("length") for currently available ones.
tells us that the method wee write should have a single argument x
. Thus
setMethod("length", "Person", function(x) {
length(firstname(x)) # use length of first name vector
})
## [1] "length"
setMethod("show", "Person", function(object) {
cat("class: ", class(object), "\n",
"length: ", length(object), " individuals\n",
sep="")
})
## [1] "show"
Note that we use accessors rather than direct slot access.
Here we implement a derived class, with an additional slot and accessor
.President <- setClass("President",
contains = "Person",
slots = c(party = "character")
)
party <- function(x)
slot(x, "party")
There are two ways in which one can construct an object of this class
.President( # use base class to initialize...
people,
party = c("Unaffiliated", "Federalist", "Democratic-Republican")
)
## class: President
## length: 3 individuals
.President( # ... or initialize each slot
first = c("George", "John", "Thomas"),
last = c("Washington", "Adams", "Jefferson"),
party = c("Unaffiliated", "Federalist", "Democratic-Republican")
)
## class: President
## length: 3 individuals
We’ll choose to implement a constructor that matches the latter
President <- function(firstname=character(), lastname=character(),
party=character())
{
.President(first=firstname, last=lastname, party=party)
}
Note that we did not need to define length()
or show()
methods for our derived class.
There are many additional features of S4 classes. A simple example is the ‘validity’ method, which can be used to impose constraints on the data.
setValidity("Person", function(object) {
msg <- character() # describe how the object is invalid
if (length(firstname(object)) != length(lastname(object)))
msg <- c(msg, "firstname() and lastname() lengths differ")
if (anyNA(firstname(object)) || anyNA(lastname(object)))
msg <- c(msg, "NA values not allowed in firstname() or lastname()")
if (length(msg)) msg else TRUE
})
## Class "Person" [in ".GlobalEnv"]
##
## Slots:
##
## Name: first last
## Class: character character
##
## Known Subclasses: "President"
setValidity("President", function(object) {
## test only properties of President
msg <- character()
if (length(party(object)) != length(object))
msg <- c(msg, "party() length differs from person lengths")
if (anyNA(party(object)))
msg <- c(msg, "NA values not allowed in party()")
if (length(msg)) msg else TRUE
})
## Class "President" [in ".GlobalEnv"]
##
## Slots:
##
## Name: party first last
## Class: character character character
##
## Extends: "Person"