Reference classes

2013-05-09

Also sometimes wittily referred to as S5 or R5.

A more recent development in R is Reference classes. These promise, at last, fully blown objects and classes like those of C++ and Java. Naturally, this innovation comes with some downsides:

Reference classes are still, as yet, lightly documented
They use their own form of syntax and idioms
The C++/Java concept of classes that self-mutate and hold state doesn't sit entirely well with R's functional nature of functions without side effects.

Basics

Let's sketch out a simple reference class, for recording sighting of wildlife:

Observation <- setRefClass ("Observation",
   fields = c (species = "character", count = "numeric"),
   methods = c (
      genus = function(level, ...) {
         strsplit(species, " ")[[1]][1]
      },
      multiple = function() {
         1 < count
      }
   )
)

A few details to highlight:

As opposed to the class creation mechanisms of S4, setRefClass does not just register a class name but creates a class and returns it to be kept in a variable. That is, the class is Observation not the name/string "Observation".
fields is much like representation in S4, a list of members and their defining data type
There are true methods, called as functions of the object and defined as the members of methods.

Note: fields and methods can be defined as a vector or a list. A vector is arguably shorter and just as easy to understand.

This toy class allows us to set the species name and how many examples of the species we saw. It also allows us to ask what the genus name is (the first word in the species name) and ask whether more than one example was seen:

my_obs <- Observation$new(species = "Pongo pongo", count = 5)
my_obs$species
## [1] "Pongo pongo"
my_obs$count
## [1] 5
my_obs$genus()
## [1] "Pongo"
my_obs$multiple()
## [1] TRUE

Two important points here:

Classes are instantiated with the new method being called on the class.
Class members and methods are called with the $ syntax: obj$member and obj$method.

Methods

As noted, Reference class methods are "true" methods: they are located in or attached to the object and they operate on the members of the object. Not how in the example above, species and count are referred to in the method definitions are implicitly understood to be referring to the mebers of that object, i.e. self recursion.

A number of builtin methods exist on every object, including:

callSuper(): pass execution to the superlcass of this object
copy(): make a copy of this object
field (field_name, val): assign a value to the field of the given name

If initialize() and finalize() methods ares provided on a class, these are used as automatic constructor and destructor methods.

(Note: there's a little oddity here in that Reference classes directly equate the arguments passed to a class constructor to the actual fields of an object, in a way mixing interface with implementation. But you can include arguments in the ''initialize'' method that create members, neither of which is listed as a field.)

Mutation

R likes functions to have no side effects, i.e. changing data actually means creating a new copy of the original data with modified values. But reference classes allow you to mutate the state of objects without duplicating them. How does that work?

Reference class methods can use the operator <<-. This modifies the value of a field in place. Where the usual''<-'' operator tpo be used, thoiis would just just create a new local object, as it it does normally in R. For example:

MyMut <- setRefClass ("MyMutCls",
   fields = c(foo = "character", bar = "character"),
   methods = c(
      initialize = function(a, b) {
         foo <<- a
         bar <<- b
      },
      mutate = function() {
         foo <- "new_foo"
         bar <<- "new_bar"
      }
   )
)
## Warning: local assignment to field name will not change the field: foo <-
## "new_foo" Did you mean to use "<<-"? ( in method "mutate" for class
## "MyMutCls")

Notice how R warns you against the deliberate mistake in mutate(). Now lets create an object and look at the starting values:

mut_obj <- MyMut$new(a = "old_val", b = "old_val")
mut_obj$foo
## [1] "old_val"
mut_obj$bar
## [1] "old_val"

Then lets call the faulty mutate() method and see what happens:

mut_obj$mutate()
mut_obj$foo
## [1] "old_val"
mut_obj$bar
## [1] "new_bar"

foo is only changed locally. The state of bar is changed permanently.

If you want to do something clever with the whole object, the variable .self can be used in methods to reference to the current (owning) object.

Inheritance

A class can inherit from another by simple use of the contains argument to the class definition as per S4:

TimedObservation <- setRefClass ("TimedObservation",
   fields = c(time = "character"),
   contains = c(Observation)
)
## Error: the 'contains' argument should be the names of superclasses: got an
## element of class "refObjectGenerator"

Methods in a subclass override those in a superclass and a subclass inherits all the fields of a superclass.

Summary

Reference classes are proper classes, for certain values of proper
They have attached methods
Ref classes can self mutate with aid of the <<- operator.

References

Some discussion of the "new" classes: http://stackoverflow.com/questions/5137199/what-is-the-significance-of-the-new-reference-classes
Hadley Wickham on R5: https://github.com/hadley/devtools/wiki/R5
Inside R and a doc page that is actually useful: http://www.inside-r.org/r-doc/methods/ReferenceClasses