An introduction to objects in R

The various types of OOP in R, whatever that is.


Doing object-oriented programming (OOP) in R can be complicated. This is because when you look in R for the usual idioms and features of OOP:

  • some are missing
  • some are present but implemented in unusual albeit valid ways
  • some are present but implemented in strange and hacky ways

A further source of confusion is the existence of several different implementations of OOP in R:

  • S3: the simplest and crudest method for constructing objects, which is relatively weak but nonetheless the basis for much of the older OOP code in R
  • S4: a more complicated and powerful method, used in many recent packages
  • Reference classes: a recent (2012) method, as yet lightly documented, which produces objects that behave in a way similar to those in Java and C++
  • R.oo, OOP, etc: a number of third party packages that allow or ease the construction of classes and objects

Finally, as a functional language R sits very uneasily with the usual notion of objects in C++ and Java, where objects will mutate frequently and side-effects are common.

As a result, OOP is comparatively underused in R and the way(s) it is implemented can be confusing and opaque to those coming from other programming langauges. While OOP isn't a panacea (there are other valid and useful programing idioms, and several smart people have challenged OOP as a paradigm) it's still useful to understand how it can be achieved in R so as to allow the building of complex data structures and algorithms.

In this light, this and the accompanying articles are an outlien iof the various OOP systems in R, detailing what they can do and how they can be used, especially from the point-of-view of those coming from other, more mainstream programming languages. But first …

What is OOP?

Surprisingly, there isn't a single canonical definition of what constitutes OOP. Some features are commonly appear:

  • Modelling: it is useful to model and implement a problem domain as objects containg related data
  • Modularity: bodies of code and data concerned with different aspects of the problem domain have miminal interactions through narrow interfaces, to allow both reuse and substition
  • Dynamic dispatch: when a method is invoked on an object, it determines how it is to be executed. It could be argued that generic functions are anothers means to the same end: a function class is dispatched to different functions depending on the arguments passed to it.
  • Subtype polymorphism: objects can be replaced with that of a derived type, which will fulfil the external obligations of the original object, while allowing the internal details to alter
  • Inheritance: derived types can be constructed that fulfil the interface of the parent class, while customizing the implementation
  • Open recursion: objects can refer to themselves

I'll refer to these in the subsequent articles.