Rcpp: Using C++ to improve performance in R

Rcpp is an R library allowing for easy integration of C++ code in your R workflow. It allows you to create optimized functions for when R just isn’t fast enough. It can also be used as a bridge between R and C++ giving you the ability to access the existing C++ libraries.

Why use Rcpp?

There are many use cases for Rcpp, and of course many of them assume that you are interested in primarily working in R. Some possibilities include using Rcpp to:

  • call optimized C++ functions from within R
  • write your own C++ functions to eliminate slow code
  • interface R with external C++ libraries not otherwise available in R

We are going to look at a concrete example of optimizing a function using Rcpp, and calling that function within R. We will need the library Rcpp, and we will be comparing execution times using the library microbenchmark. Both of these libraries can be installed easily:

install.packages("Rcpp")
install.packages("microbenchmark")

And then loaded in R:

library(Rcpp)
library(microbenchmark)

Example

This example is adapted from the application of filling a transition probability matrix (such as is typical when working with hidden Markov models).

Consider that we need to fill a matrix with calculated values. This is an issue that comes up all the time, and has many solutions. A few potential solutions for this problem include using the function outer(), looping over matrix elements with for loops, calculating row-wise or column-wise using the apply() function, etc. The various solutions each have their own computational efficiencies, with some being significantly faster than others. For demonstration we will use nested for loops in the R implementation.

For loops in R?

Typically, it is recommended to avoid using for loops in R when speed or efficiency are required. For loops are often slower than other (vectorized) solutions. However they are also the easiest method to learn for new programmers coming to R from other languages such as C++, where for loops can be very efficient.

R Implementation

The function we will be optimizing takes exactly one parameter, the size of the square matrix we will be filling with values.

func_forloop <- function(n) {
  M = matrix(0, nrow=n, ncol=n)
  for(row in 1:n) {
    for(col in 1:n) {
      for(m in 0:min(row,col)) {
        M[row,col] = M[row,col] + 
          dpois(x = col-1-m, lambda = 1) *
          dbinom(x = m, size = row-1, prob = 0.75)
      }
    }
  }
  return(M)
}

We can calculate this matrix for any value of n, let’s try 3:

func_forloop(3)
##            [,1]      [,2]      [,3]
## [1,] 0.36787944 0.3678794 0.1839397
## [2,] 0.09196986 0.3678794 0.3218945
## [3,] 0.02299247 0.1609473 0.3563832

So, how fast is this function?

bm = microbenchmark(func_forloop(10),
                    func_forloop(30),
                    func_forloop(50),
                    unit = "s")
print(bm, unit = "ms")
## Unit: milliseconds
##              expr     min       lq      mean  median       uq      max neval
##  func_forloop(10)  0.7996  0.85105  0.918849  0.8637  0.92460   2.0313   100
##  func_forloop(30) 17.4439 18.72055 19.548081 19.3164 20.14695  24.6157   100
##  func_forloop(50) 79.8493 83.20825 86.787526 85.9114 88.69220 108.6536   100

As you can see, the function does not scale well as n increases (both dimensions of the matrix increase, and the upper bound of summation for m increase as n increases). Thus computing time can quickly grow out of hand for large n. Let’s see how we can implement this in C++ using Rcpp.

Rcpp Implementation

In a separate file (I will name the file my_func.cpp), we will have this implementation:

#include <Rcpp.h>
using namespace Rcpp;
using namespace std;

// [[Rcpp::export]]
NumericMatrix func_Rcpp(int x) {
  // NumericMatrix() is the Rcpp analogue to R's matrix()
  NumericMatrix M = NumericMatrix(x); 
  
  // In C++ indexing starts at zero, not one
  for(int row = 0; row < x; row++) {
    for(int col = 0; col < x; col++) {
      for(int m = 0; m <= min(row,col); m++) {
        // C++ allows: "a += b" as shorthand for "a = a + b"
        M(row,col) += R::dpois(col-m, 1, false) *
          R::dbinom(m, row, 0.75, false);
      }
    }  
  }
  
  return M;
}

We can then use the function in R after sourcing it with Rcpp:

Rcpp::sourceCpp("my_func.cpp")

We can see that the function returns the same matrix as the R implementation:

func_Rcpp(3)
##            [,1]      [,2]      [,3]
## [1,] 0.36787944 0.3678794 0.1839397
## [2,] 0.09196986 0.3678794 0.3218945
## [3,] 0.02299247 0.1609473 0.3563832
all.equal(func_forloop(3), func_Rcpp(3))
## [1] TRUE

And we can compare the computation times with the R implementation:

bm = microbenchmark(func_forloop(50),
                    func_Rcpp(50),
                    unit = "s")
print(bm, unit = "ms")
## Unit: milliseconds
##              expr     min      lq     mean   median       uq      max neval
##  func_forloop(50) 80.7933 85.1356 88.76877 87.81295 90.86310 126.9279   100
##     func_Rcpp(50) 11.1477 11.2432 11.34730 11.28715 11.39125  12.3793   100

The timings show that there is no contest, the for loop implementation in C++ is significantly faster. However, I would be remiss if I left you with the impression that this is the only way to speed up the code. This is meant as an example of implementing code using Rcpp, and calling that code from within R. Next I will show you a faster way of calculating this particular problem.

Convolution

The biggest bottleneck in this calculation is the summation over m. However, we are summing over the product of two functions, with one function input increasing and the other decreasing. In our case, this means we have a discrete convolution. It is well known that due to the existance of the Fast Fourier Transform, convolution can be calculated extremely quickly. Base R implements this fast convolution with the function convolve().

Using this fast convolution, the function looks like this:

func_convolution <- function(n) {
  M = matrix(0, nrow=n, ncol=n)
  for(row in 1:n) {
      M[row,] = convolve(dbinom(x = 0:n, size = row-1, prob = 0.75),
                         dpois(x = n:0, lambda = 1), 
                         type = "open")[1:n]
  }
  return(M)
}
func_convolution(3)
##            [,1]      [,2]      [,3]
## [1,] 0.36787944 0.3678794 0.1839397
## [2,] 0.09196986 0.3678794 0.3218945
## [3,] 0.02299247 0.1609473 0.3563832
all.equal(func_forloop(3), func_Rcpp(3), func_convolution(3))
## [1] TRUE
bm = microbenchmark(func_forloop(50),
                    func_Rcpp(50),
                    func_convolution(50),
                    unit = "s")
print(bm, unit = "ms")
## Unit: milliseconds
##                  expr     min       lq      mean   median       uq      max
##      func_forloop(50) 80.1737 82.31445 88.196659 83.88585 88.52630 143.7698
##         func_Rcpp(50) 11.1441 11.22830 11.513228 11.25875 11.41995  17.5220
##  func_convolution(50)  1.9157  2.07120  2.140127  2.11385  2.19880   3.2247
##  neval
##    100
##    100
##    100

So, the convolution method performs even better than the Rcpp implementation. What is the drawback? Well, the convolution method is extremely fast, but only very narrowly applicable.

Vectorized

We can also vectorize to remove the need for the for loop entirely:

func_vectorized <- function(n) {
  M = matrix(0, nrow=n, ncol=n)
  
  M = t(sapply(X = 1:n, FUN = function(X) {
    convolve(dbinom(x = 0:n, size = X-1, prob = 0.75),
                         dpois(x = n:0, lambda = 1), 
                         type = "open")[1:n]
  }))

  return(M)
}
func_vectorized(3)
##            [,1]      [,2]      [,3]
## [1,] 0.36787944 0.3678794 0.1839397
## [2,] 0.09196986 0.3678794 0.3218945
## [3,] 0.02299247 0.1609473 0.3563832
bm = microbenchmark(func_convolution(300),
                    func_vectorized(300),
                    unit = "s")
print(bm, unit = "ms")
## Unit: milliseconds
##                   expr      min       lq     mean   median       uq      max
##  func_convolution(300) 185.9818 189.2231 202.6476 191.6806 193.8182 340.8995
##   func_vectorized(300) 187.5353 190.1851 200.1388 192.5210 194.6368 280.7497
##  neval
##    100
##    100

However, you can see that in this case, the for loop is essentially the same speed as the vectorized solution.

Conclusions

Rcpp can be applied to many more types of problems to optimize your R code, but you should always be aware that small computational tricks such as convolution can vastly improve the performance of your code. Knowing how to take advantage of C++ to speed up portions of your code can save tremendous amounts of time for complicated or long-running programs. The goal should be to program in R, use code profiling to find bottlenecks, and optimize those bottlenecks (either using Rcpp, or refactoring the code, or using mathematical tricks).

Related