Today, we published the first version of a new R
package at github.com/thomasWeise/functionComposeR for composing and canonicalizing functions. When we combine functions in R
in the form of g(f(x))
, we have the problem that the result is rarely human readable. This results from two problems. The first problem is that variables inside the function are evaluated in the environment of the function and even if they are constants, they will remain as variables. Thus, when printing a function f(x)
, I may sometimes something like a k*x
inside, but may not know the value of k
, even though it may be perfectly known in the function's environment and a constant. The second problem is that this also applies to nested functions, so there may be something like f=function(x) x+g(x)
where g
is a well-defined function, but printing f
will not reveal the nature of g
. Both of these issues also make evaluating the functions slower, as we could resolve the variables to constants and inline the nested functions' bodies, but instead evaluate them as variables and function calls, respectively. With our new package, we try to solve all of these issues at once. We provide a tool for combining functions and one for canonicalizing functions, i.e., for resolving all resolve-able components of a function.
Motivating Example
As example, assume you have set k
and now want to compose the functions f<-function(x) { (k*x) + 7 }
and g<-function(x) { (k*k-x*x) / (x - sin(k)) }
to a function h
. You can do that by writing h<-function(x) g(f(x))
. Of course, if you later try to inspect h
and just write h
in the R
console, you will see exactly this: function(x) g(f(x))
.
This leads to two issues: First, if you do not know f
and g
, the output is meaningless and opaque, you cannot interpret it. Second, evaluating h
is unnecessarily slow: It performs two inner function calls and needs to evaluate a variable k
at several locations, although the value of k
should be fixed to 23
. Matter of fact, also k*k
and sin(k)
are constants which could be known.
The goal of our new function function.compose
is to resolve these two issues. If you do h<-function.compose(f, g)
instead of h<-function(x) g(f(x))
, a new function composed of both the bodies of f
and g
is created. Furthermore, as many of the variables and expressions in the body which can be resolved as possible are replaced by their constant result. Printing the result of h<-function.compose(f, g)
would yield the readable and fast function:
function (x) { x <- (23 * x) + 7 (529 - x * x)/(x - -0.846220404175171) }
Detailed Example
Let us now look at some more complex composed functions and also check the performance of the composed functions.
Simple Composition
First, we again compose two functions which also access some variables from the environment.
i<-45 j<-33 k<-23 f <- function(x) { (x*(x-i)) - x/sinh(k*cos(j-atan(k+j))) } g <- function(x) { abs(x)^(abs(1/(3-i))) + (j - k*exp(-i)) / ((i*j) * x) } h.1.plain <- function(x) g(f(x)) h.1.composed <- function.compose(f, g)
Printing h.1.plain
and h.1.composed
again reveals the difference between ordinary function composition in R
and function composition using our functionComposeR package:
h.1.plain # function(x) g(f(x)) h.1.composed # function (x) # { # x <- (x * (x - 45)) - x/4818399372.40284 # abs(x)^0.0238095238095238 + 33/(1485 * x) # }
Nested Function Composition
But we can also compose multiple functions, i.e., do
h.2.plain <- function(x) g(f(g(f(x)))) h.2.composed <- function.compose(function.compose(function.compose(f, g), f), g)
which yields functions of the form
h.2.plain # function(x) g(f(g(f(x)))) h.2.composed # function (x) # { # x <- { # x <- { # x <- (x * (x - 45)) - x/4818399372.40284 # abs(x)^0.0238095238095238 + 33/(1485 * x) # } # (x * (x - 45)) - x/4818399372.40284 # } # abs(x)^0.0238095238095238 + 33/(1485 * x) # }
Benchmarking
Let us finally evaluate the performance of the composed functions versus their plain counterparts. We therefore use the package microbenchmark.
x <- runif(1000) library(microbenchmark) microbenchmark(h.1.plain(x), h.1.composed(x), h.2.plain(x), h.2.composed(x)) # Unit: microseconds # expr min lq mean median uq max neval # h.1.plain(x) 78.841 79.4880 83.05224 79.9775 85.8485 119.824 100 # h.1.composed(x) 75.890 76.4675 93.23504 76.8385 78.9615 896.681 100 # h.2.plain(x) 153.793 154.8100 166.31210 155.5855 164.3685 743.360 100 # h.2.composed(x) 149.035 149.4870 155.25070 149.8895 154.0960 213.395 100
From the result, it becomes clearly visible that the upper quartiles of the runtime consumption of the composed functions are below the lower quartiles of the runtime consumption of their plain counterparts. Obviously, this very strongly depends on the example, but it is a clear indicator that our package can compose functions in way that is both human-readable and quick to evaluate.
Canonicalizing Functions
Canonicalizing an existing function means to resolve all of its resolvable components. This means that variables which can be replaced by constants will be, and that sub-expressions which can be evaluated to constants will be so as well.
f <- function(x) { 5+3+x } function.canonicalize(f) # function (x) # 8 + x z <- 24; g <- function(x) { tan(sin(z) + (z*27) / x) } function.canonicalize(g) # function (x) # tan(-0.905578362006624 + 648/x)
Installation Instructions
You can install the package directl from GitHub by using the package devtools
as follows:
library(devtools) install_github("thomasWeise/functionComposeR")
If devtools
is not yet installed on your machine, you need to FIRST do
install.packages("devtools")
I hope our package can be useful for other R
programmers. It is published under the LGPL v3 license.