Jonathan Dobres

compound inequalities in R

Suppose you’re writing some code and you want to check to see if a variable, say x, is between two other numbers. The standard way to do this comparison in most languages, including R, would be something like:

x >= 10 & x <= 20

Not too shabby, all things considered. Of course, the repetition of the variable name is unfortunate, especially if you’re checking a.very.long.variable.name. You also have to flip either the second comparison operator or the order of its arguments, which can be a little confusing to read. Utility functions like dplyr::between give you the ability to write:

between(x, 10, 20)

That’s more convenient, but between assumes that the comparison should be inclusive on both sides; it implements >= and <=, but not > or <. We could write our own version of between to allow flags for whether each side of the comparison should be inclusive, but at that point the syntax gets more cumbersome than the thing it’s meant to replace.

What we really want is the ability to write the comparison as a compound inequality, which you might remember from grade school. They look like this:

10 < x <= 20

This succinctly expresses that x should be greater than 10 and less than or equal to 20. Expressing exclusive or inclusive comparisons on either side is as easy as changing the operators, and you type the compared variables and values only once.

It turns out that it’s possible to implement custom compound inequality operators in R, and it’s not hard. The main hurdle is that the custom operator needs to do one of two things: either store the result of a new comparison but return the original data (to be passed along to the second comparison), or return the result of a comparison that already has some stored results attached. Since R allows us to set arbitrary attributes on data primitives, we can easily meet these requirements.

'%<<%' <- function(lhs, rhs) {
  
  if (is.null(attr(lhs, 'compound-inequality-partial'))) {
    out <- rhs
    attr(out, 'compound-inequality-partial') <- lhs < rhs
  } else {
    out <- lhs < rhs & attr(lhs, 'compound-inequality-partial')
  }
  
  return(out)
}

That’s all we need to implement compound “less-than” comparisons (we’ll also need similar definitions for <<=, >>, and >>=, but more on that in a minute). I’m using %<<% instead of just %<% because the complementary %>% operator would conflict with the widely used pipe operator. The doubled symbol also neatly reinforces that this operator is meant to evaluate two comparisons.

Whenever the custom %<<% is encountered, it checks to see if the argument on the lefthand side has an attribute called compound-inequality-partial. If it doesn’t, the result of the comparison is attached to the original data as an attribute with that name, and the modified data are returned. If the attribute name exists, the function checks to see whether both the comparison stored in compound-inequality-partial and the second comparison are true. That’s all there is to it.

Implementing this functionality for the remaining three comparisons is pretty repetitive, so let’s instead write a generalized comparison function. To do this, we’ll take advantage of the fact that in R, 1 + 2 can also be invoked as '+'(1, 2) or do.call('+', list(1, 2)). Here’s our generalized helper:

compound.inequality <- function(lhs, rhs, comparison) {
  if (is.null(attr(lhs, 'compound-inequality-partial'))) {
    out <- rhs
    attr(out, 'compound-inequality-partial') <- do.call(comparison, list(lhs, rhs))
  } else {
    out <- do.call(comparison, list(lhs, rhs)) & attr(lhs, 'compound-inequality-partial')
  }
  
  return(out)
}

And then the definitions for all four operators become:

'%<<%' <- function(lhs, rhs) {
  return(compound.inequality(lhs, rhs, '<'))
}

'%<<=%' <- function(lhs, rhs) {
  return(compound.inequality(lhs, rhs, '<='))
}

'%>>%' <- function(lhs, rhs) {
  return(compound.inequality(lhs, rhs, '>'))
}

'%>>=%' <- function(lhs, rhs) {
  return(compound.inequality(lhs, rhs, '>='))
}

The comparison 2 %<<% 4 %<<% 6 is true. If you only wrote the first comparison, you’d get back 4, with TRUE attached as an attribute. Now it’s easy, and very readable, to specify compound “between” comparisons that are inclusive or exclusive on either side.

I’m not sure that I’d actually use this in “real” scripts, since these types of comparisons are relatively rare for me. But it was a fun problem to think about, and useful enough to share. This surprisingly simple solution demonstrates the flexibility of the R language. Code for this project is also available on Github.