findInterval {base} R Documentation

## Find Interval Numbers or Indices

### Description

Given a vector of non-decreasing breakpoints in `vec`, find the interval containing each element of `x`; i.e., if `i <- findInterval(x,v)`, for each index `j` in `x` v[i[j]] ≤ x[j] < v[i[j] + 1] where v[0] := - Inf, v[N+1] := + Inf, and `N <- length(v)`. At the two boundaries, the returned index may differ by 1, depending on the optional arguments `rightmost.closed` and `all.inside`.

### Usage

```findInterval(x, vec, rightmost.closed = FALSE, all.inside = FALSE,
left.open = FALSE)
```

### Arguments

 `x` numeric. `vec` numeric, sorted (weakly) increasingly, of length `N`, say. `rightmost.closed` logical; if true, the rightmost interval, `vec[N-1] .. vec[N]` is treated as closed, see below. `all.inside` logical; if true, the returned indices are coerced into `1,...,N-1`, i.e., `0` is mapped to `1` and `N` to `N-1`. `left.open` logical; if true all the intervals are open at left and closed at right; in the formulas below, ≤ should be swapped with < (and > with ≥), and `rightmost.closed` means ‘leftmost is closed’. This may be useful, e.g., in survival analysis computations.

### Details

The function `findInterval` finds the index of one vector `x` in another, `vec`, where the latter must be non-decreasing. Where this is trivial, equivalent to `apply( outer(x, vec, ">="), 1, sum)`, as a matter of fact, the internal algorithm uses interval search ensuring O(n * log(N)) complexity where `n <- length(x)` (and `N <- length(vec)`). For (almost) sorted `x`, it will be even faster, basically O(n).

This is the same computation as for the empirical distribution function, and indeed, `findInterval(t, sort(X))` is identical to n * Fn(t; X[1],..,X[n]) where Fn is the empirical distribution function of X[1],..,X[n].

When `rightmost.closed = TRUE`, the result for `x[j] = vec[N]` ( = max(vec)), is `N - 1` as for all other values in the last interval.

`left.open = TRUE` is occasionally useful, e.g., for survival data. For (anti-)symmetry reasons, it is equivalent to using “mirrored” data, i.e., the following is always true:

```    identical(
findInterval( x,  v,      left.open= TRUE, ...) ,
N - findInterval(-x, -v[N:1], left.open=FALSE, ...) )
```

where `N <- length(vec)` as above.

### Value

vector of length `length(x)` with values in `0:N` (and `NA`) where `N <- length(vec)`, or values coerced to `1:(N-1)` if and only if `all.inside = TRUE` (equivalently coercing all x values inside the intervals). Note that `NA`s are propagated from `x`, and `Inf` values are allowed in both `x` and `vec`.

### Author(s)

Martin Maechler

`approx(*, method = "constant")` which is a generalization of `findInterval()`, `ecdf` for computing the empirical distribution function which is (up to a factor of n) also basically the same as `findInterval(.)`.

### Examples

```x <- 2:18
v <- c(5, 10, 15) # create two bins [5,10) and [10,15)
cbind(x, findInterval(x, v))

N <- 100
X <- sort(round(stats::rt(N, df = 2), 2))
tt <- c(-100, seq(-2, 2, len = 201), +100)
it <- findInterval(tt, X)
tt[it < 1 | it >= N] # only first and last are outside range(X)

##  'left.open = TRUE' means  "mirroring" :
N <- length(v)
stopifnot(identical(
findInterval( x,  v,  left.open=TRUE) ,
N - findInterval(-x, -v[N:1])))
```

[Package base version 3.6.0 Index]