Indexing and subsetting

R has many powerful subset operators. Mastering them will allow you to easily perform complex operations on any kind of dataset.

There are many different ways we can subset any kind of object, and three different sub-setting operators for different data structures.

Sub-setting vectors

Let’s start by examining sub-setting in the simplest data structure, the vector.

sub-setting a vector always returns another vector.

First let’s create a vector

x <- 4:7
x
[1] 4 5 6 7

sub-setting using [ and element indices

Extracting single elements

To extract elements of a vector we can use the square bracket operator ([) and the target element index, starting from one to indicate the first element (as R is a 1 indexed language):

x[1]
[1] 4
x[4]
[1] 7

It may look different, but the square brackets operator is a function and means “get me the nth element”.

If we ask for an index beyond the length of the vector, R will return a missing value (NA):

x[6]
[1] NA

If we ask for the 0th element, we get an empty vector:

x[0]
integer(0)

Extracting multiple elements

We can also ask for multiple elements at once:

x[c(1, 3)]
[1] 4 6

Or slices of the vector:

x[2:4]
[1] 5 6 7

We can ask for the same element multiple times:

x[c(1,1,3)]
[1] 4 4 6

Excluding and removing elements

If we use a negative number as the index of a vector, R will return every element except for the one specified:

x[-2]
[1] 4 6 7

We can also skip multiple elements:

x[c(-1, -5)]  # or x[-c(1,5)]
[1] 5 6 7

In general, be aware that the result of sub-setting using indices could change if the vector is reordered.

sub-setting using element names

If the vector has a name attribute, we can subset the vector more precisely using the element’s name.

names(x) <- c("a", "b", "c", "d")

x[c("a", "c")]
a c 
4 6 

sub-setting using names is the most robust way to extract elements. The position of various elements can sometimes change when chaining together sub-setting operations, but the names will always remain the same!

sub-setting using logical vectors

We can also use any logical vector to subset:

x[c(FALSE, FALSE, TRUE, TRUE)]
c d 
6 7 

Since comparison operators (e.g. >, <, ==) evaluate to logical vectors, we can also use them to succinctly subset vectors: the following statement gives the same result as the previous one.

x[x > 5]
c d 
6 7 

Breaking it down, this statement first evaluates x > 5, generating a logical vector c(FALSE, FALSE, TRUE, TRUE), and then selects the elements of x corresponding to the TRUE values.

We can use == to mimic the previous method of indexing by name (remember you have to use == rather than = for comparisons):

x[names(x) == "a"]
a 
4 

Avoid using == to compare numbers unless they are integers! See function dplyr::near() instead.

We also might want to subset using a vector of potential values, that might not necessarily have matches in x.

In this case we can use %in% (which standsfor is member of):

x[names(x) %in% c("a", "c", "e")]
a c 
4 6 

Excluding named elements

Excluding or removing named elements is a little more involved.

If we try to skip one named element by negating the string, R complains (slightly obscurely) that it doesn’t know how to take the negative of a string:

x[-"a"]
Error in -"a": invalid argument to unary operator

However, we can use the != (not-equal) operator to construct a logical vector that will do what we want:

x[names(x) != "a"]
b c d 
5 6 7 

Excluding multiple named indices requires a different tactic through.

To perform such a subset robustly, we need to combine %in% and the negation operator !.

x[!names(x) %in% c("a","c")]
b d 
5 7 

This checks whether names of x take any value of the values in c("a","c"), returning the elements where the condition is TRUE. The ! then negates the selection, returning only the elements whose names are not contained in c("a","c").

Matrix sub-setting

As matrices are just 2d vectors, all the sub-setting operations using the [ can also be applied to matrices.

sub-setting using element indices

Let’s create a matrix

m <- matrix(1:12, ncol = 4, nrow = 3)
m
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12

Indexing matrices with [ takes two arguments: the first expression is applied to the rows, the second to the columns:

Say we want the 2 and 3rd rows of the last and first column (in that order) of our matrix. We can use all the sub-setting we learned for vectors and apply them to each dimension of our matrix.

m[2:3, c(4,1)]
     [,1] [,2]
[1,]   11    2
[2,]   12    3

sub-setting whole rows or columns

We can leave the first or second arguments blank to retrieve all the rows or columns respectively:

m[, 2:3]
     [,1] [,2]
[1,]    4    7
[2,]    5    8
[3,]    6    9
m[2:3,]
     [,1] [,2] [,3] [,4]
[1,]    2    5    8   11
[2,]    3    6    9   12

If we only access one row or column of a matrix, R will automatically convert the result to a vector:

m[3,]
[1]  3  6  9 12

If we want to keep the output as a matrix, we need to specify a third argument; drop = FALSE:

m[3, , drop = FALSE]
     [,1] [,2] [,3] [,4]
[1,]    3    6    9   12

Tip: Higher dimensional arrays

When dealing with multi-dimensional arrays, each argument to [ corresponds to a dimension. For example, a 3D array, the first three arguments correspond to the rows, columns, and depth dimension.

sub-setting lists

There are three functions used to subset lists and extract individual elements: [, [[, and $.

sub-setting list elements

Using [ will always return a list. If you want to subset a list, but not extract an element, then you will likely use [.

xlist <- list(a = "ACCE DTP Course", b = 1:10, data = head(iris))

sub-setting by element indices

As with vectors, we can use element indices and [ to subset lists.

xlist[1]
$a
[1] "ACCE DTP Course"

This returns a list with one element.

We can use multiple indices to subset multiple list elements:

xlist[1:2]
$a
[1] "ACCE DTP Course"

$b
 [1]  1  2  3  4  5  6  7  8  9 10

sub-setting by name

We can also use names:

xlist[c("a", "b")]
$a
[1] "ACCE DTP Course"

$b
 [1]  1  2  3  4  5  6  7  8  9 10

Using a single [ accesses the list as if it were a vector and returns a list.

Comparison operations involving the contents of list elements however won’t work as they are not accessible at the level of [ indexing.

Extracting individual elements

Extracting individual elements allow us to access the objects contained in a list, which can be any type of object. Hence the result depends on the object each element contains.

To extract individual elements of a list, we use the double-square bracket operator: [[.

Extracting by element index

Again we can use element indices to extract the object contained in an element.

xlist[[2]]
 [1]  1  2  3  4  5  6  7  8  9 10

Notice that now the result is a vector, not a list, which is what the second element contained.

You can’t extract more than one element at once:

xlist[[1:2]]
Error in xlist[[1:2]]: subscript out of bounds

Nor use it to skip elements:

xlist[[-1]]
Error in xlist[[-1]]: invalid negative subscript in get1index <real>

Extracting by element name

We can however use single names to extract elements:

xlist[["a"]]
[1] "ACCE DTP Course"
The $ operator

The $ operator is a shorthand way for extracting single elements by name:

xlist$data
  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa
List sub-setting challenge

Given the following list:

xlist <- list(a = "ACCE DTP Course", b = 1:10, data = head(iris))

and using your knowledge of both list and vector sub-setting, extract the number 2 from xlist.

Hint: the number 2 is contained within the “b” item in the list.

xlist <- list(a = "ACCE DTP Course", b = 1:10, data = head(iris))

xlist$b[2]
[1] 2

sub-setting data.frames

Data frames are lists underneath the hood, so similar sub-setting rules apply. However they are also two dimensional objects.

sub-setting data.frames as a list

Using [ to subset

Let’s use the in-built data.frame called trees to experiment.

First let’s use str() on trees to examine it’s contents:

str(trees)
'data.frame':   31 obs. of  3 variables:
 $ Girth : num  8.3 8.6 8.8 10.5 10.7 10.8 11 11 11.1 11.2 ...
 $ Height: num  70 65 63 72 81 83 66 75 80 75 ...
 $ Volume: num  10.3 10.3 10.2 16.4 18.8 19.7 15.6 18.2 22.6 19.9 ...

Using the [ operator with one argument will act the same way as for lists, where each list element corresponds to a column. The resulting object will be a data.frame.

trees[1]
   Girth
1    8.3
2    8.6
3    8.8
4   10.5
5   10.7
6   10.8
7   11.0
8   11.0
9   11.1
10  11.2
11  11.3
12  11.4
13  11.4
14  11.7
15  12.0
16  12.9
17  12.9
18  13.3
19  13.7
20  13.8
21  14.0
22  14.2
23  14.5
24  16.0
25  16.3
26  17.3
27  17.5
28  17.9
29  18.0
30  18.0
31  20.6
trees["Girth"]
   Girth
1    8.3
2    8.6
3    8.8
4   10.5
5   10.7
6   10.8
7   11.0
8   11.0
9   11.1
10  11.2
11  11.3
12  11.4
13  11.4
14  11.7
15  12.0
16  12.9
17  12.9
18  13.3
19  13.7
20  13.8
21  14.0
22  14.2
23  14.5
24  16.0
25  16.3
26  17.3
27  17.5
28  17.9
29  18.0
30  18.0
31  20.6

Using [[ to extract

Similarly, [[ will act to extract a single column as a vector:

trees[[1]]
 [1]  8.3  8.6  8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
[16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
[31] 20.6
trees[["Girth"]]
 [1]  8.3  8.6  8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
[16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
[31] 20.6

And $ provides a convenient shorthand to extract columns by name:

trees$Girth
 [1]  8.3  8.6  8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
[16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
[31] 20.6

sub-setting data.frames as a matrix

With two arguments, [ behaves the same way as for matrices:

trees[1:5, c("Girth", "Volume")]
  Girth Volume
1   8.3   10.3
2   8.6   10.3
3   8.8   10.2
4  10.5   16.4
5  10.7   18.8

If we subset a single row, the result will be a data.frame (because the elements are mixed types):

trees[3,]
  Girth Height Volume
3   8.8     63   10.2

But for a single column the result will be a vector.

trees[, "Girth"]
 [1]  8.3  8.6  8.8 10.5 10.7 10.8 11.0 11.0 11.1 11.2 11.3 11.4 11.4 11.7 12.0
[16] 12.9 12.9 13.3 13.7 13.8 14.0 14.2 14.5 16.0 16.3 17.3 17.5 17.9 18.0 18.0
[31] 20.6

This can be changed with the third argument, drop = FALSE).

trees[, "Girth", drop = FALSE]
   Girth
1    8.3
2    8.6
3    8.8
4   10.5
5   10.7
6   10.8
7   11.0
8   11.0
9   11.1
10  11.2
11  11.3
12  11.4
13  11.4
14  11.7
15  12.0
16  12.9
17  12.9
18  13.3
19  13.7
20  13.8
21  14.0
22  14.2
23  14.5
24  16.0
25  16.3
26  17.3
27  17.5
28  17.9
29  18.0
30  18.0
31  20.6

Advanced R Cheat Sheet

Back to top