Data types, structures and classes

Base types

Every object has a base type. Overall there are 25 different base object types that form the core of the R language.

Base data types

Base data types form the building blocks of all data structures and objects in R.

There are 5 base data types: double, integer, complex, logical, character as well as NULL.

No matter how complicated your analyses become, all data in R is interpreted as one of these basic data types.

You can inspect the type of a value or object through function typeof().

typeof(3.14)
[1] "double"
typeof(1L) # The L suffix forces the number to be an integer, since by default R uses float numbers
[1] "integer"
typeof(TRUE)
[1] "logical"
typeof("banana")
[1] "character"
typeof(NULL)
[1] "NULL"
NA values

In R, NA stands for “Not Available” and is used to represent missing or undefined values. It serves as a placeholder to indicate that a value is not present or cannot be determined for a particular observation in a dataset.

There are different types of NA in R, including NA_integer_, NA_real_, NA_complex_, NA_character_, and NA_logical_, corresponding to different data types. These are used to represent missing values in specific data types. The default NA data type is logical.

typeof(NA)
[1] "logical"
typeof(NA_character_)
[1] "character"

Data Structures

In R, data structures are ways to organize and store data so that it can be efficiently accessed and manipulated. R provides several built-in data structures, each suited for different types of data and operations.

Arrays and type coercion

The distinguishing feature of arrays is that all values are of the same data type.

Arrays can take values of any base data type and span any number of dimensions. However, all values must be of the same base data type. This allows for efficient calculation and matrix mathematics. The strictness also has some really important consequences which introduces another key concept in R, that of type coercion.

Vectors and Type Coercion

Vectors

Vectors are one dimensional arrays.

To better understand the importance of data types and coercion, let’s look at the most basic data structure in R, the vector, which is a one-dimensional array.

All the variables we’ve created so far have been vectors of length 1 (i.e. they contain only a single element).

Another way to create a new vector is to use function vector(). You can specify the length of the vector with argument length.

my_vector <- vector(length = 3)
my_vector
[1] FALSE FALSE FALSE

A vector in R is essentially an ordered collection of things, with the special condition that everything in the vector must be the same basic data type.

If you don’t choose the datatype, it’ll default to logical.

typeof(my_vector)
[1] "logical"

Otherwise, you can declare an empty vector of whatever type you like using argument mode.

another_vector <- vector(mode = "character", length = 3)
another_vector
[1] "" "" ""

You can also create a vector of a series of numbers:

1:10
 [1]  1  2  3  4  5  6  7  8  9 10
seq(10)
 [1]  1  2  3  4  5  6  7  8  9 10
seq(1, 10, by = 0.1)
 [1]  1.0  1.1  1.2  1.3  1.4  1.5  1.6  1.7  1.8  1.9  2.0  2.1  2.2  2.3  2.4
[16]  2.5  2.6  2.7  2.8  2.9  3.0  3.1  3.2  3.3  3.4  3.5  3.6  3.7  3.8  3.9
[31]  4.0  4.1  4.2  4.3  4.4  4.5  4.6  4.7  4.8  4.9  5.0  5.1  5.2  5.3  5.4
[46]  5.5  5.6  5.7  5.8  5.9  6.0  6.1  6.2  6.3  6.4  6.5  6.6  6.7  6.8  6.9
[61]  7.0  7.1  7.2  7.3  7.4  7.5  7.6  7.7  7.8  7.9  8.0  8.1  8.2  8.3  8.4
[76]  8.5  8.6  8.7  8.8  8.9  9.0  9.1  9.2  9.3  9.4  9.5  9.6  9.7  9.8  9.9
[91] 10.0

You can also create vectors by combining individual elements using function c (for combine).

c(2, 6, 3)
[1] 2 6 3

Type coercion

Q: Given what we’ve learned so far, what do you think the following will produce?

c(2, 6, "3")
[1] "2" "6" "3"

This is something called type coercion, and it is the source of many surprises and the reason why we need to be aware of the basic data types and how R will interpret them.

When R encounters a mix of types (here numeric and character) to be combined into a single vector, it will force them all to be the same type.

Not all types can be coerced into another, rather, R has a coercion hierarchy rule. All values are converted to the lowest data type in the hierarchy.

R coercion rules:

logical -> integer -> numeric -> complex -> character

where -> can be read as “are transformed into”.

In our case, our 2, & 6 integer values where converted to character.

Some other examples:

c("a", TRUE)
[1] "a"    "TRUE"
c("FALSE", TRUE)
[1] "FALSE" "TRUE" 
c(0, TRUE)
[1] 0 1

You can try to force coercion against this flow using the as. functions:

chars <- c("0", "2", "4")
as.numeric(chars)
[1] 0 2 4
as.logical(chars)
[1] NA NA NA
[1] FALSE  TRUE  TRUE
as.logical(c(0, TRUE))
[1] FALSE  TRUE
as.logical(c("FALSE", TRUE))
[1] FALSE  TRUE
as.numeric(c("FALSE", TRUE))
Warning: NAs introduced by coercion
[1] NA NA
as.numeric(as.logical(c("FALSE", TRUE)))
[1] 0 1

As you can see, some surprising things can happen when R forces one basic data type into another!

If your data isn’t the data type you expected, type coercion may well be to blame; make sure everything is the same type in your vectors or you might get nasty surprises!

Inspecting vectors

We can ask a few questions about vectors:

sequence_example <- seq(10)

# Get the data type of a vector
typeof(sequence_example)
[1] "integer"
# Get the first six elements of the vector
head(sequence_example)
[1] 1 2 3 4 5 6
# Get the last 4 elements of the vector
tail(sequence_example, n = 4)
[1]  7  8  9 10
# Get the length of the vector
length(sequence_example)
[1] 10
# Get the structure of the vector
str(sequence_example)
 int [1:10] 1 2 3 4 5 6 7 8 9 10

The somewhat cryptic output from this command indicates the basic data type found in this vector - in this case int, integer; an indication of the number of things in the vector - actually, the indexes of the vector, in this case [1:10]; and a few examples of what’s actually in the vector - in this case ascending integers.

Naming vectors

Finally, you can give names to elements in your vector:

my_example <- 5:8
names(my_example) <- c("a", "b", "c", "d")
my_example
a b c d 
5 6 7 8 
names(my_example)
[1] "a" "b" "c" "d"

Find out more about vectors

Matrices

Matrices are 2 dimensional arrays

The lengths of each dimension are defined by the number of rows and columns.

We can declare a matrix full of zeros:

matrix_example <- matrix(0, ncol = 6, nrow = 3)
matrix_example
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    0    0    0    0    0    0
[2,]    0    0    0    0    0    0
[3,]    0    0    0    0    0    0

We can get the number of dimensions of a matrix (or of any array with dimensions > 1) and their length with function dim().

dim(matrix_example)
[1] 3 6

Lists

Lists can store a mix of objects of any data type and class

Another key data structure is the list. List are the most flexible data structure because each element can hold any object, of any data type and dimension, including other lists.

Create lists using list() or coerce other objects to lists using as.list().

list(1, "a", TRUE)
[[1]]
[1] 1

[[2]]
[1] "a"

[[3]]
[1] TRUE
as.list(1:4)
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

We can also name list elements:

a_list <- list(title = "Numbers", numbers = 1:10, data = TRUE)
a_list
$title
[1] "Numbers"

$numbers
 [1]  1  2  3  4  5  6  7  8  9 10

$data
[1] TRUE

Lists are a base type:

typeof(a_list)
[1] "list"

Data.frames

Classes, S3, S4 and S6 type objects

Arrays and lists are all immutable base types. However, there are other types of objects in R.

These are S3, S4 & S6 type objects, with S3 being the most common.

Such objects have a class attribute (base types can have a class attribute too), enabling class specific functionality, a characteristic of object oriented programming. New classes can be created by users, allowing greater flexibility in the types of data structures available for analyses.

Learn more about object types

Data.frames

The most important S3 object class in R is the data.frame.

Data.frames are special types of lists.

Data.frames are used to store tabular data and are special types of lists where each element is a vector, each of equal length. So each column of a data.frame contains values of consistent data type but the data type can vary between columns (i.e. along rows).

df <- data.frame(
  id = 1:3,
  treatment = c("a", "b", "b"),
  complete = c(TRUE, TRUE, FALSE)
)
df
  id treatment complete
1  1         a     TRUE
2  2         b     TRUE
3  3         b    FALSE

We can check that our data.frame is a list under the hood:

typeof(df)
[1] "list"

As an S3 object, it also has a class attribute:

class(df)
[1] "data.frame"

We can check the dimensions of a data.frame

dim(df)
[1] 3 3

Get a certain number of rows from the top or bottom

head(df, 1)
  id treatment complete
1  1         a     TRUE
tail(df, 1)
  id treatment complete
3  3         b    FALSE

Importantly, we can display the structure of a data.frame.

str(df)
'data.frame':   3 obs. of  3 variables:
 $ id       : int  1 2 3
 $ treatment: chr  "a" "b" "b"
 $ complete : logi  TRUE TRUE FALSE
Back to top