Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors).
tibble() is a nice way to create data frames. It encapsulates best practices for data frames:
It never changes an input’s type (i.e., no more
stringsAsFactors = FALSE!).
This makes it easier to use with list-columns:
List-columns are most commonly created by
do(), but they can be useful to create by hand.
It never adjusts the names of variables:
It evaluates its arguments lazily and sequentially:
It never uses
row.names(). The whole point of tidy data is to store variables in a consistent way. So it never stores a variable as special attribute.
It only recycles vectors of length 1. This is because recycling vectors of greater lengths is a frequent source of bugs.
tibble(), tibble provides
as_tibble() to coerce objects into tibbles. Generally,
as_tibble() methods are much simpler than
as.data.frame() methods, and in fact, it’s precisely what
as.data.frame() does, but it’s similar to
do.call(cbind, lapply(x, data.frame)) - i.e. it coerces each component to a data frame and then
cbinds() them all together.
as_tibble() has been written with an eye for performance:
The speed of
as.data.frame() is not usually a bottleneck when used interactively, but can be a problem when combining thousands of messy inputs into one tidy data frame.
There are three key differences between tibbles and data frames: printing, subsetting, and recycling rules.
When you print a tibble, it only shows the first ten rows and all the columns that fit on one screen. It also prints an abbreviated description of the column type:
You can control the default appearance with options:
options(tibble.print_max = n, tibble.print_min = m): if there are more than
n rows, print only the first
m rows. Use
options(tibble.print_max = Inf) to always show all rows.
options(tibble.width = Inf) will always print all columns, regardless of the width of the screen.
Tibbles are quite strict about subsetting.
[ always returns another tibble. Contrast this with a data frame: sometimes
[ returns a data frame and sometimes it just returns a vector:
To extract a single column use
Tibbles are also stricter with
$. Tibbles never do partial matching, and will throw a warning and return
NULL if the column does not exist:
As of version 1.4.1, tibbles no longer ignore the
When constructing a tibble, only values of length 1 are recycled. The first column with length different to one determines the number of rows in the tibble, conflicts lead to an error. This also extends to tibbles with zero rows, which is sometimes important for programming:
tibble(a = 1, b = 1:3) #> # A tibble: 3 x 2 #> a b #> <dbl> <int> #> 1 1 1 #> 2 1 2 #> 3 1 3 tibble(a = 1:3, b = 1) #> # A tibble: 3 x 2 #> a b #> <int> <dbl> #> 1 1 1 #> 2 2 1 #> 3 3 1 tibble(a = 1:3, c = 1:2) #> Error: Column `c` must be length 1 or 3, not 2 tibble(a = 1, b = integer()) #> # A tibble: 0 x 2 #> # ... with 2 variables: a <dbl>, b <int> tibble(a = integer(), b = 1) #> # A tibble: 0 x 2 #> # ... with 2 variables: a <int>, b <dbl>