The tbl_df class is a subclass of data.frame, created in order to have different default behaviour. The colloquial term "tibble" refers to a data frame that has the tbl_df class. Tibble is the central data structure for the set of packages known as the tidyverse, including dplyr, ggplot2, tidyr, and readr.

The general ethos is that tibbles are lazy and surly: they do less and complain more than base data.frames. This forces problems to be tackled earlier and more explicitly, typically leading to code that is more expressive and robust.

Properties of tbl_df

Objects of class tbl_df have:

  • A class attribute of c("tbl_df", "tbl", "data.frame").

  • A base type of "list", where each element of the list has the same NROW().

  • A names attribute that is a character vector the same length as the underlying list.

  • A row.names attribute, included for compatibility with the base data.frame class. This attribute is only consulted to query the number of rows, any row names that might be stored there are ignored by most tibble methods.

Behavior of tbl_df

How default behaviour of tibbles differs from that of data.frames, during creation and access:

  • Column data is not coerced. A character vector is not turned into a factor. List-columns are expressly anticipated and do not require special tricks. Read more in tibble().

  • Recycling only happens for a length 1 input.

  • Column names are not munged, although missing names are auto-populated. Empty and duplicated column names are strongly discouraged, but the user must indicate how to resolve. Read more in name-repair.

  • Row names are not added and are strongly discouraged, in favor of storing that info as a column. Read about in rownames.

  • df[, j] returns a tibble; it does not automatically extract the column inside. df[, j, drop = FALSE] is the default. Read more in subsetting.

  • There is no partial matching when $ is used to index by name. df$name for a nonexistent name generates a warning. Read more in subsetting.

  • Printing and inspection are a very high priority. The goal is to convey as much information as possible, in a concise way, even for large and complex tibbles. Read more in formatting.

See also