This vignette is an attempt to provide a comprehensive overview over the behavior of the subsetting operators $, [[ and [, highlighting where the tibble implementation differs from the data frame implementation.
Results of the same code for data frames and tibbles are presented side by side:
new_df()#> a b cd#> 1 1 e 9#> 2 2 f 10, 11#> 3 3 g 12, 13, 14#> 4 4 h text
new_tbl()#> # A tibble: 4 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>#> 2 2 f <int [2]>#> 3 3 g <int [3]>#> 4 4 h <chr [1]>
In the following, if the results are identical (after converting to a data frame if necessary), only the tibble result is shown, as in the example below. This allows to spot differences easier.
new_tbl()#> # A tibble: 4 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>#> 2 2 f <int [2]>#> 3 3 g <int [3]>#> 4 4 h <chr [1]>
Subsetting operations are read-only. The same objects are reused in all examples:
df<-new_df()tbl<-new_tbl()
$
With $ subsetting, accessing a missing column gives a warning. Inexact matching is not supported:
With two indexes, a single element is returned. List columns are not unpacked by tibbles, the [[ only unpacks columns.
tbl[[2, "a"]]#> [1] 2
df[[2, "cd"]]#> [1] 10 11
tbl[[2, "cd"]]#> [[1]]#> [1] 10 11
df[[1:2, "cd"]]
#> Error in col[[i, exact = exact]]:#> subscript out of bounds
tbl[[1:2, "cd"]]
#> [1m [33mError [39m in [1m [1m`vectbl_as_row_location2()` at [0m
[1m#> tibble/R/subsetting.R:128:4: [0m
#> [33m! [39m Must extract row with a single valid
#> subscript.
#> [31m✖ [39m Subscript `1:2` has size 2 but must be
#> size 1.
tbl[[2, "c"]]#> NULL
df[[1:2, "c"]]#> NULL
tbl[[1:2, "c"]]
#> [1m [33mError [39m in [1m [1m`vectbl_as_row_location2()` at [0m
[1m#> tibble/R/subsetting.R:128:4: [0m
#> [33m! [39m Must extract row with a single valid
#> subscript.
#> [31m✖ [39m Subscript `1:2` has size 2 but must be
#> size 1.
Exotic variants like recursive indexing are deprecated for tibbles.
#> Error in `[.data.frame`(df, , 4):#> undefined columns selected
tbl[, 4]
#> Error in `tbl[, 4]`:#> ! Can't subset columns past the end.#> ℹ Location 4 doesn't exist.#> ℹ There are only 3 columns.
df[, NA]
#> Error in `[.data.frame`(df, , NA):#> undefined columns selected
tbl[, NA]
#> Error:#> ! Can't use NA as column index with `[`#> at positions 1, 2, and 3.
df[, NA_character_]
#> Error in `[.data.frame`(df, ,#> NA_character_): undefined columns#> selected
tbl[, NA_character_]
#> Error:#> ! Can't use NA as column index with `[`#> at position 1.
df[, NA_integer_]
#> Error in `[.data.frame`(df, ,#> NA_integer_): undefined columns selected
tbl[, NA_integer_]
#> Error:#> ! Can't use NA as column index with `[`#> at position 1.
Multiple columns can be queried by passing a vector of column indexes (names, positions, or even a logical vector). With the latter option, tibbles are a tad stricter:
tbl[c("a", "b")]#> # A tibble: 4 × 2#> a b #> <int><chr>#> 1 1 e #> 2 2 f #> 3 3 g #> 4 4 h
tbl[1:2]#> # A tibble: 4 × 2#> a b #> <int><chr>#> 1 1 e #> 2 2 f #> 3 3 g #> 4 4 h
tbl[1:3]#> # A tibble: 4 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>#> 2 2 f <int [2]>#> 3 3 g <int [3]>#> 4 4 h <chr [1]>
df[1:4]
#> Error in `[.data.frame`(df, 1:4):#> undefined columns selected
tbl[1:4]
#> Error in `tbl[1:4]`:#> ! Can't subset columns past the end.#> ℹ Location 4 doesn't exist.#> ℹ There are only 3 columns.
tbl[0:2]#> # A tibble: 4 × 2#> a b #> <int><chr>#> 1 1 e #> 2 2 f #> 3 3 g #> 4 4 h
df[-1:2]
#> Error in `[.default`(df, -1:2): only 0's#> may be mixed with negative subscripts
tbl[-1:2]
#> Error in `tbl[-1:2]`:#> ! Must subset columns with a valid#> subscript vector.#> ✖ Negative and positive locations can't#> be mixed.#> ℹ Subscript `-1:2` has 2 positive values#> at locations 3 and 4.
tbl[-1]#> # A tibble: 4 × 2#> b cd #> <chr><list>#> 1 e <dbl [1]>#> 2 f <int [2]>#> 3 g <int [3]>#> 4 h <chr [1]>
tbl[-(1:2)]#> # A tibble: 4 × 1#> cd #> <list>#> 1<dbl [1]>#> 2<int [2]>#> 3<int [3]>#> 4<chr [1]>
#> Error in `tbl[c(FALSE, TRUE)]`:#> ! Must subset columns with a valid#> subscript vector.#> ℹ Logical subscripts must match the size#> of the indexed input.#> ✖ Input has size 3 but subscript#> `c(FALSE, TRUE)` has size 2.
#> [1m [33mError [39m in [1m [1m`tbl[c(FALSE, TRUE, FALSE, [0m
[1m#> TRUE)]`: [0m
#> [33m! [39m Must subset columns with a valid
#> subscript vector.
#> [34mℹ [39m Logical subscripts must match the size
#> of the indexed input.
#> [31m✖ [39m Input has size 3 but subscript
#> `c(FALSE, TRUE, FALSE, TRUE)` has size
#> 4.
The same examples are repeated for two-dimensional indexing when omitting the row index:
tbl[, c("a", "b")]#> # A tibble: 4 × 2#> a b #> <int><chr>#> 1 1 e #> 2 2 f #> 3 3 g #> 4 4 h
tbl[, 1:2]#> # A tibble: 4 × 2#> a b #> <int><chr>#> 1 1 e #> 2 2 f #> 3 3 g #> 4 4 h
tbl[, 1:3]#> # A tibble: 4 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>#> 2 2 f <int [2]>#> 3 3 g <int [3]>#> 4 4 h <chr [1]>
df[, 1:4]
#> Error in `[.data.frame`(df, , 1:4):#> undefined columns selected
tbl[, 1:4]
#> Error in `tbl[, 1:4]`:#> ! Can't subset columns past the end.#> ℹ Location 4 doesn't exist.#> ℹ There are only 3 columns.
tbl[, 0:2]#> # A tibble: 4 × 2#> a b #> <int><chr>#> 1 1 e #> 2 2 f #> 3 3 g #> 4 4 h
df[, -1:2]
#> Error in .subset(x, j): only 0's may be#> mixed with negative subscripts
tbl[, -1:2]
#> Error in `tbl[, -1:2]`:#> ! Must subset columns with a valid#> subscript vector.#> ✖ Negative and positive locations can't#> be mixed.#> ℹ Subscript `-1:2` has 2 positive values#> at locations 3 and 4.
tbl[, -1]#> # A tibble: 4 × 2#> b cd #> <chr><list>#> 1 e <dbl [1]>#> 2 f <int [2]>#> 3 g <int [3]>#> 4 h <chr [1]>
#> Error in `tbl[, c(FALSE, TRUE)]`:#> ! Must subset columns with a valid#> subscript vector.#> ℹ Logical subscripts must match the size#> of the indexed input.#> ✖ Input has size 3 but subscript#> `c(FALSE, TRUE)` has size 2.
#> [1m [33mError [39m in [1m [1m`tbl[, c(FALSE, TRUE, FALSE, [0m
[1m#> TRUE)]`: [0m
#> [33m! [39m Must subset columns with a valid
#> subscript vector.
#> [34mℹ [39m Logical subscripts must match the size
#> of the indexed input.
#> [31m✖ [39m Input has size 3 but subscript
#> `c(FALSE, TRUE, FALSE, TRUE)` has size
#> 4.
Row subsetting with integer indexes works almost identical. Out-of-bounds subsetting is not recommended and may lead to an error in future versions. Another special case is subsetting with [1, , drop = TRUE] where the data frame implementation returns a list.
tbl[1, ]#> # A tibble: 1 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>
tbl[1, , drop =TRUE]#> # A tibble: 1 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>
tbl[1:2, ]#> # A tibble: 2 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>#> 2 2 f <int [2]>
tbl[0, ]#> # A tibble: 0 × 3#> # … with 3 variables: a <int>, b <chr>,#> # cd <list>#> # ℹ Use `colnames()` to see all variable names
tbl[integer(), ]#> # A tibble: 0 × 3#> # … with 3 variables: a <int>, b <chr>,#> # cd <list>#> # ℹ Use `colnames()` to see all variable names
tbl[5, ]#> # A tibble: 1 × 3#> a b cd #> <int><chr><list>#> 1NANA<NULL>
tbl[4:5, ]#> # A tibble: 2 × 3#> a b cd #> <int><chr><list>#> 1 4 h <chr [1]>#> 2NANA<NULL>
tbl[-1, ]#> # A tibble: 3 × 3#> a b cd #> <int><chr><list>#> 1 2 f <int [2]>#> 2 3 g <int [3]>#> 3 4 h <chr [1]>
df[-1:2, ]
#> Error in xj[i]: only 0's may be mixed#> with negative subscripts
tbl[-1:2, ]
#> [1m [33mError [39m in [1m [1m`vectbl_as_row_location()` at [0m
[1m#> tibble/R/subsetting.R:318:4: [0m
#> [33m! [39m Must subset rows with a valid
#> subscript vector.
#> [31m✖ [39m Negative and positive locations can't
#> be mixed.
#> [34mℹ [39m Subscript `-1:2` has 2 positive values
#> at locations 3 and 4.
tbl[NA, ]#> # A tibble: 4 × 3#> a b cd #> <int><chr><list>#> 1NANA<NULL>#> 2NANA<NULL>#> 3NANA<NULL>#> 4NANA<NULL>
tbl[NA_integer_, ]#> # A tibble: 1 × 3#> a b cd #> <int><chr><list>#> 1NANA<NULL>
tbl[c(NA, 1), ]#> # A tibble: 2 × 3#> a b cd #> <int><chr><list>#> 1NANA<NULL>#> 2 1 e <dbl [1]>
Row subsetting with logical indexes also works almost identical, the index vector must have length one or the number of rows with tibbles.
tbl[TRUE, ]#> # A tibble: 4 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>#> 2 2 f <int [2]>#> 3 3 g <int [3]>#> 4 4 h <chr [1]>
tbl[FALSE, ]#> # A tibble: 0 × 3#> # … with 3 variables: a <int>, b <chr>,#> # cd <list>#> # ℹ Use `colnames()` to see all variable names
df[c(TRUE, FALSE), ]#> a b cd#> 1 1 e 9#> 3 3 g 12, 13, 14
#> [1m [33mError [39m in [1m [1m`vectbl_as_row_location()` at [0m
[1m#> tibble/R/subsetting.R:320:4: [0m
#> [33m! [39m Must subset rows with a valid
#> subscript vector.
#> [34mℹ [39m Logical subscripts must match the size
#> of the indexed input.
#> [31m✖ [39m Input has size 4 but subscript
#> `c(TRUE, FALSE)` has size 2.
df[c(TRUE, FALSE, TRUE), ]#> a b cd#> 1 1 e 9#> 3 3 g 12, 13, 14#> 4 4 h text
#> [1m [33mError [39m in [1m [1m`vectbl_as_row_location()` at [0m
[1m#> tibble/R/subsetting.R:320:4: [0m
#> [33m! [39m Must subset rows with a valid
#> subscript vector.
#> [34mℹ [39m Logical subscripts must match the size
#> of the indexed input.
#> [31m✖ [39m Input has size 4 but subscript
#> `c(TRUE, FALSE, TRUE)` has size 3.
tbl[c(TRUE, FALSE, TRUE, FALSE), ]#> # A tibble: 2 × 3#> a b cd #> <int><chr><list>#> 1 1 e <dbl [1]>#> 2 3 g <int [3]>
df[c(TRUE, FALSE, TRUE, FALSE, TRUE), ]#> a b cd#> 1 1 e 9#> 3 3 g 12, 13, 14#> NA NA <NA> NULL
#> [1m [33mError [39m in [1m [1m`vectbl_as_row_location()` at [0m
[1m#> tibble/R/subsetting.R:320:4: [0m
#> [33m! [39m Must subset rows with a valid
#> subscript vector.
#> [34mℹ [39m Logical subscripts must match the size
#> of the indexed input.
#> [31m✖ [39m Input has size 4 but subscript
#> `c(TRUE, FALSE, TRUE, FALSE, TRUE)` has
#> size 5.
Indexing both row and column works more or less the same, except for drop:
df[1, "a"]#> [1] 1
tbl[1, "a"]#> # A tibble: 1 × 1#> a#> <int>#> 1 1
tbl[1, "a", drop =FALSE]#> # A tibble: 1 × 1#> a#> <int>#> 1 1