Select (and optionally rename) variables in a data frame, using
a concise mini-language that makes it easy to refer to variables based on
their name. Note that unlike calling select()
on a local tibble, this
implementation is only evaluated at the
collapse()
stage, meaning any errors
or messages will be triggered at the end of the pipe.
select()
supports dplyr
selection helpers, including:
everything
: Matches all variables.last_col
: Select last variable, possibly with an offset.
Other helpers select variables by matching patterns in their names:
starts_with
: Starts with a prefix.ends_with
: Ends with a suffix.contains
: Contains a literal string.matches
: Matches a regular expression.num_range
: Matches a numerical range like x01, x02, x03.
Or from variables stored in a character vector:
all_of
: Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.any_of
: Same asall_of()
, except that no error is thrown for names that don't exist.
Or using a predicate function:
where
: Applies a function to all variables and selects those for which the function returnsTRUE
.
Usage
# S3 method for class 'data_request'
select(.data, ..., group)
galah_select(..., group)
Arguments
- .data
An object of class
data_request
, created usinggalah_call()
.- ...
Zero or more individual column names to include.
- group
string
: (optional) name of one or more column groups to include. Valid options are"basic"
,"event"
"taxonomy"
,"media"
and"assertions"
.
Value
A tibble
specifying the name and type of each column to include in the
call to atlas_counts()
or atlas_occurrences()
.
Details
GBIF nodes store content in hundreds of different fields, and users often
require thousands or millions of records at a time. To reduce time taken to
download data, and limit complexity of the resulting tibble
, it is sensible
to restrict the fields returned by occurrence queries. The full list of
available fields can be viewed with show_all(fields)
. Note that select()
and galah_select()
are supported for all atlases that allow downloads, with
the exception of GBIF, for which all columns are returned.
Calling the argument group = "basic"
returns the following columns:
decimalLatitude
decimalLongitude
eventDate
scientificName
taxonConceptID
recordID
dataResourceName
occurrenceStatus
Using group = "event"
returns the following columns:
eventRemarks
eventTime
eventID
eventDate
samplingEffort
samplingProtocol
Using group = "media"
returns the following columns:
multimedia
multimediaLicence
images
videos
sounds
Using group = "taxonomy"
returns higher taxonomic information for a given
query. It is the only group
that is accepted by atlas_species()
as well
as atlas_occurrences()
.
Using group = "assertions"
returns all quality assertion-related
columns. The list of assertions is shown by show_all_assertions()
.
For atlas_occurrences()
, arguments passed to ...
should be valid field
names, which you can check using show_all(fields)
. For atlas_species()
,
it should be one or more of:
counts
to include counts of occurrences per species.synonyms
to include any synonymous names.lists
to include authoritative lists that each species is included on.
See also
filter()
,
st_crop()
and
identify()
for other ways to restrict
the information returned; show_all(fields)
to list available fields.
Examples
if (FALSE) { # \dontrun{
# Download occurrence records of *Perameles*,
# Only return scientificName and eventDate columns
galah_config(email = "your-email@email.com")
galah_call() |>
identify("perameles")|>
select(scientificName, eventDate) |>
collect()
# Only return the "basic" group of columns and the basisOfRecord column
galah_call() |>
identify("perameles") |>
select(basisOfRecord, group = "basic") |>
collect()
# When used in a pipe, `galah_select()` and `select()` are synonymous.
# Hence the previous example can be rewritten as:
galah_call() |>
galah_identify("perameles") |>
galah_select(basisOfRecord, group = "basic") |>
collect()
} # }