Keep or drop columns using their names

Select (and optionally rename) variables in a data frame, using a concise mini-language that makes it easy to refer to variables based on their name. Note that unlike calling select() on a local tibble, this implementation is only evaluated at the collapse() stage, meaning any errors or messages will be triggered at the end of the pipe.

select() supports dplyr selection helpers, including:

everything: Matches all variables.
last_col: Select last variable, possibly with an offset.

Other helpers select variables by matching patterns in their names:

starts_with: Starts with a prefix.
ends_with: Ends with a suffix.
contains: Contains a literal string.
matches: Matches a regular expression.
num_range: Matches a numerical range like x01, x02, x03.

Or from variables stored in a character vector:

all_of: Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.
any_of: Same as all_of(), except that no error is thrown for names that don't exist.

Or using a predicate function:

where: Applies a function to all variables and selects those for which the function returns TRUE.

Usage

# S3 method for class 'data_request'
select(.data, ..., group)

galah_select(..., group)

Arguments

.data: An object of class data_request, created using galah_call().
...: Zero or more individual column names to include.
group: string: (optional) name of one or more column groups to include. Valid options are "basic", "event" "taxonomy", "media" and "assertions".

Value

A tibble specifying the name and type of each column to include in the call to atlas_counts() or atlas_occurrences().

Details

GBIF nodes store content in hundreds of different fields, and users often require thousands or millions of records at a time. To reduce time taken to download data, and limit complexity of the resulting tibble, it is sensible to restrict the fields returned by occurrence queries. The full list of available fields can be viewed with show_all(fields). Note that select() and galah_select() are supported for all atlases that allow downloads, with the exception of GBIF, for which all columns are returned.

Calling the argument group = "basic" returns the following columns:

decimalLatitude
decimalLongitude
eventDate
scientificName
taxonConceptID
recordID
dataResourceName
occurrenceStatus

Using group = "event" returns the following columns:

eventRemarks
eventTime
eventID
eventDate
samplingEffort
samplingProtocol

Using group = "media" returns the following columns:

multimedia
multimediaLicence
images
videos
sounds

Using group = "taxonomy" returns higher taxonomic information for a given query. It is the only group that is accepted by atlas_species() as well as atlas_occurrences().

Using group = "assertions" returns all quality assertion-related columns. The list of assertions is shown by show_all_assertions().

For atlas_occurrences(), arguments passed to ... should be valid field names, which you can check using show_all(fields). For atlas_species(), it should be one or more of:

counts to include counts of occurrences per species.
synonyms to include any synonymous names.
lists to include authoritative lists that each species is included on.

Examples

if (FALSE) { # \dontrun{
# Download occurrence records of *Perameles*, 
# Only return scientificName and eventDate columns
galah_config(email = "your-email@email.com")
galah_call() |>
  identify("perameles")|>
  select(scientificName, eventDate) |>
  collect()

# Only return the "basic" group of columns and the basisOfRecord column
galah_call() |>
  identify("perameles") |>
  select(basisOfRecord, group = "basic") |>
  collect()
  
# When used in a pipe, `galah_select()` and `select()` are synonymous.
# Hence the previous example can be rewritten as:
galah_call() |>
  galah_identify("perameles") |>
  galah_select(basisOfRecord, group = "basic") |>
  collect()
} # }

Usage

Arguments

Value

Details

See also

Examples