Skip to contents

Select (and optionally rename) variables in a data frame, using a concise mini-language that makes it easy to refer to variables based on their name. Note that unlike calling select() on a local tibble, this implementation is only evaluated at the collapse() stage, meaning any errors or messages will be triggered at the end of the pipe.

select() supports dplyr selection helpers, including:

  • everything: Matches all variables.

  • last_col: Select last variable, possibly with an offset.

Other helpers select variables by matching patterns in their names:

Or from variables stored in a character vector:

  • all_of: Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

  • any_of: Same as all_of(), except that no error is thrown for names that don't exist.

Or using a predicate function:

  • where: Applies a function to all variables and selects those for which the function returns TRUE.

Usage

# S3 method for class 'data_request'
select(.data, ..., group)

galah_select(..., group)

Arguments

.data

An object of class data_request, created using galah_call().

...

Zero or more individual column names to include.

group

string: (optional) name of one or more column groups to include. Valid options are "basic", "event" "taxonomy", "media" and "assertions".

Value

A tibble specifying the name and type of each column to include in the call to atlas_counts() or atlas_occurrences().

Details

GBIF nodes store content in hundreds of different fields, and users often require thousands or millions of records at a time. To reduce time taken to download data, and limit complexity of the resulting tibble, it is sensible to restrict the fields returned by occurrence queries. The full list of available fields can be viewed with show_all(fields). Note that select() and galah_select() are supported for all atlases that allow downloads, with the exception of GBIF, for which all columns are returned.

Calling the argument group = "basic" returns the following columns:

  • decimalLatitude

  • decimalLongitude

  • eventDate

  • scientificName

  • taxonConceptID

  • recordID

  • dataResourceName

  • occurrenceStatus

Using group = "event" returns the following columns:

  • eventRemarks

  • eventTime

  • eventID

  • eventDate

  • samplingEffort

  • samplingProtocol

Using group = "media" returns the following columns:

  • multimedia

  • multimediaLicence

  • images

  • videos

  • sounds

Using group = "taxonomy" returns higher taxonomic information for a given query. It is the only group that is accepted by atlas_species() as well as atlas_occurrences().

Using group = "assertions" returns all quality assertion-related columns. The list of assertions is shown by show_all_assertions().

For atlas_occurrences(), arguments passed to ... should be valid field names, which you can check using show_all(fields). For atlas_species(), it should be one or more of:

  • counts to include counts of occurrences per species.

  • synonyms to include any synonymous names.

  • lists to include authoritative lists that each species is included on.

See also

filter(), st_crop() and identify() for other ways to restrict the information returned; show_all(fields) to list available fields.

Examples

if (FALSE) { # \dontrun{
# Download occurrence records of *Perameles*, 
# Only return scientificName and eventDate columns
galah_config(email = "your-email@email.com")
galah_call() |>
  identify("perameles")|>
  select(scientificName, eventDate) |>
  collect()

# Only return the "basic" group of columns and the basisOfRecord column
galah_call() |>
  identify("perameles") |>
  select(basisOfRecord, group = "basic") |>
  collect()
  
# When used in a pipe, `galah_select()` and `select()` are synonymous.
# Hence the previous example can be rewritten as:
galah_call() |>
  galah_identify("perameles") |>
  galah_select(basisOfRecord, group = "basic") |>
  collect()
} # }