Skip to contents

GBIF nodes store content in hundreds of different fields, and users often require thousands or millions of records at a time. To reduce time taken to download data, and limit complexity of the resulting tibble, it is sensible to restrict the fields returned by atlas_occurrences(). This function allows easy selection of fields, or commonly-requested groups of columns, following syntax shared with dplyr::select().

The full list of available fields can be viewed with show_all(fields). Note that select() and galah_select() are supported for all atlases that allow downloads, with the exception of GBIF, for which all columns are returned.

Usage

galah_select(..., group)

# S3 method for data_request
select(.data, ..., group)

Arguments

...

zero or more individual column names to include

group

string: (optional) name of one or more column groups to include. Valid options are "basic", "event" "taxonomy", "media" and "assertions".

.data

An object of class data_request, created using galah_call()

Value

A tibble specifying the name and type of each column to include in the call to atlas_counts() or atlas_occurrences().

Details

Calling the argument group = "basic" returns the following columns:

  • decimalLatitude

  • decimalLongitude

  • eventDate

  • scientificName

  • taxonConceptID

  • recordID

  • dataResourceName

  • occurrenceStatus

Using group = "event" returns the following columns:

  • eventRemarks

  • eventTime

  • eventID

  • eventDate

  • samplingEffort

  • samplingProtocol

Using group = "media" returns the following columns:

  • multimedia

  • multimediaLicence

  • images

  • videos

  • sounds

Using group = "taxonomy" returns higher taxonomic information for a given query. It is the only group that is accepted by atlas_species() as well as atlas_occurrences().

Using group = "assertions" returns all quality assertion-related columns. The list of assertions is shown by show_all_assertions().

For atlas_occurrences(), arguments passed to ... should be valid field names, which you can check using show_all(fields). For atlas_species(), it should be one or more of:

  • counts to include counts of occurrences per species.

  • synonyms to include any synonymous names.

  • lists to include authoritiative lists that each species is included on.

See also

search_taxa(), galah_filter() and galah_geolocate() for other ways to restrict the information returned by atlas_occurrences() and related functions; atlas_counts() for how to get counts by levels of variables returned by galah_select; show_all(fields) to list available fields.

Examples

if (FALSE) {
# Download occurrence records of *Perameles*, 
# Only return scientificName and eventDate columns
galah_config(email = "your-email@email.com")
galah_call() |>
  galah_identify("perameles")|>
  galah_select(scientificName, eventDate) |>
  atlas_occurrences()

# Only return the "basic" group of columns and the basisOfRecord column
galah_call() |>
  galah_identify("perameles") |>
  galah_select(basisOfRecord, group = "basic") |>
  atlas_occurrences()
  
# When used in a pipe, `galah_select()` and `select()` are synonymous.
# Hence the previous example can be rewritten as:
request_data() |>
  identify("perameles") |>
  select(basisOfRecord, group = "basic") |>
  collect()
}