Skip to contents

"Filters" are arguments of the form field logical value that are used to narrow down the number of records returned by a specific query. For example, it is common for users to request records from a particular year (year == 2020), or to return all records except for fossils (basisOfRecord != "FossilSpecimen").

The result of galah_filter() can be passed to the filter argument in atlas_occurrences(), atlas_species(), atlas_counts() or atlas_media().

Usage

galah_filter(..., profile = NULL)

# S3 method for data_request
filter(.data, ...)

# S3 method for metadata_request
filter(.data, ...)

# S3 method for files_request
filter(.data, ...)

Arguments

...

filters, in the form field logical value

profile

[Deprecated] Use galah_apply_profile instead.

.data

An object of class files_request, created using request_files()

Value

A tibble containing filter values.

Details

galah_filter uses non-standard evaluation (NSE), and is designed to be as compatible as possible with dplyr::filter() syntax.

All statements passed to galah_filter() (except the profile argument) take the form of field - logical - value. Permissible examples include:

  • = or == (e.g. year = 2020)

  • !=, e.g. year != 2020)

  • > or >= (e.g. year >= 2020)

  • < or <= (e.g. year <= 2020)

  • OR statements (e.g. year == 2018 | year == 2020)

  • AND statements (e.g. year >= 2000 & year <= 2020)

In some cases R will fail to parse inputs with a single equals sign (=), particularly where statements are separated by & or |. This problem can be avoided by using a double-equals (==) instead.

Notes on behaviour

Separating statements with a comma is equivalent to an AND statement; Ergo galah_filter(year >= 2010 & year < 2020) is the same as galah_filter(year >= 2010, year < 2020).

All statements must include the field name; so galah_filter(year == 2010 | year == 2021) works, as does galah_filter(year == c(2010, 2021)), but galah_filter(year == 2010 | 2021) fails.

It is possible to use an object to specify required values, e.g. year_value <- 2010; galah_filter(year > year_value)

solr supports range queries on text as well as numbers; so this is valid: galah_filter(cl22 >= "Tasmania")

It is possible to filter by 'assertions', which are statements about data validity, e.g. to remove those lacking critical spatial or taxonomic data: galah_filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID") Valid assertions can be found using show_all(assertions).

See also

search_taxa() and galah_geolocate() for other ways to restrict the information returned by atlas_occurrences() and related functions. Use search_all(fields) to find fields that you can filter by, and show_values() to find what values of those filters are available.

Examples

if (FALSE) {
# Filter query results to return records of interest
galah_call() |>
  galah_filter(year >= 2019,
               basisOfRecord == "HumanObservation") |>
  atlas_counts()

# Alternatively, the same call using `dplyr` functions:
request_data() |>
  filter(year >= 2019,
               basisOfRecord == "HumanObservation") |>
  count() |>
  collect()
}