Skip to contents

The filter() function is used to subset a data, retaining all rows that satisfy your conditions. To be retained, the row must produce a value of TRUE for all conditions. Unlike 'local' filters that act on a tibble, the galah implementations work by amending a query which is then enacted by collect() or one of the atlas_ family of functions (such as atlas_counts() or atlas_occurrences()).

Usage

# S3 method for class 'data_request'
filter(.data, ...)

# S3 method for class 'metadata_request'
filter(.data, ...)

# S3 method for class 'files_request'
filter(.data, ...)

galah_filter(..., profile = NULL)

Arguments

.data

An object of class data_request, metadata_request or files_request, created using galah_call() or related functions.

...

Expressions that return a logical value, and are defined in terms of the variables in the selected atlas (and checked using show_all(fields). If multiple expressions are included, they are combined with the & operator. Only rows for which all conditions evaluate to TRUE are kept.

profile

[Deprecated] Use galah_apply_profile instead.

Value

A tibble containing filter values.

Details

Syntax

filter.data_request() and galah_filter() uses non-standard evaluation (NSE), and are designed to be as compatible as possible with dplyr::filter() syntax. Permissible examples include:

  • == (e.g. year = 2020) but not = (for consistency with dplyr)

  • !=, e.g. year != 2020)

  • > or >= (e.g. year >= 2020)

  • < or <= (e.g. year <= 2020)

  • OR statements (e.g. year == 2018 | year == 2020)

  • AND statements (e.g. year >= 2000 & year <= 2020)

Some general tips:

  • Separating statements with a comma is equivalent to an AND statement; Ergo filter(year >= 2010 & year < 2020) is the same as _filter(year >= 2010, year < 2020).

  • All statements must include the field name; so filter(year == 2010 | year == 2021) works, as does filter(year == c(2010, 2021)), but filter(year == 2010 | 2021) fails.

  • It is possible to use an object to specify required values, e.g. year_value <- 2010; filter(year > year_value).

  • solr supports range queries on text as well as numbers; so filter(cl22 >= "Tasmania") is valid.

  • It is possible to filter by 'assertions', which are statements about data validity, such as filter(assertions != c("INVALID_SCIENTIFIC_NAME", "COORDINATE_INVALID"). Valid assertions can be found using show_all(assertions).

Exceptions

When querying occurrences, species, or their respective counts (i.e. all of the above examples), field names are checked internally against show_all(fields). There are some cases where bespoke field names are required, as follows.

When requesting a data download from a DOI, the field doi is valid, i.e.:

galah_call() |>
  filter(doi = "a-long-doi-string") |>
  collect()

For taxonomic metadata, the taxa field is valid:

request_metadata() |>
  filter(taxa == "Chordata") |>
  unnest()

For building taxonomic trees, the rank field is valid:

request_data() |>
  identify("Chordata") |>
  filter(rank == "class") |>
  atlas_taxonomy()

Media queries are more involved, but break two rules: they accept the media field, and they accept a tibble on the rhs of the equation. For example, users wishing to break down media queries into their respective API calls should begin with an occurrence query:

occurrences <- galah_call() |>
   identify("Litoria peronii) |>
   select(group = c("basic", "media") |>
   collect()

They can then use the media field to request media metadata:

media_metadata <- galah_call("metadata") |>
  filter(media == occurrences) |>
  collect()

And finally, the metadata tibble can be used to request files:

galah_call("files") |>
  filter(media == media_metadata) |>
  collect()

See also

select(), group_by() and geolocate() for other ways to amend the information returned by atlas_() functions. Use search_all(fields) to find fields that you can filter by, and show_values() to find what values of those filters are available.

Examples

if (FALSE) { # \dontrun{
galah_call() |>
  filter(year >= 2019,
         basisOfRecord == "HumanObservation") |>
  count() |>
  collect()
} # }