Skip to contents

"Filters" are arguments of the form field logical value that are used to narrow down the number of records returned by a specific query. For example, it is common for users to request records from a particular year (year == 2020), or to return all records except for fossils (basisOfRecord != "FossilSpecimen").

The result of galah_filter() can be passed to the filter argument in atlas_occurrences(), atlas_species(), atlas_counts() or atlas_media().

Usage

galah_filter(..., profile = NULL)

Arguments

...

filters, in the form field logical value

profile

[Soft-deprecated] Use galah_apply_profile instead.

If supplied, should be a string recording a data quality profile to apply to the query. See show_all_profiles() for valid profiles. By default no profile is applied.

Value

A tibble containing filter values.

Details

galah_filter uses non-standard evaluation (NSE), and is designed to be as compatible as possible with dplyr::filter() syntax.

All statements passed to galah_filter() (except the profile argument) take the form of field - logical - value. Permissible examples include:

  • = or == (e.g. year = 2020)

  • !=, e.g. year != 2020)

  • > or >= (e.g. year >= 2020)

  • < or <= (e.g. year <= 2020)

  • OR statements (e.g. year == 2018 | year == 2020)

  • AND statements (e.g. year >= 2000 & year <= 2020)

In some cases R will fail to parse inputs with a single equals sign (=), particularly where statements are separated by & or |. This problem can be avoided by using a double-equals (==) instead.

See also

search_taxa() and galah_geolocate() for other ways to restrict the information returned by atlas_occurrences() and related functions. Use search_all(fields) to find fields that you can filter by, and show_values() to find what values of those filters are available.

Examples

# Filter query results to return records of interest
galah_call() |>
  galah_filter(year >= 2019) |>
  atlas_counts()
#> # A tibble: 1 × 1
#>      count
#>      <int>
#> 1 20653942

galah_call() |>
  galah_filter(year >= 2019,
               basisOfRecord == "HumanObservation") |>
  atlas_counts()
#> # A tibble: 1 × 1
#>      count
#>      <int>
#> 1 20183443

galah_call() |>
  galah_filter(year >= 2019,
               basisOfRecord == "HumanObservation",
               stateProvince == "New South Wales") |>
  atlas_counts()
#> # A tibble: 1 × 1
#>     count
#>     <int>
#> 1 5791658
 
# Use filters to exclude particular values
galah_call() |> 
  galah_filter(year >= 2010 & year != 2021) |>
  atlas_counts()
#> # A tibble: 1 × 1
#>      count
#>      <int>
#> 1 47201699
if (FALSE) {
# Separating statements with a comma is equivalent to an `AND` statement
galah_filter(year >= 2010 & year < 2020) # is the same as:
galah_filter(year >= 2010, year < 2020)

# All statements must include the field name
galah_filter(year == 2010 | year == 2021) # this works (note double equals)
galah_filter(year == c(2010, 2021)) # same as above 
galah_filter(year == 2010 | 2021) # this fails
}
# It is possible to use an object to specify required values
# Numeric example
year_value <- 2010
galah_call() |>
  galah_filter(year > year_value) |>
  atlas_counts()
#> # A tibble: 1 × 1
#>      count
#>      <int>
#> 1 51961905

# Categorical example
basis_of_record <- c("HumanObservation", "MaterialSample")
galah_call() |>
  galah_filter(basisOfRecord == basis_of_record) |>
  atlas_counts()
#> # A tibble: 1 × 1
#>      count
#>      <int>
#> 1 87767333

# `solr` supports range queries on text as well as numbers. 
# e.g. query Australian States & Territories alphabetically after "Tasmania"
galah_call() |>
  galah_filter(cl22 >= "Tasmania") |>
  atlas_counts()
#> # A tibble: 1 × 1
#>      count
#>      <int>
#> 1 33750759

# Filter out specific records that could be unreliable using "assertions"
search_assertions("coordinate invalid")
#> # A tibble: 1 × 4
#>   id                 description        category type      
#>   <chr>              <chr>              <chr>    <chr>     
#> 1 COORDINATE_INVALID Coordinate invalid Warning  assertions
galah_call() |>
  galah_filter(COORDINATE_INVALID == FALSE) |>
  atlas_counts()
#> # A tibble: 1 × 1
#>       count
#>       <int>
#> 1 109160690