Skip to contents

About

galah is an R interface to biodiversity data hosted by the ‘living atlases’; a set of organisations that share a common codebase, and act as nodes of the Global Biodiversity Information Facility (GBIF). These organisations collate and store observations of individual life forms, using the ‘Darwin Core’ data standard.

galah enables users to locate and download species observations, taxonomic information, record counts, or associated media such as images or sounds. Users can restrict their queries to particular taxa or locations by specifying which columns and rows are returned by a query, or by restricting their results to observations that meet particular quality-control criteria. All functions return a tibble as their standard format.

Installation

To install from CRAN:

Or install the development version from GitHub:

install.packages("remotes")
remotes::install_github("AtlasOfLivingAustralia/galah")

Load the package

Choosing an atlas

By default, galah downloads information from the Atlas of Living Australia (ALA). To show the full list of Atlases currently supported by galah, use show_all(atlases).

show_all(atlases)
## # A tibble: 11 × 4
##    region         institution                                                             acronym url                         
##    <chr>          <chr>                                                                   <chr>   <chr>                       
##  1 Australia      Atlas of Living Australia                                               ALA     https://www.ala.org.au      
##  2 Austria        Biodiversitäts-Atlas Österreich                                         BAO     https://biodiversityatlas.at
##  3 Brazil         Sistemas de Informações sobre a Biodiversidade Brasileira               SiBBr   https://sibbr.gov.br        
##  4 Estonia        eElurikkus                                                              <NA>    https://elurikkus.ee        
##  5 France         Portail français d'accès aux données d'observation sur les espèces      OpenObs https://openobs.mnhn.fr     
##  6 Global         Global Biodiversity Information Facility                                GBIF    https://gbif.org            
##  7 Guatemala      Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt  https://snib.conap.gob.gt   
##  8 Portugal       GBIF Portugal                                                           GBIF.pt https://www.gbif.pt         
##  9 Spain          GBIF Spain                                                              GBIF.es https://www.gbif.es         
## 10 Sweden         Swedish Biodiversity Data Infrastructure                                SBDI    https://biodiversitydata.se 
## 11 United Kingdom National Biodiversity Network                                           NBN     https://nbn.org.uk

Use galah_config() to set the Atlas to use. This will automatically populate the server configuration for your selected Atlas. By default, the atlas is Australia.

galah_config(atlas = "United Kingdom")

Building queries

Functions that return data from the chosen atlas have the prefix atlas_; e.g. to find the total number of records in the atlas, use:

galah_config(atlas = "ALA")
atlas_counts()
## # A tibble: 1 × 1
##       count
##       <int>
## 1 132342396

To pass more complex queries, start with the galah_call() function and pipe additional arguments to modify the query. modifying functions have a galah_ prefix and support non-standard evaluation (NSE).

galah_call() |> 
  galah_filter(year >= 2020) |> 
  atlas_counts()
## # A tibble: 1 × 1
##      count
##      <int>
## 1 27234237

Alternatively, you can use a subset of dplyr verbs to pipe your queries, assuming you start with galah_call().

galah_call() |>
  filter(year >= 2020) |> 
  group_by(year) |>
  count() |>
  collect()
## # A tibble: 4 × 2
##   year    count
##   <chr>   <int>
## 1 2022  8409790
## 2 2021  8254306
## 3 2020  7124140
## 4 2023  3446001

To narrow the search to a particular taxonomic group, use galah_identify() or identify. Note that this function only accepts scientific names and is not case sensitive. It’s good practice to first use search_taxa() to check that the taxa you provide returns the correct taxonomic results.

search_taxa("reptilia") # Check whether taxonomic info is correct
## # A tibble: 1 × 9
##   search_term scientific_name taxon_concept_id                                              rank  match_type kingdom phylum class issues
##   <chr>       <chr>           <chr>                                                         <chr> <chr>      <chr>   <chr>  <chr> <chr> 
## 1 reptilia    REPTILIA        https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b… class exactMatch Animal… Chord… Rept… noIss…
galah_call() |>
  galah_filter(year >= 2020) |> 
  galah_identify("reptilia") |> 
  atlas_counts()
## # A tibble: 1 × 1
##    count
##    <int>
## 1 226794

Downloading records

The most common use case for galah is to download ‘occurrence’ records; observations of plants or animals made by contributors to the atlas. To download, first register with the relevant atlas, then provide your registration email. For GBIF queries, you will need to provide the email, username, and password that you have registered with GBIF.

galah_config(email = "email@email.com")

Search for fields and field IDs to filter your query.

search_all(fields, "australian states")
## # A tibble: 2 × 3
##   id     description                            type  
##   <chr>  <chr>                                  <chr> 
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22   Australian States and Territories      fields

Then you can customise records you require and query the atlas in question.

result <- galah_call() |>
  galah_identify("Litoria") |>
  galah_filter(year >= 2020, cl22 == "Tasmania") |>
  galah_select(basisOfRecord, group = "basic") |>
  atlas_occurrences()
## Retrying in 1 seconds.
## Retrying in 2 seconds.
## Retrying in 4 seconds.
result |> head()
## # A tibble: 6 × 9
##   recordID          scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus dataResourceName
##   <chr>             <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>            <chr>           
## 1 00250163-ec50-4e… Litoria        https://biodi…           -41.2             147. 2023-08-23 01:49:28 PRESENT          iNaturalist Aus…
## 2 003e0f63-9f95-4a… Litoria ewing… https://biodi…           -42.9             148. 2022-12-23 19:27:00 PRESENT          iNaturalist Aus…
## 3 00410554-5289-41… Litoria ewing… https://biodi…           -41.7             147. 2021-05-06 00:00:00 PRESENT          FrogID          
## 4 0081e7ef-459b-42… Litoria ewing… https://biodi…           -43.2             147. 2020-08-02 00:00:00 PRESENT          FrogID          
## 5 0086def1-8415-4b… Litoria ewing… https://biodi…           -41.2             147. 2020-12-31 00:00:00 PRESENT          FrogID          
## 6 00b40ee7-074b-4d… Litoria ewing… https://biodi…           -41.5             147. 2020-11-01 00:00:00 PRESENT          FrogID          
## # ℹ 1 more variable: basisOfRecord <chr>

Check out our other vignettes for more detail on how to use these functions.