Quick start guide
Martin Westgate & Dax Kellie
2024-04-09
Source:vignettes/quick_start_guide.Rmd
quick_start_guide.Rmd
galah is an R interface to biodiversity data hosted by the Global Biodiversity Information Facility (GBIF) and its subsidiary node organisations. GBIF and its partner nodes collate and store observations of individual life forms using the ‘Darwin Core’ data standard.
Installation
To install from CRAN:
install.packages("galah")
Or install the development version from GitHub:
install.packages("remotes")
remotes::install_github("AtlasOfLivingAustralia/galah")
Load the package
Configuration
By default, galah downloads information from the Atlas of Living
Australia (ALA). To show the full list of organisations currently
supported by galah, use show_all(atlases)
.
show_all(atlases)
## # A tibble: 11 × 4
## region institution acronym url
## <chr> <chr> <chr> <chr>
## 1 Australia Atlas of Living Australia ALA https://www.ala.org.au
## 2 Austria Biodiversitäts-Atlas Österreich BAO https://biodiversityatlas.at
## 3 Brazil Sistemas de Informações sobre a Biodiversidade Brasileira SiBBr https://sibbr.gov.br
## 4 Estonia eElurikkus <NA> https://elurikkus.ee
## 5 France Portail français d'accès aux données d'observation sur les espèces OpenObs https://openobs.mnhn.fr
## 6 Global Global Biodiversity Information Facility GBIF https://gbif.org
## 7 Guatemala Sistema Nacional de Información sobre Diversidad Biológica de Guatemala SNIBgt https://snib.conap.gob.gt
## 8 Portugal GBIF Portugal GBIF.pt https://www.gbif.pt
## 9 Spain GBIF Spain GBIF.es https://www.gbif.es
## 10 Sweden Swedish Biodiversity Data Infrastructure SBDI https://biodiversitydata.se
## 11 United Kingdom National Biodiversity Network NBN https://nbn.org.uk
Use galah_config()
to set the node organisation using
its region, name, or acronym. Once set, galah
will
automatically populate the server configuration for your selected GBIF
node. To download occurrence records from your chosen GBIF node, you
will need to register an account with them (using their website), then
provide your registration email to galah. To download from GBIF, you
will need to provide the email, username, and password.
galah_config(atlas = "GBIF",
username = "user1",
email = "email@email.com",
password = "my_password")
You can find a full list of configuration options by running
?galah_config
.
Basic syntax
The standard method to construct queries in galah is
via piped functions. Pipes in galah
start with the
galah_call()
function, and typically end with
collect()
, though collapse()
and
compute()
are also supported. The development team use the
base pipe by default (|>
), but the
magrittr pipe (%>%
) should work too.
galah_config(atlas = "ALA",
verbose = FALSE)
galah_call() |>
count() |>
collect()
## # A tibble: 1 × 1
## count
## <int>
## 1 133616691
To pass more complex queries, you can use additional
dplyr functions such as filter()
,
select()
, and group_by()
.
galah_call() |>
filter(year >= 2020) |>
count() |>
collect()
## # A tibble: 1 × 1
## count
## <int>
## 1 28235670
Each GBIF node allows you to query using their own set of in-built
fields. You can investigate which fields are available using
show_all()
and search_all()
:
search_all(fields, "australian states")
## # A tibble: 2 × 3
## id description type
## <chr> <chr> <chr>
## 1 cl2013 ASGS Australian States and Territories fields
## 2 cl22 Australian States and Territories fields
Taxonomic searches
To narrow your search to a particular taxonomic group, use
identify()
. Note that this function only accepts scientific
names and is not case sensitive. It’s good practice to first use
search_taxa()
to check that the taxa you provide returns
the correct taxonomic results.
search_taxa("reptilia") # Check whether taxonomic info is correct
## # A tibble: 1 × 9
## search_term scientific_name taxon_concept_id rank match_type kingdom phylum class issues
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 reptilia REPTILIA https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399 class exactMatch Animalia Chordata Reptilia noIssue
galah_call() |>
identify("reptilia") |>
filter(year >= 2020) |>
count() |>
collect()
## # A tibble: 1 × 1
## count
## <int>
## 1 252833
If you want to query something other than the number of records,
modify the type
argument in galah_call()
. Here
we’ll query the number of species:
galah_call(type = "species") |>
identify("reptilia") |>
filter(year >= 2020) |>
count() |>
collect()
## # A tibble: 1 × 1
## count
## <int>
## 1 866
Download
To download records—rather than find how many records are
available—simply remove the count()
function from your
pipe.
result <- galah_call() |>
identify("Litoria") |>
filter(year >= 2020, cl22 == "Tasmania") |>
select(basisOfRecord, group = "basic") |>
collect()
result |> head()
## # A tibble: 6 × 9
## recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus dataResourceName basisOfRecord
## <chr> <chr> <chr> <dbl> <dbl> <dttm> <chr> <chr> <chr>
## 1 00168ca6-84d0-4af1-8fa8-875fd69d25da Litoria raniform… https://biodi… -41.2 146. 2023-12-20 23:20:19 PRESENT iNaturalist Aus… HUMAN_OBSERV…
## 2 00250163-ec50-4eda-a5d5-58ae98bc5834 Litoria raniform… https://biodi… -41.2 147. 2023-08-23 01:49:28 PRESENT iNaturalist Aus… HUMAN_OBSERV…
## 3 003e0f63-9f95-4af9-b272-10db6d7b6371 Litoria ewingii https://biodi… -42.9 148. 2022-12-23 19:27:00 PRESENT iNaturalist Aus… HUMAN_OBSERV…
## 4 00410554-5289-416f-9848-74df4a814b93 Litoria ewingii https://biodi… -41.7 147. 2021-05-06 00:00:00 PRESENT FrogID OCCURRENCE
## 5 0070521f-bb45-46fb-8385-1a542c3a81a5 Litoria ewingii https://biodi… -43.1 147. 2023-12-20 03:29:23 PRESENT iNaturalist Aus… HUMAN_OBSERV…
## 6 0081e7ef-459b-42a9-8f0b-b3664ec94d0e Litoria ewingii https://biodi… -43.2 147. 2020-08-02 00:00:00 PRESENT FrogID OCCURRENCE
Check out our other vignettes for more detail on how to use these functions.