Download data
Martin Westgate & Dax Kellie
2024-11-19
Source:vignettes/download_data.Rmd
download_data.Rmd
The atlas_
functions are used to return data from the
atlas chosen using galah_config()
. They are:
The final atlas_
function—atlas_citation()
—is unusual: It does not return
any new data, but instead provides a citation for an existing dataset
(downloaded using atlas_occurrences()
) with an associated
DOI. The other functions are described below.
It is equally permissable to use the type
argument of
galah_call()
to specify the kind of data you want, and then
retrieve the data using collect()
. Here we use the
atlas_
prefix for consistency with earlier versions of
galah, and because many atlas_
functions sometimes include
shortcuts to make life easier.
Record counts
atlas_counts()
provides summary counts of records in the
specified atlas without needing to download all the records first.
galah_config(atlas = "Australia")
# Total number of records in the ALA
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 146185520
Group and summarise record counts by specific fields using
galah_group_by()
.
galah_call() |>
galah_group_by(kingdom) |>
atlas_counts()
## # A tibble: 12 × 2
## kingdom count
## <chr> <int>
## 1 Animalia 113408280
## 2 Plantae 27572183
## 3 Fungi 2448600
## 4 Chromista 1057157
## 5 Protista 316541
## 6 Bacteria 113480
## 7 Archaea 4120
## 8 Virus 2382
## 9 Bamfordvirae 210
## 10 Orthornavirae 138
## 11 Viroid 104
## 12 Shotokuvirae 41
Species lists
A common use case of atlas data is to identify which species occur in
a specified region, time period, or taxonomic group.
atlas_species()
is similar to search_taxa()
,
in that it returns taxonomic information and unique identifiers, but
differs by returning information only on species and is far more
flexible by supporting filtering.
species <- galah_call() |>
galah_identify("Rodentia") |>
galah_filter(stateProvince == "Northern Territory") |>
atlas_species()
species |> head()
## # A tibble: 6 × 11
## taxon_concept_id species_name scientific_name_auth…¹ taxon_rank kingdom phylum class order family genus vernacular_name
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 https://biodive… Pseudomys d… (Gould, 1842) species Animal… Chord… Mamm… Rode… Murid… Pseu… Delicate Mouse
## 2 https://biodive… Mesembriomy… (J.E. Gray, 1843) species Animal… Chord… Mamm… Rode… Murid… Mese… Black-footed T…
## 3 https://biodive… Zyzomys arg… (Thomas, 1889) species Animal… Chord… Mamm… Rode… Murid… Zyzo… Common Rock-rat
## 4 https://biodive… Pseudomys h… (Waite, 1896) species Animal… Chord… Mamm… Rode… Murid… Pseu… Sandy Inland M…
## 5 https://biodive… Melomys bur… (Ramsay, 1887) species Animal… Chord… Mamm… Rode… Murid… Melo… Grassland Melo…
## 6 https://biodive… Notomys ale… Thomas, 1922 species Animal… Chord… Mamm… Rode… Murid… Noto… Spinifex Hoppi…
## # ℹ abbreviated name: ¹scientific_name_authorship
Occurrence data
To download occurrence data you will need to specify an email in
galah_config()
that has been registered to an account with
your selected GBIF node. See more information in the config section.
galah_config(email = "your_email@email.com", atlas = "Australia")
Download occurrence records for Eolophus roseicapilla.
occ <- galah_call() |>
galah_identify("Eolophus roseicapilla") |>
galah_filter(
stateProvince == "Australian Capital Territory",
year >= 2010,
profile = "ALA"
) |>
galah_select(institutionID, group = "basic") |>
atlas_occurrences()
## Retrying in 1 seconds.
## Retrying in 2 seconds.
## Retrying in 4 seconds.
occ |> head()
## # A tibble: 6 × 9
## recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus
## <chr> <chr> <chr> <dbl> <dbl> <dttm> <chr>
## 1 0000a928-d756-42eb… Eolophus rose… https://biodi… -35.6 149. 2017-04-19 09:11:00 PRESENT
## 2 0001bc78-d2e9-48aa… Eolophus rose… https://biodi… -35.2 149. 2019-08-13 15:13:00 PRESENT
## 3 0002064f-08ea-425b… Eolophus rose… https://biodi… -35.3 149. 2014-03-16 06:48:00 PRESENT
## 4 00022dd2-9f85-4802… Eolophus rose… https://biodi… -35.3 149. 2022-05-08 08:20:00 PRESENT
## 5 0002cc35-8d5a-4d20… Eolophus rose… https://biodi… -35.3 149. 2015-11-01 08:00:00 PRESENT
## 6 00030a8c-082f-44f0… Eolophus rose… https://biodi… -35.3 149. 2022-01-06 11:47:00 PRESENT
## # ℹ 2 more variables: dataResourceName <chr>, institutionID <lgl>
Media metadata
In addition to text data describing individual occurrences and their
attributes, ALA stores images, sounds and videos associated with a given
record. Metadata on these records can be downloaded using
atlas_media()
.
media_data <- galah_call() |>
galah_identify("Eolophus roseicapilla") |>
galah_filter(
year == 2020,
cl22 == "Australian Capital Territory") |>
atlas_media()
media_data |> head()
## # A tibble: 6 × 19
## media_id recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus
## <chr> <chr> <chr> <chr> <dbl> <dbl> <dttm> <chr>
## 1 ff8322d0-… 003a192… Eolophus rose… https://biodi… -35.3 149. 2020-09-12 16:11:00 PRESENT
## 2 c66fc819-… 015ee7c… Eolophus rose… https://biodi… -35.4 149. 2020-08-09 15:11:00 PRESENT
## 3 fe6d7b94-… 05e86b7… Eolophus rose… https://biodi… -35.4 149. 2020-11-13 22:29:00 PRESENT
## 4 2f4d32c0-… 063bb0f… Eolophus rose… https://biodi… -35.6 149. 2020-08-04 11:50:00 PRESENT
## 5 73407414-… 063bb0f… Eolophus rose… https://biodi… -35.6 149. 2020-08-04 11:50:00 PRESENT
## 6 89171c49-… 063bb0f… Eolophus rose… https://biodi… -35.6 149. 2020-08-04 11:50:00 PRESENT
## # ℹ 11 more variables: dataResourceName <chr>, multimedia <chr>, images <chr>, sounds <lgl>, videos <lgl>,
## # creator <chr>, license <chr>, mimetype <chr>, width <int>, height <int>, image_url <chr>
To actually download the media files to your computer, use [collect_media()].
media_data |>
collect_media()
Taxonomic trees
atlas_taxonomy()
provides a way to build taxonomic trees
from one clade down to another using each GBIF node’s internal taxonomy.
Specify which taxonomic level your tree will go down to with
galah_filter()
using the rank
argument.
galah_call() |>
galah_identify("chordata") |>
galah_filter(rank == class) |>
atlas_taxonomy()
## # A tibble: 19 × 4
## name rank parent_taxon_concept_id taxon_concept_id
## <chr> <chr> <chr> <chr>
## 1 Chordata phylum <NA> https://biodivers…
## 2 Cephalochordata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
## 3 Tunicata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
## 4 Appendicularia class https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
## 5 Ascidiacea class https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
## 6 Thaliacea class https://biodiversity.org.au/afd/taxa/1c20ed62-d918-4e42-b625-8b86d533cc51 https://biodivers…
## 7 Vertebrata subphylum https://biodiversity.org.au/afd/taxa/065f1da4-53cd-40b8-a396-80fa5c74dedd https://biodivers…
## 8 Agnatha informal https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers…
## 9 Myxini informal https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers…
## 10 Petromyzontida informal https://biodiversity.org.au/afd/taxa/66db22c8-891d-4b16-a1a2-b66feaeaa3e0 https://biodivers…
## 11 Gnathostomata informal https://biodiversity.org.au/afd/taxa/5d6076b1-b7c7-487f-9d61-0fea0111cc7e https://biodivers…
## 12 Amphibia class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 13 Aves class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 14 Mammalia class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 15 Reptilia class https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 16 Pisces informal https://biodiversity.org.au/afd/taxa/ef5515fd-a0a2-4e16-b61a-0f19f8900f76 https://biodivers…
## 17 Actinopterygii class https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
## 18 Chondrichthyes class https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
## 19 Sarcopterygii class https://biodiversity.org.au/afd/taxa/e22efeb4-2cb5-4250-8d71-61c48bdaa051 https://biodivers…
Configuring galah
Various aspects of the galah package can be customized.
To download occurrence records, species lists or media, you will need to provide an email address registered with the service that you want to use (e.g. for the ALA you can create an account here).
Once an email is registered, it should be stored in the config:
galah_config(email = "myemail@gmail.com")
Setting your directory
By default, galah stores downloads in a temporary folder, meaning that the local files are automatically deleted when the R session is ended. This behaviour can be altered so that downloaded files are preserved by setting the directory to a non-temporary location.
galah_config(directory = "example/dir")
Setting the download reason
ALA requires that you provide a reason when downloading occurrence
data (via the galah atlas_occurrences()
function).
reason
is set as “scientific research” by default, but you
can change this using galah_config()
. See
show_all(reasons)
for valid download reasons.
galah_config(download_reason_id = your_reason_id)
Debugging
If things aren’t working as expected, more detail (particularly about
web requests and caching behaviour) can be obtained by setting
verbose = TRUE
.
galah_config(verbose = TRUE)