
Download data
Martin Westgate & Dax Kellie
2023-01-18
Source:vignettes/download_data.Rmd
download_data.Rmd
The atlas_
functions are used to return data from the
atlas chosen using galah_config()
. They are:
atlas_counts
atlas_occurrences
atlas_species
atlas_media
atlas_taxonomy
The final atlas_
function - atlas_citation
- is unusual in that it does not return any new data. Instead it
provides a citation for an existing dataset ( downloaded using
atlas_occurrences
) that has an associated DOI. The other
functions are described below.
Record counts
atlas_counts()
provides summary counts on records in the
specified atlas, without needing to download all the records.
galah_config(atlas = "Australia")
# Total number of records in the ALA
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 113038010
In addition to the filter arguments, it has an optional
group_by
argument, which provides counts binned by the
requested field.
galah_call() |>
galah_group_by(kingdom) |>
atlas_counts()
## # A tibble: 10 × 2
## kingdom count
## <chr> <int>
## 1 Animalia 85615774
## 2 Plantae 23758697
## 3 Fungi 2086631
## 4 Chromista 854754
## 5 Protista 145019
## 6 Bacteria 71523
## 7 Protozoa 3250
## 8 Eukaryota 1344
## 9 Archaea 1106
## 10 Virus 495
Species lists
A common use case of atlas data is to identify which species occur in
a specified region, time period, or taxonomic group.
atlas_species()
is similar to search_taxa
, in
that it returns taxonomic information and unique identifiers in a
tibble
. It differs in not being able to return information
on taxonomic levels other than the species; but also in being more
flexible by supporting filtering:
species <- galah_call() |>
galah_identify("Rodentia") |>
galah_filter(stateProvince == "Northern Territory") |>
atlas_species()
species |> head()
## # A tibble: 6 × 10
## kingdom phylum class order family genus species author species_guid verna…¹
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
## 1 Animalia Chordata Mammalia Rodentia Muridae Mesembriomys Mesembriomys gouldii (J.E. Gray, 1843) https://biodiversity.org.… Black-…
## 2 Animalia Chordata Mammalia Rodentia Muridae Zyzomys Zyzomys argurus (Thomas, 1889) https://biodiversity.org.… Common…
## 3 Animalia Chordata Mammalia Rodentia Muridae Pseudomys Pseudomys hermannsburgensis (Waite, 1896) https://biodiversity.org.… Sandy …
## 4 Animalia Chordata Mammalia Rodentia Muridae Notomys Notomys alexis Thomas, 1922 https://biodiversity.org.… Spinif…
## 5 Animalia Chordata Mammalia Rodentia Muridae Melomys Melomys burtoni (Ramsay, 1887) https://biodiversity.org.… Grassl…
## 6 Animalia Chordata Mammalia Rodentia Muridae Mus Mus musculus Linnaeus, 1758 https://biodiversity.org.… House …
## # … with abbreviated variable name ¹vernacular_name
Occurrence data
To download occurrence data you will need to specify your email in
galah_config()
. This email must be associated with an
active ALA account. See more information in the config
section
galah_config(email = "your_email@email.com", atlas = "Australia")
Download occurrence records for Eolophus roseicapilla
occ <- galah_call() |>
galah_identify("Eolophus roseicapilla") |>
galah_filter(
stateProvince == "Australian Capital Territory",
year >= 2010,
profile = "ALA"
) |>
galah_select(institutionID, group = "basic") |>
atlas_occurrences()
occ |> head()
## # A tibble: 6 × 9
## decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recor…¹ dataR…² occur…³ insti…⁴
## <dbl> <dbl> <dttm> <chr> <chr> <chr> <chr> <chr> <lgl>
## 1 -35.9 149. 2020-09-12 14:00:00 Eolophus roseicapilla https://biodiversity.org.au/a… 17f46d… eBird … PRESENT NA
## 2 -35.9 149. 2021-09-27 14:00:00 Eolophus roseicapilla https://biodiversity.org.au/a… dbb711… eBird … PRESENT NA
## 3 -35.9 149. 2012-01-18 13:00:00 Eolophus roseicapilla https://biodiversity.org.au/a… 4f7cd7… BirdLi… PRESENT NA
## 4 -35.9 149. 2017-03-17 13:00:00 Eolophus roseicapilla https://biodiversity.org.au/a… 3236c4… eBird … PRESENT NA
## 5 -35.9 149. 2020-11-14 13:00:00 Eolophus roseicapilla https://biodiversity.org.au/a… ef2b90… eBird … PRESENT NA
## 6 -35.8 149. 2021-04-02 13:00:00 Eolophus roseicapilla https://biodiversity.org.au/a… 45a589… eBird … PRESENT NA
## # … with abbreviated variable names ¹recordID, ²dataResourceName, ³occurrenceStatus, ⁴institutionID
Media metadata
In addition to text data describing individual occurrences and their
attributes, ALA stores images, sounds and videos associated with a given
record. Metadata on these records can be downloaded to R
using atlas_media()
and the same set of filters as the
other data download functions.
media_data <- galah_call() |>
galah_identify("Eolophus roseicapilla") |>
galah_filter(
year == 2020,
cl22 == "Australian Capital Territory") |>
atlas_media()
media_data |> head()
## # A tibble: 6 × 20
## decima…¹ decim…² eventDate scien…³ taxon…⁴ recor…⁵ dataR…⁶ occur…⁷ multi…⁸ media…⁹ mime_…˟ size_…˟ date_…˟ date_…˟ height width
## <dbl> <dbl> <dttm> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <int> <chr> <chr> <int> <int>
## 1 -35.6 149. 2020-08-04 01:50:00 Eoloph… https:… 063bb0… iNatur… PRESENT Image 2f4d32… image/… 2654217 2020-0… 2020-0… 1200 1800
## 2 -35.6 149. 2020-08-04 01:50:00 Eoloph… https:… 063bb0… iNatur… PRESENT Image 734074… image/… 2422643 2020-0… 2020-0… 1200 1800
## 3 -35.6 149. 2020-08-04 01:50:00 Eoloph… https:… 063bb0… iNatur… PRESENT Image 89171c… image/… 2212660 2020-0… 2020-0… 1200 1800
## 4 -35.6 149. 2020-08-04 01:50:00 Eoloph… https:… 063bb0… iNatur… PRESENT Image e681d3… image/… 3414736 2020-0… 2020-0… 1200 1800
## 5 -35.5 149. 2020-08-26 01:53:00 Eoloph… https:… 286841… iNatur… PRESENT Image 1295c2… image/… 863158 2021-0… 2021-0… 1200 1800
## 6 -35.5 149. 2020-10-14 02:34:00 Eoloph… https:… 064a39… iNatur… PRESENT Image f97686… image/… 955916 2020-1… 2020-1… 1200 1800
## # … with 4 more variables: creator <chr>, license <chr>, data_resource_uid <chr>, occurrence_id <chr>, and abbreviated variable names
## # ¹decimalLatitude, ²decimalLongitude, ³scientificName, ⁴taxonConceptID, ⁵recordID, ⁶dataResourceName, ⁷occurrenceStatus, ⁸multimedia,
## # ⁹media_id, ˟mime_type, ˟size_in_bytes, ˟date_uploaded, ˟date_taken
To actually download the media files to your computer, use [collect_media()].
Taxonomic trees
atlas_taxonomy
provides a way to build taxonomic trees
from one clade down to another using ALA’s internal taxonomy. Specify
which taxonomic level your tree will go down to with
galah_down_to
.
classes <- galah_call() |>
galah_identify("chordata") |>
galah_down_to(class) |>
atlas_taxonomy()
This function is unique within galah
as it is the only
function that returns a data.tree
, rather than a
tibble
.
## levelName
## 1 Chordata
## 2 ¦--Cephalochordata
## 3 ¦ °--Amphioxi
## 4 ¦--Craniata
## 5 ¦ °--Agnatha
## 6 ¦ ¦--Cephalasipidomorphi
## 7 ¦ °--Myxini
## 8 ¦--Tunicata
## 9 ¦ ¦--Appendicularia
## 10 ¦ ¦--Ascidiacea
## 11 ¦ °--Thaliacea
## 12 °--Vertebrata
## 13 °--Gnathostomata
## 14 ¦--Amphibia
## 15 ¦--Aves
## 16 ¦--Mammalia
## 17 ¦--Pisces
## 18 ¦ ¦--Actinopterygii
## 19 ¦ ¦--Chondrichthyes
## 20 ¦ ¦--Cephalaspidomorphi
## 21 ¦ °--Sarcopterygii
## 22 °--Reptilia
Although the tree format is useful, converting to a
data.frame
is straightforward.
data.tree::ToDataFrameTypeCol(classes, type = "rank") |> head()
## rank_phylum rank_subphylum rank_superclass rank_informal rank_class
## 1 Chordata Cephalochordata <NA> <NA> Amphioxi
## 2 Chordata Craniata Agnatha <NA> Cephalasipidomorphi
## 3 Chordata Craniata Agnatha <NA> Myxini
## 4 Chordata Tunicata <NA> <NA> Appendicularia
## 5 Chordata Tunicata <NA> <NA> Ascidiacea
## 6 Chordata Tunicata <NA> <NA> Thaliacea
Configuring galah
Various aspects of the galah package can be customized. To preserve
configuration for future sessions, set profile_path
to a
location of a .Rprofile
file.
To download occurrence records, you will need to provide an email address registered with the ALA. You can create an account here. Once an email is registered with the ALA, it should be stored in the config:
galah_config(email = "myemail@gmail.com")
Caching
galah
can cache most results to local files. This means
that if the same code is run multiple times, the second and subsequent
iterations will be faster.
By default, this caching is session-based, meaning that the local files are stored in a temporary directory that is automatically deleted when the R session is ended. This behaviour can be altered so that caching is permanent, by setting the caching directory to a non-temporary location.
galah_config(cache_directory = "example/dir")
By default, caching is turned off. To turn caching on, run
galah_config(caching = FALSE)
Setting the download reason
ALA requires that you provide a reason when downloading occurrence
data (via the galah atlas_occurrences()
function). The
reason is set as “scientific research” by default, but you can change
this using galah_config()
. See
show_all_reasons()
for valid download reasons.
galah_config(download_reason_id = your_reason_id)
Debugging
If things aren’t working as expected, more detail (particularly about
web requests and caching behaviour) can be obtained by setting the
verbose
configuration option:
galah_config(verbose = TRUE)