The living atlas community provides tools to enable users to find, access,
combine and visualise data on biodiversity. 'galah' enables the R community
to directly access data and resources hosted by the living atlases. The
basic unit of observation is an occurrence record, based on the
'Darwin Core' data standard (https://dwc.tdwg.org); however galah
also
enables users to locate and download taxonomic information,
associated media such images or sounds, all while restricting their queries
to particular taxa or locations. Users can specify which columns are returned
by a query, or restrict their results to observations that meet particular
quality-control criteria.
Functions
Start a data query
galah_call()
Start to build a data query
Narrow your results
galah_identify()
oridentify()
Search for taxonomic identifiersgalah_filter()
orfilter()
Filter recordsgalah_select()
orselect()
Fields to report information forgalah_group_by()
orgroup_by()
Fields to group counts bygalah_geolocate()
orst_crop()
Specify a locationgalah_apply_profile()
Restrict to data that pass predefined checks (ALA only)galah_down_to()
Specify a taxonomic rankslice_head()
Choose the first n rows of a download
Download data
atlas_occurrences()
Download occurrence recordsatlas_counts()
orcount()
Count the number of records or species returned by a queryatlas_species()
Download species listsatlas_taxonomy()
Return a section of the ALA taxonomic treeatlas_media()
View images and sounds available to downloadcollect_media()
Download images and soundscollect_occurrences()
Download previously-defined sets of occurrence records
Look up information
search_taxa()
Search for taxa using a text-searchsearch_identifiers()
Search for taxa using taxonomic identifiersshow_all()
&search_all()
Data for generating filter queriesshow_values()
&search_values()
Show or search for values withinfields
,profiles
,lists
,collections
,datasets
orproviders
Manage cache
show_all_cached_files()
List previously cached files and their metadataclear_cached_files()
Clear previously cached files and their metadata
Configure session
galah_config()
Package configuration options
Cite
atlas_citation()
Citation for a dataset
Terminology
To get the most value from galah
, it is helpful to understand some
terminology. Each occurrence record contains taxonomic
information, and usually some information about the observation itself, such
as its location. In addition to this record-specific information, the living
atlases append contextual information to each record, particularly data from
spatial layers reflecting climate gradients or political boundaries. They
also run a number of quality checks against each record, resulting in
assertions attached to the record. Each piece of information
associated with a given occurrence record is stored in a field,
which corresponds to a column when imported to an
R data.frame
. See show_all(fields)
to view valid fields,
layers and assertions, or conduct a search using search_all(fields)
.
Data fields are important because they provide a means to filter
occurrence records; i.e. to return only the information that you need, and
no more. Consequently, much of the architecture of galah
has been
designed to make filtering as simple as possible.
Functions with the galah_
prefix offer ways to shape your query
call. Each galah_
function allows the user to filter in a different way.
Again, the function suffix reveals what each one does. galah_filter()
,
galah_select()
and galah_group_by()
intentionally match dplyr
's select()
,
filter()
and group_by()
functions, both in their name and how they they are
used. For example, you can use galah_select()
to choose what information
is returned as columns. Alternatively, you can use galah_filter()
to filter
the rows. You can also choose specific taxa with galah_identify()
or choose
a specific location using galah_geolocate()
.
By combining different filters, it is possible to build complex
queries to return only the most valuable information for a given problem.
A notable extension of the filtering approach is to remove records with low
'quality'. All living atlases perform quality control checks on all records
that they store. These checks are used to generate new fields, that can then
be used to filter out records that are unsuitable for particular applications.
However, there are many possible data quality checks, and it is not always
clear which are most appropriate in a given instance. Therefore, galah
supports data quality profiles, which can be passed to
galah_apply_profile()
to quickly remove undesirable records. A full list of
data quality profiles is returned by show_all(profiles)
. Note this service
is currently only available for the Australian atlas (ALA).
For those outside Australia, 'galah' is the common name of Eolophus roseicapilla, a widely-distributed Australian bird species.
Package design
In most cases, users will be primarily interested in using galah
to
return data from one of the living atlases. These functions are named with
the prefix atlas_
, followed by a suffix describing the information that
they provide. For example, users that wish to download occurrence data can
use the function atlas_occurrences()
. Alternatively, users that wish to
download data on each species (rather than on each occurrence record) can use
atlas_species()
or download media content (largely images)
using atlas_media()
. Users can also assess how many records
meet their particular criteria using atlas_counts()
and return a taxonomic
tree for a specific clade from one level down to another level (e.g., from
family to genus). All functions return a data.frame
as their standard
format, except atlas_taxonomy()
which returns a data.tree
.
Functions in galah
are designed according to a nested architecture.
Users that require data should begin by locating the relevant atlas_
function; the arguments within that function then call correspondingly-named
galah_
functions; specific values that can be interpreted by those galah_
functions can be searched for or listed using search_all()
and show_all()
functions; desired taxa can be also be identified using search_taxa()
and
passed within galah_identify()
to the taxa
argument of atlas_
functions.
References
For more information on the ALA API, visit https://docs.ala.org.au/. If you have any questions, comments or suggestions, please email support@ala.org.au.
Author
Maintainer: Martin Westgate martin.westgate@csiro.au
Authors:
Matilda Stevenson
Dax Kellie dax.kellie@csiro.au
Peggy Newman peggy.newman@csiro.au