Narrow Results#

Each occurrence record contains taxonomic information and information about the observation itself, like its location and the date of observation. These pieces of information are recorded and categorised into respective fields. When you import data using galah, columns of the resulting tibble correspond to these fields.

Data fields are important because they provide a means to manipulate queries to return only the information that you need, and no more. Consequently, much of the architecture of galah has been designed to make narrowing as simple as possible. These arguments include:

  • taxa

  • filters

  • group_by

taxa#

Perhaps unsurprisingly, galah.search_taxa() searches for taxonomic information. It uses fuzzy matching to work a lot like the search bar on the Atlas of Living Australia website, and you can use it to search for taxa by their scientific name. Finding your desired taxon with galah.search_taxa() is an important step to using this taxonomic information to download data with galah.

For example, to search for reptiles, we first need to identify whether we have the correct query:

>>> import galah
>>> galah.search_taxa(taxa="Reptilia")
  scientificName scientificNameAuthorship                                                             taxonConceptID   rank   matchType   kingdom    phylum    classs order family genus species   issues                            vernacularName
0       REPTILIA                           https://biodiversity.org.au/afd/taxa/682e1228-5b3c-45ff-833b-550efd40c399  class  exactMatch  Animalia  Chordata  Reptilia                             noIssue  Snakes, Lizards, Monitors And Crocodiles

Once we know that our search matches the correct taxon or taxa, we can use it as an argument to narrow the results of our queries:

>>> galah.atlas_counts(taxa="Reptilia")
   totalRecords
0       2095197

If you’re using an international atlas, galah.search_taxa() will automatically switch to using the local name-matching service. We have the Brazilian atlas as an example here:

>>> galah.galah_config(atlas="Brazil")
>>> galah.atlas_counts(taxa="Ramphastos")
None

filters#

Perhaps the most important argument in galah is filters, which is used to filter the rows of queries:

>>> # Get total record count since 2000
>>> galah.atlas_counts(filters="year>2000")
   totalRecords
0     135369690
>>> # Get total record count for iNaturalist in 2021
>>> galah.atlas_counts(filters=["dataResourceName=iNaturalist Australia","year=2021"])
   totalRecords
0       1055756

To find available fields and corresponding valid values, use the field lookup functions galah.show_all(), galah.search_all() & show_values().

Finally, a special case of filters is to make more complex taxonomic queries than are possible using galah.search_taxa(). By using the taxonConceptID field, it is possible to build queries that exclude certain taxa, for example. This can be useful for paraphyletic concepts such as invertebrates:

>>> animalia_id = galah.search_taxa(taxa="Animalia")["taxonConceptID"][0]
>>> chordata_id = galah.search_taxa(taxa="Chordata")["taxonConceptID"][0]
>>> galah.atlas_counts(filters=["taxonConceptID={}".format(animalia_id),"taxonConceptID!={}".format(chordata_id)],group_by="class")
                  class    count
0              Anthozoa   335768
1           Aplacophora     1053
2             Arachnida  1180817
3   Archiacanthocephala       56
4            Asteroidea    85131
..                  ...      ...
64             Symphyla      825
65        Tantulocarida        7
66          Tentaculata      943
67          Thecostraca    21855
68            Trematoda    35955

[69 rows x 2 columns]

use_data_profile#

When working with the ALA, a notable feature is the ability to specify a profile to remove records that are suspect in some way. Profiles are groups of data quality filters.

galah.galah_config(data_profile="ALA")
galah.atlas_counts(filter="year>2000",use_data_profile=True)
   totalRecords
0     112635922

To see a full list of data quality profiles, use galah.show_all(profiles=True).