Taxonomic Filtering#

Callum Waite, Shandiya Balasubramaniam

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.

>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")

search_taxa()#

search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

>>> galah.search_taxa(taxa="Petroica boodang")
                scientificName scientificNameAuthorship                                                             taxonConceptID     rank   matchType   kingdom    phylum classs          order       family     genus           species     issues vernacularName
0  Petroica (Petroica) boodang           (Lesson, 1838)  https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29  species  exactMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica  Petroica boodang  [noIssue]  Scarlet Robin

It can also return taxonomic information for multiple species, including synonyms and Indigneous names.

>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
                   scientificName    scientificNameAuthorship                                                             taxonConceptID     rank        matchType   kingdom    phylum classs          order       family     genus              species     issues    vernacularName
0   Petroica (Littlera) phoenicea                 Gould, 1837  https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552  species       exactMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica   Petroica phoenicea  [noIssue]       Flame Robin
1  Petroica (Petroica) goodenovii  (Vigors & Horsfield, 1827)  https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1  species  vernacularMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica  Petroica goodenovii  [noIssue]  Red-capped Robin

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella in three kingdoms:

>>> galah.search_taxa(taxa = ["Morganella"])
  scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species     issues vernacularName
0     Morganella                                                                                                          [homonym]
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
  scientificName scientificNameAuthorship                                      taxonConceptID   rank   matchType kingdom         phylum          classs       order       family       genus species   issues vernacularName
0     Morganella                   Zeller  https://id.biodiversity.org.au/name/fungi/60015036  genus  exactMatch   Fungi  Basidiomycota  Agaricomycetes  Agaricales  Agaricaceae  Morganella          noIssue

This disambiguation of the Morganella taxa can then be used by atlas_counts, atlas_occurrences, atlas_species or atlas_media by providing the keyword scientific_name to any of these functions.

>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
   totalRecords
0           182
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
     decimalLatitude  decimalLongitude             eventDate           scientificName                                      taxonConceptID                              recordID                                                      dataResourceName occurrenceStatus
0         -47.000000        168.200000  2002-04-30T00:00:00Z      Morganella compacta                                       NZOR-6-128055  f2d0672f-0f8d-467f-a70d-af0ad257743a                       New Zealand Fungal and Plant Disease Collection          PRESENT
1         -46.879900        168.136500  2021-04-15T00:00:00Z      Morganella compacta                                       NZOR-6-128055  450d6089-a004-4a4b-b281-50f60e7596bf                       New Zealand Fungal and Plant Disease Collection          PRESENT
2         -46.874875        168.124660  1983-02-13T00:00:00Z      Morganella compacta                                       NZOR-6-128055  eaae280b-c1de-463a-8cc6-3ee72cc00e83                       New Zealand Fungal and Plant Disease Collection          PRESENT
3         -46.862757        168.116777  1985-04-23T00:00:00Z      Morganella compacta                                       NZOR-6-128055  a08519f3-221b-4006-9803-98d8eeb771a8                       New Zealand Fungal and Plant Disease Collection          PRESENT
4         -46.554617        169.479051  1990-05-24T00:00:00Z      Morganella compacta                                       NZOR-6-128055  4b2011f7-ef20-428d-af8f-dc17c9656df6                       New Zealand Fungal and Plant Disease Collection          PRESENT
..               ...               ...                   ...                      ...                                                 ...                                   ...                                                                   ...              ...
177       -15.831780        145.335830  2020-01-20T13:20:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  aaa77490-aee9-44be-9b68-4fc8efe240f3                                                 iNaturalist Australia          PRESENT
178       -13.200000        130.700000  2014-01-25T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  e022611a-52ea-4530-8bc1-bfcef1eee43c                                                       INSDC Sequences          PRESENT
179              NaN               NaN                   NaN  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  be5f2bef-5036-4817-b7aa-aebfa45f4260                      Royal Botanic Gardens, Kew - Fungarium Specimens          PRESENT
180       -13.197500        130.699722  2014-01-25T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  61495a3c-d32d-4690-8474-5b5d778c9d79                         National Herbarium of Victoria (MEL) AVH data          PRESENT
181        -8.916700        148.150000  1953-07-15T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  633433fe-a3b4-4b4f-af12-36b54c1fd9ff  Centre for Australian National Biodiversity Research (CANB) AVH data          PRESENT

[182 rows x 8 columns]

filters=#

filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for single or multiple species. We can also group the multiple species by their species names so we can compare the number of records for each robin.

>>> galah.atlas_counts(taxa="Petroica boodang")
   totalRecords
0        176037
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
...                 "Petroica phoenicea", "Petroica rosea",
...                 "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
...     taxa=aus_petroica,
...     group_by=["species","vernacularName"]
... )
                  species               vernacularName   count
0        Petroica boodang        Eastern Scarlet Robin   29203
1        Petroica boodang                Scarlet Robin  113041
2        Petroica boodang  South-western Scarlet Robin   13275
3        Petroica boodang      Tasmanian Scarlet Robin   20518
4     Petroica goodenovii             Red-capped Robin  150217
5     Petroica multicolor                Pacific Robin    6870
6      Petroica phoenicea                  Flame Robin  103552
7   Petroica rodinogaster          Mainland Pink Robin    1104
8   Petroica rodinogaster                   Pink Robin   16916
9   Petroica rodinogaster         Tasmanian Pink Robin    3877
10         Petroica rosea                   Rose Robin   78564

This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:

>>> non_chordates = galah.atlas_counts(
...     filters=["kingdom=Animalia","phylum!=Chordata"],
...     group_by=["phylum"],
...     expand=False
... )
>>> non_chordates.head()
           phylum     count
0  Acanthocephala       486
1        Annelida    355773
2      Arthropoda  11989382
3     Brachiopoda      3143
4         Bryozoa     36196

filters=, search_taxa(), and taxonomic ranks#

Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
...     taxa="Pitta",
...     group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
                                 scientificName   taxonRank  count
0                                         Pitta       genus    234
1                          Pitta (Erythropitta)    subgenus      6
2            Pitta (Erythropitta) erythrogaster     species    139
3   Pitta (Erythropitta) erythrogaster digglesi  subspecies   1045
4                            Pitta (Pitta) iris     species   7313
5                       Pitta (Pitta) iris iris  subspecies    821
6              Pitta (Pitta) iris johnstoneiana  subspecies     50
7                      Pitta (Pitta) versicolor     species  32752
8           Pitta (Pitta) versicolor intermedia  subspecies   2293
9            Pitta (Pitta) versicolor simillima  subspecies    503
10          Pitta (Pitta) versicolor versicolor  subspecies   4199

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.

>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
...                 "Bettongia gaimardi", # Tasmanian Bettong
...                 "Melanodryas vittata", # Dusky Robin
...                 "Platycercus caledonicus",# Green Rosella
...                 "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
...                 "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
...               ]
>>> galah.search_taxa(taxa=tas_endemic)
                          scientificName scientificNameAuthorship                                                             taxonConceptID        rank   matchType   kingdom    phylum    classs            order        family        genus                  species     issues                vernacularName
0                            Sarcophilus             Cuvier, 1837  https://biodiversity.org.au/afd/taxa/aa40a75d-d499-4339-8a5d-c333a29cea1c       genus  exactMatch  Animalia  Chordata  Mammalia   Dasyuromorphia    Dasyuridae  Sarcophilus                           [noIssue]                              
1                     Bettongia gaimardi        (Desmarest, 1822)  https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11     species  exactMatch  Animalia  Chordata  Mammalia    Diprotodontia    Potoroidae    Bettongia       Bettongia gaimardi  [noIssue]             Tasmanian Bettong
2      Melanodryas (Amaurodryas) vittata   (Quoy & Gaimard, 1830)  https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d     species  exactMatch  Animalia  Chordata      Aves    Passeriformes   Petroicidae  Melanodryas      Melanodryas vittata  [noIssue]                   Dusky Robin
3  Platycercus (Platycercus) caledonicus           (Gmelin, 1788)  https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2     species  exactMatch  Animalia  Chordata      Aves   Psittaciformes   Psittacidae  Platycercus  Platycercus caledonicus  [noIssue]                 Green Rosella
4         Aquila (Uroaetus) audax fleayi    Condon & Amadon, 1954  https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1  subspecies  exactMatch  Animalia  Chordata      Aves  Accipitriformes  Accipitridae       Aquila             Aquila audax  [noIssue]  Tasmanian Wedge-tailed Eagle
5         Tyto novaehollandiae castanops            (Gould, 1837)  https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec  subspecies  exactMatch  Animalia  Chordata      Aves     Strigiformes     Tytonidae         Tyto     Tyto novaehollandiae  [noIssue]          Tasmanian Masked Owl
>>> galah.atlas_counts(
...     taxa=tas_endemic,
...     group_by=["scientificName"],
...     expand=False
... )
                                       scientificName  count
0                      Aquila (Uroaetus) audax fleayi   9314
1                                  Bettongia gaimardi   2442
2                        Bettongia gaimardi cuniculus     54
3                         Bettongia gaimardi gaimardi      9
4                   Melanodryas (Amaurodryas) vittata  13743
5             Melanodryas (Amaurodryas) vittata kingi    729
6           Melanodryas (Amaurodryas) vittata vittata   7498
7               Platycercus (Platycercus) caledonicus  50397
8       Platycercus (Platycercus) caledonicus brownii    447
9   Platycercus (Platycercus) caledonicus caledonicus  30104
10                                        Sarcophilus    111
11                               Sarcophilus harrisii  36852
12                     Tyto novaehollandiae castanops    317