Taxonomic Filtering#

Callum Waite, Shandiya Balasubramaniam

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.

>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")

search_taxa()#

search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

>>> galah.search_taxa(taxa="Petroica boodang")
                scientificName scientificNameAuthorship                                                             taxonConceptID     rank   matchType   kingdom    phylum classs          order       family     genus           species   issues vernacularName
0  Petroica (Petroica) boodang           (Lesson, 1838)  https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29  species  exactMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica  Petroica boodang  noIssue  Scarlet Robin

It can also return taxonomic information for multiple species, including synonyms and Indigneous names.

>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
                   scientificName    scientificNameAuthorship                                                             taxonConceptID     rank        matchType   kingdom    phylum classs          order       family     genus              species   issues    vernacularName
0   Petroica (Littlera) phoenicea                 Gould, 1837  https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552  species       exactMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica   Petroica phoenicea  noIssue       Flame Robin
1  Petroica (Petroica) goodenovii  (Vigors & Horsfield, 1827)  https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1  species  vernacularMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica  Petroica goodenovii  noIssue  Red-capped Robin

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella in three kingdoms:

>>> galah.search_taxa(taxa = ["Morganella"])
Warning: Search for ['Morganella'] returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
We were not able to find ['Morganella'] in the Australia backbone.
Empty DataFrame
Columns: []
Index: []
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
  scientificName scientificNameAuthorship                                      taxonConceptID   rank   matchType kingdom         phylum          classs       order       family       genus   issues
0     Morganella                   Zeller  https://id.biodiversity.org.au/name/fungi/60015036  genus  exactMatch   Fungi  Basidiomycota  Agaricomycetes  Agaricales  Agaricaceae  Morganella  noIssue

This disambiguation of the Morganella taxa can then be used by atlas_counts, atlas_occurrences, atlas_species or atlas_media by providing the keyword scientific_name to any of these functions.

>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
   totalRecords
0           180
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
     decimalLatitude  decimalLongitude             eventDate           scientificName                                      taxonConceptID                              recordID                                                      dataResourceName occurrenceStatus
0         -47.000000        168.200000  2002-04-30T00:00:00Z      Morganella compacta                                       NZOR-6-128055  f2d0672f-0f8d-467f-a70d-af0ad257743a                       New Zealand Fungal and Plant Disease Collection          PRESENT
1         -46.879900        168.136500  2021-04-15T00:00:00Z      Morganella compacta                                       NZOR-6-128055  450d6089-a004-4a4b-b281-50f60e7596bf                       New Zealand Fungal and Plant Disease Collection          PRESENT
2         -46.874875        168.124660  1983-02-13T00:00:00Z      Morganella compacta                                       NZOR-6-128055  eaae280b-c1de-463a-8cc6-3ee72cc00e83                       New Zealand Fungal and Plant Disease Collection          PRESENT
3         -46.862757        168.116777  1985-04-23T00:00:00Z      Morganella compacta                                       NZOR-6-128055  a08519f3-221b-4006-9803-98d8eeb771a8                       New Zealand Fungal and Plant Disease Collection          PRESENT
4         -46.554617        169.479051  1990-05-24T00:00:00Z      Morganella compacta                                       NZOR-6-128055  4b2011f7-ef20-428d-af8f-dc17c9656df6                       New Zealand Fungal and Plant Disease Collection          PRESENT
..               ...               ...                   ...                      ...                                                 ...                                   ...                                                                   ...              ...
175       -15.831780        145.335830  2020-01-20T13:20:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  aaa77490-aee9-44be-9b68-4fc8efe240f3                                                 iNaturalist Australia          PRESENT
176       -13.200000        130.700000  2014-01-25T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  e022611a-52ea-4530-8bc1-bfcef1eee43c                                                       INSDC Sequences          PRESENT
177              NaN               NaN                   NaN  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  be5f2bef-5036-4817-b7aa-aebfa45f4260                      Royal Botanic Gardens, Kew - Fungarium Specimens          PRESENT
178       -13.197500        130.699722  2014-01-25T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  61495a3c-d32d-4690-8474-5b5d778c9d79                         National Herbarium of Victoria (MEL) AVH data          PRESENT
179        -8.916667        148.150000  1953-07-15T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  633433fe-a3b4-4b4f-af12-36b54c1fd9ff  Centre for Australian National Biodiversity Research (CANB) AVH data          PRESENT

[180 rows x 8 columns]

filters=#

filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for single or multiple species. We can also group the multiple species by their species names so we can compare the number of records for each robin.

>>> galah.atlas_counts(taxa="Petroica boodang")
   totalRecords
0        149525
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
...                 "Petroica phoenicea", "Petroica rosea",
...                 "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
...     taxa=aus_petroica,
...     group_by=["species","vernacularName"]
... )
                  species               vernacularName   count
0        Petroica boodang        Eastern Scarlet Robin    3954
1        Petroica boodang                Scarlet Robin  134878
2        Petroica boodang  South-western Scarlet Robin    2963
3        Petroica boodang      Tasmanian Scarlet Robin    7730
4     Petroica goodenovii             Red-capped Robin  132810
5     Petroica multicolor                Pacific Robin    6889
6      Petroica phoenicea                  Flame Robin   93130
7   Petroica rodinogaster          Mainland Pink Robin      72
8   Petroica rodinogaster                   Pink Robin   18911
9   Petroica rodinogaster         Tasmanian Pink Robin      88
10         Petroica rosea                   Rose Robin   69328

This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:

>>> non_chordates = galah.atlas_counts(
...     filters=["kingdom=Animalia","phylum!=Chordata"],
...     group_by=["phylum"],
...     expand=False
... )
>>> non_chordates.head()
           phylum     count
0  Acanthocephala       486
1        Annelida    348674
2      Arthropoda  11587024
3     Brachiopoda      3133
4         Bryozoa     33614

filters=, search_taxa(), and taxonomic ranks#

Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
...     taxa="Pitta",
...     group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
                                 scientificName   taxonRank  count
0                                         Pitta       genus    111
1                          Pitta (Erythropitta)    subgenus      6
2            Pitta (Erythropitta) erythrogaster     species    178
3   Pitta (Erythropitta) erythrogaster digglesi  subspecies   1044
4                            Pitta (Pitta) iris     species   7652
5                       Pitta (Pitta) iris iris  subspecies     98
6              Pitta (Pitta) iris johnstoneiana  subspecies     29
7                      Pitta (Pitta) versicolor     species  35883
8           Pitta (Pitta) versicolor intermedia  subspecies     64
9            Pitta (Pitta) versicolor simillima  subspecies     57
10          Pitta (Pitta) versicolor versicolor  subspecies    456

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.

>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
...                 "Bettongia gaimardi", # Tasmanian Bettong
...                 "Melanodryas vittata", # Dusky Robin
...                 "Platycercus caledonicus",# Green Rosella
...                 "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
...                 "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
...               ]
>>> galah.search_taxa(taxa=tas_endemic)
                          scientificName scientificNameAuthorship                                                             taxonConceptID        rank   matchType   kingdom    phylum    classs            order        family        genus   issues                  species                vernacularName
0                            Sarcophilus             Cuvier, 1837  https://biodiversity.org.au/afd/taxa/aa40a75d-d499-4339-8a5d-c333a29cea1c       genus  exactMatch  Animalia  Chordata  Mammalia   Dasyuromorphia    Dasyuridae  Sarcophilus  noIssue                      NaN                           NaN
1                     Bettongia gaimardi        (Desmarest, 1822)  https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11     species  exactMatch  Animalia  Chordata  Mammalia    Diprotodontia    Potoroidae    Bettongia  noIssue       Bettongia gaimardi             Tasmanian Bettong
2      Melanodryas (Amaurodryas) vittata   (Quoy & Gaimard, 1830)  https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d     species  exactMatch  Animalia  Chordata      Aves    Passeriformes   Petroicidae  Melanodryas  noIssue      Melanodryas vittata                   Dusky Robin
3  Platycercus (Platycercus) caledonicus           (Gmelin, 1788)  https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2     species  exactMatch  Animalia  Chordata      Aves   Psittaciformes   Psittacidae  Platycercus  noIssue  Platycercus caledonicus                 Green Rosella
4         Aquila (Uroaetus) audax fleayi    Condon & Amadon, 1954  https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1  subspecies  exactMatch  Animalia  Chordata      Aves  Accipitriformes  Accipitridae       Aquila  noIssue             Aquila audax  Tasmanian Wedge-tailed Eagle
5         Tyto novaehollandiae castanops            (Gould, 1837)  https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec  subspecies  exactMatch  Animalia  Chordata      Aves     Strigiformes     Tytonidae         Tyto  noIssue     Tyto novaehollandiae          Tasmanian Masked Owl
>>> galah.atlas_counts(
...     taxa=tas_endemic,
...     group_by=["scientificName"],
...     expand=False
... )
                                       scientificName  count
0                      Aquila (Uroaetus) audax fleayi   5275
1                                  Bettongia gaimardi   2424
2                        Bettongia gaimardi cuniculus     54
3                         Bettongia gaimardi gaimardi      9
4                   Melanodryas (Amaurodryas) vittata  17770
5             Melanodryas (Amaurodryas) vittata kingi     16
6           Melanodryas (Amaurodryas) vittata vittata     78
7               Platycercus (Platycercus) caledonicus  62544
8       Platycercus (Platycercus) caledonicus brownii     24
9   Platycercus (Platycercus) caledonicus caledonicus     47
10                                        Sarcophilus    111
11                               Sarcophilus harrisii  36797
12                     Tyto novaehollandiae castanops     86