Taxonomic Filtering#

Callum Waite, Shandiya Balasubramaniam

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.

>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")


search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

>>> galah.search_taxa(taxa="Petroica boodang")
                scientificName scientificNameAuthorship                                                             taxonConceptID     rank   kingdom    phylum          order       family     genus           species vernacularName   issues
0  Petroica (Petroica) boodang           (Lesson, 1838)  species  Animalia  Chordata  Passeriformes  Petroicidae  Petroica  Petroica boodang  Scarlet Robin  noIssue

It can also return taxonomic information for multiple species, including synonyms and Indigneous names.

>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
                   scientificName    scientificNameAuthorship                                                             taxonConceptID     rank   kingdom    phylum          order       family     genus              species    vernacularName   issues
0   Petroica (Littlera) phoenicea                 Gould, 1837  species  Animalia  Chordata  Passeriformes  Petroicidae  Petroica   Petroica phoenicea       Flame Robin  noIssue
1  Petroica (Petroica) goodenovii  (Vigors & Horsfield, 1827)  species  Animalia  Chordata  Passeriformes  Petroicidae  Petroica  Petroica goodenovii  Red-capped Robin  noIssue

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella in three kingdoms:

>>> galah.search_taxa(taxa = ["Morganella"])
Warning: Search returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
  search_term   issues
0  Morganella  homonym
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
  scientificName scientificNameAuthorship                                      taxonConceptID   rank kingdom         phylum       order       family       genus   issues
0     Morganella                   Zeller  genus   Fungi  Basidiomycota  Agaricales  Agaricaceae  Morganella  noIssue

This disambiguation of the Morganella taxa can then be used by atlas_counts, atlas_occurrences, atlas_species or atlas_media by providing the keyword scientific_name to any of these functions.

>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
0           156
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
     decimalLatitude  decimalLongitude             eventDate           scientificName                                      taxonConceptID                              recordID                                                      dataResourceName occurrenceStatus
0         -47.000000        168.200000  2002-04-30T00:00:00Z      Morganella compacta                                       NZOR-6-128055  461f0148-7549-4a81-9495-46704b26924b                                         New Zealand Virtual Herbarium          PRESENT
1         -46.874868        168.124626  1983-02-13T00:00:00Z      Morganella compacta                                       NZOR-6-128055  569d52b1-2481-43c3-8450-154fd5a52df1                                         New Zealand Virtual Herbarium          PRESENT
2         -46.862750        168.116743  1985-04-23T00:00:00Z      Morganella compacta                                       NZOR-6-128055  5e633f17-193b-48d6-a0d9-ecd8251cc755                                         New Zealand Virtual Herbarium          PRESENT
3         -46.554626        169.479023  1990-05-24T00:00:00Z      Morganella compacta                                       NZOR-6-128055  a2b24275-5c64-4a52-bbab-1fc2d72e142d                                         New Zealand Virtual Herbarium          PRESENT
4         -46.054081        170.192709  1967-12-17T00:00:00Z      Morganella compacta                                       NZOR-6-128055  144f29e2-f06c-46b1-8792-eda5b38548f1                                         New Zealand Virtual Herbarium          PRESENT
..               ...               ...                   ...                      ...                                                 ...                                   ...                                                                   ...              ...
151       -26.779400        152.880300  2009-02-19T00:00:00Z  Morganella purpurascens  092b6f3e-ef27-4cbf-bb28-b172c1b200c5                         National Herbarium of Victoria (MEL) AVH data          PRESENT
152       -26.777741        152.880254  2012-02-15T00:00:00Z  Morganella purpurascens  8b7a0609-2c0b-42fc-9676-ba7ef9a73a5e                                                             BowerBird          PRESENT
153              NaN               NaN  1964-11-21T00:00:00Z  Morganella purpurascens  4c74c65a-d9b8-4e81-9101-a283eec56e5a                         National Herbarium of Victoria (MEL) AVH data          PRESENT
154       -26.405958        153.026140  2018-03-16T01:01:09Z  Morganella purpurascens  fc2816fd-4b65-4af1-b0a2-1b086c26d97f                                     ALA species sightings and OzAtlas          PRESENT
155        -8.916667        148.150000  1953-07-15T00:00:00Z  Morganella purpurascens  633433fe-a3b4-4b4f-af12-36b54c1fd9ff  Centre for Australian National Biodiversity Research (CANB) AVH data          PRESENT

[156 rows x 8 columns]


filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for single or multiple species. We can also group the multiple species by their species names so we can compare the number of records for each robin.

>>> galah.atlas_counts(taxa="Petroica boodang")
0        119717
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
...                 "Petroica phoenicea", "Petroica rosea",
...                 "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
...     taxa=aus_petroica,
...     group_by=["species","vernacularName"]
... )
                  species               vernacularName   count
0        Petroica boodang        Eastern Scarlet Robin    3501
1        Petroica boodang                Scarlet Robin  116029
2        Petroica boodang  South-western Scarlet Robin     161
3        Petroica boodang      Tasmanian Scarlet Robin      26
4     Petroica goodenovii             Red-capped Robin  110379
5     Petroica multicolor                Pacific Robin    6700
6      Petroica phoenicea                  Flame Robin   81940
7   Petroica rodinogaster          Mainland Pink Robin      60
8   Petroica rodinogaster                   Pink Robin   13501
9   Petroica rodinogaster         Tasmanian Pink Robin      14
10         Petroica rosea                   Rose Robin   52614

This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:

>>> non_chordates = galah.atlas_counts(
...     filters=["kingdom=Animalia","phylum!=Chordata"],
...     group_by=["phylum"],
...     expand=False
... )
>>> non_chordates.head()
           phylum    count
0  Acanthocephala      399
1        Annelida   317030
2      Arthropoda  8659300
3     Brachiopoda     2169
4         Bryozoa    24728

filters=, search_taxa(), and taxonomic ranks#

Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
...     taxa="Pitta",
...     group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
                                 scientificName   taxonRank  count
0                                         Pitta       genus     76
1                          Pitta (Erythropitta)    subgenus    728
2            Pitta (Erythropitta) erythrogaster     species    190
3   Pitta (Erythropitta) erythrogaster digglesi  subspecies     21
4                            Pitta (Pitta) iris     species   5726
5                       Pitta (Pitta) iris iris  subspecies     76
6              Pitta (Pitta) iris johnstoneiana  subspecies     27
7                      Pitta (Pitta) versicolor     species  26106
8           Pitta (Pitta) versicolor intermedia  subspecies     42
9            Pitta (Pitta) versicolor simillima  subspecies     38
10          Pitta (Pitta) versicolor versicolor  subspecies    310

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.

>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
...                 "Bettongia gaimardi", # Tasmanian Bettong
...                 "Melanodryas vittata", # Dusky Robin
...                 "Platycercus caledonicus",# Green Rosella
...                 "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
...                 "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
...               ]
>>> galah.search_taxa(taxa=tas_endemic)
                          scientificName scientificNameAuthorship                                                             taxonConceptID        rank   kingdom    phylum            order        family        genus   issues                  species                vernacularName
0                            Sarcophilus             Cuvier, 1837       genus  Animalia  Chordata   Dasyuromorphia    Dasyuridae  Sarcophilus  noIssue                      NaN                           NaN
1                     Bettongia gaimardi        (Desmarest, 1822)     species  Animalia  Chordata    Diprotodontia    Potoroidae    Bettongia  noIssue       Bettongia gaimardi             Tasmanian Bettong
2      Melanodryas (Amaurodryas) vittata   (Quoy & Gaimard, 1830)     species  Animalia  Chordata    Passeriformes   Petroicidae  Melanodryas  noIssue      Melanodryas vittata                   Dusky Robin
3  Platycercus (Platycercus) caledonicus           (Gmelin, 1788)     species  Animalia  Chordata   Psittaciformes   Psittacidae  Platycercus  noIssue  Platycercus caledonicus                 Green Rosella
4         Aquila (Uroaetus) audax fleayi    Condon & Amadon, 1954  subspecies  Animalia  Chordata  Accipitriformes  Accipitridae       Aquila  noIssue             Aquila audax  Tasmanian Wedge-tailed Eagle
5         Tyto novaehollandiae castanops            (Gould, 1837)  subspecies  Animalia  Chordata     Strigiformes     Tytonidae         Tyto  noIssue     Tyto novaehollandiae          Tasmanian Masked Owl
>>> galah.atlas_counts(
...     taxa=tas_endemic,
...     group_by=["scientificName"],
...     expand=False
... )
                                       scientificName  count
0                      Aquila (Uroaetus) audax fleayi   4950
1                                  Bettongia gaimardi   1944
2                        Bettongia gaimardi cuniculus     41
3                         Bettongia gaimardi gaimardi      9
4                   Melanodryas (Amaurodryas) vittata  14139
5             Melanodryas (Amaurodryas) vittata kingi     15
6           Melanodryas (Amaurodryas) vittata vittata     39
7               Platycercus (Platycercus) caledonicus  43488
8       Platycercus (Platycercus) caledonicus brownii     24
9   Platycercus (Platycercus) caledonicus caledonicus     33
10                                        Sarcophilus      3
11                               Sarcophilus harrisii  36310
12                     Tyto novaehollandiae castanops     63