Taxonomic Filtering#

Callum Waite, Shandiya Balasubramaniam

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.

>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")

search_taxa()#

search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

>>> galah.search_taxa(taxa="Petroica boodang")
                scientificName scientificNameAuthorship                                                             taxonConceptID     rank   kingdom    phylum          order       family     genus           species vernacularName   issues
0  Petroica (Petroica) boodang           (Lesson, 1838)  https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29  species  Animalia  Chordata  Passeriformes  Petroicidae  Petroica  Petroica boodang  Scarlet Robin  noIssue

It can also return taxonomic information for multiple species, including synonyms and Indigneous names.

>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
                   scientificName    scientificNameAuthorship                                                             taxonConceptID     rank   kingdom    phylum          order       family     genus              species    vernacularName   issues
0   Petroica (Littlera) phoenicea                 Gould, 1837  https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552  species  Animalia  Chordata  Passeriformes  Petroicidae  Petroica   Petroica phoenicea       Flame Robin  noIssue
1  Petroica (Petroica) goodenovii  (Vigors & Horsfield, 1827)  https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1  species  Animalia  Chordata  Passeriformes  Petroicidae  Petroica  Petroica goodenovii  Red-capped Robin  noIssue

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella in three kingdoms:

>>> galah.search_taxa(taxa = ["Morganella"])
Warning: Search returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
  search_term   issues
0  Morganella  homonym
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
  scientificName scientificNameAuthorship                                      taxonConceptID   rank kingdom         phylum       order       family       genus   issues
0     Morganella                   Zeller  https://id.biodiversity.org.au/node/fungi/60091999  genus   Fungi  Basidiomycota  Agaricales  Agaricaceae  Morganella  noIssue

This disambiguation of the Morganella taxa can then be used by atlas_counts, atlas_occurrences, atlas_species or atlas_media by providing the keyword scientific_name to any of these functions.

>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
   totalRecords
0           160
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
     decimalLatitude  decimalLongitude             eventDate           scientificName                                      taxonConceptID                              recordID                                                      dataResourceName occurrenceStatus
0         -47.000000        168.200000  2002-04-30T00:00:00Z      Morganella compacta                                       NZOR-6-128055  f2d0672f-0f8d-467f-a70d-af0ad257743a                       New Zealand Fungal and Plant Disease Collection          PRESENT
1         -46.936200        168.126000  2021-04-15T00:00:00Z      Morganella compacta                                       NZOR-6-128055  450d6089-a004-4a4b-b281-50f60e7596bf                       New Zealand Fungal and Plant Disease Collection          PRESENT
2         -46.874875        168.124660  1983-02-13T00:00:00Z      Morganella compacta                                       NZOR-6-128055  eaae280b-c1de-463a-8cc6-3ee72cc00e83                       New Zealand Fungal and Plant Disease Collection          PRESENT
3         -46.862757        168.116777  1985-04-23T00:00:00Z      Morganella compacta                                       NZOR-6-128055  a08519f3-221b-4006-9803-98d8eeb771a8                       New Zealand Fungal and Plant Disease Collection          PRESENT
4         -46.554617        169.479051  1990-05-24T00:00:00Z      Morganella compacta                                       NZOR-6-128055  4b2011f7-ef20-428d-af8f-dc17c9656df6                       New Zealand Fungal and Plant Disease Collection          PRESENT
..               ...               ...                   ...                      ...                                                 ...                                   ...                                                                   ...              ...
155       -26.779400        152.880300  2009-02-19T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/node/fungi/60092001  092b6f3e-ef27-4cbf-bb28-b172c1b200c5                         National Herbarium of Victoria (MEL) AVH data          PRESENT
156       -26.777741        152.880254  2012-02-15T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/node/fungi/60092001  8b7a0609-2c0b-42fc-9676-ba7ef9a73a5e                                                             BowerBird          PRESENT
157              NaN               NaN  1964-11-21T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/node/fungi/60092001  4c74c65a-d9b8-4e81-9101-a283eec56e5a                         National Herbarium of Victoria (MEL) AVH data          PRESENT
158       -26.405958        153.026140  2018-03-16T01:01:09Z  Morganella purpurascens  https://id.biodiversity.org.au/node/fungi/60092001  fc2816fd-4b65-4af1-b0a2-1b086c26d97f                                     ALA species sightings and OzAtlas          PRESENT
159        -8.916667        148.150000  1953-07-15T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/node/fungi/60092001  633433fe-a3b4-4b4f-af12-36b54c1fd9ff  Centre for Australian National Biodiversity Research (CANB) AVH data          PRESENT

[160 rows x 8 columns]

filters=#

filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for single or multiple species. We can also group the multiple species by their species names so we can compare the number of records for each robin.

>>> galah.atlas_counts(taxa="Petroica boodang")
   totalRecords
0        132920
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
...                 "Petroica phoenicea", "Petroica rosea",
...                 "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
...     taxa=aus_petroica,
...     group_by=["species","vernacularName"]
... )
                  species               vernacularName   count
0        Petroica boodang        Eastern Scarlet Robin    3593
1        Petroica boodang                Scarlet Robin  129112
2        Petroica boodang  South-western Scarlet Robin     166
3        Petroica boodang      Tasmanian Scarlet Robin      49
4     Petroica goodenovii             Red-capped Robin  119863
5     Petroica multicolor                Pacific Robin    6703
6      Petroica phoenicea                  Flame Robin   88405
7   Petroica rodinogaster          Mainland Pink Robin      60
8   Petroica rodinogaster                   Pink Robin   15629
9   Petroica rodinogaster         Tasmanian Pink Robin      21
10         Petroica rosea                   Rose Robin   60078

This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:

>>> non_chordates = galah.atlas_counts(
...     filters=["kingdom=Animalia","phylum!=Chordata"],
...     group_by=["phylum"],
...     expand=False
... )
>>> non_chordates.head()
here
           phylum    count
0  Acanthocephala      403
1        Annelida   320633
2      Arthropoda  8970751
3     Brachiopoda     2181
4         Bryozoa    25617

filters=, search_taxa(), and taxonomic ranks#

Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
...     taxa="Pitta",
...     group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
                                 scientificName   taxonRank  count
0                                         Pitta       genus     68
1                          Pitta (Erythropitta)    subgenus    883
2            Pitta (Erythropitta) erythrogaster     species    181
3   Pitta (Erythropitta) erythrogaster digglesi  subspecies      6
4                            Pitta (Pitta) iris     species   6548
5                       Pitta (Pitta) iris iris  subspecies     83
6              Pitta (Pitta) iris johnstoneiana  subspecies     27
7                      Pitta (Pitta) versicolor     species  30024
8           Pitta (Pitta) versicolor intermedia  subspecies     42
9            Pitta (Pitta) versicolor simillima  subspecies     38
10          Pitta (Pitta) versicolor versicolor  subspecies    311

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.

>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
...                 "Bettongia gaimardi", # Tasmanian Bettong
...                 "Melanodryas vittata", # Dusky Robin
...                 "Platycercus caledonicus",# Green Rosella
...                 "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
...                 "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
...               ]
>>> galah.search_taxa(taxa=tas_endemic)
                          scientificName scientificNameAuthorship                                                             taxonConceptID        rank   kingdom    phylum            order        family        genus   issues                  species                vernacularName
0                            Sarcophilus             Cuvier, 1837  https://biodiversity.org.au/afd/taxa/06455b77-7d50-4ec7-9122-8ab48cfb0c1c       genus  Animalia  Chordata   Dasyuromorphia    Dasyuridae  Sarcophilus  noIssue                      NaN                           NaN
1                     Bettongia gaimardi        (Desmarest, 1822)  https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11     species  Animalia  Chordata    Diprotodontia    Potoroidae    Bettongia  noIssue       Bettongia gaimardi             Tasmanian Bettong
2      Melanodryas (Amaurodryas) vittata   (Quoy & Gaimard, 1830)  https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d     species  Animalia  Chordata    Passeriformes   Petroicidae  Melanodryas  noIssue      Melanodryas vittata                   Dusky Robin
3  Platycercus (Platycercus) caledonicus           (Gmelin, 1788)  https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2     species  Animalia  Chordata   Psittaciformes   Psittacidae  Platycercus  noIssue  Platycercus caledonicus                 Green Rosella
4         Aquila (Uroaetus) audax fleayi    Condon & Amadon, 1954  https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1  subspecies  Animalia  Chordata  Accipitriformes  Accipitridae       Aquila  noIssue             Aquila audax  Tasmanian Wedge-tailed Eagle
5         Tyto novaehollandiae castanops            (Gould, 1837)  https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec  subspecies  Animalia  Chordata     Strigiformes     Tytonidae         Tyto  noIssue     Tyto novaehollandiae          Tasmanian Masked Owl
>>> galah.atlas_counts(
...     taxa=tas_endemic,
...     group_by=["scientificName"],
...     expand=False
... )
                                       scientificName  count
0                      Aquila (Uroaetus) audax fleayi   5026
1                                  Bettongia gaimardi   1960
2                        Bettongia gaimardi cuniculus     41
3                         Bettongia gaimardi gaimardi      9
4                   Melanodryas (Amaurodryas) vittata  15709
5             Melanodryas (Amaurodryas) vittata kingi     16
6           Melanodryas (Amaurodryas) vittata vittata     45
7               Platycercus (Platycercus) caledonicus  51142
8       Platycercus (Platycercus) caledonicus brownii     24
9   Platycercus (Platycercus) caledonicus caledonicus     33
10                                        Sarcophilus      3
11                               Sarcophilus harrisii  36355
12                     Tyto novaehollandiae castanops     67