Taxonomic Filtering#

Callum Waite, Shandiya Balasubramaniam

Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.

>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")

search_taxa()#

search_taxa() enables users to look up taxonomic names before downloading data, which allows for disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa() returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided search term.

>>> galah.search_taxa(taxa="Petroica boodang")
                scientificName scientificNameAuthorship                                                             taxonConceptID     rank   matchType   kingdom    phylum classs          order       family     genus           species     issues vernacularName
0  Petroica (Petroica) boodang           (Lesson, 1838)  https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29  species  exactMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica  Petroica boodang  [noIssue]  Scarlet Robin

It can also return taxonomic information for multiple species, including synonyms and Indigneous names.

>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
                   scientificName    scientificNameAuthorship                                                             taxonConceptID     rank        matchType   kingdom    phylum classs          order       family     genus              species     issues    vernacularName
0   Petroica (Littlera) phoenicea                 Gould, 1837  https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552  species       exactMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica   Petroica phoenicea  [noIssue]       Flame Robin
1  Petroica (Petroica) goodenovii  (Vigors & Horsfield, 1827)  https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1  species  vernacularMatch  Animalia  Chordata   Aves  Passeriformes  Petroicidae  Petroica  Petroica goodenovii  [noIssue]  Red-capped Robin

Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella in three kingdoms:

>>> galah.search_taxa(taxa = ["Morganella"])
  scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species     issues vernacularName
0     Morganella                                                                                                          [homonym]
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
  scientificName scientificNameAuthorship                                      taxonConceptID   rank   matchType kingdom         phylum          classs       order       family       genus species   issues vernacularName
0     Morganella                   Zeller  https://id.biodiversity.org.au/name/fungi/60015036  genus  exactMatch   Fungi  Basidiomycota  Agaricomycetes  Agaricales  Agaricaceae  Morganella          noIssue

This disambiguation of the Morganella taxa can then be used by atlas_counts, atlas_occurrences, atlas_species or atlas_media by providing the keyword scientific_name to any of these functions.

>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
   totalRecords
0           183
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
     decimalLatitude  decimalLongitude             eventDate           scientificName                                      taxonConceptID                              recordID                                                      dataResourceName occurrenceStatus
0         -47.000000        168.200000  2002-04-30T00:00:00Z      Morganella compacta                                       NZOR-6-128055  f2d0672f-0f8d-467f-a70d-af0ad257743a                       New Zealand Fungal and Plant Disease Collection          PRESENT
1         -46.879900        168.136500  2021-04-15T00:00:00Z      Morganella compacta                                       NZOR-6-128055  450d6089-a004-4a4b-b281-50f60e7596bf                       New Zealand Fungal and Plant Disease Collection          PRESENT
2         -46.874875        168.124660  1983-02-13T00:00:00Z      Morganella compacta                                       NZOR-6-128055  eaae280b-c1de-463a-8cc6-3ee72cc00e83                       New Zealand Fungal and Plant Disease Collection          PRESENT
3         -46.862757        168.116777  1985-04-23T00:00:00Z      Morganella compacta                                       NZOR-6-128055  a08519f3-221b-4006-9803-98d8eeb771a8                       New Zealand Fungal and Plant Disease Collection          PRESENT
4         -46.554617        169.479051  1990-05-24T00:00:00Z      Morganella compacta                                       NZOR-6-128055  4b2011f7-ef20-428d-af8f-dc17c9656df6                       New Zealand Fungal and Plant Disease Collection          PRESENT
..               ...               ...                   ...                      ...                                                 ...                                   ...                                                                   ...              ...
178       -15.831780        145.335830  2020-01-20T13:20:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  aaa77490-aee9-44be-9b68-4fc8efe240f3                                                 iNaturalist Australia          PRESENT
179       -13.200000        130.700000  2014-01-25T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  e022611a-52ea-4530-8bc1-bfcef1eee43c                                                       INSDC Sequences          PRESENT
180              NaN               NaN                   NaN  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  be5f2bef-5036-4817-b7aa-aebfa45f4260                      Royal Botanic Gardens, Kew - Fungarium Specimens          PRESENT
181       -13.197500        130.699722  2014-01-25T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  61495a3c-d32d-4690-8474-5b5d778c9d79                         National Herbarium of Victoria (MEL) AVH data          PRESENT
182        -8.916700        148.150000  1953-07-15T00:00:00Z  Morganella purpurascens  https://id.biodiversity.org.au/name/fungi/60022638  633433fe-a3b4-4b4f-af12-36b54c1fd9ff  Centre for Australian National Biodiversity Research (CANB) AVH data          PRESENT

[183 rows x 8 columns]

filters=#

filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for single or multiple species. We can also group the multiple species by their species names so we can compare the number of records for each robin.

>>> galah.atlas_counts(taxa="Petroica boodang")
   totalRecords
0        176213
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
...                 "Petroica phoenicea", "Petroica rosea",
...                 "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
...     taxa=aus_petroica,
...     group_by=["species","vernacularName"]
... )
             species               vernacularName   count
0   Petroica boodang        Eastern Scarlet Robin   29222
1   Petroica boodang                  Flame Robin  103634
2   Petroica boodang          Mainland Pink Robin    1105
3   Petroica boodang                Pacific Robin    6870
4   Petroica boodang                   Pink Robin   16924
..               ...                          ...     ...
61    Petroica rosea                   Rose Robin   78616
62    Petroica rosea                Scarlet Robin  113193
63    Petroica rosea  South-western Scarlet Robin   13275
64    Petroica rosea         Tasmanian Pink Robin    3877
65    Petroica rosea      Tasmanian Scarlet Robin   20523

[66 rows x 3 columns]

This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:

>>> non_chordates = galah.atlas_counts(
...     filters=["kingdom=Animalia","phylum!=Chordata"],
...     group_by=["phylum"],
...     expand=False
... )
>>> non_chordates.head()
           phylum     count
0  Acanthocephala       486
1        Annelida    356369
2      Arthropoda  12056771
3     Brachiopoda      3145
4         Bryozoa     36260

filters=, search_taxa(), and taxonomic ranks#

Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified, and whether or not you have the correct unique name and classification of the taxa of interest.

The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa(). Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record has been identified e.g.

>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
...     taxa="Pitta",
...     group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
                                 scientificName   taxonRank  count
0                                         Pitta     species  40239
1                                         Pitta  subspecies   8912
2                                         Pitta       genus    234
3                                         Pitta    subgenus      6
4                          Pitta (Erythropitta)     species  40239
5                          Pitta (Erythropitta)  subspecies   8912
6                          Pitta (Erythropitta)       genus    234
7                          Pitta (Erythropitta)    subgenus      6
8            Pitta (Erythropitta) erythrogaster     species  40239
9            Pitta (Erythropitta) erythrogaster  subspecies   8912
10           Pitta (Erythropitta) erythrogaster       genus    234
11           Pitta (Erythropitta) erythrogaster    subgenus      6
12  Pitta (Erythropitta) erythrogaster digglesi     species  40239
13  Pitta (Erythropitta) erythrogaster digglesi  subspecies   8912
14  Pitta (Erythropitta) erythrogaster digglesi       genus    234
15  Pitta (Erythropitta) erythrogaster digglesi    subgenus      6
16                           Pitta (Pitta) iris     species  40239
17                           Pitta (Pitta) iris  subspecies   8912
18                           Pitta (Pitta) iris       genus    234
19                           Pitta (Pitta) iris    subgenus      6
20                      Pitta (Pitta) iris iris     species  40239
21                      Pitta (Pitta) iris iris  subspecies   8912
22                      Pitta (Pitta) iris iris       genus    234
23                      Pitta (Pitta) iris iris    subgenus      6
24             Pitta (Pitta) iris johnstoneiana     species  40239
25             Pitta (Pitta) iris johnstoneiana  subspecies   8912
26             Pitta (Pitta) iris johnstoneiana       genus    234
27             Pitta (Pitta) iris johnstoneiana    subgenus      6
28                     Pitta (Pitta) versicolor     species  40239
29                     Pitta (Pitta) versicolor  subspecies   8912
30                     Pitta (Pitta) versicolor       genus    234
31                     Pitta (Pitta) versicolor    subgenus      6
32          Pitta (Pitta) versicolor intermedia     species  40239
33          Pitta (Pitta) versicolor intermedia  subspecies   8912
34          Pitta (Pitta) versicolor intermedia       genus    234
35          Pitta (Pitta) versicolor intermedia    subgenus      6
36           Pitta (Pitta) versicolor simillima     species  40239
37           Pitta (Pitta) versicolor simillima  subspecies   8912
38           Pitta (Pitta) versicolor simillima       genus    234
39           Pitta (Pitta) versicolor simillima    subgenus      6
40          Pitta (Pitta) versicolor versicolor     species  40239
41          Pitta (Pitta) versicolor versicolor  subspecies   8912
42          Pitta (Pitta) versicolor versicolor       genus    234
43          Pitta (Pitta) versicolor versicolor    subgenus      6

If, for instance, you have the correct species or subspecies name, then searching for matches against the species and subspecies fields, respectively, will provide more precise results. This is because the field scientificName may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records identified to a particular level of classification, searching for matches against scientificName is recommended.

Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species, and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.

>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
...                 "Bettongia gaimardi", # Tasmanian Bettong
...                 "Melanodryas vittata", # Dusky Robin
...                 "Platycercus caledonicus",# Green Rosella
...                 "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
...                 "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
...               ]
>>> galah.search_taxa(taxa=tas_endemic)
                          scientificName scientificNameAuthorship                                                             taxonConceptID        rank   matchType   kingdom    phylum    classs            order        family        genus                  species     issues                vernacularName
0                            Sarcophilus             Cuvier, 1837  https://biodiversity.org.au/afd/taxa/aa40a75d-d499-4339-8a5d-c333a29cea1c       genus  exactMatch  Animalia  Chordata  Mammalia   Dasyuromorphia    Dasyuridae  Sarcophilus                           [noIssue]                              
1                     Bettongia gaimardi        (Desmarest, 1822)  https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11     species  exactMatch  Animalia  Chordata  Mammalia    Diprotodontia    Potoroidae    Bettongia       Bettongia gaimardi  [noIssue]             Tasmanian Bettong
2      Melanodryas (Amaurodryas) vittata   (Quoy & Gaimard, 1830)  https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d     species  exactMatch  Animalia  Chordata      Aves    Passeriformes   Petroicidae  Melanodryas      Melanodryas vittata  [noIssue]                   Dusky Robin
3  Platycercus (Platycercus) caledonicus           (Gmelin, 1788)  https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2     species  exactMatch  Animalia  Chordata      Aves   Psittaciformes   Psittacidae  Platycercus  Platycercus caledonicus  [noIssue]                 Green Rosella
4         Aquila (Uroaetus) audax fleayi    Condon & Amadon, 1954  https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1  subspecies  exactMatch  Animalia  Chordata      Aves  Accipitriformes  Accipitridae       Aquila             Aquila audax  [noIssue]  Tasmanian Wedge-tailed Eagle
5         Tyto novaehollandiae castanops            (Gould, 1837)  https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec  subspecies  exactMatch  Animalia  Chordata      Aves     Strigiformes     Tytonidae         Tyto     Tyto novaehollandiae  [noIssue]          Tasmanian Masked Owl
>>> galah.atlas_counts(
...     taxa=tas_endemic,
...     group_by=["scientificName"],
...     expand=False
... )
                                       scientificName  count
0                      Aquila (Uroaetus) audax fleayi   9392
1                                  Bettongia gaimardi   2447
2                        Bettongia gaimardi cuniculus     54
3                         Bettongia gaimardi gaimardi      9
4                   Melanodryas (Amaurodryas) vittata  13747
5             Melanodryas (Amaurodryas) vittata kingi    729
6           Melanodryas (Amaurodryas) vittata vittata   7499
7               Platycercus (Platycercus) caledonicus  50443
8       Platycercus (Platycercus) caledonicus brownii    447
9   Platycercus (Platycercus) caledonicus caledonicus  30104
10                                        Sarcophilus    112
11                               Sarcophilus harrisii  36865
12                     Tyto novaehollandiae castanops    318