Taxonomic Filtering#
Callum Waite, Shandiya Balasubramaniam
Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few
ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.
>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")
search_taxa()#
search_taxa() enables users to look up taxonomic names before downloading data, which allows for
disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa()
returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided
search term.
>>> galah.search_taxa(taxa="Petroica boodang")
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species issues vernacularName
0 Petroica (Petroica) boodang (Lesson, 1838) https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica boodang noIssue Scarlet Robin
It can also return taxonomic information for multiple species, including synonyms and Indigneous names.
>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species issues vernacularName
0 Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica phoenicea noIssue Flame Robin
1 Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 species vernacularMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica goodenovii noIssue Red-capped Robin
Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic
ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella
in three kingdoms:
>>> galah.search_taxa(taxa = ["Morganella"])
Warning: Search for ['Morganella'] returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
We were not able to find ['Morganella'] in the Australia backbone.
Empty DataFrame
Columns: []
Index: []
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus issues
0 Morganella Zeller https://id.biodiversity.org.au/name/fungi/60015036 genus exactMatch Fungi Basidiomycota Agaricomycetes Agaricales Agaricaceae Morganella noIssue
This disambiguation of the Morganella taxa can then be used by atlas_counts, atlas_occurrences,
atlas_species or atlas_media by providing the keyword scientific_name to any of these functions.
>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
totalRecords
0 180
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus
0 -47.000000 168.200000 2002-04-30T00:00:00Z Morganella compacta NZOR-6-128055 f2d0672f-0f8d-467f-a70d-af0ad257743a New Zealand Fungal and Plant Disease Collection PRESENT
1 -46.879900 168.136500 2021-04-15T00:00:00Z Morganella compacta NZOR-6-128055 450d6089-a004-4a4b-b281-50f60e7596bf New Zealand Fungal and Plant Disease Collection PRESENT
2 -46.874875 168.124660 1983-02-13T00:00:00Z Morganella compacta NZOR-6-128055 eaae280b-c1de-463a-8cc6-3ee72cc00e83 New Zealand Fungal and Plant Disease Collection PRESENT
3 -46.862757 168.116777 1985-04-23T00:00:00Z Morganella compacta NZOR-6-128055 a08519f3-221b-4006-9803-98d8eeb771a8 New Zealand Fungal and Plant Disease Collection PRESENT
4 -46.554617 169.479051 1990-05-24T00:00:00Z Morganella compacta NZOR-6-128055 4b2011f7-ef20-428d-af8f-dc17c9656df6 New Zealand Fungal and Plant Disease Collection PRESENT
.. ... ... ... ... ... ... ... ...
175 -15.831780 145.335830 2020-01-20T13:20:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 aaa77490-aee9-44be-9b68-4fc8efe240f3 iNaturalist Australia PRESENT
176 -13.200000 130.700000 2014-01-25T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 e022611a-52ea-4530-8bc1-bfcef1eee43c INSDC Sequences PRESENT
177 NaN NaN NaN Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 be5f2bef-5036-4817-b7aa-aebfa45f4260 Royal Botanic Gardens, Kew - Fungarium Specimens PRESENT
178 -13.197500 130.699722 2014-01-25T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 61495a3c-d32d-4690-8474-5b5d778c9d79 National Herbarium of Victoria (MEL) AVH data PRESENT
179 -8.916667 148.150000 1953-07-15T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 633433fe-a3b4-4b4f-af12-36b54c1fd9ff Centre for Australian National Biodiversity Research (CANB) AVH data PRESENT
[180 rows x 8 columns]
filters=#
filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic
filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for
single or multiple species. We can also group the multiple species by their species names so we can compare the
number of records for each robin.
>>> galah.atlas_counts(taxa="Petroica boodang")
totalRecords
0 149525
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
... "Petroica phoenicea", "Petroica rosea",
... "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
... taxa=aus_petroica,
... group_by=["species","vernacularName"]
... )
species vernacularName count
0 Petroica boodang Eastern Scarlet Robin 3954
1 Petroica boodang Scarlet Robin 134878
2 Petroica boodang South-western Scarlet Robin 2963
3 Petroica boodang Tasmanian Scarlet Robin 7730
4 Petroica goodenovii Red-capped Robin 132810
5 Petroica multicolor Pacific Robin 6889
6 Petroica phoenicea Flame Robin 93130
7 Petroica rodinogaster Mainland Pink Robin 72
8 Petroica rodinogaster Pink Robin 18911
9 Petroica rodinogaster Tasmanian Pink Robin 88
10 Petroica rosea Rose Robin 69328
This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:
>>> non_chordates = galah.atlas_counts(
... filters=["kingdom=Animalia","phylum!=Chordata"],
... group_by=["phylum"],
... expand=False
... )
>>> non_chordates.head()
phylum count
0 Acanthocephala 486
1 Annelida 348674
2 Arthropoda 11587024
3 Brachiopoda 3133
4 Bryozoa 33614
filters=, search_taxa(), and taxonomic ranks#
Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified,
and whether or not you have the correct unique name and classification of the taxa of interest.
The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some
secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa().
Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record
has been identified e.g.
>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
... taxa="Pitta",
... group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
scientificName taxonRank count
0 Pitta genus 111
1 Pitta (Erythropitta) subgenus 6
2 Pitta (Erythropitta) erythrogaster species 178
3 Pitta (Erythropitta) erythrogaster digglesi subspecies 1044
4 Pitta (Pitta) iris species 7652
5 Pitta (Pitta) iris iris subspecies 98
6 Pitta (Pitta) iris johnstoneiana subspecies 29
7 Pitta (Pitta) versicolor species 35883
8 Pitta (Pitta) versicolor intermedia subspecies 64
9 Pitta (Pitta) versicolor simillima subspecies 57
10 Pitta (Pitta) versicolor versicolor subspecies 456
If, for instance, you have the correct species or subspecies name, then searching for matches against the species
and subspecies fields, respectively, will provide more precise results. This is because the field scientificName
may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records
identified to a particular level of classification, searching for matches against scientificName is recommended.
Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is
simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species,
and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.
>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
... "Bettongia gaimardi", # Tasmanian Bettong
... "Melanodryas vittata", # Dusky Robin
... "Platycercus caledonicus",# Green Rosella
... "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
... "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
... ]
>>> galah.search_taxa(taxa=tas_endemic)
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus issues species vernacularName
0 Sarcophilus Cuvier, 1837 https://biodiversity.org.au/afd/taxa/aa40a75d-d499-4339-8a5d-c333a29cea1c genus exactMatch Animalia Chordata Mammalia Dasyuromorphia Dasyuridae Sarcophilus noIssue NaN NaN
1 Bettongia gaimardi (Desmarest, 1822) https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11 species exactMatch Animalia Chordata Mammalia Diprotodontia Potoroidae Bettongia noIssue Bettongia gaimardi Tasmanian Bettong
2 Melanodryas (Amaurodryas) vittata (Quoy & Gaimard, 1830) https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Melanodryas noIssue Melanodryas vittata Dusky Robin
3 Platycercus (Platycercus) caledonicus (Gmelin, 1788) https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2 species exactMatch Animalia Chordata Aves Psittaciformes Psittacidae Platycercus noIssue Platycercus caledonicus Green Rosella
4 Aquila (Uroaetus) audax fleayi Condon & Amadon, 1954 https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1 subspecies exactMatch Animalia Chordata Aves Accipitriformes Accipitridae Aquila noIssue Aquila audax Tasmanian Wedge-tailed Eagle
5 Tyto novaehollandiae castanops (Gould, 1837) https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec subspecies exactMatch Animalia Chordata Aves Strigiformes Tytonidae Tyto noIssue Tyto novaehollandiae Tasmanian Masked Owl
>>> galah.atlas_counts(
... taxa=tas_endemic,
... group_by=["scientificName"],
... expand=False
... )
scientificName count
0 Aquila (Uroaetus) audax fleayi 5275
1 Bettongia gaimardi 2424
2 Bettongia gaimardi cuniculus 54
3 Bettongia gaimardi gaimardi 9
4 Melanodryas (Amaurodryas) vittata 17770
5 Melanodryas (Amaurodryas) vittata kingi 16
6 Melanodryas (Amaurodryas) vittata vittata 78
7 Platycercus (Platycercus) caledonicus 62544
8 Platycercus (Platycercus) caledonicus brownii 24
9 Platycercus (Platycercus) caledonicus caledonicus 47
10 Sarcophilus 111
11 Sarcophilus harrisii 36797
12 Tyto novaehollandiae castanops 86