Taxonomic Filtering#
Callum Waite, Shandiya Balasubramaniam
Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few
ways to ensure records are not missed by using [functions] in galah. Let’s start by configuring galah to the ALA.
>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")
search_taxa()#
search_taxa() enables users to look up taxonomic names before downloading data, which allows for
disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa()
returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided
search term.
>>> galah.search_taxa(taxa="Petroica boodang")
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species issues vernacularName
0 Petroica (Petroica) boodang (Lesson, 1838) https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica boodang [noIssue] Scarlet Robin
It can also return taxonomic information for multiple species, including synonyms and Indigneous names.
>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species issues vernacularName
0 Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica phoenicea [noIssue] Flame Robin
1 Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 species vernacularMatch Animalia Chordata Aves Passeriformes Petroicidae Petroica Petroica goodenovii [noIssue] Red-capped Robin
Where homonyms exist, search_taxa() will prompt users to clarify the search term by providing one or more taxonomic
ranks using the search_taxa() argument scientific_name. This example differentiates among the genus Morganella
in three kingdoms:
>>> galah.search_taxa(taxa = ["Morganella"])
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species issues vernacularName
0 Morganella [homonym]
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species issues vernacularName
0 Morganella Zeller https://id.biodiversity.org.au/name/fungi/60015036 genus exactMatch Fungi Basidiomycota Agaricomycetes Agaricales Agaricaceae Morganella noIssue
This disambiguation of the Morganella taxa can then be used by atlas_counts, atlas_occurrences,
atlas_species or atlas_media by providing the keyword scientific_name to any of these functions.
>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
totalRecords
0 183
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus
0 -47.000000 168.200000 2002-04-30T00:00:00Z Morganella compacta NZOR-6-128055 f2d0672f-0f8d-467f-a70d-af0ad257743a New Zealand Fungal and Plant Disease Collection PRESENT
1 -46.879900 168.136500 2021-04-15T00:00:00Z Morganella compacta NZOR-6-128055 450d6089-a004-4a4b-b281-50f60e7596bf New Zealand Fungal and Plant Disease Collection PRESENT
2 -46.874875 168.124660 1983-02-13T00:00:00Z Morganella compacta NZOR-6-128055 eaae280b-c1de-463a-8cc6-3ee72cc00e83 New Zealand Fungal and Plant Disease Collection PRESENT
3 -46.862757 168.116777 1985-04-23T00:00:00Z Morganella compacta NZOR-6-128055 a08519f3-221b-4006-9803-98d8eeb771a8 New Zealand Fungal and Plant Disease Collection PRESENT
4 -46.554617 169.479051 1990-05-24T00:00:00Z Morganella compacta NZOR-6-128055 4b2011f7-ef20-428d-af8f-dc17c9656df6 New Zealand Fungal and Plant Disease Collection PRESENT
.. ... ... ... ... ... ... ... ...
178 -15.831780 145.335830 2020-01-20T13:20:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 aaa77490-aee9-44be-9b68-4fc8efe240f3 iNaturalist Australia PRESENT
179 -13.200000 130.700000 2014-01-25T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 e022611a-52ea-4530-8bc1-bfcef1eee43c INSDC Sequences PRESENT
180 NaN NaN NaN Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 be5f2bef-5036-4817-b7aa-aebfa45f4260 Royal Botanic Gardens, Kew - Fungarium Specimens PRESENT
181 -13.197500 130.699722 2014-01-25T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 61495a3c-d32d-4690-8474-5b5d778c9d79 National Herbarium of Victoria (MEL) AVH data PRESENT
182 -8.916700 148.150000 1953-07-15T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/name/fungi/60022638 633433fe-a3b4-4b4f-af12-36b54c1fd9ff Centre for Australian National Biodiversity Research (CANB) AVH data PRESENT
[183 rows x 8 columns]
filters=#
filters= subsets records by searching for exact matches to an expression, and may also be used for taxonomic
filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for
single or multiple species. We can also group the multiple species by their species names so we can compare the
number of records for each robin.
>>> galah.atlas_counts(taxa="Petroica boodang")
totalRecords
0 176213
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
... "Petroica phoenicea", "Petroica rosea",
... "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
... taxa=aus_petroica,
... group_by=["species","vernacularName"]
... )
species vernacularName count
0 Petroica boodang Eastern Scarlet Robin 29222
1 Petroica boodang Flame Robin 103634
2 Petroica boodang Mainland Pink Robin 1105
3 Petroica boodang Pacific Robin 6870
4 Petroica boodang Pink Robin 16924
.. ... ... ...
61 Petroica rosea Rose Robin 78616
62 Petroica rosea Scarlet Robin 113193
63 Petroica rosea South-western Scarlet Robin 13275
64 Petroica rosea Tasmanian Pink Robin 3877
65 Petroica rosea Tasmanian Scarlet Robin 20523
[66 rows x 3 columns]
This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:
>>> non_chordates = galah.atlas_counts(
... filters=["kingdom=Animalia","phylum!=Chordata"],
... group_by=["phylum"],
... expand=False
... )
>>> non_chordates.head()
phylum count
0 Acanthocephala 486
1 Annelida 356369
2 Arthropoda 12056771
3 Brachiopoda 3145
4 Bryozoa 36260
filters=, search_taxa(), and taxonomic ranks#
Deciding between using filters= and search_taxa() in a query comes down to how a record has been classified,
and whether or not you have the correct unique name and classification of the taxa of interest.
The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some
secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters= and search_taxa().
Additionally, there is a field named scientificName, which refers to the lowest taxonomic rank to which a record
has been identified e.g.
>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
... taxa="Pitta",
... group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
scientificName taxonRank count
0 Pitta species 40239
1 Pitta subspecies 8912
2 Pitta genus 234
3 Pitta subgenus 6
4 Pitta (Erythropitta) species 40239
5 Pitta (Erythropitta) subspecies 8912
6 Pitta (Erythropitta) genus 234
7 Pitta (Erythropitta) subgenus 6
8 Pitta (Erythropitta) erythrogaster species 40239
9 Pitta (Erythropitta) erythrogaster subspecies 8912
10 Pitta (Erythropitta) erythrogaster genus 234
11 Pitta (Erythropitta) erythrogaster subgenus 6
12 Pitta (Erythropitta) erythrogaster digglesi species 40239
13 Pitta (Erythropitta) erythrogaster digglesi subspecies 8912
14 Pitta (Erythropitta) erythrogaster digglesi genus 234
15 Pitta (Erythropitta) erythrogaster digglesi subgenus 6
16 Pitta (Pitta) iris species 40239
17 Pitta (Pitta) iris subspecies 8912
18 Pitta (Pitta) iris genus 234
19 Pitta (Pitta) iris subgenus 6
20 Pitta (Pitta) iris iris species 40239
21 Pitta (Pitta) iris iris subspecies 8912
22 Pitta (Pitta) iris iris genus 234
23 Pitta (Pitta) iris iris subgenus 6
24 Pitta (Pitta) iris johnstoneiana species 40239
25 Pitta (Pitta) iris johnstoneiana subspecies 8912
26 Pitta (Pitta) iris johnstoneiana genus 234
27 Pitta (Pitta) iris johnstoneiana subgenus 6
28 Pitta (Pitta) versicolor species 40239
29 Pitta (Pitta) versicolor subspecies 8912
30 Pitta (Pitta) versicolor genus 234
31 Pitta (Pitta) versicolor subgenus 6
32 Pitta (Pitta) versicolor intermedia species 40239
33 Pitta (Pitta) versicolor intermedia subspecies 8912
34 Pitta (Pitta) versicolor intermedia genus 234
35 Pitta (Pitta) versicolor intermedia subgenus 6
36 Pitta (Pitta) versicolor simillima species 40239
37 Pitta (Pitta) versicolor simillima subspecies 8912
38 Pitta (Pitta) versicolor simillima genus 234
39 Pitta (Pitta) versicolor simillima subgenus 6
40 Pitta (Pitta) versicolor versicolor species 40239
41 Pitta (Pitta) versicolor versicolor subspecies 8912
42 Pitta (Pitta) versicolor versicolor genus 234
43 Pitta (Pitta) versicolor versicolor subgenus 6
If, for instance, you have the correct species or subspecies name, then searching for matches against the species
and subspecies fields, respectively, will provide more precise results. This is because the field scientificName
may include subgenera. If you’ve used search_taxa() to get the ALA-matched name of a taxon and only want records
identified to a particular level of classification, searching for matches against scientificName is recommended.
Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is
simpler to use search_taxa(). In the example below, search_taxa() matches terms to one genus, three species,
and two subspecies. This can then be used in atlas_counts() to get counts for each scientific name.
>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
... "Bettongia gaimardi", # Tasmanian Bettong
... "Melanodryas vittata", # Dusky Robin
... "Platycercus caledonicus",# Green Rosella
... "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
... "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
... ]
>>> galah.search_taxa(taxa=tas_endemic)
scientificName scientificNameAuthorship taxonConceptID rank matchType kingdom phylum classs order family genus species issues vernacularName
0 Sarcophilus Cuvier, 1837 https://biodiversity.org.au/afd/taxa/aa40a75d-d499-4339-8a5d-c333a29cea1c genus exactMatch Animalia Chordata Mammalia Dasyuromorphia Dasyuridae Sarcophilus [noIssue]
1 Bettongia gaimardi (Desmarest, 1822) https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11 species exactMatch Animalia Chordata Mammalia Diprotodontia Potoroidae Bettongia Bettongia gaimardi [noIssue] Tasmanian Bettong
2 Melanodryas (Amaurodryas) vittata (Quoy & Gaimard, 1830) https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d species exactMatch Animalia Chordata Aves Passeriformes Petroicidae Melanodryas Melanodryas vittata [noIssue] Dusky Robin
3 Platycercus (Platycercus) caledonicus (Gmelin, 1788) https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2 species exactMatch Animalia Chordata Aves Psittaciformes Psittacidae Platycercus Platycercus caledonicus [noIssue] Green Rosella
4 Aquila (Uroaetus) audax fleayi Condon & Amadon, 1954 https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1 subspecies exactMatch Animalia Chordata Aves Accipitriformes Accipitridae Aquila Aquila audax [noIssue] Tasmanian Wedge-tailed Eagle
5 Tyto novaehollandiae castanops (Gould, 1837) https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec subspecies exactMatch Animalia Chordata Aves Strigiformes Tytonidae Tyto Tyto novaehollandiae [noIssue] Tasmanian Masked Owl
>>> galah.atlas_counts(
... taxa=tas_endemic,
... group_by=["scientificName"],
... expand=False
... )
scientificName count
0 Aquila (Uroaetus) audax fleayi 9392
1 Bettongia gaimardi 2447
2 Bettongia gaimardi cuniculus 54
3 Bettongia gaimardi gaimardi 9
4 Melanodryas (Amaurodryas) vittata 13747
5 Melanodryas (Amaurodryas) vittata kingi 729
6 Melanodryas (Amaurodryas) vittata vittata 7499
7 Platycercus (Platycercus) caledonicus 50443
8 Platycercus (Platycercus) caledonicus brownii 447
9 Platycercus (Platycercus) caledonicus caledonicus 30104
10 Sarcophilus 112
11 Sarcophilus harrisii 36865
12 Tyto novaehollandiae castanops 318