Taxonomic Filtering#
Callum Waite, Shandiya Balasubramaniam
Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few
ways to ensure records are not missed by using [functions]
in galah
. Let’s start by configuring galah
to the ALA.
>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")
search_taxa()
#
search_taxa()
enables users to look up taxonomic names before downloading data, which allows for
disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa()
returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided
search term.
>>> galah.search_taxa(taxa="Petroica boodang")
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus species vernacularName issues
0 Petroica (Petroica) boodang (Lesson, 1838) https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica boodang Scarlet Robin noIssue
It can also return taxonomic information for multiple species, including synonyms and Indigneous names.
>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus species vernacularName issues
0 Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica phoenicea Flame Robin noIssue
1 Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica goodenovii Red-capped Robin noIssue
Where homonyms exist, search_taxa()
will prompt users to clarify the search term by providing one or more taxonomic
ranks using the search_taxa()
argument scientific_name
. This example differentiates among the genus Morganella
in three kingdoms:
>>> galah.search_taxa(taxa = ["Morganella"])
Warning: Search returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
search_term issues
0 Morganella homonym
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus issues
0 Morganella Zeller https://id.biodiversity.org.au/node/fungi/60091999 genus Fungi Basidiomycota Agaricales Agaricaceae Morganella noIssue
This disambiguation of the Morganella taxa can then be used by atlas_counts
, atlas_occurrences
,
atlas_species
or atlas_media
by providing the keyword scientific_name
to any of these functions.
>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
totalRecords
0 149
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus
0 -47.000000 168.200000 2002-04-30T00:00:00Z Morganella compacta NZOR-6-128055 f2d0672f-0f8d-467f-a70d-af0ad257743a New Zealand Fungal and Plant Disease Collection PRESENT
1 -46.879900 168.136500 2021-04-15T00:00:00Z Morganella compacta NZOR-6-128055 450d6089-a004-4a4b-b281-50f60e7596bf New Zealand Fungal and Plant Disease Collection PRESENT
2 -46.874875 168.124660 1983-02-13T00:00:00Z Morganella compacta NZOR-6-128055 eaae280b-c1de-463a-8cc6-3ee72cc00e83 New Zealand Fungal and Plant Disease Collection PRESENT
3 -46.862757 168.116777 1985-04-23T00:00:00Z Morganella compacta NZOR-6-128055 a08519f3-221b-4006-9803-98d8eeb771a8 New Zealand Fungal and Plant Disease Collection PRESENT
4 -46.554617 169.479051 1990-05-24T00:00:00Z Morganella compacta NZOR-6-128055 4b2011f7-ef20-428d-af8f-dc17c9656df6 New Zealand Fungal and Plant Disease Collection PRESENT
.. ... ... ... ... ... ... ... ...
144 NaN NaN 1910-04-01T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 6ec9d2c2-238e-45b6-ae43-bcf0eda37555 Royal Botanic Gardens, Kew - Fungarium Specimens PRESENT
145 NaN NaN NaN Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 4302254c-0f4d-44bb-a017-ade0c5007dff Royal Botanic Gardens, Kew - Fungarium Specimens PRESENT
146 -22.500000 145.000000 NaN Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 4269a5b2-c102-4688-8d1d-cf4790459675 USDA United States National Fungus Collections PRESENT
147 NaN NaN NaN Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 e72cb4f1-7cd9-4720-a345-1f32869a34d1 USDA United States National Fungus Collections PRESENT
148 -8.916667 148.150000 1953-07-15T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 633433fe-a3b4-4b4f-af12-36b54c1fd9ff Centre for Australian National Biodiversity Research (CANB) AVH data PRESENT
[149 rows x 8 columns]
filters=
#
filters=
subsets records by searching for exact matches to an expression, and may also be used for taxonomic
filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for
single or multiple species. We can also group the multiple species by their species names so we can compare the
number of records for each robin.
>>> galah.atlas_counts(taxa="Petroica boodang")
totalRecords
0 132526
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
... "Petroica phoenicea", "Petroica rosea",
... "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
... taxa=aus_petroica,
... group_by=["species","vernacularName"]
... )
species vernacularName count
0 Petroica boodang Eastern Scarlet Robin 3846
1 Petroica boodang Scarlet Robin 128368
2 Petroica boodang South-western Scarlet Robin 212
3 Petroica boodang Tasmanian Scarlet Robin 100
4 Petroica goodenovii Red-capped Robin 121141
5 Petroica multicolor Pacific Robin 6860
6 Petroica phoenicea Flame Robin 82845
7 Petroica rodinogaster Mainland Pink Robin 69
8 Petroica rodinogaster Pink Robin 15601
9 Petroica rodinogaster Tasmanian Pink Robin 50
10 Petroica rosea Rose Robin 60583
This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:
>>> non_chordates = galah.atlas_counts(
... filters=["kingdom=Animalia","phylum!=Chordata"],
... group_by=["phylum"],
... expand=False
... )
>>> non_chordates.head()
phylum count
0 Acanthocephala 482
1 Annelida 332489
2 Arthropoda 10197787
3 Brachiopoda 11635
4 Bryozoa 32982
filters=
, search_taxa()
, and taxonomic ranks#
Deciding between using filters=
and search_taxa()
in a query comes down to how a record has been classified,
and whether or not you have the correct unique name and classification of the taxa of interest.
The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some
secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters=
and search_taxa()
.
Additionally, there is a field named scientificName
, which refers to the lowest taxonomic rank to which a record
has been identified e.g.
>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
... taxa="Pitta",
... group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
scientificName taxonRank count
0 Pitta genus 70
1 Pitta (Erythropitta) subgenus 882
2 Pitta (Erythropitta) erythrogaster species 190
3 Pitta (Erythropitta) erythrogaster digglesi subspecies 6
4 Pitta (Pitta) iris species 6597
5 Pitta (Pitta) iris iris subspecies 91
6 Pitta (Pitta) iris johnstoneiana subspecies 27
7 Pitta (Pitta) versicolor species 30306
8 Pitta (Pitta) versicolor intermedia subspecies 64
9 Pitta (Pitta) versicolor simillima subspecies 53
10 Pitta (Pitta) versicolor versicolor subspecies 424
If, for instance, you have the correct species or subspecies name, then searching for matches against the species
and subspecies fields, respectively, will provide more precise results. This is because the field scientificName
may include subgenera. If you’ve used search_taxa()
to get the ALA-matched name of a taxon and only want records
identified to a particular level of classification, searching for matches against scientificName
is recommended.
Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is
simpler to use search_taxa()
. In the example below, search_taxa()
matches terms to one genus, three species,
and two subspecies. This can then be used in atlas_counts()
to get counts for each scientific name.
>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
... "Bettongia gaimardi", # Tasmanian Bettong
... "Melanodryas vittata", # Dusky Robin
... "Platycercus caledonicus",# Green Rosella
... "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
... "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
... ]
>>> galah.search_taxa(taxa=tas_endemic)
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus issues species vernacularName
0 Sarcophilus Cuvier, 1837 https://biodiversity.org.au/afd/taxa/06455b77-7d50-4ec7-9122-8ab48cfb0c1c genus Animalia Chordata Dasyuromorphia Dasyuridae Sarcophilus noIssue NaN NaN
1 Bettongia gaimardi (Desmarest, 1822) https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11 species Animalia Chordata Diprotodontia Potoroidae Bettongia noIssue Bettongia gaimardi Tasmanian Bettong
2 Melanodryas (Amaurodryas) vittata (Quoy & Gaimard, 1830) https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d species Animalia Chordata Passeriformes Petroicidae Melanodryas noIssue Melanodryas vittata Dusky Robin
3 Platycercus (Platycercus) caledonicus (Gmelin, 1788) https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2 species Animalia Chordata Psittaciformes Psittacidae Platycercus noIssue Platycercus caledonicus Green Rosella
4 Aquila (Uroaetus) audax fleayi Condon & Amadon, 1954 https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1 subspecies Animalia Chordata Accipitriformes Accipitridae Aquila noIssue Aquila audax Tasmanian Wedge-tailed Eagle
5 Tyto novaehollandiae castanops (Gould, 1837) https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec subspecies Animalia Chordata Strigiformes Tytonidae Tyto noIssue Tyto novaehollandiae Tasmanian Masked Owl
>>> galah.atlas_counts(
... taxa=tas_endemic,
... group_by=["scientificName"],
... expand=False
... )
scientificName count
0 Aquila (Uroaetus) audax fleayi 5103
1 Bettongia gaimardi 2286
2 Bettongia gaimardi cuniculus 54
3 Bettongia gaimardi gaimardi 9
4 Melanodryas (Amaurodryas) vittata 15807
5 Melanodryas (Amaurodryas) vittata kingi 16
6 Melanodryas (Amaurodryas) vittata vittata 63
7 Platycercus (Platycercus) caledonicus 51527
8 Platycercus (Platycercus) caledonicus brownii 24
9 Platycercus (Platycercus) caledonicus caledonicus 50
10 Sarcophilus 131
11 Sarcophilus harrisii 36611
12 Tyto novaehollandiae castanops 85