Taxonomic Filtering#
Callum Waite, Shandiya Balasubramaniam
Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few
ways to ensure records are not missed by using [functions]
in galah
. Let’s start by configuring galah
to the ALA.
>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")
search_taxa()
#
search_taxa()
enables users to look up taxonomic names before downloading data, which allows for
disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa()
returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided
search term.
>>> galah.search_taxa(taxa="Petroica boodang")
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus species vernacularName issues
0 Petroica (Petroica) boodang (Lesson, 1838) https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica boodang Scarlet Robin noIssue
It can also return taxonomic information for multiple species, including synonyms and Indigneous names.
>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus species vernacularName issues
0 Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica phoenicea Flame Robin noIssue
1 Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica goodenovii Red-capped Robin noIssue
Where homonyms exist, search_taxa()
will prompt users to clarify the search term by providing one or more taxonomic
ranks using the search_taxa()
argument scientific_name
. This example differentiates among the genus Morganella
in three kingdoms:
>>> galah.search_taxa(taxa = ["Morganella"])
Warning: Search returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
search_term issues
0 Morganella homonym
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus issues
0 Morganella Zeller https://id.biodiversity.org.au/node/fungi/60091999 genus Fungi Basidiomycota Agaricales Agaricaceae Morganella noIssue
This disambiguation of the Morganella taxa can then be used by atlas_counts
, atlas_occurrences
,
atlas_species
or atlas_media
by providing the keyword scientific_name
to any of these functions.
>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
totalRecords
0 160
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus
0 -47.000000 168.200000 2002-04-30T00:00:00Z Morganella compacta NZOR-6-128055 f2d0672f-0f8d-467f-a70d-af0ad257743a New Zealand Fungal and Plant Disease Collection PRESENT
1 -46.936200 168.126000 2021-04-15T00:00:00Z Morganella compacta NZOR-6-128055 450d6089-a004-4a4b-b281-50f60e7596bf New Zealand Fungal and Plant Disease Collection PRESENT
2 -46.874875 168.124660 1983-02-13T00:00:00Z Morganella compacta NZOR-6-128055 eaae280b-c1de-463a-8cc6-3ee72cc00e83 New Zealand Fungal and Plant Disease Collection PRESENT
3 -46.862757 168.116777 1985-04-23T00:00:00Z Morganella compacta NZOR-6-128055 a08519f3-221b-4006-9803-98d8eeb771a8 New Zealand Fungal and Plant Disease Collection PRESENT
4 -46.554617 169.479051 1990-05-24T00:00:00Z Morganella compacta NZOR-6-128055 4b2011f7-ef20-428d-af8f-dc17c9656df6 New Zealand Fungal and Plant Disease Collection PRESENT
.. ... ... ... ... ... ... ... ...
155 -26.779400 152.880300 2009-02-19T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 092b6f3e-ef27-4cbf-bb28-b172c1b200c5 National Herbarium of Victoria (MEL) AVH data PRESENT
156 -26.777741 152.880254 2012-02-15T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 8b7a0609-2c0b-42fc-9676-ba7ef9a73a5e BowerBird PRESENT
157 NaN NaN 1964-11-21T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 4c74c65a-d9b8-4e81-9101-a283eec56e5a National Herbarium of Victoria (MEL) AVH data PRESENT
158 -26.405958 153.026140 2018-03-16T01:01:09Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 fc2816fd-4b65-4af1-b0a2-1b086c26d97f ALA species sightings and OzAtlas PRESENT
159 -8.916667 148.150000 1953-07-15T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 633433fe-a3b4-4b4f-af12-36b54c1fd9ff Centre for Australian National Biodiversity Research (CANB) AVH data PRESENT
[160 rows x 8 columns]
filters=
#
filters=
subsets records by searching for exact matches to an expression, and may also be used for taxonomic
filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for
single or multiple species. We can also group the multiple species by their species names so we can compare the
number of records for each robin.
>>> galah.atlas_counts(taxa="Petroica boodang")
totalRecords
0 132920
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
... "Petroica phoenicea", "Petroica rosea",
... "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
... taxa=aus_petroica,
... group_by=["species","vernacularName"]
... )
species vernacularName count
0 Petroica boodang Eastern Scarlet Robin 3593
1 Petroica boodang Scarlet Robin 129112
2 Petroica boodang South-western Scarlet Robin 166
3 Petroica boodang Tasmanian Scarlet Robin 49
4 Petroica goodenovii Red-capped Robin 119863
5 Petroica multicolor Pacific Robin 6703
6 Petroica phoenicea Flame Robin 88405
7 Petroica rodinogaster Mainland Pink Robin 60
8 Petroica rodinogaster Pink Robin 15629
9 Petroica rodinogaster Tasmanian Pink Robin 21
10 Petroica rosea Rose Robin 60078
This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:
>>> non_chordates = galah.atlas_counts(
... filters=["kingdom=Animalia","phylum!=Chordata"],
... group_by=["phylum"],
... expand=False
... )
>>> non_chordates.head()
here
phylum count
0 Acanthocephala 403
1 Annelida 320633
2 Arthropoda 8970751
3 Brachiopoda 2181
4 Bryozoa 25617
filters=
, search_taxa()
, and taxonomic ranks#
Deciding between using filters=
and search_taxa()
in a query comes down to how a record has been classified,
and whether or not you have the correct unique name and classification of the taxa of interest.
The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some
secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters=
and search_taxa()
.
Additionally, there is a field named scientificName
, which refers to the lowest taxonomic rank to which a record
has been identified e.g.
>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
... taxa="Pitta",
... group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
scientificName taxonRank count
0 Pitta genus 68
1 Pitta (Erythropitta) subgenus 883
2 Pitta (Erythropitta) erythrogaster species 181
3 Pitta (Erythropitta) erythrogaster digglesi subspecies 6
4 Pitta (Pitta) iris species 6548
5 Pitta (Pitta) iris iris subspecies 83
6 Pitta (Pitta) iris johnstoneiana subspecies 27
7 Pitta (Pitta) versicolor species 30024
8 Pitta (Pitta) versicolor intermedia subspecies 42
9 Pitta (Pitta) versicolor simillima subspecies 38
10 Pitta (Pitta) versicolor versicolor subspecies 311
If, for instance, you have the correct species or subspecies name, then searching for matches against the species
and subspecies fields, respectively, will provide more precise results. This is because the field scientificName
may include subgenera. If you’ve used search_taxa()
to get the ALA-matched name of a taxon and only want records
identified to a particular level of classification, searching for matches against scientificName
is recommended.
Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is
simpler to use search_taxa()
. In the example below, search_taxa()
matches terms to one genus, three species,
and two subspecies. This can then be used in atlas_counts()
to get counts for each scientific name.
>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
... "Bettongia gaimardi", # Tasmanian Bettong
... "Melanodryas vittata", # Dusky Robin
... "Platycercus caledonicus",# Green Rosella
... "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
... "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
... ]
>>> galah.search_taxa(taxa=tas_endemic)
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus issues species vernacularName
0 Sarcophilus Cuvier, 1837 https://biodiversity.org.au/afd/taxa/06455b77-7d50-4ec7-9122-8ab48cfb0c1c genus Animalia Chordata Dasyuromorphia Dasyuridae Sarcophilus noIssue NaN NaN
1 Bettongia gaimardi (Desmarest, 1822) https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11 species Animalia Chordata Diprotodontia Potoroidae Bettongia noIssue Bettongia gaimardi Tasmanian Bettong
2 Melanodryas (Amaurodryas) vittata (Quoy & Gaimard, 1830) https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d species Animalia Chordata Passeriformes Petroicidae Melanodryas noIssue Melanodryas vittata Dusky Robin
3 Platycercus (Platycercus) caledonicus (Gmelin, 1788) https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2 species Animalia Chordata Psittaciformes Psittacidae Platycercus noIssue Platycercus caledonicus Green Rosella
4 Aquila (Uroaetus) audax fleayi Condon & Amadon, 1954 https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1 subspecies Animalia Chordata Accipitriformes Accipitridae Aquila noIssue Aquila audax Tasmanian Wedge-tailed Eagle
5 Tyto novaehollandiae castanops (Gould, 1837) https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec subspecies Animalia Chordata Strigiformes Tytonidae Tyto noIssue Tyto novaehollandiae Tasmanian Masked Owl
>>> galah.atlas_counts(
... taxa=tas_endemic,
... group_by=["scientificName"],
... expand=False
... )
scientificName count
0 Aquila (Uroaetus) audax fleayi 5026
1 Bettongia gaimardi 1960
2 Bettongia gaimardi cuniculus 41
3 Bettongia gaimardi gaimardi 9
4 Melanodryas (Amaurodryas) vittata 15709
5 Melanodryas (Amaurodryas) vittata kingi 16
6 Melanodryas (Amaurodryas) vittata vittata 45
7 Platycercus (Platycercus) caledonicus 51142
8 Platycercus (Platycercus) caledonicus brownii 24
9 Platycercus (Platycercus) caledonicus caledonicus 33
10 Sarcophilus 3
11 Sarcophilus harrisii 36355
12 Tyto novaehollandiae castanops 67