Taxonomic Filtering#
Callum Waite, Shandiya Balasubramaniam
Taxonomic complexity can confound the process of searching, filtering, and downloading records using galah, but there are a few
ways to ensure records are not missed by using [functions]
in galah
. Let’s start by configuring galah
to the ALA.
>>> import galah
>>> galah.galah_config(atlas="Australia",email="your-email-here")
search_taxa()
#
search_taxa()
enables users to look up taxonomic names before downloading data, which allows for
disambiguating homonyms and checking that the search term matches the taxon name in the ALA . search_taxa()
returns the scientific name, authorship, rank, and full classification for the taxon matched to the provided
search term.
>>> galah.search_taxa(taxa="Petroica boodang")
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus species vernacularName issues
0 Petroica (Petroica) boodang (Lesson, 1838) https://biodiversity.org.au/afd/taxa/a3e5376b-f9e6-4bdf-adae-1e7add9f5c29 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica boodang Scarlet Robin noIssue
It can also return taxonomic information for multiple species, including synonyms and Indigneous names.
>>> # Muscicapa chrysoptera is a synonym for the Flame Robin, Petroica phoenicea
>>> # Guniibuu is the Yuwaalaraay Indigenous name for the Red-Capped Robin, Petroica goodenovii
>>> galah.search_taxa(taxa = ["Muscicapa chrysoptera", "Guniibuu"])
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus species vernacularName issues
0 Petroica (Littlera) phoenicea Gould, 1837 https://biodiversity.org.au/afd/taxa/fe74e658-4848-437a-a23d-f1001a198552 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica phoenicea Flame Robin noIssue
1 Petroica (Petroica) goodenovii (Vigors & Horsfield, 1827) https://biodiversity.org.au/afd/taxa/10dbd908-00f3-4ec2-9a9c-a2fd4782eaf1 species Animalia Chordata Passeriformes Petroicidae Petroica Petroica goodenovii Red-capped Robin noIssue
Where homonyms exist, search_taxa()
will prompt users to clarify the search term by providing one or more taxonomic
ranks using the search_taxa()
argument scientific_name
. This example differentiates among the genus Morganella
in three kingdoms:
>>> galah.search_taxa(taxa = ["Morganella"])
Warning: Search returned multiple taxa due to a homonym issue.
Please use the `scientific_name` argument to clarify taxa.
search_term issues
0 Morganella homonym
>>> galah.search_taxa(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus issues
0 Morganella Zeller https://id.biodiversity.org.au/node/fungi/60091999 genus Fungi Basidiomycota Agaricales Agaricaceae Morganella noIssue
This disambiguation of the Morganella taxa can then be used by atlas_counts
, atlas_occurrences
,
atlas_species
or atlas_media
by providing the keyword scientific_name
to any of these functions.
>>> galah.atlas_counts(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
totalRecords
0 156
>>> galah.atlas_occurrences(scientific_name={"kingdom": ["Fungi"],"scientificName": ["Morganella"]})
decimalLatitude decimalLongitude eventDate scientificName taxonConceptID recordID dataResourceName occurrenceStatus
0 -47.000000 168.200000 2002-04-30T00:00:00Z Morganella compacta NZOR-6-128055 461f0148-7549-4a81-9495-46704b26924b New Zealand Virtual Herbarium PRESENT
1 -46.874868 168.124626 1983-02-13T00:00:00Z Morganella compacta NZOR-6-128055 569d52b1-2481-43c3-8450-154fd5a52df1 New Zealand Virtual Herbarium PRESENT
2 -46.862750 168.116743 1985-04-23T00:00:00Z Morganella compacta NZOR-6-128055 5e633f17-193b-48d6-a0d9-ecd8251cc755 New Zealand Virtual Herbarium PRESENT
3 -46.554626 169.479023 1990-05-24T00:00:00Z Morganella compacta NZOR-6-128055 a2b24275-5c64-4a52-bbab-1fc2d72e142d New Zealand Virtual Herbarium PRESENT
4 -46.054081 170.192709 1967-12-17T00:00:00Z Morganella compacta NZOR-6-128055 144f29e2-f06c-46b1-8792-eda5b38548f1 New Zealand Virtual Herbarium PRESENT
.. ... ... ... ... ... ... ... ...
151 -26.779400 152.880300 2009-02-19T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 092b6f3e-ef27-4cbf-bb28-b172c1b200c5 National Herbarium of Victoria (MEL) AVH data PRESENT
152 -26.777741 152.880254 2012-02-15T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 8b7a0609-2c0b-42fc-9676-ba7ef9a73a5e BowerBird PRESENT
153 NaN NaN 1964-11-21T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 4c74c65a-d9b8-4e81-9101-a283eec56e5a National Herbarium of Victoria (MEL) AVH data PRESENT
154 -26.405958 153.026140 2018-03-16T01:01:09Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 fc2816fd-4b65-4af1-b0a2-1b086c26d97f ALA species sightings and OzAtlas PRESENT
155 -8.916667 148.150000 1953-07-15T00:00:00Z Morganella purpurascens https://id.biodiversity.org.au/node/fungi/60092001 633433fe-a3b4-4b4f-af12-36b54c1fd9ff Centre for Australian National Biodiversity Research (CANB) AVH data PRESENT
[156 rows x 8 columns]
filters=
#
filters=
subsets records by searching for exact matches to an expression, and may also be used for taxonomic
filtering. for example, if we want to search for multiple species of robins in Australia, we can do this for
single or multiple species. We can also group the multiple species by their species names so we can compare the
number of records for each robin.
>>> galah.atlas_counts(taxa="Petroica boodang")
totalRecords
0 119420
>>> aus_petroica = ["Petroica boodang", "Petroica goodenovii",
... "Petroica phoenicea", "Petroica rosea",
... "Petroica rodinogaster", "Petroica multicolor"]
>>> galah.atlas_counts(
... taxa=aus_petroica,
... group_by=["species","vernacularName"]
... )
species vernacularName count
0 Petroica boodang Eastern Scarlet Robin 3496
1 Petroica boodang Scarlet Robin 115738
2 Petroica boodang South-western Scarlet Robin 160
3 Petroica boodang Tasmanian Scarlet Robin 26
4 Petroica goodenovii Red-capped Robin 110285
5 Petroica multicolor Pacific Robin 6699
6 Petroica phoenicea Flame Robin 81573
7 Petroica rodinogaster Mainland Pink Robin 60
8 Petroica rodinogaster Pink Robin 13417
9 Petroica rodinogaster Tasmanian Pink Robin 14
10 Petroica rosea Rose Robin 52415
This can be useful in searching for paraphyletic or polyphyletic groups. For example, to get counts of non-chordates:
>>> non_chordates = galah.atlas_counts(
... filters=["kingdom=Animalia","phylum!=Chordata"],
... group_by=["phylum"],
... expand=False
... )
>>> non_chordates.head()
here
phylum count
0 Acanthocephala 398
1 Annelida 316621
2 Arthropoda 8568339
3 Brachiopoda 2157
4 Bryozoa 24682
filters=
, search_taxa()
, and taxonomic ranks#
Deciding between using filters=
and search_taxa()
in a query comes down to how a record has been classified,
and whether or not you have the correct unique name and classification of the taxa of interest.
The ALA has fields for the primary taxonomic ranks (kingdom, phylum, class, order, family, genus, species) and some
secondary ranks (e.g. subfamily, subgenus), all of which may be used with filters=
and search_taxa()
.
Additionally, there is a field named scientificName
, which refers to the lowest taxonomic rank to which a record
has been identified e.g.
>>> import numpy as np
>>> pitta_ranks = galah.atlas_counts(
... taxa="Pitta",
... group_by=["scientificName","taxonRank"]
... )
>>> pitta_ranks = pitta_ranks.loc[pitta_ranks["scientificName"].notnull()]
>>> pitta_ranks
scientificName taxonRank count
0 Pitta genus 76
1 Pitta (Erythropitta) subgenus 728
2 Pitta (Erythropitta) erythrogaster species 190
3 Pitta (Erythropitta) erythrogaster digglesi subspecies 21
4 Pitta (Pitta) iris species 5722
5 Pitta (Pitta) iris iris subspecies 76
6 Pitta (Pitta) iris johnstoneiana subspecies 27
7 Pitta (Pitta) versicolor species 26086
8 Pitta (Pitta) versicolor intermedia subspecies 42
9 Pitta (Pitta) versicolor simillima subspecies 38
10 Pitta (Pitta) versicolor versicolor subspecies 310
If, for instance, you have the correct species or subspecies name, then searching for matches against the species
and subspecies fields, respectively, will provide more precise results. This is because the field scientificName
may include subgenera. If you’ve used search_taxa()
to get the ALA-matched name of a taxon and only want records
identified to a particular level of classification, searching for matches against scientificName
is recommended.
Paraphyletic or polyphyletic groups may contain taxa identified to different taxonomic levels. In this case, it is
simpler to use search_taxa()
. In the example below, search_taxa()
matches terms to one genus, three species,
and two subspecies. This can then be used in atlas_counts()
to get counts for each scientific name.
>>> tas_endemic = ["Sarcophilus", # Tasmanian Devil
... "Bettongia gaimardi", # Tasmanian Bettong
... "Melanodryas vittata", # Dusky Robin
... "Platycercus caledonicus",# Green Rosella
... "Aquila audax fleayi", # Tasmanian Wedge-Tailed Eagle
... "Tyto novaehollandiae castanops" # Tasmanian Masked Owl
... ]
>>> galah.search_taxa(taxa=tas_endemic)
scientificName scientificNameAuthorship taxonConceptID rank kingdom phylum order family genus issues species vernacularName
0 Sarcophilus Cuvier, 1837 https://biodiversity.org.au/afd/taxa/06455b77-7d50-4ec7-9122-8ab48cfb0c1c genus Animalia Chordata Dasyuromorphia Dasyuridae Sarcophilus noIssue NaN NaN
1 Bettongia gaimardi (Desmarest, 1822) https://biodiversity.org.au/afd/taxa/8f7da937-6338-4c39-8b11-4f83807afe11 species Animalia Chordata Diprotodontia Potoroidae Bettongia noIssue Bettongia gaimardi Tasmanian Bettong
2 Melanodryas (Amaurodryas) vittata (Quoy & Gaimard, 1830) https://biodiversity.org.au/afd/taxa/0f04889f-5489-4369-a545-8a041fba9f6d species Animalia Chordata Passeriformes Petroicidae Melanodryas noIssue Melanodryas vittata Dusky Robin
3 Platycercus (Platycercus) caledonicus (Gmelin, 1788) https://biodiversity.org.au/afd/taxa/c6e478fe-f199-463f-8576-a77108fd73e2 species Animalia Chordata Psittaciformes Psittacidae Platycercus noIssue Platycercus caledonicus Green Rosella
4 Aquila (Uroaetus) audax fleayi Condon & Amadon, 1954 https://biodiversity.org.au/afd/taxa/ac93f7f0-0686-4589-801a-5832378cb7c1 subspecies Animalia Chordata Accipitriformes Accipitridae Aquila noIssue Aquila audax Tasmanian Wedge-tailed Eagle
5 Tyto novaehollandiae castanops (Gould, 1837) https://biodiversity.org.au/afd/taxa/2c30d58b-572b-4dab-8644-b222c28eb0ec subspecies Animalia Chordata Strigiformes Tytonidae Tyto noIssue Tyto novaehollandiae Tasmanian Masked Owl
>>> galah.atlas_counts(
... taxa=tas_endemic,
... group_by=["scientificName"],
... expand=False
... )
scientificName count
0 Aquila (Uroaetus) audax fleayi 4935
1 Bettongia gaimardi 1941
2 Bettongia gaimardi cuniculus 41
3 Bettongia gaimardi gaimardi 9
4 Melanodryas (Amaurodryas) vittata 14131
5 Melanodryas (Amaurodryas) vittata kingi 15
6 Melanodryas (Amaurodryas) vittata vittata 39
7 Platycercus (Platycercus) caledonicus 43463
8 Platycercus (Platycercus) caledonicus brownii 24
9 Platycercus (Platycercus) caledonicus caledonicus 33
10 Sarcophilus 3
11 Sarcophilus harrisii 36302
12 Tyto novaehollandiae castanops 63