API Docs#
- galah.atlas_counts(taxa=None, filters=None, group_by=None, expand=True, verbose=False, use_data_profile=False)#
Prior to downloading data it is often valuable to have some estimate of how many records are available, both for deciding if the query is feasible, and for estimating how long it will take to download. Alternatively, for some kinds of reporting, the count of observations may be all that is required, for example for understanding how observations are growing or shrinking in particular locations, or for particular taxa.
To this end,
galah.atlas_counts()
takes arguments in the same format asgalah.atlas_occurrences()
, and provides either a total count of records matching the criteria, or a data.frame of counts matching the criteria supplied to the group_by argument.- Parameters:
taxa (string) – one or more scientific names. Use
galah.search_taxa()
to search for valid scientific names.filters (pandas.DataFrame) – filters, in the form
field
logical
value
(e.g."year=2021"
)group_by (string) – zero or more individual column names (i.e. fields) to include. See
galah.show_all()
andgalah.search_all()
to see valid fields.expand (logical) – When using the
group_by
argument ofgalah.atlas_counts()
, controls whether counts for each row value are combined or calculated separately. Defaults toTrue
.verbose (logical) – If
True
, galah gives more information like progress bars. Defaults toFalse
.use_data_profile (string) – A profile name. Should be a string - the name or abbreviation of a data quality profile to apply to the query. Valid values can be seen using
galah.show_all(profiles=True)
- Return type:
An object of class
pandas.DataFrame
.
Examples
Return total records in your chosen atlas
galah.atlas_counts()
totalRecords 0 114489442
Return records from 2020 onwards, grouped by year
galah.atlas_counts(filters="year>2019",group_by="year")
year count 0 2020 6655859 1 2021 7349495 2 2022 1716751 3 2023 493458
- galah.atlas_media(taxa=None, filters=None, fields=None, verbose=False, multimedia=None, assertions=None, use_data_profile=False, collect=False, path=None)#
In addition to text data describing individual occurrences and their attributes, ALA stores images, sounds and videos associated with a given record.
galah.atlas_media()
displays metadata for any and all of the media types.- Parameters:
taxa (string / list) – one or more scientific names. Use
galah.search_taxa()
to search for valid scientific names.filters (string / list) – filters, in the form
field
logical
value
(e.g."year=2021"
)fields (string / list) –
Name of one or more column groups to include. Valid options are “basic”, “event” and “assertions” Default is set to
"fields=basic"
, which returns:decimalLatitude, decimalLongitude, eventDate, scientificName, taxonConceptID, recordID, dataResourceName, occurrenceStatus
Using
"fields="event"
returns:eventRemarks, eventTime, eventID, eventDate, samplingEffort, samplingProtocol
Using
fields="media"
returns:multimedia, multimediaLicence, images, videos, sounds
See
galah.show_all()
andgalah.search_all()
to see all valid fields.verbose (logical) – If
True
, galah gives more information like progress bars. Defaults toFalse
multimedia (string / list) – This is for specifying what types of multimedia you would like, i.e “images”. Defaults to [‘images’,’videos’,’sounds’]
assertions (string) – Using “assertions” returns all quality assertion-related columns. These columns are data quality checks run by each living atlas. The list of assertions is shown by
galah.show_all(assertions=True)
.use_data_profile (logical) – if
True
, uses data profile set ingalah_config()
. Valid values can be seen usinggalah.show_all(profiles=True)
. Default isFalse
collect (logical) – if
True
, downloads full-sized images and media files returned to a local directory.path (string) – path to directory where downloaded media will be stored. Defaults to current directory.
- Return type:
An object of class
pandas.DataFrame
. Ifcollect=True
, available image & media files are downloaded to a user local directory.
Examples
filters = ["year=2020","decimalLongitude>153.0"] galah.atlas_media(taxa="Ornithorhynchus anatinus",filters=filters)
decimalLatitude ... occurrenceID 0 -30.395442 ... 42912016-2409-4125-a61f-13ffd4c32fcb 1 -30.348831 ... 6882dfe3-5295-493d-a005-66f8392e9d5b 2 -30.279557 ... 72ad295f-112d-442f-83e1-fd385ef9163a 3 -30.247747 ... dd885218-bdb9-404a-8441-e5cd329f6605 4 -30.218699 ... 072bc4f0-7581-44c6-b0ec-e3fe58d251fc 5 -28.794918 ... 2a701513-b21d-4baa-bac4-5e6a3a65252b 6 -28.794918 ... 2a701513-b21d-4baa-bac4-5e6a3a65252b 7 -28.656448 ... 12d8c7f9-9d66-4312-9245-87a0a1021557 8 -28.213785 ... 808b0c19-9616-45f1-994e-e3b33fc32836 9 -28.213785 ... 808b0c19-9616-45f1-994e-e3b33fc32836 10 -28.765905 ... NaN [11 rows x 19 columns]
- galah.atlas_occurrences(taxa=None, filters=None, test=False, verbose=False, fields=None, assertions=None, use_data_profile=False)#
The most common form of data stored by living atlases are observations of individual life forms, known as ‘occurrences’. This function allows the user to search for occurrence records that match their specific criteria, and return them as a
pandas.DataFrame
for analysis. Optionally, the user can also request a DOI for a given download to facilitate citation and re-use of specific data resources.- Parameters:
taxa (string) – one or more scientific names. Use
galah.search_taxa()
to search for valid scientific names.filters (string / list) – filters, in the form
field
logical
value
(e.g."year=2021"
)test (logical) – Test if the API is up and running correctly. Prints status of Atlas and returns.
verbose (logical) – If
True
, galah gives more information like URLs of your queries. Defaults toFalse
fields (string / list) –
Name of one or more column groups to include. Valid options are “basic”, “event” and “assertions” Default is set to
"fields=basic"
, which returns:decimalLatitude, decimalLongitude, eventDate, scientificName, taxonConceptID, recordID, dataResourceName, occurrenceStatus
Using
"fields="event"
returns:eventRemarks, eventTime, eventID, eventDate, samplingEffort, samplingProtocol
Using
fields="media"
returns:multimedia, multimediaLicence, images, videos, sounds
See
galah.show_all()
andgalah.search_all()
to see all valid fields.assertions (string / list) – Using “assertions” returns all quality assertion-related columns. These columns are data quality checks run by each living atlas. The list of assertions is shown by
galah.show_all(assertions=True)
.use_data_profile (string) – A profile name. Should be a string - the name or abbreviation of a data quality profile to apply to the query. Valid values can be seen using
galah.show_all(profiles=True)
- Return type:
An object of class
pandas.DataFrame
.
Examples
Download records of Vulpes vulpes in 2023
import galah galah.galah_config(atlas="Australia",email="your-email@example.com") galah.atlas_occurrences(taxa="Vulpes vulpes",filters="year=2023")
decimalLatitude ... occurrenceStatus 0 -39.083564 ... PRESENT 1 -39.021544 ... PRESENT 2 -38.668361 ... PRESENT 3 -38.639731 ... PRESENT 4 -38.631438 ... PRESENT ... ... ... ... 1370 -24.071733 ... PRESENT 1371 -23.915604 ... PRESENT 1372 -23.295037 ... PRESENT 1373 37.808811 ... PRESENT 1374 51.100000 ... PRESENT [1375 rows x 8 columns]
Download records of Vulpes vulpes in 2023, returning only
eventDate
fieldimport galah galah.galah_config(atlas="Australia",email="your-email@example.com") galah.atlas_occurrences(taxa="Vulpes vulpes",filters="year=2023",fields="eventDate")
eventDate 0 2022-12-31T14:56:00Z 1 2022-12-31T21:21:27Z 2 2023-01-01T00:00:00Z 3 2023-01-01T00:00:00Z 4 2023-01-01T00:00:00Z ... ... 1370 2023-04-18T23:35:35Z 1371 2023-04-19T00:13:40Z 1372 2023-04-19T01:08:00Z 1373 2023-04-21T13:52:00Z 1374 2023-04-21T22:54:38Z [1375 rows x 1 columns]
- galah.atlas_species(taxa=None, filters=None, verbose=False)#
While there are reasons why users may need to check every record meeting their search criteria (i.e. using
galah.atlas_occurrences()
), a common use case is to simply identify which species occur in a specified region, time period, or taxonomic group. This function returns apandas.DataFrame
with one row per species, and columns giving associated taxonomic information.- Parameters:
taxa (string / list) – one or more scientific names. Use
galah.search_taxa()
to search for valid scientific names.rank (string) – filters, in the form
field
logical
value
(e.g."year=2021"
)verbose – If
True
, galah gives you the URLs used to query all the data. Default toFalse
.
- Return type:
An object of class
pandas.DataFrame
.
Examples
galah.atlas_species(taxa="Heleioporus")
species ... vernacular_name 0 Heleioporus eyrei ... Moaning Frog 1 Heleioporus australiacus ... Giant Burrowing Frog 2 Heleioporus albopunctatus ... Western Spotted Frog 3 Heleioporus psammophilus ... Sand Frog 4 Heleioporus inornatus ... Plains Frog 5 Heleioporus barycragus ... Hooting Frog [6 rows x 9 columns]
- galah.galah_config(email=None, email_notify=None, atlas=None, data_profile=None, ranks=None, reason=None)#
The galah package supports large data downloads, and also interfaces with the ALA which requires that users of some services provide a registered email address and reason for downloading data. The
galah_config()
function provides a way to manage these issues as simply as possible.- Parameters:
email (string) – An email address that has been registered with the chosen atlas. For the ALA, you can register here.
email_notify (string) – Used to receive an email for each query to
galah.atlas_occurrences()
. Defaults toNone
, but can be useful in some instances, for example for tracking DOIs assigned to specific downloads for later citation.atlas (string) – Living Atlas to point to,
Australia
by default. Can be an organisation name, acronym, or region (seeshow_all(atlases=True)
for admissible values)data_profile (string) – A profile name. Should be a string - the name or abbreviation of a data quality profile to apply to the query. Valid values can be seen using
galah.show_all(profiles=True)
ranks (string) – A string letting galah know what taxonomic ranks to show. Use “all” to see all 69 possible ranks, and “gbif” to see the 9 most common ranks.
reason (integer) – A number (integer) providing the reason you are downloading data. Default is set to 4 (scientific research). For a list of all possible reasons run
galah.show_all_reasons()
- Returns:
- No arguments (A
pandas.DataFrame
of all current configuration options.)- >=1 arguments (None)
Examples
import galah galah.galah_config(email="yourname@example.com")
- galah.search_all(assertions=None, atlases=None, apis=None, collection=None, datasets=None, fields=None, licences=None, lists=None, profiles=None, providers=None, ranks=None, reasons=None, column_name=None)#
The living atlases store a huge amount of information, above and beyond the occurrence records that are their main output. In galah, one way that users can investigate this information is by searching for a specific option or category for the type of information they are interested in.
search_all()
is a helper function that can do searches within multiple types of information.- Parameters:
assertions (string) – Search for results of data quality checks run by each atlas
atlases (string) – Search for what atlases are available
apis (string) – Search for what APIs & functions are available for each atlas
collection (string) – Search for the specific collections within those institutions
datasets (string) – Search for the data groupings within those collections
fields (string) – Search for fields that are stored in an atlas
licences (string) – Search for copyright licences applied to media
lists (string) – Search for what species lists are available
profiles (string) – Search for what data profiles are available
providers (string) – Search for which institutions have provided data
ranks (string) – Search for valid taxonomic ranks (e.g. Kingdom, Class, Order, etc.)
reasons (string) – Search for what values are acceptable as ‘download reasons’ for a specified atlas
column_name (string) – Determines what column in the table this function will search for the string specified as the argument
- Return type:
An object of class
pandas.DataFrame
containing all data of interest.
Examples
import galah galah.search_all(apis="Australia")
atlas system ... called_by functional 0 Australia collections ... show_all-collections True 21 Australia spatial ... show_all-fields True 20 Australia records ... atlas_species True 19 Australia records ... atlas_occurrences True 18 Australia records ... atlas_occurrences True 17 Australia records ... show_all-fields True 16 Australia records ... atlas_counts, show_values-fields True 15 Australia records ... atlas_counts True 14 Australia records ... show_all-assertions True 13 Australia name-matching ... search_taxa True 12 Australia name-matching ... search_taxa True 11 Australia name-matching ... search_identifiers True 10 Australia logger ... show_all-reasons True 9 Australia lists ... show_values-lists True 8 Australia lists ... show_all-lists True 7 Australia images ... media_metadata True 6 Australia images ... show_all-licences True 5 Australia doi ... doi_download True 4 Australia data-quality ... show_values-profiles True 3 Australia data-quality ... show_all-profiles True 2 Australia collections ... show_all-providers True 1 Australia collections ... show_all-datasets True 22 Australia species ... atlas_taxonomy True 23 Australia species ... atlas_taxonomy True [24 rows x 6 columns]
- galah.search_taxa(taxa)#
Look up taxonomic names before downloading data from the ALA, using
atlas_occurrences()
,atlas_species()
oratlas_counts()
. Taxon information returned bysearch_taxa()
may be passed to thetaxa
argument ofatlas
functions.search_taxa()
allows users to disambiguate homonyms (i.e. where the same name refers to taxa in different clades) prior to downloading data.- Parameters:
taxa (string) – one or more scientific names to search.
- Return type:
An object of class
pandas.DataFrame
.
Examples
import galah galah.search_taxa(taxa="Vulpes vulpes")
scientificName scientificNameAuthorship ... vernacularName issues 0 Vulpes vulpes Linnaeus, 1758 ... Fox noIssue [1 rows x 12 columns]
- galah.search_values(field=None, value=None, column_name=None)#
Users may wish to see the specific values within a chosen field, profile or list to narrow queries or understand more about the information of interest.
search_values()
allows users for search for specific values within a specified field.- Parameters:
field (string) – A string to specify what type of parameters should be searched.
value (string) – A string specifying a search term. Not case sensitive.
verbose (logical) – This option is available for users who want to know what URLs this function is using to get the value. Default to False.
- Return type:
An object of class
pandas.DataFrame
.
Examples
import galah galah.search_values(field="basisOfRecord",value="OBS")
field category 2 basisOfRecord OBSERVATION 0 basisOfRecord HUMAN_OBSERVATION 4 basisOfRecord MACHINE_OBSERVATION
- galah.show_all(assertions=False, atlases=False, apis=False, collection=False, datasets=False, fields=False, licences=False, lists=False, profiles=False, providers=False, ranks=False, reasons=False)#
The living atlases store a huge amount of information, above and beyond the occurrence records that are their main output. In galah, one way that users can investigate this information is by showing all the available options or categories for the type of information they are interested in.
show_all()
is a helper function that can display multiple types of information, displaying all valid options for the information specified.- Parameters:
assertions (logical) – Show results of data quality checks run by each atlas
atlases (logical) – Show what atlases are available
apis (logical) – Show what APIs & functions are available for each atlas
collection (logical) – Show the specific collections within those institutions
datasets (logical) – Shows all the data groupings within those collections
fields (logical) – Show fields that are stored in an atlas
licences (logical) – Show what copyright licenses are applied to media
lists (logical) – Show what species lists are available
profiles (logical) – Show what data profiles are available
providers (logical) – Show which institutions have provided data
ranks (logical) – Show valid taxonomic ranks (e.g. Kingdom, Class, Order, etc.)
reasons (logical) – Show what values are acceptable as ‘download reasons’ for a specified atlas
- Return type:
An object of class
pandas.DataFrame
containing all data of interest.
Examples
import galah galah.show_all(datasets=True)
name ... uid 0 River Torrens Linear Park Species List ... dr14140 1 Warringah Species List ... dr4047 2 'A Genome' Oryza species ... dr8221 3 *Brigalow ... dr14507 4 0662409 ... dr19938 ... ... ... ... 13189 [C4OC] Torres Strait Regional Landcare Facilit... ... dr3529 13190 [C4OC] Traditional Ecological Knowledge Record... ... dr3199 13191 [C4OC] Trialling and demonstrating farmer inno... ... dr3519 13192 [C4OC] Urban and Coastal Protection ... dr6802 13193 _River_Torrens_Linear_Park_Species_List.csv ... dr19265 [13194 rows x 3 columns]
- galah.show_values(field=None, verbose=False)#
Users may wish to see the specific values within a chosen field, profile or list to narrow queries or understand more about the information of interest.
show_values()
provides users with these values.- Parameters:
field (string) – A string to specify what type of parameters should be shown.
verbose (logical) – This option is available for users who want to know what URLs this function is using to get the value. Default is False.
- Return type:
An object of class
pandas.DataFrame
.
Examples
import galah galah.show_values(field="basisOfRecord")
field category 0 basisOfRecord HUMAN_OBSERVATION 1 basisOfRecord PRESERVED_SPECIMEN 2 basisOfRecord OBSERVATION 3 basisOfRecord OCCURRENCE 4 basisOfRecord MACHINE_OBSERVATION 5 basisOfRecord MATERIAL_SAMPLE 6 basisOfRecord LIVING_SPECIMEN 7 basisOfRecord MATERIAL_CITATION 8 basisOfRecord FOSSIL_SPECIMEN