API Docs#

galah.atlas_counts(taxa=None, filters=None, group_by=None, expand=True, verbose=False, use_data_profile=False)#

Prior to downloading data it is often valuable to have some estimate of how many records are available, both for deciding if the query is feasible, and for estimating how long it will take to download. Alternatively, for some kinds of reporting, the count of observations may be all that is required, for example for understanding how observations are growing or shrinking in particular locations, or for particular taxa.

To this end, galah.atlas_counts() takes arguments in the same format as galah.atlas_occurrences(), and provides either a total count of records matching the criteria, or a data.frame of counts matching the criteria supplied to the group_by argument.

Parameters:
  • taxa (string) – one or more scientific names. Use galah.search_taxa() to search for valid scientific names.

  • filters (pandas.DataFrame) – filters, in the form field logical value (e.g. "year=2021")

  • group_by (string) – zero or more individual column names (i.e. fields) to include. See galah.show_all() and galah.search_all() to see valid fields.

  • expand (logical) – When using the group_by argument of galah.atlas_counts(), controls whether counts for each row value are combined or calculated separately. Defaults to True.

  • verbose (logical) – If True, galah gives more information like progress bars. Defaults to False.

  • use_data_profile (string) – A profile name. Should be a string - the name or abbreviation of a data quality profile to apply to the query. Valid values can be seen using galah.show_all(profiles=True)

Return type:

An object of class pandas.DataFrame.

Examples

Return total records in your chosen atlas

galah.atlas_counts()
   totalRecords
0     114489442

Return records from 2020 onwards, grouped by year

galah.atlas_counts(filters="year>2019",group_by="year")
   year    count
0  2020  6655859
1  2021  7349495
2  2022  1716751
3  2023   493458
galah.atlas_media(taxa=None, filters=None, fields=None, verbose=False, multimedia=None, assertions=None, use_data_profile=False, collect=False, path=None)#

In addition to text data describing individual occurrences and their attributes, ALA stores images, sounds and videos associated with a given record. galah.atlas_media() displays metadata for any and all of the media types.

Parameters:
  • taxa (string / list) – one or more scientific names. Use galah.search_taxa() to search for valid scientific names.

  • filters (string / list) – filters, in the form field logical value (e.g. "year=2021")

  • fields (string / list) –

    Name of one or more column groups to include. Valid options are “basic”, “event” and “assertions” Default is set to "fields=basic", which returns:

    • decimalLatitude, decimalLongitude, eventDate, scientificName, taxonConceptID, recordID, dataResourceName, occurrenceStatus

    Using "fields="event" returns:

    • eventRemarks, eventTime, eventID, eventDate, samplingEffort, samplingProtocol

    Using fields="media" returns:

    • multimedia, multimediaLicence, images, videos, sounds

    See galah.show_all() and galah.search_all() to see all valid fields.

  • verbose (logical) – If True, galah gives more information like progress bars. Defaults to False

  • multimedia (string / list) – This is for specifying what types of multimedia you would like, i.e “images”. Defaults to [‘images’,’videos’,’sounds’]

  • assertions (string) – Using “assertions” returns all quality assertion-related columns. These columns are data quality checks run by each living atlas. The list of assertions is shown by galah.show_all(assertions=True).

  • use_data_profile (logical) – if True, uses data profile set in galah_config(). Valid values can be seen using galah.show_all(profiles=True). Default is False

  • collect (logical) – if True, downloads full-sized images and media files returned to a local directory.

  • path (string) – path to directory where downloaded media will be stored. Defaults to current directory.

Return type:

An object of class pandas.DataFrame. If collect=True, available image & media files are downloaded to a user local directory.

Examples

filters = ["year=2020","decimalLongitude>153.0"]
galah.atlas_media(taxa="Ornithorhynchus anatinus",filters=filters)
    decimalLatitude  ...                          occurrenceID
0        -30.395442  ...  42912016-2409-4125-a61f-13ffd4c32fcb
1        -30.348831  ...  6882dfe3-5295-493d-a005-66f8392e9d5b
2        -30.279557  ...  72ad295f-112d-442f-83e1-fd385ef9163a
3        -30.247747  ...  dd885218-bdb9-404a-8441-e5cd329f6605
4        -30.218699  ...  072bc4f0-7581-44c6-b0ec-e3fe58d251fc
5        -28.794918  ...  2a701513-b21d-4baa-bac4-5e6a3a65252b
6        -28.794918  ...  2a701513-b21d-4baa-bac4-5e6a3a65252b
7        -28.656448  ...  12d8c7f9-9d66-4312-9245-87a0a1021557
8        -28.213785  ...  808b0c19-9616-45f1-994e-e3b33fc32836
9        -28.213785  ...  808b0c19-9616-45f1-994e-e3b33fc32836
10       -28.765905  ...                                   NaN

[11 rows x 19 columns]
galah.atlas_occurrences(taxa=None, filters=None, test=False, verbose=False, fields=None, assertions=None, use_data_profile=False)#

The most common form of data stored by living atlases are observations of individual life forms, known as ‘occurrences’. This function allows the user to search for occurrence records that match their specific criteria, and return them as a pandas.DataFrame for analysis. Optionally, the user can also request a DOI for a given download to facilitate citation and re-use of specific data resources.

Parameters:
  • taxa (string) – one or more scientific names. Use galah.search_taxa() to search for valid scientific names.

  • filters (string / list) – filters, in the form field logical value (e.g. "year=2021")

  • test (logical) – Test if the API is up and running correctly. Prints status of Atlas and returns.

  • verbose (logical) – If True, galah gives more information like URLs of your queries. Defaults to False

  • fields (string / list) –

    Name of one or more column groups to include. Valid options are “basic”, “event” and “assertions” Default is set to "fields=basic", which returns:

    • decimalLatitude, decimalLongitude, eventDate, scientificName, taxonConceptID, recordID, dataResourceName, occurrenceStatus

    Using "fields="event" returns:

    • eventRemarks, eventTime, eventID, eventDate, samplingEffort, samplingProtocol

    Using fields="media" returns:

    • multimedia, multimediaLicence, images, videos, sounds

    See galah.show_all() and galah.search_all() to see all valid fields.

  • assertions (string / list) – Using “assertions” returns all quality assertion-related columns. These columns are data quality checks run by each living atlas. The list of assertions is shown by galah.show_all(assertions=True).

  • use_data_profile (string) – A profile name. Should be a string - the name or abbreviation of a data quality profile to apply to the query. Valid values can be seen using galah.show_all(profiles=True)

Return type:

An object of class pandas.DataFrame.

Examples

Download records of Vulpes vulpes in 2023

import galah
galah.galah_config(atlas="Australia",email="your-email@example.com")
galah.atlas_occurrences(taxa="Vulpes vulpes",filters="year=2023")
      decimalLatitude  ...  occurrenceStatus
0          -39.083564  ...           PRESENT
1          -39.021544  ...           PRESENT
2          -38.668361  ...           PRESENT
3          -38.639731  ...           PRESENT
4          -38.631438  ...           PRESENT
...               ...  ...               ...
1370       -24.071733  ...           PRESENT
1371       -23.915604  ...           PRESENT
1372       -23.295037  ...           PRESENT
1373        37.808811  ...           PRESENT
1374        51.100000  ...           PRESENT

[1375 rows x 8 columns]

Download records of Vulpes vulpes in 2023, returning only eventDate field

import galah
galah.galah_config(atlas="Australia",email="your-email@example.com")
galah.atlas_occurrences(taxa="Vulpes vulpes",filters="year=2023",fields="eventDate")
                 eventDate
0     2022-12-31T14:56:00Z
1     2022-12-31T21:21:27Z
2     2023-01-01T00:00:00Z
3     2023-01-01T00:00:00Z
4     2023-01-01T00:00:00Z
...                    ...
1370  2023-04-18T23:35:35Z
1371  2023-04-19T00:13:40Z
1372  2023-04-19T01:08:00Z
1373  2023-04-21T13:52:00Z
1374  2023-04-21T22:54:38Z

[1375 rows x 1 columns]
galah.atlas_species(taxa=None, filters=None, verbose=False)#

While there are reasons why users may need to check every record meeting their search criteria (i.e. using galah.atlas_occurrences()), a common use case is to simply identify which species occur in a specified region, time period, or taxonomic group. This function returns a pandas.DataFrame with one row per species, and columns giving associated taxonomic information.

Parameters:
  • taxa (string / list) – one or more scientific names. Use galah.search_taxa() to search for valid scientific names.

  • rank (string) – filters, in the form field logical value (e.g. "year=2021")

  • verbose – If True, galah gives you the URLs used to query all the data. Default to False.

Return type:

An object of class pandas.DataFrame.

Examples

galah.atlas_species(taxa="Heleioporus")
                     species  ...       vernacular_name
0          Heleioporus eyrei  ...          Moaning Frog
1   Heleioporus australiacus  ...  Giant Burrowing Frog
2  Heleioporus albopunctatus  ...  Western Spotted Frog
3   Heleioporus psammophilus  ...             Sand Frog
4      Heleioporus inornatus  ...           Plains Frog
5     Heleioporus barycragus  ...          Hooting Frog

[6 rows x 9 columns]
galah.galah_config(email=None, email_notify=None, atlas=None, data_profile=None, ranks=None, reason=None)#

The galah package supports large data downloads, and also interfaces with the ALA which requires that users of some services provide a registered email address and reason for downloading data. The galah_config() function provides a way to manage these issues as simply as possible.

Parameters:
  • email (string) – An email address that has been registered with the chosen atlas. For the ALA, you can register here.

  • email_notify (string) – Used to receive an email for each query to galah.atlas_occurrences(). Defaults to None, but can be useful in some instances, for example for tracking DOIs assigned to specific downloads for later citation.

  • atlas (string) – Living Atlas to point to, Australia by default. Can be an organisation name, acronym, or region (see show_all(atlases=True) for admissible values)

  • data_profile (string) – A profile name. Should be a string - the name or abbreviation of a data quality profile to apply to the query. Valid values can be seen using galah.show_all(profiles=True)

  • ranks (string) – A string letting galah know what taxonomic ranks to show. Use “all” to see all 69 possible ranks, and “gbif” to see the 9 most common ranks.

  • reason (integer) – A number (integer) providing the reason you are downloading data. Default is set to 4 (scientific research). For a list of all possible reasons run galah.show_all_reasons()

Returns:

  • - No arguments (A pandas.DataFrame of all current configuration options.)

  • - >=1 arguments (None)

Examples

import galah
galah.galah_config(email="yourname@example.com")
galah.search_all(assertions=None, atlases=None, apis=None, collection=None, datasets=None, fields=None, licences=None, lists=None, profiles=None, providers=None, ranks=None, reasons=None, column_name=None)#

The living atlases store a huge amount of information, above and beyond the occurrence records that are their main output. In galah, one way that users can investigate this information is by searching for a specific option or category for the type of information they are interested in. search_all() is a helper function that can do searches within multiple types of information.

Parameters:
  • assertions (string) – Search for results of data quality checks run by each atlas

  • atlases (string) – Search for what atlases are available

  • apis (string) – Search for what APIs & functions are available for each atlas

  • collection (string) – Search for the specific collections within those institutions

  • datasets (string) – Search for the data groupings within those collections

  • fields (string) – Search for fields that are stored in an atlas

  • licences (string) – Search for copyright licences applied to media

  • lists (string) – Search for what species lists are available

  • profiles (string) – Search for what data profiles are available

  • providers (string) – Search for which institutions have provided data

  • ranks (string) – Search for valid taxonomic ranks (e.g. Kingdom, Class, Order, etc.)

  • reasons (string) – Search for what values are acceptable as ‘download reasons’ for a specified atlas

  • column_name (string) – Determines what column in the table this function will search for the string specified as the argument

Return type:

An object of class pandas.DataFrame containing all data of interest.

Examples

import galah
galah.search_all(apis="Australia")
        atlas         system  ...                         called_by functional
0   Australia    collections  ...              show_all-collections       True
21  Australia        spatial  ...                   show_all-fields       True
20  Australia        records  ...                     atlas_species       True
19  Australia        records  ...                 atlas_occurrences       True
18  Australia        records  ...                 atlas_occurrences       True
17  Australia        records  ...                   show_all-fields       True
16  Australia        records  ...  atlas_counts, show_values-fields       True
15  Australia        records  ...                      atlas_counts       True
14  Australia        records  ...               show_all-assertions       True
13  Australia  name-matching  ...                       search_taxa       True
12  Australia  name-matching  ...                       search_taxa       True
11  Australia  name-matching  ...                search_identifiers       True
10  Australia         logger  ...                  show_all-reasons       True
9   Australia          lists  ...                 show_values-lists       True
8   Australia          lists  ...                    show_all-lists       True
7   Australia         images  ...                    media_metadata       True
6   Australia         images  ...                 show_all-licences       True
5   Australia            doi  ...                      doi_download       True
4   Australia   data-quality  ...              show_values-profiles       True
3   Australia   data-quality  ...                 show_all-profiles       True
2   Australia    collections  ...                show_all-providers       True
1   Australia    collections  ...                 show_all-datasets       True
22  Australia        species  ...                    atlas_taxonomy       True
23  Australia        species  ...                    atlas_taxonomy       True

[24 rows x 6 columns]
galah.search_taxa(taxa)#

Look up taxonomic names before downloading data from the ALA, using atlas_occurrences(), atlas_species() or atlas_counts(). Taxon information returned by search_taxa() may be passed to the taxa argument of atlas functions.

search_taxa() allows users to disambiguate homonyms (i.e. where the same name refers to taxa in different clades) prior to downloading data.

Parameters:

taxa (string) – one or more scientific names to search.

Return type:

An object of class pandas.DataFrame.

Examples

import galah
galah.search_taxa(taxa="Vulpes vulpes")
  scientificName scientificNameAuthorship  ... vernacularName   issues
0  Vulpes vulpes           Linnaeus, 1758  ...            Fox  noIssue

[1 rows x 12 columns]
galah.search_values(field=None, value=None, column_name=None)#

Users may wish to see the specific values within a chosen field, profile or list to narrow queries or understand more about the information of interest. search_values() allows users for search for specific values within a specified field.

Parameters:
  • field (string) – A string to specify what type of parameters should be searched.

  • value (string) – A string specifying a search term. Not case sensitive.

  • verbose (logical) – This option is available for users who want to know what URLs this function is using to get the value. Default to False.

Return type:

An object of class pandas.DataFrame.

Examples

import galah
galah.search_values(field="basisOfRecord",value="OBS")
           field             category
2  basisOfRecord          OBSERVATION
0  basisOfRecord    HUMAN_OBSERVATION
4  basisOfRecord  MACHINE_OBSERVATION
galah.show_all(assertions=False, atlases=False, apis=False, collection=False, datasets=False, fields=False, licences=False, lists=False, profiles=False, providers=False, ranks=False, reasons=False)#

The living atlases store a huge amount of information, above and beyond the occurrence records that are their main output. In galah, one way that users can investigate this information is by showing all the available options or categories for the type of information they are interested in. show_all() is a helper function that can display multiple types of information, displaying all valid options for the information specified.

Parameters:
  • assertions (logical) – Show results of data quality checks run by each atlas

  • atlases (logical) – Show what atlases are available

  • apis (logical) – Show what APIs & functions are available for each atlas

  • collection (logical) – Show the specific collections within those institutions

  • datasets (logical) – Shows all the data groupings within those collections

  • fields (logical) – Show fields that are stored in an atlas

  • licences (logical) – Show what copyright licenses are applied to media

  • lists (logical) – Show what species lists are available

  • profiles (logical) – Show what data profiles are available

  • providers (logical) – Show which institutions have provided data

  • ranks (logical) – Show valid taxonomic ranks (e.g. Kingdom, Class, Order, etc.)

  • reasons (logical) – Show what values are acceptable as ‘download reasons’ for a specified atlas

Return type:

An object of class pandas.DataFrame containing all data of interest.

Examples

import galah
galah.show_all(datasets=True)
                                                    name  ...      uid
0                 River Torrens Linear Park Species List  ...  dr14140
1                                 Warringah Species List  ...   dr4047
2                               'A Genome' Oryza species  ...   dr8221
3                                              *Brigalow  ...  dr14507
4                                                0662409  ...  dr19938
...                                                  ...  ...      ...
13189  [C4OC] Torres Strait Regional Landcare Facilit...  ...   dr3529
13190  [C4OC] Traditional Ecological Knowledge Record...  ...   dr3199
13191  [C4OC] Trialling and demonstrating farmer inno...  ...   dr3519
13192                [C4OC] Urban and Coastal Protection  ...   dr6802
13193        _River_Torrens_Linear_Park_Species_List.csv  ...  dr19265

[13194 rows x 3 columns]
galah.show_values(field=None, verbose=False)#

Users may wish to see the specific values within a chosen field, profile or list to narrow queries or understand more about the information of interest. show_values() provides users with these values.

Parameters:
  • field (string) – A string to specify what type of parameters should be shown.

  • verbose (logical) – This option is available for users who want to know what URLs this function is using to get the value. Default is False.

Return type:

An object of class pandas.DataFrame.

Examples

import galah
galah.show_values(field="basisOfRecord")
           field             category
0  basisOfRecord    HUMAN_OBSERVATION
1  basisOfRecord   PRESERVED_SPECIMEN
2  basisOfRecord          OBSERVATION
3  basisOfRecord           OCCURRENCE
4  basisOfRecord  MACHINE_OBSERVATION
5  basisOfRecord      MATERIAL_SAMPLE
6  basisOfRecord      LIVING_SPECIMEN
7  basisOfRecord    MATERIAL_CITATION
8  basisOfRecord      FOSSIL_SPECIMEN