Skip to contents

galah has been designed to support a piped workflow that mimics workflows made popular by tidyverse packages such as dplyr. Although piping in galah is optional, it can make things much easier to understand, and so we use it in (nearly) all our examples.

To see what we mean, let’s look at an example of how dplyr::filter() works. Notice how dplyr::filter and galah_filter both require logical arguments to be added by using the == sign:

library(dplyr)

mtcars |> 
  filter(mpg == 21)
##               mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4      21   6  160 110  3.9 2.620 16.46  0  1    4    4
## Mazda RX4 Wag  21   6  160 110  3.9 2.875 17.02  0  1    4    4
galah_call() |>
  galah_filter(year == 2021) |>
  atlas_counts()
## # A tibble: 1 × 1
##     count
##     <int>
## 1 8254306

As another example, notice how galah_group_by() + atlas_counts() works very similarly to dplyr::group_by() + dplyr::count():

mtcars |>
  group_by(vs) |> 
  count()
## # A tibble: 2 × 2
## # Groups:   vs [2]
##      vs     n
##   <dbl> <int>
## 1     0    18
## 2     1    14
## # A tibble: 2 × 2
##   biome           count
##   <chr>           <int>
## 1 TERRESTRIAL 120889729
## 2 MARINE        6043879

We made this move towards tidy evaluation to make it possible to use piping for building queries to the Atlas of Living Australia. In practice, this means that data queries can be filtered just like how you might filter a data.frame with the tidyverse suite of functions.

Piping with galah_call()

You may have noticed in the above examples that dplyr pipes begin with some data, while galah pipes all begin with galah_call() (be sure to add the parentheses!). This function tells galah that you will be using pipes to construct your query. Follow this with your preferred pipe (|> from base or %>% from magrittr). You can then narrow your query line-by-line using galah_ functions. Finally, end with an atlas_ function to identify what type of data you want from your query.

Here is an example using counts of bandicoot records:

galah_call() |>
  galah_identify("perameles") |>
  galah_filter(year >= 2020) |>
  galah_group_by(species, year) |>
  atlas_counts()
## # A tibble: 15 × 3
##    species                year  count
##    <chr>                  <chr> <int>
##  1 Perameles nasuta       2021   3475
##  2 Perameles nasuta       2022   1701
##  3 Perameles nasuta       2020   1576
##  4 Perameles nasuta       2023    714
##  5 Perameles gunnii       2023    122
##  6 Perameles gunnii       2021     71
##  7 Perameles gunnii       2022     64
##  8 Perameles gunnii       2020     49
##  9 Perameles bougainville 2021     84
## 10 Perameles bougainville 2022     72
## 11 Perameles bougainville 2020      1
## 12 Perameles pallescens   2022     30
## 13 Perameles pallescens   2023     26
## 14 Perameles pallescens   2021     25
## 15 Perameles pallescens   2020     11

And a second example, to download occurrence records of bandicoots in 2021, and also to include information on which records had zero coordinates:

galah_call() |>
  galah_identify("perameles") |>
  galah_filter(year == 2021) |>
  galah_select(group = "basic", ZERO_COORDINATE) |>
  atlas_occurrences() |>
  head()
## Retrying in 1 seconds.
## # A tibble: 6 × 9
##   recordID          scientificName taxonConceptID decimalLatitude decimalLongitude eventDate           occurrenceStatus dataResourceName
##   <chr>             <chr>          <chr>                    <dbl>            <dbl> <dttm>              <chr>            <chr>           
## 1 00108221-afc6-42… Perameles nas… https://biodi…           -28.8             153. 2021-09-29 00:00:00 PRESENT          NSW BioNet Atlas
## 2 001e914d-0281-41… Perameles nas… https://biodi…           -33.8             151. 2021-04-19 00:00:00 PRESENT          NSW BioNet Atlas
## 3 00233c1e-66df-4d… Perameles nas… https://biodi…           -33.8             151. 2021-02-27 00:00:00 PRESENT          NSW BioNet Atlas
## 4 003064b3-490a-49… Perameles nas… https://biodi…           -27.5             152. 2021-11-05 12:06:00 PRESENT          iNaturalist Aus…
## 5 004fd28b-a899-4a… Perameles nas… https://biodi…           -33.8             151. 2021-07-24 00:00:00 PRESENT          NSW BioNet Atlas
## 6 0068547b-b091-4a… Perameles nas… https://biodi…           -33.8             151. 2021-01-28 00:00:00 PRESENT          NSW BioNet Atlas
## # ℹ 1 more variable: ZERO_COORDINATE <lgl>

Note that the order in which galah_ functions are added doesn’t matter, as long as galah_call() goes first, and an atlas_ function comes last.

Using dplyr functions in galah

As of version 1.5.1, it is possible to call dplyr functions natively within galah to amend how queries are processed, i.e.:

# galah syntax
galah_call() |>
  galah_filter(year >= 2020) |>
  galah_group_by(year) |>
  atlas_counts()
## # A tibble: 4 × 2
##   year    count
##   <chr>   <int>
## 1 2022  8409790
## 2 2021  8254306
## 3 2020  7124140
## 4 2023  3446001
# dplyr syntax
galah_call() |>
  filter(year >= 2020) |>
  group_by(year) |>
  count()
## Object of type `data_request` containing:
## • type occurrences-count
## • filter year >= 2020
## • group_by year

The full list of masked functions is: