galah
has been designed to support a piped workflow that
mimics workflows made popular by tidyverse packages such as
dplyr
. Although piping in galah
is optional,
it can make things much easier to understand, and so we use it in
(nearly) all our examples.
To see what we mean, let’s look at an example of how
dplyr::filter()
works. Notice how
dplyr::filter
and galah_filter
both require
logical arguments to be added by using the ==
sign:
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
galah_call() |>
galah_filter(year == 2021) |>
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 8254306
As another example, notice how galah_group_by()
+
atlas_counts()
works very similarly to
dplyr::group_by()
+ dplyr::count()
:
## # A tibble: 2 × 2
## # Groups: vs [2]
## vs n
## <dbl> <int>
## 1 0 18
## 2 1 14
galah_call() |>
galah_group_by(biome) |>
atlas_counts()
## # A tibble: 2 × 2
## biome count
## <chr> <int>
## 1 TERRESTRIAL 120889729
## 2 MARINE 6043879
We made this move towards tidy evaluation to make it possible to use
piping for building queries to the Atlas of Living Australia. In
practice, this means that data queries can be filtered just like how you
might filter a data.frame
with the tidyverse
suite of functions.
Piping with galah_call()
You may have noticed in the above examples that dplyr
pipes begin with some data, while galah
pipes all begin
with galah_call()
(be sure to add the parentheses!). This
function tells galah
that you will be using pipes to
construct your query. Follow this with your preferred pipe
(|>
from base
or %>%
from
magrittr
). You can then narrow your query line-by-line
using galah_
functions. Finally, end with an
atlas_
function to identify what type of data you want from
your query.
Here is an example using counts of bandicoot records:
galah_call() |>
galah_identify("perameles") |>
galah_filter(year >= 2020) |>
galah_group_by(species, year) |>
atlas_counts()
## # A tibble: 15 × 3
## species year count
## <chr> <chr> <int>
## 1 Perameles nasuta 2021 3475
## 2 Perameles nasuta 2022 1701
## 3 Perameles nasuta 2020 1576
## 4 Perameles nasuta 2023 714
## 5 Perameles gunnii 2023 122
## 6 Perameles gunnii 2021 71
## 7 Perameles gunnii 2022 64
## 8 Perameles gunnii 2020 49
## 9 Perameles bougainville 2021 84
## 10 Perameles bougainville 2022 72
## 11 Perameles bougainville 2020 1
## 12 Perameles pallescens 2022 30
## 13 Perameles pallescens 2023 26
## 14 Perameles pallescens 2021 25
## 15 Perameles pallescens 2020 11
And a second example, to download occurrence records of bandicoots in 2021, and also to include information on which records had zero coordinates:
galah_call() |>
galah_identify("perameles") |>
galah_filter(year == 2021) |>
galah_select(group = "basic", ZERO_COORDINATE) |>
atlas_occurrences() |>
head()
## Retrying in 1 seconds.
## # A tibble: 6 × 9
## recordID scientificName taxonConceptID decimalLatitude decimalLongitude eventDate occurrenceStatus dataResourceName
## <chr> <chr> <chr> <dbl> <dbl> <dttm> <chr> <chr>
## 1 00108221-afc6-42… Perameles nas… https://biodi… -28.8 153. 2021-09-29 00:00:00 PRESENT NSW BioNet Atlas
## 2 001e914d-0281-41… Perameles nas… https://biodi… -33.8 151. 2021-04-19 00:00:00 PRESENT NSW BioNet Atlas
## 3 00233c1e-66df-4d… Perameles nas… https://biodi… -33.8 151. 2021-02-27 00:00:00 PRESENT NSW BioNet Atlas
## 4 003064b3-490a-49… Perameles nas… https://biodi… -27.5 152. 2021-11-05 12:06:00 PRESENT iNaturalist Aus…
## 5 004fd28b-a899-4a… Perameles nas… https://biodi… -33.8 151. 2021-07-24 00:00:00 PRESENT NSW BioNet Atlas
## 6 0068547b-b091-4a… Perameles nas… https://biodi… -33.8 151. 2021-01-28 00:00:00 PRESENT NSW BioNet Atlas
## # ℹ 1 more variable: ZERO_COORDINATE <lgl>
Note that the order in which galah_
functions are added
doesn’t matter, as long as galah_call()
goes first, and an
atlas_
function comes last.
Using dplyr
functions in galah
As of version 1.5.1, it is possible to call dplyr
functions natively within galah
to amend how queries are
processed, i.e.:
# galah syntax
galah_call() |>
galah_filter(year >= 2020) |>
galah_group_by(year) |>
atlas_counts()
## # A tibble: 4 × 2
## year count
## <chr> <int>
## 1 2022 8409790
## 2 2021 8254306
## 3 2020 7124140
## 4 2023 3446001
# dplyr syntax
galah_call() |>
filter(year >= 2020) |>
group_by(year) |>
count()
## Object of type `data_request` containing:
## • type occurrences-count
## • filter year >= 2020
## • group_by year
The full list of masked functions is:
-
identify()
({graphics}
) as a synonym forgalah_identify()
-
select()
(dplyr) as a synonym forgalah_select()
-
group_by()
(dplyr) as a synonym forgalah_group_by()
-
slice_head()
(dplyr) as a synonym for thelimit
argument inatlas_counts()
-
st_crop()
(sf) as a synonym forgalah_polygon()
-
count()
(dplyr) as a synonym foratlas_counts()