I am a former physicist, trying to create order in the world of taxonomy and alike. He does so by writing elegant software functions to help you speed up your research.
Climate change, environmental degradation and invasive species represent imminent threats to biodiversity. Decision makers urgently need accurate and reliable information about status, trends and impacts. To do so data needs to be presented in an actionable and understandable format, including measures of uncertainty.
The emerging challenge is to produce synthesised data products that can be used further by ecologists for purposes such as distribution modelling and risk mapping, and that can be combined with other environmental data, such as climate and land use data. Within the Group on Earth Observations Biodiversity Observation Network (GeoBON), it has been proposed to create aggregated biodiversity “data cubes” with taxonomic (what), spatial (where) and temporal (when) dimensions (Kissling et al. 2018). The Biodiversity Building Blocks for Policy project (B-Cubed) will generate biodiversity data cubes at the desired scale, automatically, as often as needed and with minimal manual intervention. These cubes will be made available and citable using the Global Biodiversity Information Facility (GBIF) infrastructure. Aside from the technological challenges, there is a conceptual challenge to solve: how to deal with the taxonomic, temporal and spatial uncertainty of biodiversity occurrence data?
Taxonomic uncertainty manifests itself in the form of synonymy. By trusting a taxonomy backbone such as the GBIF Taxonomy Backbone, this source of uncertainty is reduced. The temporal uncertainty is typically lower than the granularity used for aggregation (e.g. year) and can typically be neglected. On the contrary, the spatial uncertainty cannot be neglected.
Most commonly, occurrences are either collected in square grids of various dimensions or as points with an uncertainty radius (Bloom et al. 2018). Therefore, occurrences are not defined as points, but as two dimensional shapes, typically squares or circles. These rarely fit to the same geographic grid systems for which environmental and landscape data are available. A common solution is to upscale the data to a coarse grid. However, this inevitably reduces the spatial resolution of the data, which may result in a loss of accuracy when using data for building indicators and models.
To account for spatial uncertainty we developed within the Tracking Invasive Alien Species project (TrIAS) an algorithm (Oldoni et al. 2020) to randomly choose a point within the square or the circle and assign the occurrence to the spatial cell this point belongs to. This could however produce slightly different results with every round of drafting occurrence cubes. By creating an ensemble of cubes, we could correctly propagate the uncertainty from the raw occurrence data to the calculation of summary statistics, such as the number of occupied grid cells by a species (observed occupancy). Using Monte Carlo simulations with synthetic data we aim to determine the ensemble size, i.e. the minimum number of cubes needed to robustly infer the average observed occupancy and its uncertainty.