2023-08-29, 11:00–12:30, Room 21 (Sala 21)
A common challenge with raster datasets is not only that they come in large files (single Sentinel-2 tiles are around 1 GB), but that many of these files, potentially thousands or millions, are needed to address the area and time period of interest. In 2022, Copernicus, the program that runs all Sentinel satellites, published 160 TB of images per day. This means that a classic pattern in using R consisting of downloading data to local disc, loading the data in memory, and analysing it is not going to work. This lectures describes how large spatial and spatiotemporal datasets can be handled with R, with a focus on packages sf and stars. For practical use, we classify large datasets as too large:
- to fit in working memory,
- to fit on the local hard drive, or
- to download to locally managed infrastructure (such as network attached storage)
These three categories may (today) correspond very roughly to Gigabyte-, Terabyte- and Petabyte-sized datasets. Besides size considerations, access and processing speed also play a role, in particular for larger datasets or interactive applications. Cloud native geospatial formats are formats optimised with processing on cloud infrastructure in mind, where costs of computing and storage need to be considered and optimised.
Affiliation: University of Münster
Research interests: Spatial Statistics, Geoinformatics, Spatial Data Science, Reproducible Research, R
About: I lead the spatio-temporal modelling laboratory at the institute for geoinformatics, and am currently head of institute.