Fabian Gans
Fabian Gans is a project group leader at the Max-Planck-Institute for Biogeochemistry in Jena, Germany. For the last 10 years he has been working as a Research software engineer and dealing with data cubes from a variety of sources and ever growing in size. He is excited about the Julia Programming language, and is creator/maintainer/comtributor in several Julia packages for geospatial data analysis.
Sessions
Spatiotemporal data cubes are becoming ever more abundant and are a widely used tool in the Earth System Science community to handle geospatial raster data.
Sophisticated frameworks in high-level programming languages like R and python allow scientists to draft and run their data analysis pipelines and to scale them in HPC or cloud environments.
While many data cube frameworks can handle harmonized analysis-ready data cubes very well, we repeatedly experienced problems when running complex analyses on multi-source data that was not homogenized. The problems arise when different datasets need to be resampled on the fly to a common resolution and have non-aligning chunk boundaries, which leads to very complex and often unresolvable task graphs in frameworks like xarray+dask.
In this workshop we present the emerging ecosystem of large-scale geodata processing in the Julia programming language under the JuliaDataCubes github umbrella.
Julia is an interactive scientific programming language, designed for HPC applications with primitives for Multi-threaded and Distributed computations built into the language.
We will demonstrate an example analysis where data from different sources (global fields of daily MODIS, hourly ERA5, high-resolution land cover), summing to multiple TBs of data, can interoperate on-the-fly and scale well when run on different computing environments.
Central Europe experienced a series of droughts and heat waves between 2018 and 2020 which severely effected the forest ecosystems.The canopy cover loss has been mapped for Germany by [1] via the use of high spatial optical images from the Sentinel-2 and Landsat-8 satellites.In this contribution we want to present the results of assessing deforestation with a complementary approach using Sentinel-1 C-Band SAR data. We use the Recurrence Quantification Analysis (RQA) to derive a change metric which takes the order of the time series into account [2]. This approach provides high resolution yearly forest loss maps based on a continuous data stream.
In addition to the scientific results we showcase the processing pipeline on the European Open Science Cloud. The amount of high resolution earth observation data processed in this study was too large to do all analysis on local computers or even local cluster systems. To achieve high performance computations for out-of-memory datasets we develop the YAXArrays.jl package in the Julia programming language. YAXArrays.jl provides both an abstraction over chunked n-dimensional arrays with labelled axes and efficient multi-threaded and multi-process computation on these arrays.
Citation:
[1]: Thonfeld, F.; Gessner, U.; Holzwarth, S.; Kriese, J.; da Ponte, E.; Huth, J.; Kuenzer, C.A First Assessment of Canopy Cover Loss in Germany’s Forests after the 2018–2020 Drought Years.
Remote Sens. 2022, 14, 562. https://doi.org/10.3390/rs14030562
[2]:F. Cremer, M. Urbazaev, J. Cortés, J. Truckenbrodt, C. Schmullius and C. Thiel,
"Potential of Recurrence Metrics from Sentinel-1 Time Series for Deforestation Mapping,"
in IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, vol. 13, pp. 5233-5240, 2020, https://doi.org/10.1109/JSTARS.2020.3019333