Open Earth Monitor — Global Workshop 2023

Distributed computing on large geodata from multiple sources using the Julia Programming language
2023-10-04, 16:30–17:15, EURAC Seminar room 2 & 3

Spatiotemporal data cubes are becoming ever more abundant and are a widely used tool in the Earth System Science community to handle geospatial raster data.
Sophisticated frameworks in high-level programming languages like R and python allow scientists to draft and run their data analysis pipelines and to scale them in HPC or cloud environments.

While many data cube frameworks can handle harmonized analysis-ready data cubes very well, we repeatedly experienced problems when running complex analyses on multi-source data that was not homogenized. The problems arise when different datasets need to be resampled on the fly to a common resolution and have non-aligning chunk boundaries, which leads to very complex and often unresolvable task graphs in frameworks like xarray+dask.

In this workshop we present the emerging ecosystem of large-scale geodata processing in the Julia programming language under the JuliaDataCubes github umbrella.
Julia is an interactive scientific programming language, designed for HPC applications with primitives for Multi-threaded and Distributed computations built into the language.
We will demonstrate an example analysis where data from different sources (global fields of daily MODIS, hourly ERA5, high-resolution land cover), summing to multiple TBs of data, can interoperate on-the-fly and scale well when run on different computing environments.

OEMC Grant agreement ID: 101059548

Felix Cremer received his diploma in mathematics from the University of Leipzig in 2014. In 2016 he started his PhD study on time series analysis of hypertemporal Sentinel-1 radar data.
He is interested in the use of irregular time series tools on Synthetic Aperture Radar data to derive more robust information from these data sets.
He worked on the development of deforestation mapping algorithms and on flood mapping in the amazon using Sentinel-1 data.
He currently works at the Max-Planck-Institute for Biogeochemistry on the development of the JuliaDataCubes ecosystem in the scope of the NFDI4Earth project. The JuliaDataCubes organisation provides easy to use interfaces for the use of multi dimensional raster data.

2023 - today Data Scientist and PostDoc, Max Planck Institute for Biogeochemistry
2018 - 2023 Doctoral researcher Bioinformatics, Leibniz Institute for Natural Product Research and Infection Biology
2015 - 2017 Master Bioinformatics, Friedrich Schiller University Jena
2012 - 2015 Bachelor Molecular Life Science, University of Lubeck

