Open Earth Monitor — Global Workshop 2024

Spatiotemporal Machine Learning: fitting models and generating predictions using time-series data
2024-10-03, 16:00–16:45, Wodak Room (IIASA)

Machine Learning is commonly used to map environmental variables in 2D, but what about generating predictions of dynamic variables such as above ground biomass, forest species, soil carbon and similar? The difference between spatiotemporal vs purely 2D / 3D mapping is in the three main aspects: (1) points and covariate layers are matched in spacetime (usually month-year period or at least year), (2) covariate layers are based on time-series data and include also accumulative indices (e.g. cumulative rainfall, cumulative snow cover, cumulative cropping fraction and similar) and derivatives, (3) during model training and validation, points are subset in both spacetime to avoid overfitting and bias in predictions. The rationale for using spatiotemporal machine learning is fitness of data for reliable time-series analysis: the predictions for anywhere in the spacetime cube need to be unbiased, with objectively quantified prediction errors (uncertainty), so that hence changes can be derived without a risk for serious over-/under-estimation. We have tested this framework on local and regional data sets (e.g. LUCAS soil samples covering 2009, 2012, 2015, 2018 for Europe) and can be now potentially applied using global compilations of soil points (https://opengeohub.github.io/SoilSamples/). Spatiotemporal machine learning could also potentially be used for predicting future states of soil, e.g. by extrapolating models to future climate scenarios and future land use systems (Bonannella et al., 2023).


The workshop will provide tutorials in R (https://opengeohub.github.io/spatial-prediction-eml/) and Python (https://github.com/openlandmap/scikit-map) and will focus primarily in the mlr3 and scikit-learn frameworks for ML and ensemble methods based on model stacking (3-4 base learners and 1 meta-learner).

Cited references:
- Bonannella, C., et al. (2023). Biomes of the world under climate change scenarios: increasing aridity and higher temperatures lead to significant shifts in natural vegetation. PeerJ, 11, e15593. https://doi.org/10.7717/peerj.15593
- Hackländer, J., et al. (2023). Land potential assessment and trend-analysis using 2000–2021 FAPAR monthly time-series at 250 m spatial resolution. PeerJ, in review. https://doi.org/10.21203/rs.3.rs-3415685/v1
- Witjes, M., et al. (2023). Ecodatacube. eu: Analysis-ready open environmental data cube for Europe. PeerJ, 11, e15478. https://doi.org/10.7717/peerj.15478


What is your current associations to EU Horizon projects (if any)?

Open-Earth-Monitor Cyberinfrastructure (Grant agreement ID: 101059548)

Please provide URL that you plan to use to distribute your materials (if available).

https://opengeohub.github.io/spatial-prediction-eml/

Tom is the Director at the OpenGeoHub foundation. He has more than 20 years of experience as an environmental modeler, data scientist and spatial analyst with background in soil mapping and geo-information science. He continuously runs hands-on-R training courses to promote use of Open Source software for spatial analysis / spatial modeling purposes. He is currently the project leader of the OpenLandMap — a system for automated global soil and vegetation mapping at fine spatial resolutions (100 m, 250 m to 1 km) and which aspires to be recognized as an “OpenStreetMap-type” system for environmental data. Tom’s core core philosophy is outlined in this document (see also our Medium article on OpenLandMap.org). Tom is recipient of the Clarivate Highly Cited Researchers for 2021/2022/2023.