2025-04-08, 10:00–10:30, HugoTECH
Machine learning is promoted as a game-changer for soil health assessment, offering new ways to model complex relationships and generate high-resolution soil property maps. However, while ML has shown promise, its application in soil science is often met with overstated expectations and underappreciated limitations. This keynote critically examines the role of ML in space-time soil mapping for soil health, highlighting both its strengths and pitfalls.
ML has certainly advanced soil mapping in an unprecedented way to achieve continental and even global maps at high resolution for numerous soil properties. Entangled soil processes and the variability of locations, all nearly having an individual set of soil-forming factors, result in complex space-time soil patterns. In the commonly used mapping approach, ML has to learn all this complexity fully data-driven from the surveyed soil samples and the environmental predictors such as remote sensing data or elevation models. For cases where no environmental predictor dataset can differentiate the observed soil property patterns, ML predictions will play save and predict the average observed value for similar locations. From a soil process knowledge perspective, the mean might often not be the best prediction. For example, a forest topsoil may be buffered by carbonates and have a pH around 8 or its pH might already have dropped to reach the aluminum buffer range of around 4. A mean pH of 5-6 likely to be predicted by ML is not often observed within unfertilized forests and, hence, is rather unlikely.
Similar limitations also appear while quantifying prediction uncertainty at each location. ML-based prediction intervals often contain value ranges that, from a soil process viewpoint, we already know are very unlikely. While certainly more field surveying is due to support unbiased mapping, it does not resolve the challenge. The marginal benefit of more data points for fully-data driven ML often decreases rapidly, more so in the presence of measurement errors. Soil sampling will always only provide a tiny fraction of the total 3D soil continuum we are interested in. Data-hungry ML techniques such as deep learning are therefore unlikely to excel in space-time soilmapping of soil health indicators.