Patric is a sustainability scientist (PhD) working at the Institute for Crop and Soil Science being part of the Julius Kühn Institute - the German Federal Research Centre for Cultivated Plants. He is leading research on model-based crop yield estimation at the Institute’s Research Centre for Agricultural Remote Sensing leveraging the power of earth observation and geospatial analytics.
Estimating crop yields timely, is pivotal for official statistics on agricultural productivity to inform policy-making on sustainable food production. Existing approaches of collecting yield data annually from a large number of farms are resource-intense, though. Official crop yield statistics in Germany, for instance, relies heavily on extensive and time-consuming farm surveys and on-farm measurements.
The EU’s Copernicus earth observation (EO) program provides a plethora of satellite data, enabling the remotely sensed monitoring of agricultural land at high spatio-temporal resolution. EO imagery, open geospatial data on meteorological conditions and soil properties as well as advances in machine learning (ML) provide huge opportunities for model based crop yield estimation, covering large spatial scales with unprecedented granularity. Managing vast amounts of multi-source data required for yield modelling remains a challenge, though, particularly for public authorities. We present a model-based approach to estimate yields of multiple major crops cultivated in Germany, by employing ML ensembles, using a cloud-integrated spatial data infrastructure (SDI). Our SDI is built on interconnected components linking EO cloud computation and data storage, using the CODE-DE platform, with internal data cubes through web services.
Our model-based yield estimation approach integrates a number of dynamic and static predictors. Analysis ready data of multi-spectral Sentinel-2 imagery is used for space-borne retrieval of crop traits such as leaf area index and above ground biomass. Geospatial data on meteorological time-series are queried from our data cube, providing daily variables such as temperature, precipitation, and global radiation. External geospatial data on soil moisture and physicochemical soil properties are obtained from the Copernicus Global Land Service and SoilGrids 2.0 data portals, respectively. Crop-specific ML models are trained on multi-annual data (2018 - 2022) collected at agricultural parcel level for three crops, i.e. winter wheat, winter barley, and winter rape. The ensemble of ML regressors employed, includes Gradient boosted trees (CatBoost, LightGBM, XGBoost), Partial Least Squares, RandomForest, and Support Vector Machines. Parcel geometries obtained from the Integrated Administration and Control System (IACS) enable the spatially scaled application of trained yield models, covering larger administrative regions represented by two federal states.
RSQ values of best performing models, inferred from cross validations at parcel level, range between 0.67 - 0.74. Related normalized RMSE (nRMSE) values range between 12 - 19%. Aggregated yield estimates at district level compared against mean yields at district level obtained from official yield statistics for 2020 and 2021 show RSQ values for best performing models, ranging between 0.57 - 0.85. Related nRMSE values range between 5 - 10%.
Preliminary results are promising, suggesting several advantages compared to traditional yield estimation approaches, regarding area coverage, cost effectiveness, and timeliness. Our cloud-integrated SDI used as backbone enables full scalability for crop yield estimation at national scale. However, high quality training data inferred from a representative sampling across the country and open data access for IACS parcel geometries are required to lift current scalability barriers.