Open-Earth-Monitor Global Workshop 2026

Transformer-Based Adaptive Multimodal Fusion Model for Remote Sensing Large-scale Winter Wheat Yield Prediction
2026-10-07, 18:25–18:30 (Europe/Amsterdam), Aula Magna

Large-scale and highly accurate wheat yield prediction is of great importance for
ensuring food security, supporting agricultural policymaking, and guiding grain
allocation. In recent years, the rapid development of remote sensing technologies and
deep learning algorithms has provided powerful tools for large-scale crop yield
prediction. However, crop yield is jointly influenced by multiple environmental factors,
such as climate, soil, and topography. Existing studies often adopt simple feature
concatenation or fixed-weight fusion strategies, lacking adaptive modeling of relative
modality importance, which limits further improvement in prediction accuracy. To
address this issue, this study proposes a Transformer-based multi-modal adaptive Gated
Fusion model (TMMGF). The model employs Transformers to model dynamic time
series of remote sensing spectral data and climate variables, applies multilayer
perceptrons (MLP) to handle static environmental factors including soil and topography.
Multiple modalities are then integrated through a gated fusion mechanism to achieve
adaptive weighted fusion. This study was conducted across the conterminous United
States, based on county-level winter wheat yield records from 2008 to 2023. The
TMMGF was systematically compared with an LSTM-based multimodal adaptive
Gated Fusion model (MMGF), Transformer single-modal remote sensing model,
Transformer single-modal climate model, MLP single-modal soil model, and MLP
single-modal topography model. The results show that TMMGF achieves the best
performance, with an average R² of 0.813, RMSE of 0.571 t/ha, and MAPE of 14.49%
in 10-fold cross-validation, significantly outperforming the baseline models. In
particular, compared with the LSTM-based multimodal model MMGF (R² = 0.796,
RMSE = 0.598 t/ha, MAPE = 15.11%), TMMGF shows clear advantages in both
accuracy and stability. This study demonstrates that a Transformer-based adaptive
multimodal fusion framework can effectively integrate heterogeneous data sources and
provides a promising technical pathway for high-accuracy large-scale wheat yield
prediction.


This research is about an adaptive multi-modal deep learning model to achieve a higher accuracy in wheat yield prediction at a large scale. Most research only uses one single type data to conduct the yield prediction or simply combine more than one types data. This study is about an adaptive fusion about multi-modal model to better integrate the different datasets to further improve the performence of the prediction model.


What is your current associations to EU Horizon projects (if any)?

I am a Phd student in the University of Barcelona. My research focus is about applying the deep learning algorithms and remote sensing data to conduct the crop yield prediction. I really hope that I could have a tremendous communication with you excellent scholars.