Haoran Meng Open-Earth-Monitor Global Workshop 2026

Haoran Meng
.ical

I am a Phd student in the University of Barcelona. My research focus is about applying the deep learning algorithms and remote sensing data to conduct the crop yield prediction. I really hope that I could have a tremendous communication with you excellent scholars.

Do you accept that a video-recording of your talk is published under CC-BY license via https://av.tib.eu? – yes

Sessions

10-07

18:25

5min

Transformer-Based Adaptive Multimodal Fusion Model for Remote Sensing Large-scale Winter Wheat Yield Prediction

Haoran Meng

Large-scale and highly accurate wheat yield prediction is of great importance for
ensuring food security, supporting agricultural policymaking, and guiding grain
allocation. In recent years, the rapid development of remote sensing technologies and
deep learning algorithms has provided powerful tools for large-scale crop yield
prediction. However, crop yield is jointly influenced by multiple environmental factors,
such as climate, soil, and topography. Existing studies often adopt simple feature
concatenation or fixed-weight fusion strategies, lacking adaptive modeling of relative
modality importance, which limits further improvement in prediction accuracy. To
address this issue, this study proposes a Transformer-based multi-modal adaptive Gated
Fusion model (TMMGF). The model employs Transformers to model dynamic time
series of remote sensing spectral data and climate variables, applies multilayer
perceptrons (MLP) to handle static environmental factors including soil and topography.
Multiple modalities are then integrated through a gated fusion mechanism to achieve
adaptive weighted fusion. This study was conducted across the conterminous United
States, based on county-level winter wheat yield records from 2008 to 2023. The
TMMGF was systematically compared with an LSTM-based multimodal adaptive
Gated Fusion model (MMGF), Transformer single-modal remote sensing model,
Transformer single-modal climate model, MLP single-modal soil model, and MLP
single-modal topography model. The results show that TMMGF achieves the best
performance, with an average R² of 0.813, RMSE of 0.571 t/ha, and MAPE of 14.49%
in 10-fold cross-validation, significantly outperforming the baseline models. In
particular, compared with the LSTM-based multimodal model MMGF (R² = 0.796,
RMSE = 0.598 t/ha, MAPE = 15.11%), TMMGF shows clear advantages in both
accuracy and stability. This study demonstrates that a Transformer-based adaptive
multimodal fusion framework can effectively integrate heterogeneous data sources and
provides a promising technical pathway for high-accuracy large-scale wheat yield
prediction.

Soil, water and agriculture

Aula Magna

Haoran Meng .ical

Sessions

Haoran Meng
.ical