2026-10-07, 15:15–15:30 (Europe/Amsterdam), Rooms 12+14
Accurate characterization of tropical forest vertical structure is critical for carbon accounting and ecosystem monitoring, yet most machine-learning pipelines reduce GEDI's rich waveform information to a single scalar, typically canopy height or a high relative-height percentile. This simplification discards the ordered height distribution that GEDI encodes across its full relative height (RH) profile, and that its own biomass algorithms depend on. We introduce Biomazon, an open, ML-ready multimodal benchmark dataset at 20 m resolution over the Amazon Basin, designed to support joint prediction of the full GEDI RH profile (RH0 to RH100) together with above-ground biomass density (AGBD). The dataset pairs GEDI-derived targets with multi-sensor predictors including Sentinel-1, Sentinel-2, ALOS-2 PALSAR-2, Copernicus DEM, Dynamic World land cover, and geospatial foundation model embeddings, all co-registered on a common grid with standardized spatial splits and evaluation protocols to enable reproducible comparison of methods. We formulate RH prediction as structured output learning with a monotonicity constraint that enforces physical consistency across percentiles, and we provide baseline results from systematic ablations over model scale, sensor contributions, and the role of AlphaEarth embeddings, both as standalone predictors and in fusion with raw modalities. Results are contextualized against existing gridded products to assess practical relevance. Biomazon addresses a gap in current benchmarking by shifting the task formulation from scalar regression toward structure-aware modeling, and by providing the community with an open, multi-sensor dataset and protocol for investigating when and how different data sources, including learned representations, contribute to forest structure and biomass retrieval in tropical forests.
This talk presents Biomazon, an open multimodal benchmark dataset for predicting forest vertical structure and biomass in the Amazon Basin from multi-sensor Earth Observation data. The dataset and protocols will be publicly released, supporting reproducible comparison of methods for full GEDI relative height profile and AGBD prediction.
The talk is relevant to researchers and practitioners working on forest monitoring pipelines, foundation model evaluation, and carbon-relevant mapping. We will show controlled comparisons between raw sensor inputs (Sentinel-1/2, ALOS-2 PALSAR-2) and AlphaEarth embeddings under identical training conditions, providing evidence on how learned representations help. We will further contextualize Biomazon baseline performance through regionally aligned comparisons with existing gridded products, including GEDI L4D, Wagner et al., Lang et al., Potapov et al., Tolan et al., and ESA CCI AGBD. Finally, we will discuss why treating vertical structure as a structured prediction target, rather than a scalar canopy height proxy, matters for biomass estimation and carbon accounting at scale.
All data, code, and evaluation protocols are designed to be open and reusable.
Other
Please provide URL that you plan to use to distribute your materials (if available). –Sayan Mandal received his B.Tech. in Computer Science from University of Petroleum and Energy Studies, India, in 2017 and M.Sc. in Computer Science (major: Machine Learning, minor: Visual Computing), with distinction, from Technische Universität Graz, Austria, in 2024. Before joining Masters, he worked in the field of Computer Vision for over 4 years with two leading startups in India. For his M.Sc. thesis, he worked as a Student Project Assistant in FutureWoods Project at ICG, TU Graz, Austria, funded by FFG - Austrian Research Promotion Agency and the Vienna Scientific Cluster supercomputer. He is currently pursuing his Ph.D. degree in Electrical and Computer Engineering from University of Iceland in conjunction with the “AI and ML for Remote Sensing” Simulation and Data Lab, JSC, Forschungszentrum Jülich, Germany. His main research interests include developing robust deep learning models for remote sensing applications, foundation models and exploring AI efficiency, using HPC systems.