Sajed
Sajed Sarabandi is a software engineer and junior researcher specializing in remote sensing at the OpenGeoHub Foundation. He holds a Master’s degree in Computer Science from Leiden University.
Sessions
Time series and spatial modeling are commonly used to generate cloud- and gap-free satellite imagery. Most existing approaches reconstruct the entire dataset using advanced models, which requires high computational resources and time. In this study, we introduce a new, computationally efficient pipeline to reconstruct monthly Landsat data without gaps or clouds. The pipeline includes four levels of gap filling. In the first step, we apply a clean mask to biweekly Landsat data and create a 7-image weighted window spanning the current and neighbouring months. For each band and month across the 28-year period, we generate 25th and 75th percentile thresholds and calculate a weighted median, giving 50% weight to the current month and 25% to neighboring months, using only values within the 25th–75th percentiles. In the second step, remaining gaps are filled using an annual land cover classification derived from the GLAD dataset and Landsat data from up to ten previous years, restricted to pixels in the same land cover class. The third step fills small gaps of up to 2×2 pixels using a 4×4 averaging kernel. These steps fill approximately 40–60% of land pixels depending on tile location. Finally, a pretrained temporal model is applied to fill the remaining gaps. We tested this pipeline on a CPU server with 96 threads and 1 TB RAM. Each tile can be processed in under 2000 seconds. Parallelization across tiles and bands enables global processing in under six weeks, significantly reducing the computational time compared to full dataset reconstruction, which would take approximately six months. The resulting dataset provides clean, gap- and cloud-free monthly Landsat imagery suitable for a variety of research applications. Limitations remain, mostly related to input/output operations, and future work could apply embedding models to reduce dataset size and produce abstract representations for faster access.
One of the most significant deliverables of the OEMC project are global, cloud-less Landsat monthly time series from 2000–2025 at 30 m resolution. The Landsat global mosaics (V1) are explained in detail in Consoli et al. (2025; https://peerj.com/articles/18585/). The Landsat V2 is at the order of magnitude more ambitious aiming at monthly products in 16bit format and will significantly less artifacts. The pipeline uses a four-step process for improved quality, including gap-filling using spatial and temporal neighbours, data fusion and final gap filling using global models. The results of cross-validation show improvements in accuracy in consistency. Major project challenges include needing 1PB of storage and securing post-2025 commercial services. Landsat V2 can also be used to derive embeddings for 2000-2025.