YieldSAT: A Multimodal Benchmark Dataset for High-Resolution Crop Yield Prediction
Miro Miranda ⋅ Deepak Pathak ⋅ Patrick Helber ⋅ Benjamin Bischke ⋅ Hiba Najjar ⋅ Francisco Mena ⋅ Cristhian Sanchez ⋅ Akshay Pai ⋅ Diego Arenas ⋅ Matias Valdenegro ⋅ Marcela Charfuelan ⋅ Marlon Nuske ⋅ Andreas Dengel
Abstract
Crop yield prediction requires substantial data to train data-driven models. However, creating yield prediction datasets is constrained by high acquisition costs, heterogeneous data quality, and data privacy regulations. Consequently, existing datasets are scarce, low in quality, or limited to regional levels or single crop types. In this work, we release YieldSAT, a large, high-quality, and multimodal dataset for high-resolution crop yield prediction. YieldSAT spans various climate zones across two continents and multiple countries, including Argentina, Brazil, Uruguay, and Germany. The dataset was collected between 2016 and 2024 and comprises four major crop types—corn, rapeseed, soybeans, and wheat—across 2,176 curated fields. In total, over 12.2 million yield samples are available, each with a spatial resolution of $\SI{10}{m}$. Each field is paired with multispectral satellite imagery, resulting in 113,630 labeled satellite images, complemented by auxiliary environmental data. We demonstrate the potential of large-scale and high-resolution crop yield prediction as an image regression task by comparing various deep learning models and data fusion architectures. Furthermore, we highlight open challenges arising from severe distribution shifts in the ground truth data. To mitigate this, we explore a domain-informed Deep Ensemble that exhibits greater diversity in the weight space.
Successful Page Load