2023-10-05, 10:40–10:45, Poster presentation
Earth System Models and Earth Observations are crucial for studying the Earth, providing scientific insight into fundamental dynamics and valuable predictions about Earth’s future. However, they generate huge amounts of data, at different temporal and spatial scales, so it becomes of paramount importance to access them in a seamless and efficient way for scientific analysis. Usually Earth Science datasets are represented with hundreds or thousands of files that can introduce a lot of burden to the user in terms of management, since it requires the user to set up computational and storage resources for accessing and retrieving data and writing code to load and prepare data into in-memory data structures for analysis.
In this talk, we describe in detail the architectural design, implementation and deployment of a data management and analytics system in order to facilitate cataloguing, accessing and processing Earth Science data. The system has been designed using a cloud-native architecture, based on containerized microservices, that facilitates the development, deployment and maintenance of the system itself. It has been implemented by integrating different open source frameworks, tools and libraries and has been deployed using the Kubernetes platform and related tools such as kubectl and kustomize.
The Data Platform consists of different components that will be introduced and described together with the related technologies adopted: (a) the Catalog, based on Intake and MongoDB for cataloguing and indexing the datasets published and managed in the system, (b) the Analytics Engine, based on the geokube and dask Python libraries: geokube is used for specialised geospatial operations (such as extracting a bounding box or a multipolygon) according to different types of geoscientific datasets and dask for parallel and distributed processing; (c) the Broker implemented using RabbitMQ framework for managing the user workload requests; finally, (d) the Rest APIs and the OGC standard interfaces (i.e., WPS) to access data and to submit analytics workflows.
An instance of the Data Platform has been deployed in production at Euro-Mediterranean Centre on Climate Change (CMCC) for the delivery and analysis of data produced by the CMCC Research Divisions. In this talk, we will showcase different Use Cases, related to sectors such as climate change and wildfire management, that demonstrate how the system has been used at CMCC, within different projects and initiatives, for building downstream products and services that need to access, analyse and process Earth Science data.