Dominik Weckmüller is a data scientist at the Knowledge Centre on Earth Observation of the European Commission.
He develops innovative solutions for large-scale data analysis by utilizing advanced NLP machine learning methodologies to support EU policy-making.
The Knowledge Centre on Earth Observation activity is grounded in sound knowledge management practices and cutting-edge NLP technologies. It aims to create a common scaffolding for research projects on the one hand and policy needs on the other.
The User Requirement Database (URDB) stores and validates Core Copernicus Users requirements for Earth Observation (EO) products and applications. The URDB facilitates automated gap analysis and screenings across diverse data spaces in pursuit of the optimal matching pre-existing solution, initially examining Copernicus Services product catalogues, followed by a subsequent exploration of research findings from the EU Horizon programme. The Text Mining Application (TMA) leverages innovative advancements in machine learning utilizing Transformers to facilitate precise semantic document retrieval within an EO-specific subset of research outcomes financially supported by the European Union's research and innovation framework programmes. These programmes and the respective EO project data span from FP1 in 1984 to the most current initiative, Horizon Europe. The primary TMA objective is to empower users with rapid access to research findings for highly specific queries, while simultaneously offering a user-friendly database, an internal microservice as an API, and a GUI interface for more advanced metrics and visualizations. In the future, the URDB and TMA will be closely interlinked and integrated, enabling users of either platform to benefit from rapid access to the actual Copernicus datasets, as well as enhanced meta-information metrics and insight into research outcomes.
In addition to supporting gap and fit for purpose analysis, the main scope of the URDB is to enable requirement retracing across the components of the EO value chain, from policy needs to observations, and therefore supporting and tracking the evolution of the Copernicus Programme. The URDB's records are technology-agnostic quantitative requirements, expressed by verifiable, unambiguous and actionable technical specifications (horizontal resolution, measurement uncertainty, tasking time, etc.). The URDB's data model builds on the experience of existing requirement databases from Copernicus Core Services and international partners (e.g., USGS and NASA). One of the URDB’s and TMA’s core design principles is semantic interoperability: entities, relationships and attributes are clearly defined in a terminology and, when applicable, they follow international standards (ISO, OGC), recommendation and best practices (CEOS, GEO).
From a technical perspective, both, the URDB and TMA are self-hosted open-source databases with a GUI and an application layer for querying and performing analysis. While the URDB is based on PostgreSQL, the TMA utilizes a novel vector database known as Qdrant, which fulfils highly specific AI requirements and offers a user-friendly API.
The vision for both databases is to link them through web APIs to online data catalogues and tightly integrate them into the existing (meta-) data Copernicus landscape.