Data Pipelines in a PolyDBMS (Master Thesis, Ongoing)
Author
Description
The growing demand for more data, delivered as quickly as possible, in our interconnected society has led to increasingly diverse and distributed data management landscapes that span a wide range of applications and services. To extract meaningful information from this huge landscape the task of centralizing from a variety of external systems has become a topic of huge interest.
However, data ingestion into such systems often involves substantial manual effort and is highly dependent on both the source and target systems. Additionally, any significant changes to the data structure exposed by the source systems requires manual handling to not end up with incorrect of missing data.
To address these challenges, data pipelines have become a popular solution, which automate the process of transferring and transforming data between systems, by providing a set of predefined operators and means to easily build use case specific data pipelines. Despite this, these pipelines heavily rely on the specific characteristics and data models of both the source and destination systems, often requiring additional data transformations. This reliance can obscure valuable information, as transforming data into a different data modelcan result in a loss of important details.
With the introduction of fully-fledged multi-model database systems, called PolyDBMS, the rigid constraints of only a single destination-specific data model have been relaxed, significantly reducing the system-specific requirements for data pipelines. This flexibility opens new possibilities for improving and simplifying the data integration process.
This Master’s thesis will examine current data pipeline trends as well as methodologies and propose novel approaches for modeling and integrating such a data pipeline framework within a PolyDBMS. The focus will be on expanding and adapting current research in the field and leveraging the unique capabilities of a PolyDBMS. The resulting functionalities have to be implemented within the PolyDBMS Polypheny, to demonstrate and evaluate their effectiveness.
Start / End Dates
2024/11/04 - 2025/05/04