Schema Recognition and Integration in a Polystore System using Machine Learning Methods (Master Thesis, Finished)
Author
Description
Extending the polystore database system Polypheny by adding schema integration. A schema integration consists of schema recognition (recognizing the data types of fields), schema matching (quantifying the relationship between fields) and schema mapping (mapping fields to each other according to their relationships, as provided by the schema matching). Multiple tools from each of these steps will be implemented. Which of these tools will be used for the schema integration task and their order will be decided by a cost optimization. Additionally, new GUI elements of Polypheny are discussed. A user can manually set settings for the schema integration, such as whether speed or thoroughness is favored. The user can be asked to provide information on whether a specific matching is correct, which provides ground truth for the schema integration tools. The schema integration results will be evaluated on quality and speed. Appropriate quality and speed metrics are discussed. For the evaluation, both synthetic and real-world data will be used. The evaluation allows for comparison to pre-existing tools and provides evidence for the suitability for real-world data.
Start / End Dates
2022/05/09 - 2022/11/08