Conceptualization and prototyping of a multi-vendor, multi-format data lake for network analytics (Bachelor Thesis, Finished)
Author
Description
The goal of this project is to develop and test a concept on how data gathered from different network devices can be structured and normalized into a data lake and how this data can then be used in the context of network analytics.
The Input data can consist of unstructured data retrieved through screen scraping, semi-structured data, as well as structured data retrieved for example through rest-api. The data formats include metrics, logs, config data, and operational state data.
There are different potentially suitable technologies and data schemes to structure, store, query and present/visualize heterogeneous data. At the beginning of the project, the student is expected to evaluate and compare different options and select the most suitable one(s). This project consists of two parts: (1) Concept & evaluation of different data schemes and tools, and (2) implementing a prototype using the previously selected data structure(s) and tool(s). The context will be provided by the data and GUI from Narrowin's network explorer. In the concept and evaluation, the student will investigate:
- different data formats regarding their interoperability with industry standards
- how the storage format influences the possibilities for flexible queries, aggregations, and cross vendor comparisons
- the pros and cons of schema on read vs. schema on write (normalizing to e.g. YANG, elastic common schema or an own format) within the context at hand
The thesis is done in cooperation with narrowin gmbh, Liestal.
The main focus of the project should be on flexibility and fast iteration. Therefore, lock-in effects should be avoided (no strict and tight dependencies on storage format and software). It is also important, that fast prototyping is possible (sophisticated analytics and visual exploration of ingested data early-on). Additionally, the data querying capabilities should be highly flexible, since at this point the questions asked and analyses to perform are not fully known and will evolve. Last, the data should be easy to transform, so it can be fed for example into the GUI.
The student should evaluate their conceptual framework and the selected data structure(s) and tool(s) using the prototype and derive recommendations for further research and development. Depending on the student's progress, the project can be extended by looking into one or more of the following topics:
- Feeding data from the prototype into the GUI
- Identifying first use cases for analytics based on the prototype
- Evaluating options for data enrichment
The work is done in cooperation with narrowin gmbh, Liestal.
Start / End Dates
2022/10/14 - 2023/02/13