PAN - A P2P Approach for Scalable Complex Event Detection in Distributed Data Streams (Master Thesis, Finished)
In the last decade, the number of data streams and the volume of streamed data has increased enormously. With this trend, the importance of detecting complex events in data streams in real-time has increased as well. Solving this problem is important for many economical as well as entertainment (e.g., sport analyses) use cases.
In this thesis, we present PAN (P2P Analysis Network). PAN is a generic real-time complex event detection system which is able to analyze multiple distributed input data streams and handle several client requests.
In order to be scalable, PAN distributes its workload onto several workers hosted on peers in a P2P network, which are combined to a workflow. This general idea is not novel but used by many distributed complex event processing (CEP) systems. However, PAN uses a pull-based - instead of the common push-based - publish/subscribe approach to connect the workers and thereby inverts the workflow definition direction. This fundamental difference enables the dynamic extension of the workflow at runtime without changing the existing workflow. In consequence, PAN is able to handle clients as sinks of a workflow and balance the load onto multiple publishers. This makes PAN scalable not only in terms of data but also w.r.t. the number of client requests.
Evaluations based on an extended version of the ACM DEBS 2013 Grand Challenge scenario confirm that the PAN approach works well, i.e., that it is possible to combine the workers of a real-time complex event detection system to a workflow by means of a pull-based publish/subscribe system.
Start / End Dates
2014/02/17 - 2014/08/16