Pull-based Real-Time Complex Event Detection in Multiple Data Streams - the PAN Approach
With the proliferation of embedded sensors, the number of data streams and the volume of streamed data has increased enormously. This has strongly influenced both our business and our private life and has brought forward a large variety of monitoring applications in different domains. In all these applications the analysis of data streams in real-time is essential. One of the main challenges in data stream analysis is the detection of complex events out of the raw streaming data. In this report, we present PAN, a generic middleware for distributed real-time complex event detection (CED) which is able to analyze multiple distributed data streams. In PAN, CED applications are defined as workflows and are executed by dedicated workers in a distributed way in a P2P network. These workers use pull-based publish/subscribe for communication. This allows to dynamically extend analysis workflows at run-time and to balance the load between workers. As a consequence, it makes PAN scalable not only in terms of the number of data streams as well as the number and the complexity of the analyses but also in terms of the number of clients that retrieve the analysis results. Evaluations based on an extended version of the ACM DEBS 2013 Grand Challenge scenario show the effectiveness and efficiency of PAN. This technical report is an extended version of our previous work [19] that has been published by Springer International Publishing in the Proceedings of the 16th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems (DAIS 2016). It add details on the inner workings of PAN, especially on the pull-based publish/subsribe interaction between workers.