Shepherd:Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS

Authors
Diego Milano and Nenad Stojnić
Type
In Proceedings
Date
2010/12
Appears in
Proceedings of the 5th Workshop on Enhanced Web Service Technologies (WEWST 2010)
Location
Ayia Napa, Cyprus
Publisher
ACM
Abstract
OSIRIS is a middleware for the composition and orchestration of distributed web services that follows a P2P decentralized approach to process execution, providing already some degree of resilience to faults and high performance in large-scale computational clusters. In this paper, we present on-going work aimed at improving OSIRIS' fault tolerance capabilities. We introduce in OSIRIS new architectural elements for the maintenance of a virtual stable storage and the monitoring of activities of service instances, together with algorithms that allow execution to survive also failures that the system is currently not able to cope with.