Shepherd:Node Monitors for Fault-Tolerant Distributed Process Execution in OSIRIS
Diego Milano and Nenad Stojnić
Proceedings of the 5th Workshop on Enhanced Web Service Technologies (WEWST 2010)
Ayia Napa, Cyprus
OSIRIS is a middleware for the composition and orchestration of distributed web services that follows a P2P decentralized approach to process execution, providing already some degree of resilience to faults and high performance in large-scale computational clusters. In this paper, we present on-going work aimed at improving OSIRIS' fault tolerance capabilities. We introduce in OSIRIS new architectural elements for the maintenance of a virtual stable storage and the monitoring of activities of service instances, together with algorithms that allow execution to survive also failures that the system is currently not able to cope with.