Dependable Computing with Proactive Failure Avoidance, Recovery, and Maintenance 2016
This book presents a set of techniques for dependable computing, known collectively as Proactive Failure Avoidance, Recovery, and Maintenance, or PFARM, which can have a major impact on computer systems availability and performance. It focuses on runtime monitoring, failure avoidance and prediction algorithms and technologies, proactive recovery and preventive maintenance, as the main steps in proactive fault management. Coverage includes runtime monitoring techniques, long-term and short-term prediction techniques, an introduction to prediction quality measures, and a demonstration of how the availability of software and hardware systems can be increased by preventive measures which are triggered by short-term failure prediction mechanisms.