Fail Over Strategy for Fault Tolerance in Cloud Computing Environment
View/ Open
mohammed_et_al_2017.pdf (3.735Mb)
Download
Publication date
2017-09Keyword
Fault toleranceCheckpointing
Checkpointing
Virtualisation
Load balancing
Virtual machine
Cloud computing
Rights
© 2017 Wiley. This is the peer reviewed version of the following article: Mohammed B, Kiran M, Maiyama KM et al (2017) Failover strategy for fault tolerance in cloud computing environment. Software: Practice and Experience. 47(9): 1243-1274, which has been published in final form at https://doi.org/10.1002/spe.2491. This article may be used for noncommercial purposes in accordance with Wiley Terms and Conditions for Self-Archiving.Peer-Reviewed
YesOpen Access status
openAccessAccepted for publication
2017-01-20
Metadata
Show full item recordAbstract
Cloud fault tolerance is an important issue in cloud computing platforms and applications. In the event of an unexpected system failure or malfunction, a robust fault-tolerant design may allow the cloud to continue functioning correctly possibly at a reduced level instead of failing completely. To ensure high availability of critical cloud services, the application execution and hardware performance, various fault tolerant techniques exist for building self-autonomous cloud systems. In comparison to current approaches, this paper proposes a more robust and reliable architecture using optimal checkpointing strategy to ensure high system availability and reduced system task service finish time. Using pass rates and virtualised mechanisms, the proposed Smart Failover Strategy (SFS) scheme uses components such as Cloud fault manager, Cloud controller, Cloud load balancer and a selection mechanism, providing fault tolerance via redundancy, optimized selection and checkpointing. In our approach, the Cloud fault manager repairs faults generated before the task time deadline is reached, blocking unrecoverable faulty nodes as well as their virtual nodes. This scheme is also able to remove temporary software faults from recoverable faulty nodes, thereby making them available for future request. We argue that the proposed SFS algorithm makes the system highly fault tolerant by considering forward and backward recovery using diverse software tools. Compared to existing approaches, preliminary experiment of the SFS algorithm indicate an increase in pass rates and a consequent decrease in failure rates, showing an overall good performance in task allocations. We present these results using experimental validation tools with comparison to other techniques, laying a foundation for a fully fault tolerant IaaS Cloud environment.Version
Accepted manuscriptCitation
Mohammed B, Kiran M, Maiyama KM et al (2017) Failover strategy for fault tolerance in cloud computing environment. Software: Practice and Experience. 47(9): 1243-1274.Link to Version of Record
https://doi.org/10.1002/spe.2491Type
Articleae974a485f413a2113503eed53cd6c53
https://doi.org/10.1002/spe.2491