View/ Open
Main article (368.9Kb)
Download
Publication date
2017Rights
© 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Peer-Reviewed
n/a
Metadata
Show full item recordAbstract
Failure in a cloud system is defined as an even that occurs when the delivered service deviates from the correct intended behavior. As the cloud computing systems continue to grow in scale and complexity, there is an urgent need for cloud service providers (CSP) to guarantee a reliable on-demand resource to their customers in the presence of faults thereby fulfilling their service level agreement (SLA). Component failures in cloud systems are very familiar phenomena. However, large cloud service providers’ data centers should be designed to provide a certain level of availability to the business system. Infrastructure-as-a-service (Iaas) cloud delivery model presents computational resources (CPU and memory), storage resources and networking capacity that ensures high availability in the presence of such failures. The data in-production-faults recorded within a 2 years period has been studied and analyzed from the National Energy Research Scientific computing center (NERSC). Using the real-time data collected from the Computer Failure Data Repository (CFDR), this paper presents the performance of two machine learning (ML) algorithms, Linear Regression (LR) Model and Support Vector Machine (SVM) with a Linear Gaussian kernel for predicting hardware failures in a real-time cloud environment to improve system availability. The performance of the two algorithms have been rigorously evaluated using K-folds cross-validation technique. Furthermore, steps and procedure for future studies has been presented. This research will aid computer hardware companies and cloud service providers (CSP) in designing a reliable fault-tolerant system by providing a better device selection, thereby improving system availability and minimizing unscheduled system downtime.Version
Accepted ManuscriptCitation
Adamu H, Bashir M, Bukar AM, Cullen A and Awan I (2017) An approach to failure prediction in a cloud based environment. Presented at the IEEE 5th International Conference on Future Internet of Things and Cloud. (FiCloud 2017) 21-23 August 2017, Prague, Czech Republic. pp. 191-197.Link to Version of Record
https://doi.org/10.1109/FiCloud.2017.56Type
Conference paperae974a485f413a2113503eed53cd6c53
https://doi.org/10.1109/FiCloud.2017.56