Publication

Early Diagnosis and Personalised Treatment: Classification Modelling of Immunotherapy Data Utilising Machine Learning and Deep Learning

Mahmoud, Ahsanullah Y.
Publication Date
End of Embargo
Supervisor
Rights
Creative Commons License
The University of Bradford theses are licenced under a Creative Commons Licence.
Peer-Reviewed
Open Access status
Accepted for publication
Institution
University of Bradford
Department
School of Computer Science, Artificial Intelligence and Electronics. Faculty of Engineering and Digital Technologies
Awarded
2023
Embargo end date
Collections
Additional title
A Focus on Data Challenges, Artificial Intelligence (AI), Experiments, Big Data, Visual Learning and Computational Health
Abstract
Early diagnosis and personalised treatment are emerging in health, due to machine learning and deep learning playing a vital role in the treatment of cancer, infections and immunotherapies. However, immunotherapy faces obstacles as medical data are typically small, imbalanced and contain irrelevant features, resulting in suboptimal classification performance. Therefore, the following contributions are proposed, addressing the data challenges. A comprehensive immunotherapy literature review is presented to uncover gaps in published studies by exploring application domains, datasets, algorithms and software tools. Studies on imbalanced immunotherapy datasets are reproduced to identify gaps in applications. Novel personalised experiments are conducted based on converting original data to artificially big data, to consider the impact on classification evaluating simulations of observations and features, manual classification, visualisations and correlations. Random Forest and Generative Adversarial Network are mainly used for classification and synthetic data generation, respectively, because of their better performance. A visual learning approach is proposed considering data, algorithm and human levels to improve the quality of a dataset relative to the expected classification performance. Numerous experiments including statistical features, cumulative sums, histograms, correlation matrix, mean squared error and principal component analysis are performed comparing visualisations of original and synthetic data. An adaptable synergy between data quality and classification performance is obtained while preserving statistical characteristics. For original and synthetic immunotherapy data, the Random Forest performs best with precision, recall, f-measure, accuracy, sensitivity and specificity.
Version
Citation
Link to publisher’s version
Link to published version
Link to Version of Record
Type
Thesis
Qualification name
PhD
Notes