Loading...
Thumbnail Image
Publication

Machine learning experiments with artificially generated big data from small immunotherapy datasets

Mahmoud, Ahsanullah Y.
Abdullatif, Amr A.A.
Publication Date
2023-03
End of Embargo
Supervisor
Rights
© 2022 IEEE. Reproduced in accordance with the publisher's self-archiving policy.
Peer-Reviewed
Yes
Open Access status
openAccess
Accepted for publication
2022
Institution
Department
Awarded
Embargo end date
Additional title
Abstract
Big data and machine learning result in agile and robust healthcare by expanding raw data into useful patterns for data-enhanced decision support. The available datasets are mostly small and unbalanced, resulting in non-optimal classification when the algorithms are implemented. In this study, five novel machine learning experiments are conducted to address the challenges of small datasets by expanding these into big data and then utilising Random Forests. The experiments are based on personalised adaptable strategies for both balanced and unbalanced datasets. Multiple datasets from cryotherapy and immunotherapy are considered, however, hereby only immunotherapy is used. In the first experiment, artificially generated data is presented by increasing the observations of the dataset, each new data is four-time larger than the previous one, resulting in better classification. In the second experiment, the effect of volume on classification is considered based on the number of attributes. The attributes of each new dataset are built based on conditional probabilities. It did not make any difference, in obtained classification, when the number of attributes is increased to more than 879. In the third simulation experiment, classes of data are classified manually by dividing the data into a two-dimensional plane. This experiment is first performed on small data and then on expanded big data: by increasing observations, an accuracy of 73.68% is attained. In the fourth experiment, the visualisation of the enlarged data did not provide better insights. In the fifth experiment, the impact of correlations among datasets’ attributes on classification is observed, however, no improvements in performance are achieved. The experiments generally improved performance by comparing the classification results using the original and artificial data.
Version
Accepted manuscript
Citation
Mahmoud AY, Neagu CD, Scrimieri D et al (2022) Machine learning experiments with artificially generated big data from small immunotherapy datasets. 21st IEEE International Conference on Machine Learning and Applications (ICMLA). 12-14 December 2022, Nassau, Bahamas.
Link to publisher’s version
Link to published version
Type
Conference paper
Qualification name
Notes

Version History

Now showing 1 - 2 of 2
VersionDateSummary
2*
2025-04-09 13:00:39
Edited author entries
2022-12-13 17:07:18
* Selected version