Machine learning experiments with artificially generated big data from small immunotherapy datasets
View/ Open
Scrimieri_et_al_ICMLA_2022 (167.4Kb)
Download
Publication date
2023-03Keyword
ImmunotherapyBig data
Machine learning
Classification
Random Forests
Warts
Cryotherapy
Health-care
Rights
© 2022 IEEE. Reproduced in accordance with the publisher's self-archiving policy.Peer-Reviewed
YesOpen Access status
openAccess
Metadata
Show full item recordAbstract
Big data and machine learning result in agile and robust healthcare by expanding raw data into useful patterns for data-enhanced decision support. The available datasets are mostly small and unbalanced, resulting in non-optimal classification when the algorithms are implemented. In this study, five novel machine learning experiments are conducted to address the challenges of small datasets by expanding these into big data and then utilising Random Forests. The experiments are based on personalised adaptable strategies for both balanced and unbalanced datasets. Multiple datasets from cryotherapy and immunotherapy are considered, however, hereby only immunotherapy is used. In the first experiment, artificially generated data is presented by increasing the observations of the dataset, each new data is four-time larger than the previous one, resulting in better classification. In the second experiment, the effect of volume on classification is considered based on the number of attributes. The attributes of each new dataset are built based on conditional probabilities. It did not make any difference, in obtained classification, when the number of attributes is increased to more than 879. In the third simulation experiment, classes of data are classified manually by dividing the data into a two-dimensional plane. This experiment is first performed on small data and then on expanded big data: by increasing observations, an accuracy of 73.68% is attained. In the fourth experiment, the visualisation of the enlarged data did not provide better insights. In the fifth experiment, the impact of correlations among datasets’ attributes on classification is observed, however, no improvements in performance are achieved. The experiments generally improved performance by comparing the classification results using the original and artificial data.Version
Accepted manuscriptCitation
Mahmoud AY, Neagu D, Scrimieri D et al (2022) Machine learning experiments with artificially generated big data from small immunotherapy datasets. 21st IEEE International Conference on Machine Learning and Applications (ICMLA). 12-14 December 2022, Nassau, Bahamas.Link to Version of Record
https://doi.org/10.1109/ICMLA55696.2022.00165Type
Conference paperae974a485f413a2113503eed53cd6c53
https://doi.org/10.1109/ICMLA55696.2022.00165