Machine learning experiments with artificially generated big data from small immunotherapy datasets
dc.contributor.author | Mahmoud, Ahsanullah Y. | |
dc.contributor.author | Neagu, Daniel | |
dc.contributor.author | Scrimieri, Daniele | |
dc.contributor.author | Abdullatif, Amr A.A. | |
dc.date.accessioned | 2022-12-13T17:07:18Z | |
dc.date.accessioned | 2023-01-18T14:43:37Z | |
dc.date.available | 2022-12-13T17:07:18Z | |
dc.date.available | 2023-01-18T14:43:37Z | |
dc.date.issued | 2023-03 | |
dc.identifier.citation | Mahmoud AY, Neagu D, Scrimieri D et al (2022) Machine learning experiments with artificially generated big data from small immunotherapy datasets. 21st IEEE International Conference on Machine Learning and Applications (ICMLA). 12-14 December 2022, Nassau, Bahamas. | |
dc.identifier.uri | http://hdl.handle.net/10454/19291 | |
dc.description | Yes | |
dc.description.abstract | Big data and machine learning result in agile and robust healthcare by expanding raw data into useful patterns for data-enhanced decision support. The available datasets are mostly small and unbalanced, resulting in non-optimal classification when the algorithms are implemented. In this study, five novel machine learning experiments are conducted to address the challenges of small datasets by expanding these into big data and then utilising Random Forests. The experiments are based on personalised adaptable strategies for both balanced and unbalanced datasets. Multiple datasets from cryotherapy and immunotherapy are considered, however, hereby only immunotherapy is used. In the first experiment, artificially generated data is presented by increasing the observations of the dataset, each new data is four-time larger than the previous one, resulting in better classification. In the second experiment, the effect of volume on classification is considered based on the number of attributes. The attributes of each new dataset are built based on conditional probabilities. It did not make any difference, in obtained classification, when the number of attributes is increased to more than 879. In the third simulation experiment, classes of data are classified manually by dividing the data into a two-dimensional plane. This experiment is first performed on small data and then on expanded big data: by increasing observations, an accuracy of 73.68% is attained. In the fourth experiment, the visualisation of the enlarged data did not provide better insights. In the fifth experiment, the impact of correlations among datasets’ attributes on classification is observed, however, no improvements in performance are achieved. The experiments generally improved performance by comparing the classification results using the original and artificial data. | |
dc.language.iso | en | en |
dc.publisher | IEEE | |
dc.rights | © 2022 IEEE. Reproduced in accordance with the publisher's self-archiving policy. | |
dc.subject | Immunotherapy | |
dc.subject | Big data | |
dc.subject | Machine learning | |
dc.subject | Classification | |
dc.subject | Random Forests | |
dc.subject | Warts | |
dc.subject | Cryotherapy | |
dc.subject | Health-care | |
dc.title | Machine learning experiments with artificially generated big data from small immunotherapy datasets | |
dc.status.refereed | Yes | |
dc.type | Conference paper | |
dc.type.version | Accepted manuscript | |
dc.identifier.doi | https://doi.org/10.1109/ICMLA55696.2022.00165 | |
dc.rights.license | Unspecified | |
dc.date.updated | 2022-12-13T17:07:19Z | |
refterms.dateFOA | 2023-01-18T14:44:11Z | |
dc.openaccess.status | openAccess | |
dc.date.accepted | 2022 |