Loading...
Classification of heterogeneous data based on data type impact of similarity
Ali, N. ; Neagu, Daniel ; Trundle, Paul R.
Ali, N.
Neagu, Daniel
Trundle, Paul R.
Publication Date
2019
End of Embargo
Supervisor
Rights
© Springer Nature Switzerland AG 2019. Reproduced in accordance with the publisher's self-archiving policy.
The final publication is available at Springer via https://doi.org/10.1007/978-3-319-97982-3_21.
Peer-Reviewed
Yes
Open Access status
openAccess
Accepted for publication
Institution
Department
Awarded
Embargo end date
Additional title
Abstract
Real-world datasets are increasingly heterogeneous, showing a mixture of numerical, categorical and other feature types. The main challenge for mining heterogeneous datasets is how to deal with heterogeneity present in the dataset records. Although some existing classifiers (such as decision trees) can handle heterogeneous data in specific circumstances, the performance of such models may be still improved, because heterogeneity involves specific adjustments to similarity measurements and calculations. Moreover, heterogeneous data is still treated inconsistently and in ad-hoc manner. In this paper, we study the problem of heterogeneous data classification: our purpose is to use heterogeneity as a positive feature of the data classification effort by using consistently the similarity between data objects. We address the heterogeneity issue by studying the impact of mixing data types in the calculation of data objects’ similarity. To reach our goal, we propose an algorithm to divide the initial data records based on pairwise similarity for classification subtasks with the aim to increase the quality of the data subsets and apply specialized classifier models on them. The performance of the proposed approach is evaluated on 10 publicly available heterogeneous data sets. The results show that the models achieve better performance for heterogeneous datasets when using the proposed similarity process.
Version
Accepted manuscript
Citation
Ali N, Neagu D and Trundle P (2019) Classification of heterogeneous data based on data type impact on similarity. In: Lotfi A, Bouchachia H, Gegov A et al (eds) Advances in Computational Intelligence Systems. UK Workshop on Computational Intelligence, 2018. Advances in Intelligent Systems and Computing. Springer: Cham. 840: 252-263.
Link to publisher’s version
Link to published version
Link to Version of Record
Type
Conference paper