Classification of heterogeneous data based on data type impact of similarity
View/ Open
Neagu_ACIS_Chapter.pdf (574.6Kb)
Download
Publication date
2019Keyword
Heterogeneous datasetsSimilarity measures
Two-dimensional similarity space
Classification algorithms
Rights
© Springer Nature Switzerland AG 2019. Reproduced in accordance with the publisher's self-archiving policy. The final publication is available at Springer via https://doi.org/10.1007/978-3-319-97982-3_21.Peer-Reviewed
YesOpen Access status
openAccess
Metadata
Show full item recordAbstract
Real-world datasets are increasingly heterogeneous, showing a mixture of numerical, categorical and other feature types. The main challenge for mining heterogeneous datasets is how to deal with heterogeneity present in the dataset records. Although some existing classifiers (such as decision trees) can handle heterogeneous data in specific circumstances, the performance of such models may be still improved, because heterogeneity involves specific adjustments to similarity measurements and calculations. Moreover, heterogeneous data is still treated inconsistently and in ad-hoc manner. In this paper, we study the problem of heterogeneous data classification: our purpose is to use heterogeneity as a positive feature of the data classification effort by using consistently the similarity between data objects. We address the heterogeneity issue by studying the impact of mixing data types in the calculation of data objects’ similarity. To reach our goal, we propose an algorithm to divide the initial data records based on pairwise similarity for classification subtasks with the aim to increase the quality of the data subsets and apply specialized classifier models on them. The performance of the proposed approach is evaluated on 10 publicly available heterogeneous data sets. The results show that the models achieve better performance for heterogeneous datasets when using the proposed similarity process.Version
Accepted manuscriptCitation
Ali N, Neagu D and Trundle P (2019) Classification of heterogeneous data based on data type impact on similarity. In: Lotfi A, Bouchachia H, Gegov A et al (eds) Advances in Computational Intelligence Systems. UK Workshop on Computational Intelligence, 2018. Advances in Intelligent Systems and Computing. Springer: Cham. 840: 252-263.Link to Version of Record
https://doi.org/10.1007/978-3-319-97982-3_21Type
Conference paperae974a485f413a2113503eed53cd6c53
https://doi.org/10.1007/978-3-319-97982-3_21