Show simple item record

dc.contributor.authorAli, N.*
dc.contributor.authorNeagu, Daniel*
dc.contributor.authorTrundle, Paul R.*
dc.date.accessioned2019-01-22T15:02:57Z
dc.date.available2019-01-22T15:02:57Z
dc.date.issued2019
dc.identifier.citationAli N, Neagu D and Trundle P (2019) Classification of heterogeneous data based on data type impact on similarity. In: Lotfi A, Bouchachia H, Gegov A et al (eds) Advances in Computational Intelligence Systems. UK Workshop on Computational Intelligence, 2018. Advances in Intelligent Systems and Computing. Springer: Cham. 840: 252-263.en_US
dc.identifier.urihttp://hdl.handle.net/10454/16760
dc.descriptionYesen_US
dc.description.abstractReal-world datasets are increasingly heterogeneous, showing a mixture of numerical, categorical and other feature types. The main challenge for mining heterogeneous datasets is how to deal with heterogeneity present in the dataset records. Although some existing classifiers (such as decision trees) can handle heterogeneous data in specific circumstances, the performance of such models may be still improved, because heterogeneity involves specific adjustments to similarity measurements and calculations. Moreover, heterogeneous data is still treated inconsistently and in ad-hoc manner. In this paper, we study the problem of heterogeneous data classification: our purpose is to use heterogeneity as a positive feature of the data classification effort by using consistently the similarity between data objects. We address the heterogeneity issue by studying the impact of mixing data types in the calculation of data objects’ similarity. To reach our goal, we propose an algorithm to divide the initial data records based on pairwise similarity for classification subtasks with the aim to increase the quality of the data subsets and apply specialized classifier models on them. The performance of the proposed approach is evaluated on 10 publicly available heterogeneous data sets. The results show that the models achieve better performance for heterogeneous datasets when using the proposed similarity process.en_US
dc.language.isoenen_US
dc.relation.isreferencedbyhttps://doi.org/10.1007/978-3-319-97982-3_21en_US
dc.rights© Springer Nature Switzerland AG 2019. Reproduced in accordance with the publisher's self-archiving policy. The final publication is available at Springer via https://doi.org/10.1007/978-3-319-97982-3_21.en_US
dc.subjectHeterogeneous datasetsen_US
dc.subjectSimilarity measuresen_US
dc.subjectTwo-dimensional similarity spaceen_US
dc.subjectClassification algorithmsen_US
dc.titleClassification of heterogeneous data based on data type impact of similarityen_US
dc.status.refereedYesen_US
dc.date.application2018-08-11
dc.typeConference paperen_US
dc.type.versionAccepted Manuscripten_US
refterms.dateFOA2019-01-22T15:02:57Z


Item file(s)

Thumbnail
Name:
Neagu_ACIS_Chapter.pdf
Size:
574.6Kb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record