Show simple item record

dc.contributor.authorMicic, Natasha*
dc.contributor.authorNeagu, Daniel*
dc.contributor.authorTorgunov, Denis*
dc.contributor.authorCampean, Felician*
dc.date.accessioned2018-08-28T16:12:34Z
dc.date.available2018-08-28T16:12:34Z
dc.date.issued2018-06-28
dc.identifier.citationMicic N, Neagu D, Torgunov D and Campean F (2018) Exploring Methods for Comparing Similarity of Dimensionally Inconsistent Multivariate Numerical Data. [Proceedings of the] 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems. IEEE. ISBN: 978-1-5386-6614-2. pp. 1530-1537.en_US
dc.identifier.urihttp://hdl.handle.net/10454/16556
dc.descriptionnoen_US
dc.description.abstractWhen developing multivariate data classification and clustering methodologies for data mining, it is clear that most literature contributions only really consider data that contain consistently the same attributes. There are however many cases in current big data analytics applications where for same topic and even same source data sets there are differing attributes being measured, for a multitude of reasons (whether the specific design of an experiment or poor data quality and consistency). We define this class of data a dimensionally inconsistent multivariate data, a topic that can be considered a subclass of the Big Data Variety research. This paper explores some classification methodologies commonly used in multivariate classification and clustering tasks and considers how these traditional methodologies could be adapted to compare dimensionally inconsistent data sets. The study focuses on adapting two similarity measures: Robinson-Foulds tree distance metrics and Variation of Information; for comparing clustering of hierarchical cluster algorithms (such clusters are derived from the raw multivariate data). The results from experiments on engineering data highlight that adapting pairwise measures to exclude non-common attributes from the traditional distance metrics may not be the best method of classification. We suggest that more specialised metrics of similarity are required to address challenges presented by dimensionally inconsistent multivariate data, with specific applications for big engineering data analytics.en_US
dc.description.sponsorshipJaguar Land-Roveren_US
dc.language.isoenen_US
dc.relation.isreferencedbyhttps://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8605812
dc.subjectBig dataen_US
dc.subjectClusteringen_US
dc.subjectHeterogeneous data setsen_US
dc.subjectClassification methodologiesen_US
dc.subjectInconsistent multivariate dataen_US
dc.titleExploring Methods for Comparing Similarity of Dimensionally Inconsistent Multivariate Numerical Dataen_US
dc.status.refereedYesen_US
dc.typeConference paperen_US
dc.type.versionNo full-text in the repositoryen_US
refterms.dateFOA2018-08-28T16:12:48Z


Item file(s)

Thumbnail
Name:
EDMA_Micic_June18.pdf
Size:
4.625Mb
Format:
PDF
Description:
Keep suppressed - published version

This item appears in the following Collection(s)

Show simple item record