BRADFORD SCHOLARS

    • Sign in
    View Item 
    •   Bradford Scholars
    • Engineering and Informatics
    • Engineering and Informatics Publications
    • View Item
    •   Bradford Scholars
    • Engineering and Informatics
    • Engineering and Informatics Publications
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of Bradford ScholarsCommunitiesAuthorsTitlesSubjectsPublication DateThis CollectionAuthorsTitlesSubjectsPublication Date

    My Account

    Sign in

    HELP

    Bradford Scholars FAQsCopyright Fact SheetPolicies Fact SheetDeposit Terms and ConditionsDigital Preservation Policy

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Exploring Methods for Comparing Similarity of Dimensionally Inconsistent Multivariate Numerical Data

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Publication date
    2018-06-28
    Author
    Micic, Natasha
    Neagu, Daniel
    Torgunov, Denis
    Campean, I. Felician
    Keyword
    Big data
    Clustering
    Heterogeneous data sets
    Classification methodologies
    Inconsistent multivariate data
    Peer-Reviewed
    Yes
    
    Metadata
    Show full item record
    Abstract
    When developing multivariate data classification and clustering methodologies for data mining, it is clear that most literature contributions only really consider data that contain consistently the same attributes. There are however many cases in current big data analytics applications where for same topic and even same source data sets there are differing attributes being measured, for a multitude of reasons (whether the specific design of an experiment or poor data quality and consistency). We define this class of data a dimensionally inconsistent multivariate data, a topic that can be considered a subclass of the Big Data Variety research. This paper explores some classification methodologies commonly used in multivariate classification and clustering tasks and considers how these traditional methodologies could be adapted to compare dimensionally inconsistent data sets. The study focuses on adapting two similarity measures: Robinson-Foulds tree distance metrics and Variation of Information; for comparing clustering of hierarchical cluster algorithms (such clusters are derived from the raw multivariate data). The results from experiments on engineering data highlight that adapting pairwise measures to exclude non-common attributes from the traditional distance metrics may not be the best method of classification. We suggest that more specialised metrics of similarity are required to address challenges presented by dimensionally inconsistent multivariate data, with specific applications for big engineering data analytics.
    URI
    http://hdl.handle.net/10454/16556
    Version
    No full-text in the repository
    Citation
    Micic N, Neagu D, Torgunov D and Campean F (2018) Exploring Methods for Comparing Similarity of Dimensionally Inconsistent Multivariate Numerical Data. [Proceedings of the] 2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems. IEEE. ISBN: 978-1-5386-6614-2. pp. 1530-1537.
    Link to publisher’s version
    https://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=8605812
    Type
    Conference paper
    Collections
    Engineering and Informatics Publications

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Quick Guide | Contact Us
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.