BRADFORD SCHOLARS

    • Sign in
    View Item 
    •   Bradford Scholars
    • University of Bradford eTheses
    • Theses
    • View Item
    •   Bradford Scholars
    • University of Bradford eTheses
    • Theses
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of Bradford ScholarsCommunitiesAuthorsTitlesSubjectsPublication DateThis CollectionAuthorsTitlesSubjectsPublication Date

    My Account

    Sign in

    HELP

    Bradford Scholars FAQsCopyright Fact SheetPolicies Fact SheetDeposit Terms and ConditionsDigital Preservation Policy

    Statistics

    Most Popular ItemsStatistics by CountryMost Popular Authors

    Contributions for Handling Big Data Heterogeneity. Using Intuitionistic Fuzzy Set Theory and Similarity Measures for Classifying Heterogeneous Data

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    View/Open
    PhD Thesis (2.657Mb)
    Download
    Publication date
    2019
    Author
    Ali, Najat
    Supervisor
    Neagu, Daniel
    Trundle, Paul R.
    Keyword
    Big data
    Heterogeneous data
    Similarity measures
    Intuitionistic fuzzy set theory
    Numeric features
    Categorical features
    Intuitionistic fuzzy features
    Classification model
    Rights
    Creative Commons License
    The University of Bradford theses are licenced under a Creative Commons Licence.
    Institution
    University of Bradford
    Department
    Department of Computer Science
    Awarded
    2019
    
    Metadata
    Show full item record
    Abstract
    A huge amount of data is generated daily by digital technologies such as social media, web logs, traffic sensors, on-line transactions, tracking data, videos, and so on. This has led to the archiving and storage of larger and larger datasets, many of which are multi-modal, or contain different types of data which contribute to the problem that is now known as “Big Data”. In the area of Big Data, volume, variety and velocity problems remain difficult to solve. The work presented in this thesis focuses on the variety aspect of Big Data. For example, data can come in various and mixed formats for the same feature(attribute) or different features and can be identified mainly by one of the following data types: real-valued, crisp and linguistic values. The increasing variety and ambiguity of such data are particularly challenging to process and to build accurate machine learning models. Therefore, data heterogeneity requires new methods of analysis and modelling techniques to enable useful information extraction and the modelling of achievable tasks. In this thesis, new approaches are proposed for handling heterogeneous Big Data. these include two techniques for filtering heterogeneous data objects are proposed. The two techniques called Two-Dimensional Similarity Space(2DSS) for data described by numeric and categorical features, and Three-Dimensional Similarity Space(3DSS) for real-valued, crisp and linguistic data are proposed for filtering such data. Both filtering techniques are used in this research to reduce the noise from the initial dataset and make the dataset more homogeneous. Furthermore, a new similarity measure based on intuitionistic fuzzy set theory is proposed. The proposed measure is used to handle the heterogeneity and ambiguity within crisp and linguistic data. In addition, new combine similarity models are proposed which allow for a comparison between the heterogeneous data objects represented by a combination of crisp and linguistic values. Diverse examples are used to illustrate and discuss the efficiency of the proposed similarity models. The thesis also presents modification of the k-Nearest Neighbour classifier, called k-Nearest Neighbour Weighted Average (k-NNWA), to classify the heterogeneous dataset described by real-valued, crisp and linguistic data. Finally, the thesis also introduces a novel classification model, called FCCM (Filter Combined Classification Model), for heterogeneous data classification. The proposed model combines the advantages of the 3DSS and k-NNWA classifier and outperforms the latter algorithm. All the proposed models and techniques have been applied to weather datasets and evaluated using accuracy, Fscore and ROC area measures. The experiments revealed that the proposed filtering techniques are an efficient approach for removing noise from heterogeneous data and improving the performance of classification models. Moreover, the experiments showed that the proposed similarity measure for intuitionistic fuzzy data is capable of handling the fuzziness of heterogeneous data and the intuitionistic fuzzy set theory offers some promise in solving some Big Data problems by handling the uncertainties, and the heterogeneity of the data.
    URI
    http://hdl.handle.net/10454/19418
    Type
    Thesis
    Qualification name
    PhD
    Collections
    Theses

    entitlement

     
    DSpace software (copyright © 2002 - 2023)  DuraSpace
    Quick Guide | Contact Us
    Open Repository is a service operated by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.