Show simple item record

dc.contributor.advisorNeagu, Daniel
dc.contributor.advisorRidley, Mick J.
dc.contributor.authorMakhtar, Mokhairi
dc.date.accessioned2012-10-18T16:31:06Z
dc.date.available2012-10-18T16:31:06Z
dc.date.issued2012-10-18
dc.identifier.urihttp://hdl.handle.net/10454/5478
dc.description.abstractThe increasing variety of data mining tools offers a large palette of types and representation formats for predictive models. Managing the models then becomes a big challenge, as well as reusing the models and keeping the consistency of model and data repositories. Sustainable access and quality assessment of these models become limited to researchers. The approach for the Data and Model Governance (DMG) makes easier to process and support complex solutions. In this thesis, contributions are proposed towards ensembles of models with a focus on model representation, comparison and usage. Predictive Toxicology was chosen as an application field to demonstrate the proposed approach to represent predictive models linked to data for DMG. Further analysing methods such as predictive models comparison and predictive models combination for reusing the models from a collection of models were studied. Thus in this thesis, an original structure of the pool of models was proposed to represent predictive toxicology models called Predictive Toxicology Markup Language (PTML). PTML offers a representation scheme for predictive toxicology data and models generated by data mining tools. In this research, the proposed representation offers possibilities to compare models and select the relevant models based on different performance measures using proposed similarity measuring techniques. The relevant models were selected using a proposed cost function which is a composite of performance measures such as Accuracy (Acc), False Negative Rate (FNR) and False Positive Rate (FPR). The cost function will ensure that only quality models be selected as the candidate models for an ensemble. The proposed algorithm for optimisation and combination of Acc, FNR and FPR of ensemble models using double fault measure as the diversity measure improves Acc between 0.01 to 0.30 for all toxicology data sets compared to other ensemble methods such as Bagging, Stacking, Bayes and Boosting. The highest improvements for Acc were for data sets Bee (0.30), Oral Quail (0.13) and Daphnia (0.10). A small improvement (of about 0.01) in Acc was achieved for Dietary Quail and Trout. Important results by combining all the three performance measures are also related to reducing the distance between FNR and FPR for Bee, Daphnia, Oral Quail and Trout data sets for about 0.17 to 0.28. For Dietary Quail data set the improvement was about 0.01 though, but this data set is well known as a difficult learning exercise. For five UCI data sets tested, similar results were achieved with Acc improvement between 0.10 to 0.11, closing more the gaps between FNR and FPR. As a conclusion, the results show that by combining performance measures (Acc, FNR and FPR), as proposed within this thesis, the Acc increased and the distance between FNR and FPR decreased.en_US
dc.language.isoenen_US
dc.rights<a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The University of Bradford theses are licenced under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/">Creative Commons Licence</a>.eng
dc.subjectPredictive toxicologyen_US
dc.subjectModel representationen_US
dc.subjectModel comparisonen_US
dc.subjectEnsembles of modelsen_US
dc.subjectClassifiersen_US
dc.titleContributions to Ensembles of Models for Predictive Toxicology Applications. On the Representation, Comparison and Combination of Models in Ensembles.en_US
dc.type.qualificationleveldoctoralen_US
dc.publisher.institutionUniversity of Bradfordeng
dc.publisher.departmentSchool of Computing, Informatics and Mediaen_US
dc.typeThesiseng
dc.type.qualificationnamePhDen_US
dc.date.awarded2012
refterms.dateFOA2018-07-19T11:05:42Z


Item file(s)

Thumbnail
Name:
MokhairiPhDThesisFinal.pdf
Size:
1.117Mb
Format:
PDF

This item appears in the following Collection(s)

Show simple item record