Contributions to Ensembles of Models for Predictive Toxicology Applications. On the Representation, Comparison and Combination of Models in Ensembles.

Makhtar, Mokhairi

This is not the latest version of this item. The latest version can be found here.

Publication

Contributions to Ensembles of Models for Predictive Toxicology Applications. On the Representation, Comparison and Combination of Models in Ensembles.

Makhtar, Mokhairi

Publication Date

2012-10-18

Supervisor

Neagu, Daniel
Ridley, Mick J.

Keywords

Predictive toxicology, Model representation, Model comparison, Ensembles of models, Classifiers

Rights

The University of Bradford theses are licenced under a Creative Commons Licence.

Institution

University of Bradford

Department

School of Computing, Informatics and Media

Awarded

2012

Collections

Theses

Show all metadata

Files

MokhairiPhDThesisFinal.pdf

Adobe PDF, 1.12 MB

Abstract

The increasing variety of data mining tools offers a large palette of types and representation formats for predictive models. Managing the models then becomes a big challenge, as well as reusing the models and keeping the consistency of model and data repositories. Sustainable access and quality assessment of these models become limited to researchers. The approach for the Data and Model Governance (DMG) makes easier to process and support complex solutions. In this thesis, contributions are proposed towards ensembles of models with a focus on model representation, comparison and usage. Predictive Toxicology was chosen as an application field to demonstrate the proposed approach to represent predictive models linked to data for DMG. Further analysing methods such as predictive models comparison and predictive models combination for reusing the models from a collection of models were studied. Thus in this thesis, an original structure of the pool of models was proposed to represent predictive toxicology models called Predictive Toxicology Markup Language (PTML). PTML offers a representation scheme for predictive toxicology data and models generated by data mining tools. In this research, the proposed representation offers possibilities to compare models and select the relevant models based on different performance measures using proposed similarity measuring techniques. The relevant models were selected using a proposed cost function which is a composite of performance measures such as Accuracy (Acc), False Negative Rate (FNR) and False Positive Rate (FPR). The cost function will ensure that only quality models be selected as the candidate models for an ensemble. The proposed algorithm for optimisation and combination of Acc, FNR and FPR of ensemble models using double fault measure as the diversity measure improves Acc between 0.01 to 0.30 for all toxicology data sets compared to other ensemble methods such as Bagging, Stacking, Bayes and Boosting. The highest improvements for Acc were for data sets Bee (0.30), Oral Quail (0.13) and Daphnia (0.10). A small improvement (of about 0.01) in Acc was achieved for Dietary Quail and Trout. Important results by combining all the three performance measures are also related to reducing the distance between FNR and FPR for Bee, Daphnia, Oral Quail and Trout data sets for about 0.17 to 0.28. For Dietary Quail data set the improvement was about 0.01 though, but this data set is well known as a difficult learning exercise. For five UCI data sets tested, similar results were achieved with Acc improvement between 0.10 to 0.11, closing more the gaps between FNR and FPR. As a conclusion, the results show that by combining performance measures (Acc, FNR and FPR), as proposed within this thesis, the Acc increased and the distance between FNR and FPR decreased.

URI

http://hdl.handle.net/10454/5478

Type

Thesis

Qualification name

PhD

Version History

You are currently viewing version 1 of the item.

Now showing 1 - 2 of 2

Version	Date	Summary
2	2025-04-09 09:09:43	Edited author entries
1*	2012-10-18 16:31:06

* Selected version

Contributions to Ensembles of Models for Predictive Toxicology Applications. On the Representation, Comparison and Combination of Models in Ensembles.

Makhtar, Mokhairi

Publication Date

End of Embargo

Supervisor

Keywords

Rights

Peer-Reviewed

Open Access status

Accepted for publication

Institution

Department

Awarded

Embargo end date

Collections

Files

Additional title

Abstract

URI

Version

Citation

Link to publisher’s version

Link to published version

Link to Version of Record

Type

Qualification name

Notes

Version History