Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology
View/ Open
Journal_of_Soft_Computing.pdf (1.632Mb)
Download
Publication date
2016-08Keyword
Big data in toxicologyComputational toxicology
Classification
Vehicle-toxicity modelling
Area under the curve
Decision tree
Random forest
Data mining
Rights
© 2015 The authors. Published by Springer. Reproduced in accordance with the publisher's self-archiving policy.Peer-Reviewed
YesOpen Access status
openAccess
Metadata
Show full item recordAbstract
Drug vehicles are chemical carriers that provide beneficial aid to the drugs they bear. Taking advantage of their favourable properties can potentially allow the safer use of drugs that are considered highly toxic. A means for vehicle selection without experimental trial would therefore be of benefit in saving time and money for the industry. Although machine learning is increasingly used in predictive toxicology, to our knowledge there is no reported work in using machine learning techniques to model drug-vehicle relationships for vehicle selection to minimise toxicity. In this paper we demonstrate the use of data mining and machine learning techniques to process, extract and build models based on classifiers (decision trees and random forests) that allow us to predict which vehicle would be most suited to reduce a drug’s toxicity. Using data acquired from the National Institute of Health’s (NIH) Developmental Therapeutics Program (DTP) we propose a methodology using an area under a curve (AUC) approach that allows us to distinguish which vehicle provides the best toxicity profile for a drug and build classification models based on this knowledge. Our results show that we can achieve prediction accuracies of 80 % using random forest models whilst the decision tree models produce accuracies in the 70 % region. We consider our methodology widely applicable within the scientific domain and beyond for comprehensively building classification models for the comparison of functional relationships between two variables.Version
Published versionCitation
Mistry P, Neagu D, Trundle PR and Vessey JD (2016) Using random forest and decision tree models for a new vehicle prediction approach in computational toxicology. Journal of Soft Computing. 20(8): 2967-2979.Link to Version of Record
https://doi.org/10.1007/s00500-015-1925-9ae974a485f413a2113503eed53cd6c53
https://doi.org/10.1007/s00500-015-1925-9