Loading...
Thumbnail Image
Publication

Prediction of Large for Gestational Age Infants in Ethnically Diverse Datasets Using Machine Learning Techniques. Development of 3rd Trimester Machine Learning Prediction Models and Identification of Important Features Using Dimensionality Reduction Techniques

Sabouni, Sumaia
Publication Date
2023
End of Embargo
Rights
Creative Commons License
The University of Bradford theses are licenced under a Creative Commons Licence.
Peer-Reviewed
Open Access status
Accepted for publication
Institution
University of Bradford
Department
School of Chemistry & Biosciences. Faculty of Life Sciences
Awarded
2023
Embargo end date
Collections
Additional title
Abstract
Background: Large-for-gestational-age (LGA) is a common pregnancy complication, associated with high maternal BMI and diabetes. Despite its gravity, standard prediction methods, such as ultrasounds are inaccurate. Objective: application of machine learning methods to develop LGA prediction models for ethnically diverse datasets and provide a benchmark for future LGA prediction work. Methods: Two retrospective datasets were used: Born In Bradford (BiB) and NHS, each including a large percentage of women of South Asian ethnicity. After appropriate data preparation, LGA classification models were developed, and imbalanced learning strategies were applied. Additionally, using data reduction, important features within the datasets were reported. Results: Baseline BiB models achieved 9% sensitivity, 56% precision, and 26% F0.5, BiB-GDM (containing only GDM women) models achieved 41% sensitivity, 60% precision, and 55% F0.5. Applying random undersampling increased sensitivity scores to 72% for BiB models and 80% for BiB-GDM. Cost sensitive learning methods achieved 36% F0.5 score for BiB models and 57% for BiB-GDM models. Threshold tuning improved the F0.5 scores models to 47% and 66 % in BiB and BiB-GDM, respectively. Using data reduction, important features were the minimum and maximum Abdominal Circumference (AC) and Estimated Foetal Weight (EFW) ultrasound measurements in BiB-GDM dataset. While they were the mean and proportion of high Blood Glucose (BG) measurements in the NHS dataset. Conclusions: machine learning algorithms are not superior to Logistic Regression in LGA predictive performance. Threshold tuning was an appropriate method for handling data imbalance and maximising F0.5 scores. Finally, ultrasound measurements and BG self-monitoring data were important features in LGA prediction.
Version
Citation
Link to publisher’s version
Link to published version
Link to Version of Record
Type
Thesis
Qualification name
PhD
Notes

Version History

Now showing 1 - 2 of 2
VersionDateSummary
2*
2025-04-08 09:25:41
Edited author entries
2024-04-24 14:33:42
* Selected version