Integrated data analytics of germline mutation classes in human cancers. An integrated bioinformatics analysis to investigate associations between germline mutation classes and human cancers.
|Tobin, Desmond J.
|Al-Shammari, Mohamad H.
|Biological and environmental factors contribute collectively to the development of human cancers. The primary focus of this research project was to investigate the impact of germline gene mutations, as a significant biological factor, on 29 major primary human cancers. For this I obtained data from multiple databases, including the Genetic Association Database (GAD), Sanger database (COSMIC), HGMD database, OMIM data and PubMed literature. Using the Extraction Transform and Load (ETL) process, 424 genes were obtained with 8,879 cancer mutation records. By integrating these gene mutation records a Human Cancer Map (HCM) was constructed, from which several sub-maps were derived based on particular mutation classes. Furthermore, a Protein-Protein Interaction Map (PPIM) was constructed based on the encoded proteins of the 424 gene set. Several key questions were addressed using the HCM and its sub-maps including the following: (i) Are individual groups of primary cancers associated with specific subset of genes (within the 424 full set)? (ii) Are groups of primary cancers associated with particular mutation classes? (iii) If both questions prove to be true, are groups of cancers associated with particular mutation class of target genes? This project also explored whether a corresponding Protein-Protein Interaction Map, derived from the Missense/Non-sense Mutation portion of the HCM gene set, would provide further information on gene associations between primary cancers in terms of the consequent identical amino acid changes involved. Results showed that: (1) closely-connected human cancers in the HCM exhibited a strong association with a particular mutation class; (2) Missense /Nonsense and Regulatory mutations played a central role in connecting cancers (i.e. via primary nodes) and so significantly influenced the construction of the HCM; (3) Genes with Missense/Nonsense and Regulatory mutations tended to be involved in cancer-associated pathways; (4) Using the kappa test to measure the extent of agreement between two connected primary cancers in the sub-HCMs, BRCA1, BRCA2, PALB2, MSH2, MSH6, MLH1, CDKN2A, and TP53 showed highest agreement for 5 of 10 mutation classes; (5) From the PIPM, it was evident that BRCA1, MSH6, BARD1, TP53, MSH2 and CHEK2 proteins best connected Breast, Ovarian, Prostate and Bowel primary cancers, and so the latter could represent ¿driver proteins¿ for these cancers. In summary, this project has approached the analysis of gene involvement in human primary cancers from the starting position of the mutation class that harbours the specific gene mutation. Together with their downstream resultant alterations in the associated proteins, this analysis can provide insights into the relatedness of primary human cancers and their potential gene hierarchies. These data may therefore help us to understand more fully the etiology, diagnosis and potentially personalized treatments for cancer.
|<a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/"><img alt="Creative Commons License" style="border-width:0" src="http://i.creativecommons.org/l/by-nc-nd/3.0/88x31.png" /></a><br />The University of Bradford theses are licenced under a <a rel="license" href="http://creativecommons.org/licenses/by-nc-nd/3.0/">Creative Commons Licence</a>.
|Integrated data analytics of germline mutation classes in human cancers. An integrated bioinformatics analysis to investigate associations between germline mutation classes and human cancers.
|University of Bradford
|School of Computing, Informatics and Media