Bioinformatics Seminar

Time:
Venue: Na

9 April 2019

Genome-wide protein structure prediction for functional annotation of a neglected human pathogen

Brendan Ansell
WEHI Population Health & Immunity

Protein structure prediction can be a useful tool for understanding the biology of genetically divergent organisms ; especially human pathogens. Protein structure is conserved over sequence. Comparing the predicted structure of a protein of unknown function to similar solved crystal structures in the Protein DataBank ; can therefore assist with pathogen genome annotation. The quality of a single predicted structure can be determined by manual inspection using structure viewing software and multiple sequence alignment. Now that whole-genome structural proteome prediction has become computationally feasible ; how do we assign confidence in the quality of thousands of predicted structures at once?

In this project we used I-TASSER software to predict the structure of 5000 proteins encoded in the human parasite Giardia. We used discrete protein sequence annotations (Pfam codes) assigned to peptides encoding predicted structures ; and their closest empirically-determined homologues in the PDB ; to bin the predicted structures into a high-confidence (matching IPR code) category ; or lower-confidence category (i.e. ; no matching IPR codes). Continuous metrics output by I-TASSER were used to construct a random forest model that predicted the high-confidence category ; yeilding structural insight into ~1000 proteins including enzymes important for antibiotic drug metabolism and redox maintenance. The classifier also produced a second tier of predicted structures that have features of the high-confidence structures ; but lack matching PFAM domains with their closest crystal structure homologue (i.e. ; false-positives).

High-confidence models exhibited greater transcriptional abundance ; and the classifier generalized across species ; indicating the broad utility of this approach for automatically stratifying predicted structures. This work provides a method for assigning confidence in predicted protein structures en masse in a software-agnostic manner ; and can be used to help prioritise limited resources for follow-up wet-lab experiments.;Actual Venue : Davis Auditorium;;

The Walter and Eliza Hall Institute of Medical Research