Bioinformatics Seminars

Bioinformatics Seminar

Time: 10:45am Tuesdays.
Venue:
Level 7 Seminar Room 2, WEHI1

9 April 2019

Actual Venue : Davis Auditorium

Genome-wide protein structure prediction for functional annotation of a neglected human pathogen

Brendan Ansell
WEHI Population Health & Immunity

Protein structure prediction can be a useful tool for understanding the biology of genetically divergent organisms, especially human pathogens. Protein structure is conserved over sequence. Comparing the predicted structure of a protein of unknown function to similar solved crystal structures in the Protein DataBank, can therefore assist with pathogen genome annotation. The quality of a single predicted structure can be determined by manual inspection using structure viewing software and multiple sequence alignment. Now that whole-genome structural proteome prediction has become computationally feasible, how do we assign confidence in the quality of thousands of predicted structures at once?

In this project we used I-TASSER software to predict the structure of 5000 proteins encoded in the human parasite Giardia. We used discrete protein sequence annotations (Pfam codes) assigned to peptides encoding predicted structures, and their closest empirically-determined homologues in the PDB, to bin the predicted structures into a high-confidence (matching IPR code) category, or lower-confidence category (i.e., no matching IPR codes). Continuous metrics output by I-TASSER were used to construct a random forest model that predicted the high-confidence category, yeilding structural insight into ~1000 proteins including enzymes important for antibiotic drug metabolism and redox maintenance. The classifier also produced a second tier of predicted structures that have features of the high-confidence structures, but lack matching PFAM domains with their closest crystal structure homologue (i.e., false-positives).

High-confidence models exhibited greater transcriptional abundance, and the classifier generalized across species, indicating the broad utility of this approach for automatically stratifying predicted structures. This work provides a method for assigning confidence in predicted protein structures en masse in a software-agnostic manner, and can be used to help prioritise limited resources for follow-up wet-lab experiments.


Search past seminars