Bioinformatics Seminars

Current Bioinformatics Seminar

Time: 11AM Tuesdays.
Venue: Davis Auditorium and Online

3 October 2023

From peptides to proteins: missingness-informed protein quantification in bottom-up proteomics

Mengbo Li
WEHI Bioinformatics

Mass spectrometry (MS) based proteomics is a powerful tool in biomedical research, but its usefulness is limited by the frequent occurrence of missing values. We argue that missing values should always be viewed as missing not at random (MNAR) in MS-based proteomics data, because the probability of detection is related to the underlying intensity. We propose a statistical model for non-ignorable missing values in proteomics data, termed the detection probability curve (DPC). The DPC model demonstrates that missing values are informative and quantifies the bias of missing intensities as compared to those that are observed. We also discuss the DPC model when it is applied on single cell proteomics data, where over 80% observations can be missing. Importantly, the DPC model provides a probabilistic model on the missing values and can be used to inform the downstream differential expression analysis. To this end, we introduce the DPC-quantification model, where missing values are taken into account when peptides are summarized into proteins. For each protein (or protein group), we use DPC to represent missing values. An additive linear model is then fitted to estimate the protein-level intensity in each sample by maximizing the posterior distribution with empirical priors. Uncertainty in protein-level estimations is incorporated into differential expression testing via a customized limma analysis analogous to voom for RNA-seq. The proposed methods are tested and evaluated on an in-house calibration dataset generated by myself with our collaborators. We show that the DPC-quantification model eliminates missing values in protein-level data and improves statistical power for differential expression in proteome-wide experiments while maintaining correct control of the false discovery rate.

Search past seminars