Bioinformatics Seminar

Time: 11AM
Venue: Davis Auditorium and Teams

4 October 2022

Linear models and empirical Bayes methods for proteome-wide label-free quantification and differential expression in mass spectrometry-based proteomics experiments

Mengbo Li
WEHI Bioinformatics

Mass spectrometry-based proteomics is a powerful tool in biomedical research, but its usefulness is limited by the frequent occurrence of missing values in peptides that cannot be reliably quantified. Many analysis strategies have been proposed for missing values where the discussion often focuses on distinguishing whether values are missing completely at random (MCAR), missing at random (MAR) or missing not at random (MNAR). We argue that missing values should always be viewed as MNAR in label-free proteomics because physical missing value mechanisms cannot be identified for individual points, and because the probability of detection is related to the underlying intensity. We propose a statistical method for estimating the detection probability curve as a function of the underlying intensity, whether observed or not. The model demonstrates that missing values are informative and quantifies the bias of missing intensities as compared to those that are observed. The distribution of missing intensities is estimated from the observed values on the peptide level, following which a new protein-level quantification method by linear models is introduced. The empirical Bayes method in limma is also revised to account for uncertainty caused by imputed values. Performances of the proposed pipeline are evaluated on real proteomics data sets with the mixture design, where two distinct samples are mixed in known proportions. We show that the proposed method eliminates missing values in protein-level quantification and improves the statistical power for differential expression in proteome-wide experiments

The Walter and Eliza Hall Institute of Medical Research