Bioinformatics Seminars

Bioinformatics Seminar

Time: 11AM
Venue: Davis Auditorium and Teams

23 August 2022

edgeR quasi-likelihood: correcting the deviance for bias

Lizhong Chen
WEHI Bioinformatics

edgeR is an R package for analyzing sequence read count data from genomic sequencing technologies such as RNA-seq, ChIP-seq and ATAC-seq, using negative binomial generalized linear models. The quasi-likelihood differential expression pipeline of edgeR is recommended because it provides the most rigorous FDR control. The quasi-likelihood pipeline has trouble however when there is a preponderance of very small counts for particular genes and treatment groups leading to small fitted values. In this talk, I show that the generalized linear model deviance underestimates the true variability in the data when the counts are small. I will develop an adjusted deviance that follows an accurate chisquare approximation even when the counts are small. The adjusted deviance leads to a new edgeR quasi-likelihood pipeline. The new pipeline agrees with the current edgeR approach for bulk RNA-seq datasets when low counts are filtered, but it proves very accurate for a much wider range of datasets, including small count datasets, even without low count filtering. The new method has the potential to analyze any sequencing data with small counts including ChIP-seq and single-cell RNA-seq.

Search past seminars