Bioinformatics Seminars

Bioinformatics Seminar

Time: 11AM
Venue: Hybrid

23 May 2023

Extending edgeR for small counts and large sample sizes

Lizhong Chen
WEHI Bioinformatics

edgeR is an R package for analyzing sequence read count data from genomic sequencing technologies such as RNA-seq, ChIP-seq and ATAC-seq, using negative binomial generalized linear models. The quasi-likelihood differential expression pipeline of edgeR has been recommended for the past decade or so because it provides the most rigorous FDR control. Quasi-likelihood however relies on a classic chisquare approximation for the generalized linear model deviance, which does not hold well when there is a preponderance of very small counts for particular genes and treatment groups leading to small fitted values. In this talk I develop an adjusted deviance that follows an accurate chisquare approximation even when the counts are small last year. I show that quasi-dispersion estimation is improved using the adjusted deviance, providing a better estimate of the mean-variance relation and a new edgeR quasi-likelihood pipeline. The new pipeline agrees with the current edgeR approach for bulk RNA-seq datasets when low counts are filtered, but it proves very accurate for a much wider range of datasets, including small count datasets without low count filtering, highly over-dispersed count datasets such as mini-bulk RNA-seq and transcript count datasets. The new method has the potential to analyze any sequencing data with small counts including single-cell RNA-seq.

Search past seminars