Supplementary Information for

Differential expression analysis of complex RNA-seq experiments using edgeR

Yunshun Chen, Aaron T.L. Lun and Gordon K. Smyth

Statistical Analysis of Next Generation Sequence Data, Springer, New York, pages 51-74
Preprint 31 January 2014

Summarized counts for the RNA-seq data

This page provides the summarized count data used for the case study in the book chapter by Chen et al (Preprint 31 January 2014). The data used in the book chapter are from a study on the transcription factor IRF4 by Man et al (PubMed). The raw sequence reads are available either in SRA format from the Gene Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo) as series GSE49929 or in FastQ format from the European Nucleotide Archive (ENA) (http://www.ebi.ac.uk/ena) as series SRP028864. There are a total of 11 samples, the first 9 of which are used in the analysis.

The study can be viewed as a 2x2 factorial experiment with 2-3 replicates for each combination of IRF4 and affinity peptide conditions. The target information is provided below:

An Illumina HiSeq 2000 was used to create a FastQ file of 100bp paired-end sequence reads for each sample. The library size for each sample varied from 4.7 to 7.3 million. To obtain gene-level counts, fragments were mapped to the mm10 mouse genome using the Subread aligner, and fragment counts were summarized by Entrez Gene ID using the featureCounts function of the Bioconductor package Rsubread:

 


Comments/Questions? Contact Yunshun Chen.
Last modified: 3 July 2016