Supplementary Information:
Why weight? Modelling sample and observational level variability improves power in RNA-seq analyses

Ruijie Liu¹, Aliaksei Z. Holik^2,4, Shian Su¹, Natasha Jansz^1,4, Kelan Chen^1,4, Huei San Leong^1,4,
Marnie E. Blewitt^1,4, Marie-Liesse Asselin-Labat^2,4, Gordon K. Smyth^3,5 and Matthew E. Ritchie^1,4,5

1. Molecular Medicine Division,
2. Stem Cells and Cancer Division,
3. Bioinformatics Division, The Walter and Eliza Hall Institute of Medical Research.
4. Department of Medical Biology,
5. School of Mathematics and Statistics, The University of Melbourne.

Software

The voomWithQualityWeights function which implements the method described in this paper is available in the limma package

Simulation R code and Additional Figures

The code to simulate RNA-seq data with outlier samples and various true fold-changes and generate the various figures is provided below.

Note that running these simulations may take a few days (data for more than 500 million genes across 51,100 experiments were simulated!)

2 groups, 3 samples per group, 1 more variable sample: [R code] [Supp Figures] (merge of Figure 5 and 6 wih boxplots) [FDR < 0.05] (Figure 6 using with a FDR cut-off of 0.05) [Figure 6 with boxplots to show distributions of simulation results rather than averages]

2 groups, 4 samples per group, 1 more variable sample: [R code] [Supp Figures]

2 groups, 5 samples per group, 1 more variable sample: [R code] [Supp Figures]

2 groups, 4 samples in group 1, 3 samples in group 2, 1 more variable sample per group: [R code] [Supp Figures]

[R code to make fold-change and MDS plots] (Figures 3 and 4 in paper)

[R code to make results Figures from paper]

[R code to make Supp Figures] (from simulation settings not shown in the main paper)

[Counts used to start simulations]

Control Experiment

The full data set is available as GEO series GSE64098.

The summarized counts are available as an R object from here.

The code to analyse this data is available from here and a pdf of the output can be found here. The pdf was compiled using the knitr package

Smchd1 Experiment

The full data set is available as GEO series GSE64099.

The summarised counts are available as an R object from here and the list of imprinted genes from here.