This web page provides the read count tables and sample profiles for two public data sets: the Pickrell data set [1] and the Montgomery data set [2], both published in Nature in 2010.

Data sets

The two data sets were downloaded from public data repositories: the Pickrell data set was downloaded from the NCBI website, and the Montgomery data set was downloaded from DNAnexus.com.

Both two data sets contain RNA-Seq data. There are 161 libraries in the Pickrell data set derived from 69 samples; each sample has two or three replicates. The Montgomery data set has 60 libraries, each from a different sample. We used only two replicates for each sample in the Pickrell data set if the sample has more than two replicates.

Read alignment and summarization

We used our own tools, including Subread [3] and featureCounts [4], to perform Read alignment and summarization. The Montgomery data set contains paired-end reads, but we only used the first FASTQ file in every pair of files.

The reference genome used in read alignment was GRCh37/hg19 human genome, while the annotations used in read summarization were NCBI RefSeq Build 37.2. The aligner and the read summarization tool were from the subread-1.3.5-p5 package. The subread package can be downloaded from here. The reads that can be mapped to multiple locations with the same confidence level were excluded from alignment, and the Hamming distances between the reads and the reference genome were used to determine the best mapping locations.

Download the tables

  • Click here to download the read count table for the Montgomery data set. (1.5 Mb)
  • Click here to download the read count table for the Pickrell data set. (3.2 Mb)
  • Click here to download the sample profile table. (7 Kb)

References

  1. Pickrell JK, Marioni JC, Pai AA, Degner JF, Engelhardt BE, Nkadori E, Veyrieras JB, Stephens M, Gilad Y, Pritchard JK. Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature. 2010 Apr 1;464(7289):768-72 [PMID: 20220758]
  2. Montgomery SB, Sammeth M, Gutierrez-Arcelus M, Lach RP, Ingle C, Nisbett J, Guigo R, Dermitzakis ET. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature. 2010 Apr 1;464(7289):773-7 [PMID:20220756]
  3. Liao Y, Smyth GK and Shi W. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Research, 41(10):e108, 2013 [PMID:23558742]
  4. Liao Y, Smyth GK and Shi W. featureCounts: an efficient general-purpose program for assigning sequence reads to genomic features. Bioinformatics, 2013 Nov 30. doi:10.1093/bioinformatics/btt656. Advance Access [PMID:24227677]