Bioinformatics Seminars

Bioinformatics Seminar

Time:
Venue: Na

15 October 2019

Na

Read trimming is not required for read alignment in an RNA-seq gene expression analysis

Wei Shi
WEHI Bioinformatics

RNA sequencing (RNA-seq) is currently the standard method for performing genome-wide profiling of gene expression. RNA-seq reads need to be mapped to a reference genome before expression of genes can be quantified. Read trimming tools such as Trimmomatic and TrimGalore have been developed to remove adapter sequences and low-quality bases from reads. It is also known that read aligners perform soft-clipping during their read mapping process to remove read bases that cannot be mapped along with the majority of bases in a read ; however it is unknown how concordant or different between soft-clipping and read trimming at the base level. Furthermore ; it is unclear how read trimming affects the accuracy of gene expression quantification in RNA-seq data.

We used a benchmark SEQC RNA-seq dataset and also expression data of >900 RT-PCR validated genes to investigate the difference between soft-clipping and read trimming and also the impact of read trimming on gene expression quantification. We found that the Subread aligner effectively removed adapter sequences from the reads and also successfully rescued many low-quality bases that were discarded by read trimming tools. The quantification result from untrimmed reads were found to have a comparable or slightly better accuracy than that from using trimmed reads. Also ; the total quantification time was found to increase by up to an order of magnitude when applying read trimming. Our evaluation results suggest that the soft-clipping performed by a read aligner ; which is based on sequence matching between reads and reference sequences ; is more effective than the trimming performed by standalone read trimming tools that use provided adapter sequences and sequencing quality scores for trimming.;;;


Search past seminars