Bioinformatics Seminar
Time: 11AM
Venue: Davis Auditorium and Online
16 September 2025
From Technical Artefacts to 'Real Biology': A Long-Read Transcriptomic Data Analysis Story
Rotem AharonPeter Maccallum cancer centre
Long read transcriptomics data enables estimation of gene and isoform level expression with higher accuracy and confidence than ever before. Further, Oxford Nanopore Technologies (ONT) allow the direct sequencing of RNA without Reverse Transcription (RT), and also the sequencing of cDNA without PCR amplification. Previously, RNA sequencing required both Reverse Transcription (RT), and PCR amplification. PCR amplification is known to induce biases associated with expression levels, transcript lengths and GC content. Biases introduced by the RT step are not as well described or quantified. In order to gain further insights into how different aspects of RNA preparation bias the data, we compared ONT long read RNA sequencing of direct RNA, direct cDNA and PCR amplified cDNA using five cell-line data sets from SG-NEx. Across all the cell-lines we identified hundreds of Differentially Expressed (DE) genes and isoforms, indicating a significant change in the distribution of counts between preparation protocols. Furthermore, Differential Transcript Usage (DTU) was identified between protocols in hundreds of genes, indicating significant changes in the measured isoform proportions. Next, we focused on genes significant for DTU between dRNA and cDNA samples with a view to understand biases that may occur in the structure of transcripts rather than just measured expression. We devised an approach to identify and characterise the structural differences between pairs of isoforms (such as skipped exon, intron retention, extended exon etc.). When we compared the distribution of structural differences identified between switching isoforms to that found between pairs of isoforms selected at random from non-significant DTU genes (background), we observed clear differences in isoform features. Some of these differences occur at the transcript ends and could be associated with RNA fragmentation. However within the transcript body, the isoform switching pairs contain significantly more inserted introns, and less skipped exons in cDNA compared with the background pairs. This highlights a consistent pattern found across all the cell-lines tested, that could indicate the introduction of biases that are prevalent within the transcript body, possibly due to the Reverse Transcription step. In experimental settings, these approaches can improve the interpretation of potential functional differences between transcripts in DTU genes. Detected structural differences can be aggregated and compared between experimental conditions to help identify specific mechanisms involved in isoform switching.