Bioinformatics Seminar

Time: 11AM
Venue: Davis Auditorium and Online

27 February 2024

Benchmarking long-read & spatial transcriptomic technologies and analysis tools

Matt Ritchie
WEHI Epigenetics and Development

This talk will cover our efforts to generate and analyse custom benchmarking datasets for long-read RNA-seq (LongBench) and spatial transcriptomics (SpatialBench).In both areas, a lack of benchmark datasets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection, differential expression and spatial analysis workflows. Our LongBench experiment profiled two human lung adenocarcinoma cell lines in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that, StringTie2 and bambu outperformed other tools from the 6 isoform detection tools tested, DESeq2, edgeR and limma-voom were best amongst the 5 differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the 5 tools compared, which suggests further methods development is needed for this application. Our SpatialBench experiment was created to compare the performance of different Visium spatial transcriptomic platforms offered by 10x Genomics. This reference dataset profiled mouse spleen tissue responding to malaria infection spanning several tissue preparation protocols (both fresh frozen and FFPE samples with and without CytAssist tissue placement) and included replicates of each. We use these data to benchmark the performance of different sample handling approaches after pre-processing and carry out downstream analysis to explore spatial gene expression patterns and cell type identification for the best performing method. We explore the outcomes of clustering, cell deconvolution using matched single-cell RNA-seq data, and make use of the replicate samples available to conduct multi-sample analyses to recover expected immune cell types and biological changes between experimental groups.These datasets and analyses are publicly available, and we hope they will be of broader benefit to researchers interested in benchmarking and methods development.

The Walter and Eliza Hall Institute of Medical Research