Bioinformatics Seminars

Bioinformatics Seminar

Time: 10:45am Tuesdays.
Venue:
Level 7 Seminar Room 2, WEHI1

24 April 2018

Multi-scale approaches for analyses of high-throughput sequencing data

Heejung Shim
The University of Melbourne


Identification of differences between multiple groups in molecular and cellular phenotypes measured by high-throughput sequencing assays is frequently encountered in genomics applications. For example, common problems include identifying genetic variants associated with gene expression using RNA-seq data and detecting differences in chromatin accessibility across tissues/conditions using DNase-seq or ATAC-seq data. These high-throughput sequencing data provide high-resolution measurements on how traits vary along the whole genome in each sample. However, typical analyses fail to exploit the full potential of these high-resolution measurements, instead aggregating the data at coarser resolutions, such as genes, or windows of fixed length.

In this talk, I will present two multi-scale methods that more fully exploit the high-resolution data. First, I will introduce a wavelet-based multi-scale method, WaveQTL, and demonstrate that WaveQTL has more power than simpler window-based approaches in identification of genetic variants associated with chromatin accessibility. I will also illustrate how the estimated shape of the genotype effect can help in understanding the potential mechanisms underlying the identified associations. The second part will discuss potential limitations of WaveQTL in analyses of data sets with small sample sizes or low sequencing depths. To address these issues, I will present another multi-scale approach, multiseq, that models the count nature of the sequencing data directly using multi-scale models for inhomogeneous Poisson processes. Applying multiseq to ATAC-seq data measured on three Copper treated and three control samples to detect differences in chromatin accessibility, I will show that multiseq performs well in small sample size compared to WaveQTL. Finally, I will briefly discuss applications of the multi-scale approaches for analyses of different types of high-throughput sequencing data, such as CAGE-seq, RNA-seq, and Hi-C data.

In brief, the main advantage of the multi-scale methods (WaveQTL and multiseq) is that due to "multi-scale" nature, the multi-scale methods easily capture signals varying in their scale (narrow or broad) while window-based methods are well powered to detect signals that occur on a single scale determined by the length of the window.


Search past seminars