Current Bioinformatics Seminar
Time: 11AM Tuesdays.
Venue: Davis Auditorium and Online
24 March 2026
This is a WEHI only event.SVEnsemble2: a Machine Learning-based integrative framework for detection of structural variation from high throughput sequencing data
Vladimir ShikovWEHI
Structural variants (SVs) are large-scale chromosomal changes, which include insertions, deletions, duplications, inversions, and translocations, as well as large-scale catastrophic rearrangements of chromosomes. Despite its prominent biological role, structural variation remains an understudied topic due to difficulties that arise when calling SVs. A multitude of SV calling approaches have been developed - both for NGS, and long-read data - drawing from patterns in read alignments or relying on construction of novel contigs. Different methods integrate these data in different ways, but no method is perfect. Their performance varies greatly between different datasets, and most tools demonstrate inconsistent success when calling different types of SVs - e.g., deletions and insertions. Integrating different algorithms into ensembles can leverage their individual strengths, but current ensemble callers use ad-hoc “n of m” approaches or are fine-tuned on specific samples. Here we present SVEnsemble2 - a novel tool for SV detection that uses a Positive-Unlabeled Machine Learning model to evaluate and merge results from multiple SV callers. By adjusting the model to individual samples, SVEnsemble2 can improve structural variant calling consistency across them, as well as achieve better performance in general SV detection. Using several high confidence validation SV sets we benchmark multiple short- and long- read tools demonstrating that SVEnsemble2 can improve the standalone outputs of popular SV callers, achieving higher AUROC and AUPRC over native quality metrics. We also demonstrate the potential ability of SVEnsemble2 to merge several SV call sets and its advantages over currently used consensus methods.