Current Bioinformatics Seminar
Time: 11AM Tuesdays.
Venue: Davis Auditorium and Online
17 September 2024
Modelling spatial molecular data for normalisation and power analysis.
Dharmesh BhuvaSAiGENCI
Spatial resolution of molecular measurements has revolutionised biological studies while posing a significant informatics challenge. Advances in commercial products have increased the spatial resolution and throughput of measurements obtained, however, these are often coupled with significant costs. To better understand the properties of these expensive spatial molecular datasets, we need to understand the spatial nature of measurements and refrain from imposing cellular abstractions where possible. I will begin by describing our investigation into the total density of measurements (library size) in spatial transcriptomics datasets and show how it confounds biology. As a result of this confounding effect, library size normalisation using current methods results in poorer domain identification. Next, I will present our newly developed model, SpaNorm, that uses thin plate splines and a regularised generalised linear model (GLM) to model transcript counts and subsequently adjust library size effects by computing percentile adjusted counts. Our spatially aware library size normalisation method can adjust library size effects while retaining biological variation in the tasks of spatial domain identification and spatially variable gene calling. Finally, I will show how SpaNorm can be repurposed to model cell counts and estimate cellular rates for power analysis where the spatial distribution patterns of cells play a crucial role. Collectively, these results outline the strength of modelling spatial variation using splines and GLMs and demonstrate their broad utility.