Bioinformatics Seminars

Bioinformatics Seminar

Time: 11AM
Venue: Hybrid

27 June 2023

Designing cis-regulatory sequence codes

Emily Wong
Victor Chang

The amount of potential distal regulatory elements (~30%) in the human genome greatly outnumber protein coding sequences (~2%). These regulatory regions govern cell identity by modulating gene expression, yet, unlike protein coding regions, we do not fully understand how they are defined; thus a sequence guide for characterizing cell types and cell states remains elusive. We show that a bag-of-motif model based on transcription factor binding site (TFBS), the unit of cis-regulatory function, can accurately classify mammalian cell-type cis-regulatory elements across cell types, at different life stages and in different species. We provide a computational approach, BOM (Bag-of-Motifs), that combines XGBoost with an explainable AI method, SHapley Additive exPlanations (SHAP), to train classifiers context-specific cis-regulatory codes based on transcription factor (TF) binding motifs. Applying this method to cell context-specific candidate distal cis-regulatory elements across from human, mouse, zebrafish and fly from different life stages reveals the model shows high precision and recall. The model can outperform more complex deep learning models at the same task. BOM will allow for more systematic approaches in interpretating cell-type or condition-specific cis-regulatory sequences. Related to these findings, I will also discuss our current efforts to understand cell-type specific regulatory enhancers, by design, to gain a deeper understanding of the genetics underlying cis-regulatory regions.


Search past seminars