Bioinformatics Seminar
Time: 11AM
Venue: Davis Auditorium and Online
20 August 2024
IdentifiHR: predicting homologous recombination deficiency in high-grade serous ovarian carcinoma through gene expression
Ashley WeirWEHI Bioinformatics
Approximately half of all high-grade serous ovarian carcinomas (HGSC) have a therapeutically targetable defect in the homologous recombination (HR) DNA repair mechanism. HGSC is the most commonly HR deficient (HRD) cancer type, largely due to the frequency of germline and somatic mutations in the HR-genes, BRCA1/2. While there are genomic methods and some transcriptomic signatures, developed for other cancer types, to identify HRD patients, there are no gene expression-based tools to predict HR repair status in HGSC specifically. We have built the first HGSC-specific model to predict HR repair status using gene expression. We separated The Cancer Genome Atlas (TCGA) cohort of HGSCs (n = 361) into training (n = 288) and testing (n = 73) sets and labelled each case as HRD or HR proficient (HRP) based on the clinical gold standard for classification, being a score of HRD genomic damage. Using the training set, we performed differential gene expression analysis between HRD and HRP cases. The 2604 significantly differentially expressed genes were then used to tune and train a penalised logistic regression model. IdentifiHR is an elastic net penalised logistic regression model that uses the expression of 209 genes to predict HR status in HGSC. These genes capture known regions of HR-specific copy number alteration, which impact gene expression levels, and preserve the genomic damage signal. IdentifiHR has an accuracy of 85% in the TCGA test set and of 91% in an independent cohort of 99 samples of the Australian Ovarian Cancer Study (AOCS), collected from primary tumours before (n = 74/99) and after autopsy (n = 6/99), in addition to ascites (n = 12/99) and normal fallopian tube samples (n = 7/99). Further, IdentifiHR is 84% accurate in pseudobulked single-cell HGSC sequencing from 37 patients and outperforms existing gene expression-based models for HR status, being BRCAness, MutliscaleHRD and expHRD. IdentifiHR is an accurate model to predict HR status in HGSC using gene expression alone.