Mouse and Human Versions of the MSigDB in R Format

Background

The Molecular Signatures Database (MSigDB) is an important resource created and maintained by the Broad Institute. The gene sets contained in the MSigDB are from a wide variety of sources and consist of human genes identified either by NCBI GeneID or by gene symbol. Our work at the WEHI predominately uses mouse models of human disease. To facilitate use of the MSigDB in our work, we have created a pure mouse version of the MSigDB by mapping all sets to mouse orthologs. A pure human version is also provided.

Procedure

For human, gmt files downloaded from the MSigDB were converted to R lists and saved in RDS format.

The mouse C1 positional gene set collection was created from the NCBI gene information file Mus_musculus.gene_info.gz downloaded from the NCBI ftp site. Cytobands were identified from the map_location column. The positional collection provides GeneIDs for the genes in each cytoband.

The C5 gene ontology collections were created from the Bioconductor organism package org.Mm.eg.db using the GeneID to GO Term mappings provided by the egGO2ALLEGS Bimap. This ensures that the GO Term hierarchy is respected: any GeneID associated with a child (more specific) GO Term is also included in any parent (more general) GO Term that is an ancestor for the original Term.

All other mouse collections were created by mapping the corresponding human collection to mouse orthologs using HGNC Comparison of Orthology Predictions (HCOP). The HCOP tool integrates orthology assertions predicted by eggNOG, Ensembl Compara, HGNC, HomoloGene, Inparanoid, NCBI Gene Orthology, OMA, OrthoDB, OrthoMCL, Panther, PhylomeDB, PomBase, TreeFam and ZFIN. It includes non-coding as well a protein-coding genes.

Current Version

Previous Versions

 

Comments/Questions? Email Alexandra Garnham

Last Modified: 15 June 2020