Working with the client we amassed a set of terms that describe neurodegenerative disease in the literature. These terms were used to mine public data to produce gene lists, these lists were expanded into networks by including interaction data. The total set presents a massive network of annotated genes against which omics data for specific diseases can be analysed.
Disease terminology
After a first pass through data sources such as MESH, DO and SNOMED-CT we refined the term list with disease domain experts. These terms were then used to retrieve genes with evidence against those terms from PubMed, ClinVar, GWAS and OMIM (client license).
Data expansion
The large list of candidate genes was expanded by retrieving interacting proteins/genes from Metacore (client license). Annotation was applied to each protein (Uniprot, GO, Interpro).
Data interpretation
Network analysis, statistical analysis of omics data, enrichment analysis and visualisations were performed using R, Cytocape and Pathvisio.
The data package was delivered to the client and is refreshed on a regular basis to facilitate their omics data mining.