Over years of working with multiple clients to solve their problems we have developed a resource of integrated data and tools. We call this MARRS, the foundation of which is semantic integrated data. Components of MARRS form the basis of many of our client projects and provide the GB sand-pit for technology development.
Public bioinformatics data is not well suited to data mining as it is not integrated. To support data mining we take diverse public data sources and transform the data to harmonise identifiers and terms; we integrate using Semantic Web technologies and serve the RDF for querying with Virtuoso. RDF data is the foundation of MARRS. To enable our use of this data we have a custom interface through which we can browse MARRS contents, the MARRS Explorer.
Data mining in GB is mostly via scripting, often using Python over SPARQL, we see great opportunity for Jupyter in this domain. Deployment to standardised data mining is via Galaxy where the data work-bench and in built tools are especially helpful. For quick views on what we know about a single protein we have Target Pages that combine data from multiple sources at a summary level. We have also produced a Cystocape plugin, General SPARQL, which allows users to extract data from MARRS in a very scientifically intuitive manner.
MARRS is vital to GB, some clients have direct access to the RDF integrated data, others barely know it is there, but we use it to deliver data according to the needs of their science. For GB, MARRS is the embodiment of our skills, tools and data that we bring to bear on all client interactions.