Visualisation of target-disease relationships in drug discovery

Rationale

The Centre for Therapeutic Target Validation (CTTV) is an innovative public-private partnership whose aim is to support researchers in identifying possible drug targets more efficiently by integrating genome-wide biological data from several public databases. It is a collaboration between The Wellcome Trust Sanger Institute (WTSI), The European Bioinformatics Institute (EBI), Glaxo SmithKline and Biogen and has the commitment to make public all the data generated and the software developed in the project.

The first public version of the web platform was released in December 2015 and is currently being used widely. As part of this initiative, the CTTV team has developed several re-usable visualisations to allow easier interpretation of the integrated data and these have been made public through the BioJS registry.

One of the areas we would like to improve in the next versions of the platform is help in creating new hypotheses (ie, target-disease associations) based on known associations. These new hypotheses can be inferred from different biological data such as pathways, known drugs or co-occurrences in scientific literature.

In this project we propose the creation of a visualisation that can be used to easily explore these new hypotheses.

Approach

The CTTV project makes use of UX methodologies to define the user requirements and ensure that the features included in the platform meets the expectations of the prospective users. Thus, interaction with the UX team and becoming familiar with UX techniques will be beneficial to the GSoC student.

To develop this visualisation the GSoC student will have to get comfortable with the data schema used at CTTV and work with our back-end team to get the data in the most convenient way to visualise it. Depending on the skills and interests of the student there is an opportunity to get involved in processing the data and its presentation through our REST API.

The visualisation will be developed as a re-usable web-based component, be integrated in our AngularJS web platform and made public in BioJS.

A possible visualisation may look as follows:

This figure could represent diseases as red nodes in one side (red circles), targets as white nodes in the other side with lines connecting known associations between them. Between both lines, yellow circles represent new hypotheses (genes or diseases) inferred by different types of biological evidence. The size of the nodes may represent the confidence of the new hypothesis based on the cumulative evidence.

Challenges

The main challenges for this project are:

  • Creating these new hypotheses based on the current data stored in the CTTV portal is challenging. Creating these type of inferred associations is in our current roadmap. Depending on his/her skills and interests the students will have the opportunity to work in this aspect of the project too.
  • The number of inferred associations to display can be very variable and poses a challenge in the visualisation. Aggregation techniques, filtering options etc may be required.
  • If a network visualisation is chosen, computing a sensible layout is critical.

Involved toolkits or projects

  • Depending on the type of visualisation selected this project may need the use of current visualisation tools like cytoscape
  • D3 may be required to develop the visualisation
  • The CTTV web application uses AngularJS and Twitter Bootstrap. Current visualisations have been written in Javascript using D3 as its unique dependency.

Required skills

  • Solid understanding of modern Javascript.
  • Working knowledge or interest in data visualisation is required.
  • Familiarity with current BioJS would be beneficial.

CTTV is a highly collaborative project, having the ability to work in an Agile team interfacing with back-end developers, UX designers and other web developers is essential.

Mentors

Miguel Pignatelli (CTTV-EBI), Luca Fumis (CTTV-EBI)