Visualisation of standard types in life science with BioSchemas, schema.org and BioJS

Rationale

At the moment there are no standard ways to describe life science events, people, training materials and courses. The websites holding this information code it in different ways. As a result, the dissemination, discovery and aggregation of this information is challenging, and advertising in third party websites normally requires time-consuming manual curation. This hampers the flow of information around the life science community.

The project aims to propose standard ways to present life science information. This will be achieved using schema.org markup, extended where necessary with new properties and guidelines on how to use new and existing schema.org properties. The project will also create a prototype that parses HTML pages annotated with terms coined within the agreed standards. Such a prototype will be done in using BioJS, a JavaScript based component library. If time allows, a visualization of the parsed data would be the next step.

Approach

Work has already been done on creating coding standards for life science information. Several life science organisations across Europe have joined together and formed Bioschemas, which is still in early stages. This project will engage with the schema.org community and with Bioschemas to carry the standards forward.

The standards themselves will be based as much as possible on existing agreements that haven’t yet been put into practice. There will be a standard for coding each type of life science information (e.g. an event, person or organisation). Each standard will consist of:

  • a data model based on an existing schema.org type (e.g. Person, Event), or a proposed extension of an existing type (e.g. LifeSciencePerson). Existing types will be preferred.
  • the data model may contain additional properties for each type. These properties will be submitted to the schema.org community to be adopted as part of the standard schema.org specifications.
  • controlled vocabularies, using existing ontologies wherever possible.
  • cardinality (i.e. whether one or many values are expected for each property).
  • minimum fields

The specifications will be designed to be unintrusive to information providers, minimising changes to the methods organisations currently use to publish life science information. The dissemination of information will be facilitated by making use of standards like Microdata, JSON-LD and RDFa.
JavaScript visualizations will provide specific views for life science schema.org types and facilitate content integration in third party websites. It will also help both in creating and implementing the standards. Example use cases include:

  • Using bubble diagrams to find common properties across all of the information types (e.g. people, events), and to identify properties common to them all.
  • A map of training course locations and people's expertise to help identify training needs.
  • A map life science events, that you can filter by topic.
  • A diagram showing where life science information is generated and where it propagates to (which websites). This will identify ways bioschemas can enrich this network, or show how bioschemas helps. If the websites are in agreement, this diagram could be quantified with page views.

Challenges

Agreeing the appropriate use of schema.org properties and types with the schema.org community (the use of some properties and types are open to interpretation).
Reaching agreement with the life science community on which properties are needed to describe different types of life science information.
Encouraging the adoption of the new coding standards in some key life science websites.

Involved Tools/Libraries

Comfortable with HTML
Familiar with schema.org markup
JavaScript

Mentors

Martin Cook (ELIXIR), Rafael Jimenez (ELIXIR), Leyla García (Uniprot-EBI)