This work package focuses on the infrastructure needed for federated, secure, cross-border data discovery and access processes in Europe.
It aims to enable and facilitate interoperability across the national genomic, clinical and phenotypic data. The data collected will be part of the 1+MG initiative Use Cases on rare diseases, cancer, and common/complex diseases. B1MG will be used to identify the infrastructure requirements. A consultation process will then ensure that all stakeholders will be able to contribute (see WP1) to the WP deliverables.
WP4 will collect requirements from the partners and analyse gaps and inconsistencies on existing standards for interoperability (e.g. HL7/FHIR, Phenopackets, SNOMED-CT, ORDO, HPO, OIDC, DUO, ISO27k, FASTQ, BAM/ CRAM, VCF). WP4 provides coordination efforts across ongoing and funded initiatives with partners in leading roles (e.g. ELIXIR Federated Human Data, H2020 CINECA, EOSC-Life, EUCANCan, EJP RD or the work streams at GA4GH), and will re-use existing research infrastructure capacities from the European Open Science Cloud (EOSC) or the EuroHPC.
WP4 will benefit from the experience of leading genome and healthcare data management implementations, such as Genomics England, and of those international standards that have been agreed jointly with WP3 and are compliant with the ELSI (WP2) requirements.
One major activity includes organisation of workshops across Europe to engage with experts and stakeholders (partnering with WP1) that are already investigating similar requirements and implementations. Furthermore, this WP will facilitate a joint understanding of the security requirements for technical service components and build a catalogue of existing “synthetic” datasets that mimic a transnational cohort of at least 1M individuals. These synthetic data will be used for safe cross-border interoperability tests.
CSC will set up a secretariat for coordinating the partners, organise a kick-off event, provide a documentation platform for partners with a federated login, and operate a video conferencing service for WP meetings, in collaboration with WP6 (Task.6.4), where the consortium communication strategy is defined.
The task analysis will be based on existing regional (such as Tryggve), H2020 funded projects (CINECA, EOSC- Life, EUCANCan, EJP RD, and ELIXIR-CONVERGE), IMI and ELIXIR-coordinated commissioned services (e.g.Federated Human Data). These projects already fund development of practical solutions for identifying researchers across the 1+MG signatories (and those that are not signatories) and science driven discovery processes for the national data sets and access to the underlying data for data custodian approved researchers. Furthermore, these initiatives are based on open-source projects applying standards developed internationally for this field (such as those in the GA4GH) and re-use existing research infrastructures (ELIXIR, BBMRI-ERIC) and infrastructures such as the European Open Science Cloud (EOSC) or the EuroHPC.
By the end of the project, WP4 documentation is expected to include end-to-end pilot proposal solutions. Where possible, these solutions would have been piloted as part of the initiatives that funded the original work. For example, converting healthcare data to the suggested standard formats leveraging e.g. GA4GH Phenopackets, submission to a national facility using e.g. ELIXIR Federated EGA, using e.g. GA4GH Beacon API for discovery of federated data, granting access to data with e.g. GA4GH Passports, secure access to the data from a secure cloud infrastructure, or federated analysis with interfaces from national facilities.
Participants: CSC (Tommi Nyrönen, Ilkka Lappalainen), all WP participants.
Participants' roles: CSC will coordinate this task with all WP participants contributing.
This task maps maturity level of implementation of different international standards (such as those proposed by GA4GH) across partners and European countries. The task includes:
The optimal outcome would be a visual "map" of e.g. tools and workflows specification, provision of computational and/or data resources, data access and policies using international standards across countries represented in the B1MG-OG. Example matrix of possible columns: countries (aiming for the whole 28 in EU today); rows: standards promoting genome data infrastructure interoperability (GA4GH standards and other relevant standards e.g. ISO TC 215).
Participants: EMBL-EBI (Thomas Keane) with contributions from several internal and external stakeholders.
Participants' roles: EMBL-EBI: coordinate the mapping of competences.
Gap analysis of existing/emerging infrastructure components that enable the 1+MG initiative. This task analyses the workflow for federated data access from a regulatory viewpoint and from the technology angle. Which "arrows" in the flowchart can be implemented, and what is missing? The work will be carried out in consultation with existing European Research Infrastructures. The report will focus on what is missing from the current landscape of federated data management technologies.
Participants' roles: UU and SU: coordinate the gap analysis exercise involving relevant RIs.
Before exposing infrastructure components on real genomic and clinical data collected from the patients, the development versions of various applications, APIs and access management processes will be tested and validated on synthetic, real-like datasets. These datasets will be created using other funding sources (e.g. national, regional or H2020) but used for WP4 goals.
Virtual realistic datasets are required to test technical service interoperability, data protection and scalability without fear of data breach. Optimally, synthetic clinical and genomic data would be managed just like real data, using existing tools such as the federated EGA or RD-Connect.
The first synthetic datasets will be made available from the Nordics, Genomics England, the Netherlands, Spain, Estonia and Italy, but these data sets will be altered during the project based on the Use Case requirements and to ensure sex, gender, diversity and ethnics aspects are taken into account.
Participants: UU (Bengt Persson), SU (Niclas Jareborg) with contributions from several internal and external Stakeholders.
Participants' roles: UU and SU: coordinate the definition of synthetic dataset catalogue.
This task will organise a number of thematic workshops across Europe (Sweden, Switzerland, France, Spain, Italy and Slovenia) to bring together thematic experts to produce materials, report progress on stakeholders, and provide written input for the technical roadmap (Task 4.1).
The events will cover the following topics: Synthetic human datasets for safe cross-border testing; Security on federated data access processes; 1+MG: Synchronicity between regulation and technology; Identify the barriers: existing cross-border data discovery and access solutions; Interfaces for cross-border interoperable services; Healthcare and genomes – capacity building for Europe.
Participants: CSC (Tommi Nyrönen, Ilkka Lappalainen), UU (Bengt Persson), EPFL (J ean-Pierre Hubaux), CNRS (David Salgado), CRG (Sergi Beltran), UMIL (Matteo Chiaro), UL (Brane Leskosek), UNILU (Venkata Satagopam), UT (Andres Metsplau), SU (Niclas Jareborg), CNR (Graziano Pesole)
Participants' roles: CSC: coordinate the workshop and the integration of the different outcomes; UU, EPFL, IFB, CRG, UNIL, UL, UNILU, UT, SU and CNR will organise the local workshops and produce the workshop report.
CSC, ELIXIR/EMBL/EBI, CRG, University of Luxembourg, Uppsala University, SU, University of Tartu, University of Milan CNR, CNRS, University of Ljubljana