SciTech Europa speaks to Sergi Beltran about the role GPAP plays in rare disease research.
According to EU classification, for a disease to be ‘rare’ there has to be less than one in 2,000 people suffering from that particular disease. The RD-Connect ‘Genome-Phenome Analysis Platform’ (GPAP), acknowledged by the International Rare Disease Research Consortium (IRDiRC), is an online resource for diagnosis and gene discovery in rare disease research. SciTech Europa Quarterly spoke to project lead Sergi Beltran about the role GPAP plays in rare disease research and what the future has in store for the discovery and treatment of rare diseases.
Beltran started by introducing the concept of RD-Connect and GPAP. He said: “RD-Connect started as a project with two other projects in Europe; NeurOmics and EURenOmics. We jointly discovered over 120 new disease genes, whilst at the same time, at GPAP, we developed an infrastructure to facilitate the sharing and analysis of this data, which now holds data from almost 5,000 individuals.
“In RD-Connect we are working to help with the scarcity of data samples, whilst also helping the community to break down barriers and to bring together resources, as there are so many different rare diseases. In many instances, data can sit in isolation in hospitals or databases and, additionally, each country has its own patient registry for a specific disease and its own biobank, which exacerbates this isolation. As a result, the knowledge is scattered across different countries and between different specialities within the different disease groups.
“To try and break this, at RD-connect we have created three main resources. The first one is the registry and biobank finder which is a catalogue of patient registries and biobanks mostly in Europe; this includes patient information and samples from patients with certain diseases. Secondly, we have the sample catalogue which contains over 25,000 bio-samples, covering over 100 rare diseases. And thirdly, you have the genome and phenome platform ‘GPAP’, Beltran concluded.
The standardised analysis and sharing of genome-phenome data
7% of the population are affected by rare diseases, and there is still a lack of diagnosis and therapies at the global level. However, with the processing and analysis of genomes, it is now possible to increase the diagnostic rate of rare diseases in patients from the current mean diagnosis average of 4.8 years. In addition, it is estimated that about 80% of rare diseases are of genetic origin, and this is what GPAP primarily focuses on. The campaign now has almost 5,000 integrated genome-phenome datasets which can be shared between researchers.
By reaching a diagnosis, it is then possible to understand exactly what the cause for that disease is, and thus a prediction of how it is going to advance can be made and measures to try and slow the disease down can then be tried and applied.
Beltran told SciTech Europa: “Moreover, it is very useful to identify another patient somewhere else in the world that has a similar set of symptoms to the patient that you are looking at and with the same candidate gene for causing the disease. At GPAP, we try and facilitate at least some of these processes by integrating genomic data, such as genomes and exomes. The patients know the benefits of sharing their data – anonymously, of course – and they are in general willing to have as many people as possible look at their data because they want to find a solution. I some cases, they are even willing to share their case completely publicly in order to find answers.
“In RD-Connect, we are mainly working with exome and genome technologies, which allow us to sequence all of the genes contained in the genome or the whole set of genes plus all the other parts of the genome which, in some cases, we know what they do. We pass the exomes and genomes through a standardised analysis pipeline in order to put them all in the same format so that we can use them in an integrated way and analyse them together. We also collect the standardised phenotypic information as well, using the Human Phenotype Ontology (HPO). The importance of putting together the genetic information with the phenotypic information can be demonstrated when we look at the success rate between us having and not having HPOs for research. For example, with no HPOs we are only able to diagnose around 25% of the cases. However, when we have more than six HPOs, we are then able to solve almost 50%.”
This is important because it allows the researchers to use this information programmatically as they are using standards and ontologies that computers can “understand”. It also ensures that the information is interoperable with other systems around the world as the majority of teams all use the same ontologies and languages. That is, if someone enters the information in Spanish, for example, the code behind that ontology is the same as all the others, which means that the systems can talk to each other. The RD-Connect GPAP then provides analysis and interpretation tools to their users so that they can go onto identify the genetic course of that patient’s symptoms and illness.
Challenges in creating the RD-Connect GPAP online tool
According to Beltran, the main challenge involved in establishing the RD-Connect GPAP online resource comes from the fact that their work is seeing them enter the realm of genome sequencing, which can be very challenging to interpret. A genome is very big, it contains over three billion bases which, to the naked eye, all look the same. “You cannot really say this one looks good or this one looks bad,” he told SciTech Europa.
“Each of us has more than three billion bases and we are trying to look at which of them is responsible for a disease which is a significant challenge and one which requires advanced tools, techniques, and technologies. In the future, this will be highly automated once we know more about the diseases. For instance, we hope to be able to sequence one genome and automatically have a report of what the diagnosis is for that patient, alongside any suggestions or recommendations for the patient follow-up.”
This, however, is some way off, and at the moment the process is much more complicated. Beltran explained: “Once we have a genome sequence, we have to spend a lot of time in front of the computer trying to analyse and interpret it in order to identify some kind of variant or mutation that might be causing the disease. In many cases, because there is a lack of knowledge and it has not been previously described, you need to functionally validate the mutation, including conducting experiments to certify that this mutation is indeed causing that disease.”
A further challenge experienced by the RD-Connect team was in relation to the creation of a system that handle the large amounts of data being generated.
“Phenomic data is small, but genomic data is huge – we are talking about three billion positions for every genome, and we are dealing with thousands of genomes. This is therefore a big challenge, both in terms of processing the data and being able to conduct this in a timely manner. Storing the data is also a challenge, as is being able to access it in real time. To overcome these challenges, we need to use big data technologies and infrastructures,” Beltran said.
In addition, one of the biggest societal challenges concerns the education of stakeholders, including ensuring that clinicians and researchers are aware of the benefit and importance of sharing data, and then being able to organise everything so that it complies with the ethical and legal regulations.
The challenges involved in genome-phenome analysis
For Beltran, one of the main challenges facing the community is that in order to be able to analyse genome-phenome data, it is necessary to have specialised knowledge, and this is something that the RD-Connect team have tried to solve with the GPAP tool. “For instance,” he said, “you need to have someone who knows how to do this, and you also need to have researchers which are specifically trained in medical and clinical genomics, alongside having a very secure IT infrastructure.
“By creating the GPAP research tool we are trying to facilitate and provide access to these kinds of technologies and analyses to many people that otherwise wouldn’t have previously been able to see this collection of data. Before the creation of GPAP, they would have had to ask a company to do it for them and, unfortunately, this meant that if the report did not provide them with a diagnosis, they would have been unable to re-analyse the data.
“A system like the RD-Connect GPAP research tool sidesteps this issue and because the system is very user friendly, patients could even try interpreting themselves or, as it is currently the case, they can have someone look at the data and interpret it for them so that, in the end, a diagnosis is achieved,” he concluded.
In the last six years since RD-Connect began, the field has made a lot of progress in agreeing on which standards to use, at least for the phenotypic and clinical information on rare diseases – before this, there was a bigger debate around which ontologies should be used.
Beltran explained: “At the beginning of the project we took the decision around which standard to use when it comes to these ontologies, and this has gone on to facilitate the international standardisation of this phenotypic data, which is great progress.
“Regarding the standardisation of information, if we start from really raw data then this can be relatively easy, although it is nevertheless quite time- and power- intensive. To solve this, we hope to move towards a situation where we can provide analysis pipelines so that people can run their analysis on the exact same pipeline that we are using, meaning that they can then run it on their computers or the cloud and give us back the data in the same format.”
Beltran also highlighted that this system includes security measures to ensure that the data is only accessed by those who should have access to it, and that it is only used for authorised purposes.
Bringing together genotype and phenotype data at the patient level
Bringing together genotype and phenotype data at the patient level means that there is a collaborative environment where data and queries can be shared between authorised users. “Users can search for variants in genes across all samples, which thus means that users have the power to identify other patients that might have the same symptoms and genes,” Beltran explained. “It therefore enables anonymised data discovery. In the current system, there are almost 5,000 individuals but within that only 3,100 are index cases (referring to the individual who has been refereed for diagnosis) meaning that while the individual who has been referred for diagnosis is the main subject in their specific family, we also have information from their relatives (genome- exome- and phenome-related information regardless of whether or not they too have the disease, which is useful for interpreting the data).
“Moreover, at least 500 of these index cases have been analysed primarily with the RD-Connect system. Therefore, from these 500 cases, we have diagnoses for 220 cases, creating approximately an 45% diagnostic rate. It is important to also mention that there are 124 index cases which have been labelled as ‘solved’ in the system, although we don’t know if they used the RD-Connect system as their primary means for analysis” Interestingly, many of the cases included in these figures stated out as very difficult cases which could not be solved by using the routine diagnostic procedures and tests, and the teams involved therefore collaborated through RD-Connect and a diagnosis was obtained.
The RD-Connect GPAP research tool has enabled the identification of causative variants in treatable disorders. In one project, known as ‘Consequitur’ which was conducted by Dr Rita Horvath, the exoms and phenomes of 50 index cases of with different types of rare neurological disorders were included.
Discussing this, Beltran told SEQ: “It has been possible to diagnose 31 of the 50 cases; with a preliminary diagnostic success rate of between 40-75%, depending on the type of disorder. For example, 75% of leukodystrophy have been solved, while this diagnostic rate went down to 40% in epilepsy. The nice thing about this is that these are treatable disorders and with treatment the patients’ lives can be greatly improved, even if the disorders are not completely cured.
“For the same project, there was a cohort of 48 families with CMS (congenital myasthenic syndrome). Here, the exomes were first pre-screened for already known mutations, and by doing so 19 cases were immediately solved. For the remaining 29, the data was put into RD-Connect. A diagnosis for a further 8 cases was subsequently achieved, and novel candidate genes have been proposed for 13 of the cases. This example represents a discovery that includes the identification of new genes that can potentially become worldwide available knowledge.”
There are a number of reasons as to why it is so important for this work to continue. As Beltran underlined: “When it comes to reaching a molecular diagnosis, the creation of GPAP is incredibly important as this could result in a variety of treatment options, including curative options, which there are currently very few of when it comes to rare diseases. Moreover, alongside this and the prevention of disease progression, GPAP will hopefully be very successful in helping reduce patient and family anxiety by facilitating diagnoses.
“In addition, it is my hope that GPAP will go on to consolidate its position in Europe as an infrastructure for rare disease research and, moreover, that it will really help clinicians and researchers to reach a diagnosis for their patients and discover new genes. I envisage that, besides improving the system of genome-phenome data, GPAP will also incorporate other types of information and datasets into the system in order to be able to form more integrated analysis.
“Most importantly, what GPAP has proved (and will continue to prove) is that collaboration is the key for rare disease research. RD-Connect is a very collaborative project with many people involved and it really has brought together a community that has shown that, by working together, we can really improve both research and the way we are able to access and share data,” he concluded.
Sergi Beltran Agulló
Head of Bioinformatics Analysis Centro Nacional de Análisis Genómico (CNAG-CRG)