While data sharing promises to accelerate the diagnosis of people with rare diseases and the development of therapies to treat them, these efforts’ success will depend in part on the quality of the data they have.
Ensuring the quality of shared data resources was the topic of a session during a recent two-day conference on data sharing and its potential to accelerate therapeutic development for rare diseases. The event, held by the Duke-Margolis Center for Health Policy and the U.S. Food and Drug Administration, ran August 18 and 19. A replay of the event is available online.
Data quality is a critical, overarching issue for data sharing. For researchers to benefit from pooled datasets, issues like the common use of terminology, the provenance of data, and how it is gathered and stored all impact on how usable it may be and whether conclusions based on the data will be valid. The new data collection program being built by RARE-X ensures patient organizations can collect high-quality structured data and are equipped to share ‘the right data at the right time,’ to accelerate research for their disease.
“For us to use the data with any kind of fidelity and an expectation that it means something, the data has to be of high quality. There is an exact expectation from this standpoint of quantitative sciences that the quality around the data science has to be there. And then the third piece of this is obviously on the regulatory sciences,” said Jeff Barrett, senior advisor of quantitative medicine at the Critical Path Institute, a nonprofit that works to advance medical innovation and regulatory science. “If someone is going to make decisions, either based on the data or the tool, there is also an implicit understanding that the traceability there, the audit trail—all of the expectations on the fidelity around the tool and the data—have to be there in order to use it for this purpose.”
Barrett said when C-PATH builds a data tool for a regulatory authority; the process begins with discussing such things as what the quality of the underlying data needs to be, the diversity of the users of the data, and questions about data privacy and security. Because expectations and quality can vary across data types—clinical trial data is different from registry data and different from genomic data—data must be standardized to ensure access and usability.
How good data has to be, depends on how it will be used. Context of use agreements are essential because they set the bar according to the expectations on a particular data type, he said.
“Regardless of how it comes into the system, the standards that need to be applied need to be applied differently to the different data types, but also with the expectations of use. It requires this community to invest in quality and the governance of all aspects of this,” said Barrett. “One group can’t be responsible for quality. It has to be the collective that really participates in this in a bigger way. And the governance around setting principles and practices that ensure high quality has to be part of the ongoing dialogue.”
Laurie Conklin, medical director of clinical programs for ReveraGen BioPharma, said that disease-specific registries, natural history studies, or quality improvement studies could be robust data sources that academics and foundations would like to see used to facilitate drug development. Still, because of data standards, these types of studies are not useful for regulatory submissions. She said that it’s worth considering what can be done to make them regulatory-friendly.
“We’ve spoken a bit about the need to involve all stakeholders early in the planning of registries and the need to involve regulators,” she said. “I’m not sure that some academics or foundations are knowledgeable or aware that all of these people need to come to the table early on in order to optimize these studies.”
Sam Hume, leader of the data science team at the Clinical Data Interchange Standards Consortium (CDISC), talked about his organization’s efforts to create standards at each step to collect and share data. CDISC has done one of the things to create standards for different disease communities to minimize having to alter different data sets to have value when shared. The organization has created therapeutic area user guides for Huntington’s disease and Duchenne muscular dystrophy to cover metadata, terminology, and required information to ensure standards are applied.
“In cases where folks are implementing the standards outside of regulated research, one of the key drivers is this ability to create standardized, consistent datasets that are much easier to pull and aggregate,” he said. “Because the standards are published, they have a decent understanding of what they’re looking at and working with.”
It can be difficult to identify enough patients from any single data source to conduct robust analysis of rare diseases. Shrujal Baxi, senior medical director at Flatiron Health, which helps cancer researchers apply real-world data to their work, said that there is an opportunity to create partnerships with different data vendors, regulators, and organizations to develop guidelines. Specifically, to standardize quality metrics for observational data generation, use a consistent ontology for common data variables, and considerations around processing unstructured data.
“We believe it is likely that the more we have regulatory certainty around these types of issues and how data is used in rare cohorts,” she said, “the more investment we’re likely to see in the innovative solutions to these problems and the healthcare data infrastructure as technology continues to evolve.”
Fortunately, RARE-X has been working with consortia partners tackling these very challenges; data standardization, using consistent ontology for common data variables, mapping to existing data standards like CDISC, and working/engaging across the ecosystem [patients, academic/clinical research, industry, government].
“The benefits of this data standardization will also help support the scaling of the RARE-X data collection platform,” said Nicole Boice, Co-Founder of RARE-X. “What we are building will create efficiencies of time around data collection, decrease in cost to stand up fit-for-purpose data, ease of use for patients to collect structured high-quality data, and ultimately help accelerate future cures.”