One way that RARE-X can help improve the diagnosis and treatment of people with rare conditions is by gathering data that can provide new insights through machine learning, the ability for artificial intelligence systems to improve through experience.
A new study sheds light on how machine learning has already been applied to rare disease and opportunities for future work. It also suggests why efforts like RARE-X can play a critical role in improving the landscape for people with rare diseases by increasing the understanding of rare conditions through data.
In the study, published in the June 2020 Orphanet Journal of Rare Diseases, researchers at the Berlin Institute of Health examined studies in the PubMed database from 2010 through 2019 that applied machine learning to rare diseases.
Machine learning can be a valuable tool to assist in the diagnosis and treatment of rare diseases. The authors note that more than 80 percent of rare diseases affect less than one patient in 1 million, meaning that experienced physicians with a high patient contact volume will likely never see a single patient with most of these conditions.
“Despite its potential for improving the quality of care for patients, the use of machine learning in the field of rare diseases has not been comprehensively reviewed,” the authors write.
The authors note that more than 80 percent of rare diseases affect less than one patient in 1 million, meaning that experienced physicians with a high patient contact volume will likely never see a single patient with most of these conditions. The number of publications increased over the study period with just three published in 2010 and 79 published in 2019, which the authors said mirrored an increase in publications about machine learning in general.
Rare diseases with a higher prevalence appeared more often in these studies than diseases with a lower prevalence. Of the 74 diseases included in the studies, 55 percent had less than five percent per 10,000. That compared to just 3 percent that had a prevalence of less than nine per 100,000.
The researchers note that machine learning algorithms’ performance largely depends on the amount of data available to train the algorithms. “The lack of sufficient training data could also explain why rare diseases with a higher prevalence were investigated more often than lower prevalence diseases,” they write. “It is therefore important to further promote cross-institutional and international collaboration to create data sets sufficiently large for machine learning research.”
The researchers noted that some diseases (rare neurologic diseases, rare systemic or rheumatologic diseases, rare respiratory diseases, rare cardiac diseases, and rare gastroenterologic disease) were investigated more than expected. At the same time, other diseases (rare developmental defects during embryogenesis, rare inborn errors of metabolism, rare skin diseases, and rare endocrine diseases) were investigated less than expected. The researchers speculate this may reflect the data available for these conditions. For instance, many of the overrepresented disease groups work with imaging data, which is plentiful and standardized.
In fact, the most frequent source of data used in the studies were images (32 percent), demographic data (27 percent), and omics data (26.5 percent).
“The barrier of applying machine learning to other types of data, such as unstructured text data in medical records, is higher because these data are often not standardized and therefore more difficult to process,” the authors write. “This highlights the importance of international health IT standards and medical terminologies that can improve interoperability and that can help to make medical data more accessible to machine learning.”
The authors said that a small proportion of the studies tested their algorithms on an external validation data set or validated data performance against human experts. To advance the use of machine learning into clinical use, the authors say validation is critical. “Machine learning studies should therefore aim to evaluate their performance on external data so that their potential for real-world application can be more easily assessed.”
Overall, most studies used machines learning for diagnosis (40.8 percent) or prognosis (38.4 percent), while only a small number (4.7 percent) sought to improve treatment.
While they said it was not surprising to find most studies focused on using machine learning for diagnosis and prognosis of rare diseases, the technology can play a role in improving treatment for rare diseases, and future studies could focus on the use of machine learning to accelerate drug development.
The ability to apply machine learning to problems such as rare disease is dependent on the availability of data. RARE-X can play a crucial role in helping patient organizations leverage their data to speed up new therapy and diagnosis development. By providing a collaborative data platform, RARE-X allows patient organizations to share patient registries, natural history studies, genomic information, and electronic health records with researchers, clinicians, and drug developers.