The main purpose of our study was to evaluate disease suggestions provided by Ada DX in rare inflammatory systemic disease cases when optimized for this domain. We conducted a retrospective study of rare disease cases with confirmed diagnoses. We optimized the system’s rare disease knowledge base and assessed the correctness and timing of suggestions. The rare disease suggestions were based on the ranked fit of the symptom constellation for the respective disease models.
Our findings suggest that Ada DX could provide accurate rare disease suggestions earlier than the time of clinical diagnosis in many cases based on information from the medical record, thus likely available to non-rare disease specialists. Our findings further show that, at the time of diagnosis, accurate disease suggestions were provided in most cases. Results pertaining to the system’s accuracy should be interpreted cautiously due to methodological limitations. A prerequisite for this study was the extension and optimization of the system’s medical knowledge base of selected rare diseases and related symptoms.
The interpretation of our results suggests that Ada DX has the potential to highlight the possibility of rare disease to physicians early in the course of a case. Effects on the actual TD in the clinical setting cannot be directly concluded. Evaluation of such effects requires a prospective study. However, we believe that early rare disease suggestions can facilitate earlier diagnosis. An early suggestion of diseases may increase awareness among physicians, particularly of those who may be non-rare disease specialists, thereby reducing diagnostic inaccuracy due to insufficient knowledge or premature closure [8, 32]. Suggesting possible rare diseases can increase the level of early suspicion that is necessary for diagnosis. By delivering early diagnostic support, Ada DX could alleviate the challenges of rare disease diagnosis. Ada DX could serve as a prototype of the tool that NAMSE, the UK Department of Health and other stakeholders in the rare disease community have recommended to develop [11].
Such an endeavor would require a structured and comprehensive extension of Ada’s rare disease knowledge base. Moreover, availability of the tool, preferably via a web-based application, is required to scale for widespread use and to support PCPs and specialists. This would give Ada DX the potential to empower PCPs to improve accurate rare disease referral, provide more accurate rare disease diagnosis, and shorten TD in rare disease cases on a larger scale.
Whether Ada DX disease suggestions effectively help physicians make better decisions in a real-world setting must be further investigated. For example, how will physicians know when to seriously consider a rare disease suggestion and when to ignore it? As a support and reminder system that presents a list of diseases ranked by their estimated probability and fit, Ada DX will necessarily suggest diseases that are not ultimately the correct diagnosis. Such false positive suggestions are not necessarily problematic in a reminder system. Our analysis of false positive suggestions based on a large set of common and rare internal medicine test cases revealed a low false positive rate. We do not know how either correct or false positive disease suggestions will affect the diagnostic process, costs, patient safety, and health outcomes. While DDSSs could modestly increase the risk of unnecessary diagnostic procedures [17, 33] it has the potential to improve overall diagnostic quality and reduce costs [20, 34]. These effects need to be evaluated in future studies.
Potential improvement of Ada DX
The analysis disclosed several reasons for an inaccurate disease suggestion, which might indicate possible areas of future improvement.
Multiple diagnoses (multimorbidity)
The presence of multiple diagnoses in single cases appeared to be among the most challenging scenarios for Ada DX in the given case set. Multiple diagnoses led to a lower accuracy and subsequently a lack of early correct disease suggestion. To increase diagnostic accuracy, the possibility to recognize multimorbidity is an ideal target for improvement.
Strict diagnostic criteria
Ada DX does not provide the option for excluding specific disease suggestions by assessment of strict diagnostic criteria, such as those provided in diagnostic guidelines and disease classifications. For that reason, prominent disease suggestions that were based on reasonably high probability estimations, but did not match strict diagnostic criteria, could not be excluded. Although the application of strict criteria partly contradicts the concept of probabilistic reasoning, it is of great importance when making or excluding diagnoses. Integration of strict diagnostic criteria separately from or after the probabilistic inference of disease suggestions, or the possibility to manually exclude suggested conditions, should be considered as an additional feature.
Consideration of therapy effects
Information concerning therapy cannot be included in cases. Consequently therapy effects are not reflected in the probability estimation although they can be of diagnostic relevance. Examples include factors such as therapy failure, symptom improvement with therapy, and consideration of medication side effects. The information that is conveyed by therapy failure should also be recognizable.
Compatibility with existing databases
Regarding the Ada knowledge base and its extension to include specific rare disease knowledge, the importance of system interoperability should not be underestimated and future optimization should prioritize compatibility. Knowledge base compatibility with existing rare disease databases like Orphanet [22] should be emphasized and should at least include disease mapping and codification. The existing Orphanet nomenclature (Orpha numbers) should be represented in the Ada knowledge base. Integration of such external databases would increase disease coverage. Integration with external databases also appears necessary for future improvements of Ada DX for rare disease. Ontology mapping facilitates scientific cooperation and follows recommendations from European institutions such as the UK Strategy for Rare Diseases, the French National Plan on Rare Diseases, and the German National Action League for People with Rare Diseases [10, 11, 35]. Database compatibility might also allow for a more efficient and targeted knowledge base extension because it could enable integration of further related databases, such as existing databases of genetic variants. Connection to available genetic information could be achieved through Online Mendelian Inheritance in Man (OMIM) [23] genetic reference numbers. It would facilitate the integration of known disease genotypes, gene-phenotype relations, as well as the appropriate suggestion and handling of genetic tests in Ada. Since most of such databases also rely on the compatibility with medical phenotype ontologies, especially the HPO [21], coverage of HPO terms should be comprehensively extended. HPO terms should be represented in a way that enables HPO users to use the system seamlessly.
Future knowledge base extension
To increase the usefulness of DDSS, complete coverage of known rare diseases should be desired. In the face of over 7000 known rare diseases and rapidly increasing medical knowledge, the process of disease model creation should be supported by technological means. A strategy for future disease model creation should aim for curated, automated modeling from structured disease databases. Furthermore, unstructured sources should be mined via the application of natural language processing (NLP). Similar technology could be applied to keep the knowledge base up to date. NLP could be used to screen medical publications to facilitate continuous updates in the knowledge base. Although such a process should still be curated by medical editors and follow rigorous quality testing, it could accelerate the process of knowledge base extension and maintenance.
User input dependency
Apart from the previously mentioned reasons for inaccurate disease suggestions, correct early disease suggestion implies additional challenges. It should be acknowledged that the capability of Ada DX to provide adequate disease suggestions is highly dependent on appropriate user input. Specifically, Ada DX depends on information gathered by the physician’s history assessment, examination, and further tests that are needed to confirm the correct diagnosis. It is therefore determined by the physician’s knowledge and skills in these areas. Nevertheless, Ada DX facilitates correct data gathering. For example, this can be achieved by not only suggesting diseases but also appropriate diagnostic tests and next steps specific to early diagnosis. While possible effects of diagnostic test suggestion and next step recommendation have not been examined in this study, it can be speculated that improvement of such features might further facilitate early diagnosis.
The manual work to enter cases was significant. If Ada DX was to be routinely used in clinical practice, the user experience must be improved to reduce active effort. If possible, collected data in Ada DX should be integrated into the electronic health record to avoid double data input and manual work for clinicians, ideally operating in the background and adapting to clinicians’ workflows [36].
Study limitations
A retrospective analysis of confirmed rare disease cases is generally suitable to assess the potential accuracy of DDSSs in such cases. However, a drawback of a retrospective approach is that the results can only be interpreted exploratively. A retrospective approach is generally suspect because of selection bias. Even though this study was partially controlled by fixed inclusion and exclusion criteria, there was inherent selection bias due to the focus on cases with a long course of disease and high final diagnostic certainty. The potential effect on TD might be lower in cases with a shorter course of disease or lower diagnostic certainty. A strength of the study is that a wide range of diagnoses from the group of systemic inflammatory diseases were represented (n=42), including cases of co-morbidity. However, results are limited to this group of diseases. Generalization of the results to the entire domain of rare diseases is not appropriate, as only a subgroup of rare diseases was studied. Following a monocentric design, a generalization of the study results applying to other institutions or medical domains is limited.
The unblinded case input and subsequent risk of confirmation bias represent a methodological limitation. While case input was not blinded to diagnosis, it was based on written documented information from the medical records to reduce hindsight bias and retrospective misinterpretation. Given that the study was performed retrospectively on the files of confirmed rare disease cases, the cases’ documented evidence often revealed the diagnosis through confirmatory findings at the time of diagnosis. Blinding of case input to the diagnosis might have been feasible if the confirmatory evidence from the diagnosis visit would not have been transcribed to case summaries. However, excluding evidence from the diagnosis visit would have compromised the evaluation of accuracy at the time of diagnosis. Future studies that aim to validate DDSS accuracy should follow a blinded and prospective design.
Data input was performed by a single user, thus no statement can be given regarding user dependency of the input. While this was accepted in this study because it focused on the potential of the reasoning engine and a first prototype of the Ada DX DDSS, it will be highly relevant to test data entry with different users in practice. For this reason, following studies should put an additional focus on user dependency.
The study was not intended as a validation of the system’s initial diagnostic accuracy (which we know was still limited by the relatively low number of rare diseases covered), but as an explorative estimation of the system’s potential to suggest rare diseases early. For this reason, the optimization of the knowledge base during the course of the study was accepted. This optimization enabled the evaluation of TD in an optimized scenario, but limits the evaluation of the system’s holistic accuracy. We recommend future studies use a fixed knowledge base and reasoning system to validate the accuracy of such systems. To properly evaluate false positive suggestions, a suitable control group should be considered.
The purpose of this study was not to perform a validation of the system’s initial accuracy, so a comparison of two different versions of Ada DX to track developmental changes in the system’s accuracy during optimization was dismissed. With the chosen study design it was not possible to calculate the knowledge base improvement achieved through extension and optimization, which would have required the comparison of two fixed knowledge base instances before and after the study. Nevertheless, such comparison should be considered for future studies that might aim to investigate the effect of a knowledge base extension.
Another limitation is that only confirmed conditions from the case set were added to the knowledge base and not a more extensive set of rare diseases. Arguably, a future extended disease knowledge base might lead to a lower disease suggestion accuracy. Early suggestion ranking of conditions could be lower if more diseases were present in the knowledge base. Specific evidence constellations can be expected to consistently lead to a high ranking of correct disease suggestions. The accuracy at the time of diagnosis should be relatively unaffected by the number of diseases, since specific confirmatory evidence is most likely to be present in the case at that time.
Lastly, effects on TD cannot be measured directly with this study, but the results of early suggestion indicate the potential for earlier suggestion of rare diseases to the physician resulting in earlier correct diagnosis. However, it should be considered that this is not always the case, as such diagnosis might only become legitimate with evolving clinical features in the course of a case.