- Open Access
Extending inherited metabolic disorder diagnostics with biomarker interaction visualizations
Orphanet Journal of Rare Diseases volume 18, Article number: 95 (2023)
Inherited Metabolic Disorders (IMDs) are rare diseases where one impaired protein leads to a cascade of changes in the adjacent chemical conversions. IMDs often present with non-specific symptoms, a lack of a clear genotype–phenotype correlation, and de novo mutations, complicating diagnosis. Furthermore, products of one metabolic conversion can be the substrate of another pathway obscuring biomarker identification and causing overlapping biomarkers for different disorders. Visualization of the connections between metabolic biomarkers and the enzymes involved might aid in the diagnostic process. The goal of this study was to provide a proof-of-concept framework for integrating knowledge of metabolic interactions with real-life patient data before scaling up this approach. This framework was tested on two groups of well-studied and related metabolic pathways (the urea cycle and pyrimidine de-novo synthesis). The lessons learned from our approach will help to scale up the framework and support the diagnosis of other less-understood IMDs.
Our framework integrates literature and expert knowledge into machine-readable pathway models, including relevant urine biomarkers and their interactions. The clinical data of 16 previously diagnosed patients with various pyrimidine and urea cycle disorders were visualized on the top 3 relevant pathways. Two expert laboratory scientists evaluated the resulting visualizations to derive a diagnosis.
The proof-of-concept platform resulted in varying numbers of relevant biomarkers (five to 48), pathways, and pathway interactions for each patient. The two experts reached the same conclusions for all samples with our proposed framework as with the current metabolic diagnostic pipeline. For nine patient samples, the diagnosis was made without knowledge about clinical symptoms or sex. For the remaining seven cases, four interpretations pointed in the direction of a subset of disorders, while three cases were found to be undiagnosable with the available data. Diagnosing these patients would require additional testing besides biochemical analysis.
The presented framework shows how metabolic interaction knowledge can be integrated with clinical data in one visualization, which can be relevant for future analysis of difficult patient cases and untargeted metabolomics data. Several challenges were identified during the development of this framework, which should be resolved before this approach can be scaled up and implemented to support the diagnosis of other (less understood) IMDs. The framework could be extended with other OMICS data (e.g. genomics, transcriptomics), and phenotypic data, as well as linked to other knowledge captured as Linked Open Data.
Many enzymes are critically involved in the synthesis, degradation, and transport of molecules in metabolic processes . Malfunctioning of any of these enzymes often results in a lack of or (potentially) toxic levels of metabolites, as well as affecting other (downstream) pathways . Figure 1 presents a schematic of the disturbed biochemical reactions based on one impaired protein, leading to an altered phenotype. These disorders are classified as Inherited Metabolic Disorders (IMDs) or Inborn Errors of Metabolism . A timely and accurate diagnosis of IMDs, currently based on both symptoms and biomarkers measured in various bodily fluids, is required to initiate therapies, which are sparsely available . The current diagnostic process starts with a metabolic pediatrician, who based on the phenotype of a patient can request biochemical analyses on a patient sample (e.g. blood, urine). After the sample has been collected and processed, several types of analysis can be performed (e.g. targeted metabolite assays, Whole Exome Sequencing (WES)), which all require data processing and interpretation. The processed data is often linked to existing database knowledge to arrive at a diagnosis. Methods to detect genetic variants (WES) are useful for the diagnosis of specific classes of IMDs where few or no specific metabolic biomarkers exist (e.g. mitochondrial disorders). This technique has been found less sensitive and specific as compared to metabolic measurements in newborn screening . Furthermore, genetic profiles of patients can also contain variants of uncertain significance (Fig. 1); these variants can only be classified as (likely) pathogenic when genomic, transcriptomic, proteomic, metabolomic, and/or fluxomic data are integrated through pathway or network analysis [6, 7, 8]. Targeted metabolite assays on the other hand are a valuable tool to pinpoint which metabolic processes are disturbed if the biomarkers for a disorder are known. These altered metabolites are used in newborn screening through dried blood spot analysis.
Unfortunately, diagnosing IMDs using metabolites can be challenging due to the commonly observed overlap between biomarkers, since the individual compounds are often involved in more than one metabolic pathway and can therefore be metabolized to various products. Furthermore, the diagnostic process can be quite time-consuming, requiring a manual inspection by an expert in the field, who needs to be familiar with all relevant metabolic conversion and their respective enzymes to point out the malfunctioning protein. Last, current clinical diagnoses are lacking a visualization of the connections between individual metabolic biomarkers and the enzymes involved in their synthesis and degradation.
Therefore, this study provides a proof-of-concept framework for the integration of metabolic interactions knowledge with clinical patient data and identifies current challenges for scaling up this approach. We hypothesize that the combination of this knowledge and patient data in one visualization can aid in the diagnosis of IMDs, by providing an overview of the processes relevant to the patient-specific deficient protein. With this approach, the attention progresses from individual markers to changes at the process level, which enables linking biological pathway knowledge to clinical cases. This direct link shows which metabolic reactions are disturbed, which proteins are related to these reactions, and potentially which specific protein is impaired, aiding diagnosis. Furthermore, metabolic disturbances can be recognized which cannot be attributed directly to the disorder, revealing potential blind spots in existing clinical knowledge.
Our framework was tested on two groups of IMDs with a well-understood molecular mechanism (pyrimidine metabolism and the urea cycle) known for biomarker overlap for several IMDs due to their common metabolite carbamoyl phosphate . Furthermore, pyrimidine disorders often present with nonspecific clinical symptoms and a lack of a clear genotype–phenotype correlation [10, 11, 12], while urea cycle disorders are often more specific (e.g. hyperammonemia, lethargy, vomiting, coma) .
The presented framework highlights chances for the IMD field as a whole regarding data integration and reuse, by showcasing that improving data and identifier (ID) harmonization increases the integration of clinical data with pathway knowledge and biomarker information. Furthermore, the framework could aid in the diagnostic process of other (novel) IMDs and is adaptable to analyze different types of IMDs and functional assays in the future, as well as integrating other types of (omics) data analysis, e.g. transcriptomics, metabolomics, and fluxomics. By using visualization techniques from common network approaches, the framework could also be extended with information on drug targets or genetic variants, which could allow for personalized medicine. Last, since this study combines several research fields and demonstrates an interdisciplinary approach, this paper will address each field individually with the hope of closing the gap between data collection and interpretation, data curation and modeling, and data processing and interoperability.
Figure 2 shows the proposed workflow to connect clinical data to pathway models and theoretical biomarker data. Knowledge from various databases had to be integrated into the framework, which is summarized in Table 1. All data processing steps were captured in an RMarkdown script  in the R programming language (version 4.1.3) , tested through RStudio (version 2022.02.2) , available at https://github.com/BiGCAT-UM/IMD-PUPY.
Clinical biomarker data
Biomarker data from 20 patients previously diagnosed with a pyrimidine or urea cycle IMD was collected through two targeted chemical assays in urine [19, 29]; metabolite concentrations were reported in μmol/mmol creatinine and patient age in months. Four patients were removed from this study, due to missing data for the AA panel (patients labeled B, C, P, and Q). The same assays were used to collect reference data from other patients suspected of having an IMD, however with no apparent IMD as assessed by selective metabolic screening. Reference data for purines and pyrimidines (PUPY panel) included 4853 samples selected over ten years; amino acids (AA panel) 1872 samples over five years. The reference data was categorized into five age categories; data from the overarching category 0 to 16 years was used if no reference value was available for a specific age category. For the 88 chemical biomarkers present in the patient data, four were disregarded from further data analysis due to missing reference data: n-carbamyl-aspartate (CHEBI:32814), allantoin (CHEBI:15676), cytosine (CHEBI:16040), and cytidine (CHEBI:17562). The patient and reference data was annotated with corresponding ChEBI  identifiers (IDs) or Wikidata  IDs when no ChEBI ID was available. Patient data was anonymized and five biomarkers were disregarded: allopurinol (used as treatment) and its metabolite oxypurinol; argininosuccinic acid anhydride (ASA-anhydride) (obsolete after switching the separation method from anion exchange chromatography to UHPLC-MS/MS for AA analysis ); and CysHCys and 2,8-dihydroxyadenine (metabolites without a ChEBI ID). Table 2 details the sample size, diseases, and corresponding age ranges used in this study. No patients or their caregivers have objected to the anonymous use of their leftover material from routine diagnostics for laboratory development and validation purposes.
Since the clinical patient data also included metabolites from the purine pathway, IMDs in this group were added to the analysis to serve as control data points. Machine-readable versions of the purine, pyrimidine, and urea cycle metabolic pathways were created using the pathway editor and curation tool PathVisio (version 3.3.0) , as well as pathway models (PWMs) on biomarkers, visualizing several markers missing from the main pathway models. All proteins were annotated with UniProt IDs , and directed Rhea IDs  for the metabolic conversions. Corresponding ChEBI IDs  from Rhea were used to annotate the substrate and product metabolites. IMDs were annotated with OMIM disease IDs . Data on the created PWMs was deposited in WikiPathways  and retrieved from RDF data format (Resource Description Framework ) through the WikiPathways SPARQL endpoint  (data from September 2021 ).
Selection of relevant biomarkers (in PWMs)
All biomarkers were compared to the lower or upper reference values; below the lower limit indicated a decrease (negative change) and above the upper limit indicated an increase (positive change). Biomarker values in between or exactly equal to the reference values were designated as unchanged. Missing biomarker data (null-values) were disregarded, as well as patient or reference concentration data being equal to zero. All resulting calculated values were log(2) transformed to show proportional changes, resulting in a log2FC. The changed biomarkers were compared against existing PWMs to find missing entries through the WikiPathways SPARQL endpoint (data from September 2021).
Theoretical biomarker data
Since the chemical assay for pyrimidine metabolites also measures purine compounds, we collected theoretical biomarker data for both pathways, in addition to the urea cycle. Potentially relevant biomarkers for these disorders were retrieved manually from IEMbase  V 2.0.0 (accessed on 2021-08-05) through their HGNC gene name as HMDB IDs , including the sample matrix, and positive or negative concentration change. The latter was converted to a numeric scale for each of the five provided age categories. The biomarkers in IEMBase were represented through arrows (and some other characters) to show relative increases or decreases rather than numeric values. These visualizations were converted to a numeric scale (from -3 to + 3) according to these rules:
↑↑↑, ↑↑, ↑
+ 3, + 2, + 1
+ 1.5, + 2.5
n to 1, n to 2
+ 0.5, + 1.5
↓↓↓, ↓↓, ↓
−3, −2, −1
n, + −
Correlations between individual metabolic biomarkers and diseases were visualized in a heatmap (Euclidean distance) with the gplots package (version 3.1.1, https://cran.r-project.org/package=gplots); positively changed biomarkers were colored red (using three shades to show mildly, high, and very high), negatively changed markers blue (again in three shades); markers which were not altered for a disease were colored white. Disorders without any biomarker data for a specific age category were removed from the visualizations.
Relevant biomarker overlap
All biomarkers were manually linked from ChEBI IDs (patient and pathway model data) to their corresponding HMDB IDs (theoretical biomarker data). The patient biomarker data was converted to the same scale as the theoretical biomarkers (values for log2FC above 3 or below −3 were set at 3 and −3, respectively). The patient data was visualized together with the theoretical biomarker visualization, removing small changes (log2FC between −0.05 and 0.05).
Relevant pathways were found through a query against the WikiPathways SPARQL endpoint matching the changed biomarkers. The pathways were sorted based on the highest number of matching biomarkers. A maximum of three pathways were selected, based on including the most unique biomarkers.
The data for each patient was visualized with the network analysis tool Cytoscape  (version 3.9.1), by using the Cytoscape REST API  (version v1) and WikiPathways App for Cytoscape  (version 3.3.10) through R. The absolute highest value for the log2FC was used to determine the color scale, using a five-point scale to accommodate for small changes (values between −1.5 and 1.5) and high (abnormal) biomarker values. If no value was available for a node within the network, the fill color was set to gray.
For each patient, the network data visualization (framework step 7) and relevant biomarker overlap heatmap (step 5) were provided to two Laboratory Specialists in Biochemical Genetics, after which narrative feedback on a potential diagnosis was collected.
A framework was designed to visualize clinical biomarker data for IMDs through their metabolic interactions. In order to explain the findings of this interdisciplinary study, this section is divided into three paragraphs, so that experts from different research fields can directly find the information most relevant to them, while also being able to switch outside of their expertise.
Clinical geneticists, metabolic pediatricians, biologists, and chemists
This group of experts is mainly responsible for the data collection and interpretation (e.g. Metabolic Pediatricians, Laboratory Specialist), and is involved at the direct start of the diagnostic pipeline and the final diagnostic step. Our framework was tested on data from 16 patients with a variety of pyrimidine and urea cycle IMDs and is summarized in Table 2. In total 88 clinical markers were measured in urine samples, 34 through the PUPY panel (purines and pyrimidines) and 54 by the AA panel (amino acids). Theoretical biomarkers for the investigated phenotypes were obtained from an online database (IEMbase), finding 27 unique metabolic biomarkers relevant to urine samples. Table 3 shows the number of (significantly altered) biomarkers linked to reference data for each patient, as well as the number of biomarkers found in a metabolic pathway. Two laboratory specialists in biochemical genetics used the data visualizations from our framework to arrive at an IMD diagnosis, by combining the heatmap showing theoretical biomarkers and related enzymes with the network biomarker data visualization.
Figure 3 shows the theoretical biomarkers for their respective IMD class (purine, pyrimidine, urea cycle) and data for one patient (age category 0–1 year, diagnosed originally with DPYD, in purple, labeled patient I) as a heatmap. Comparing theoretically changed biomarkers to patient data is the first step in selecting potentially relevant phenotypes and affected proteins, and can be used to imply which biochemical reactions or pathways are disturbed. Rows indicate individual phenotypes (right axis) and are clustered (left axis) based on their overlapping biomarker profiles (bottom axis). The top left of Fig. 3 shows that for example the first two rows representing SLC25A15 and OTC (both urea cycle disorders) are clustered together, due to their overlapping biomarkers orotic acid (HMDB0000226) and homocitrulline (HMDB0000679). However, for SLC25A15 an excessive amount of homocitrulline is produced and a small increase in orotic acid can be noted, while for OTC both metabolites are increased in a similar amount. Disorders clustered together can be difficult to diagnose, due to marginal changes in or low numbers of known biomarkers, and overlap between the markers. The sample obtained from patient I showed four additionally changed biomarkers compared to the theoretical values; however, no direct relation to the theoretical biomarker profile of DPYD was observed.
Regarding all patients, the biomarker profile of only four patients clustered with a potential gene of interest (Table 3 last column), with three patients (labeled A, E, K) closely resembling the theoretical biomarkers for their corresponding disorder DPYD. Interpreting the remaining patient data with knowledge of the biochemical interactions between the biomarkers is needed to arrive at a diagnosis.
Figure 4 shows the data visualization for patient I on the pathways selected for this patient: ‘Biomarkers for pyrimidine metabolism disorders’ (left) and ‘Purine metabolism’ (right) pathways. As expected for DPYD, the pyrimidine pathway which includes the DPD protein shows the most relevant metabolic changes for this patient. Two metabolites (thymine, uracil) directly converted by DPD show elevated levels; one direct downstream metabolite of thymine (5-OH-methyluracil) also shows a high concentration. Two downstream metabolites of DPD (dihydrouracil, beta-alanine) are found to be within the healthy reference values, whereas (S)-beta-aminoisobutyrate also downstream of DPD shows a decreased concentration. The second selected pathway on purine metabolism shows elevated levels of SAICA-riboside, which was not expected for this disorder and might suggest physiological immaturity.
In total, nine disorders out of the 16 patient samples were diagnosed with the correct IMD, whereas for four patients the visualization suggested further informative assays (see Table 6). These latter patients included one case of Dihydropyrimidine dehydrogenase deficiency (DPYD—patient E), and three cases of Ornithine Transcarbamylase deficiency (OTC—patients G, R, and S). Samples from patients under treatment, e.g. patient H (diagnosed with hyperornithinemia- hyperammonemia- homocitrullinuria (HHH) syndrome, also known as ornithine translocase (SLC25A15) deficiency) receiving citrulline, were difficult to diagnose since the framework cannot distinguish between abnormal biomarker values due to treatment or caused by the IMD. Patient J (diagnosed with Beta-ureidopropionase deficiency, UPB1D) was not correctly diagnosed by both experts, which we attribute to the very mild disturbances in the biomarker patterns. Last, patient O (diagnosed with Ornithine Transcarbamylase deficiency, OTC), showed an unexpectedly higher value for arginine rather than ornithine.
Data(base) curators and modelers
This section describes the data curation and modeling aspects of this study; often an invisible layer in the diagnostic process however an important influence on the results of a diagnosis. Out of the 88 biomarkers measured through the targeted metabolic assays, two could not be annotated with one unified database ID from ChEBI. Six new pathway models were created for this project, to provide interoperability between the metabolic interactions relevant to the studied IMDs and the clinical biomarker data. Table 3 details how many relevant clinical biomarkers were missing from any PWM from the WikiPathways database (framework step 3), the pathway’s coverage of biomarkers relevant for each patient (pathway), and the highest total of markers covered by one pathway. Table 4 describes the content of each pathway, regarding proteins, metabolites, interactions, and described disorders. There were seven markers not part of any pathway model (previously existing or newly created) with the corresponding ChEBI IDs: 28315, 40279, 17755, 43433, 89698, 49015, and 61511, which can be used for future curation. The available theoretical reference data for urine samples with a database identifier (HMDB) left 23 unique biomarkers, linked to 25 individual IMD phenotypes. Nine disorders were missing theoretical urinary biomarker data, and for one disorder the molecular mechanism is still unclear, therefore missing a specific protein connection.
Programmers, data scientists, and bioinformaticians
The last group addressed in this section provides the glue that holds the analysis section of the diagnostic pipeline together and is responsible for data processing and interoperability. Theoretical biomarker data was collected for all IMDs in the pyrimidine and urea cycle pathway; biomarkers for the purine pathway were included to represent true negative values. In total, 17 purine, 10 pyrimidine, and 8 urea cycle IMDs were included (35 disorders in total) in the PWMs. The annotated data from framework step 1 (Fig. 2) was used to find relevant pathways for visualization, which led to 171 pathways in total, including one or more distinct biomarkers. Fifteen of these pathways contain 10 or more markers, displayed in Table 5, which could potentially be ideal candidates for data visualization.
By only querying the pathway data for relevant biomarkers (instead of all markers in a panel), a customized visualization was created for each patient. The selection of the top three pathways containing the highest number of unique markers was performed using SPARQL-queries. One pathway for three patients was selected manually, aiming to include relevant metabolic interaction containing biomarkers with the largest change. Table 6 shows which pathways ended up in the top three for each patient, as well as which biomarkers were not part of the visualization after selecting the top three.
Our proposed framework is based on the combination of clinical data, online available biomarker information, and metabolic reaction models. This framework creates the possibility to visualize clinical biochemical data on the pathway level, allowing for a more detailed interpretation of the connections between the different markers. The developed framework was designed for individual patient analysis and optimized for pyrimidine and urea cycle disorders with biomarkers measured through targeted assays. We believe this framework can be extended to other IMDs and additional biological matrices. Also, several challenges should be taken into account before scaling up the framework. This section is again divided into three paragraphs, to describe the challenges for data collection and interpretation, data curation and modeling, and data processing and interoperability.
Clinical geneticists, metabolic pediatricians, biologists, and chemists
Our developed framework enables the visualization of clinical biomarker profiles with biological pathway knowledge, by connecting individual markers to changes on the process level. This approach shows which metabolic reactions are disturbed, which proteins are related to these reactions, and potentially which specific protein is impaired, aiding diagnosis. Existing data interpretation approaches often require a manual inspection of pathways and interactions, which do not include clinical data. Furthermore, metabolic disturbances can be recognized which cannot be attributed directly to the disorder, revealing potential blind spots in existing clinical knowledge. However, the data integration needed for this approach requires database identifiers; therefore we advise the (rare) disease community to include these identifiers (from a publically available database) for each compound measured through a metabolic assay. The developed framework is extendable with in-house biomarker data, knowledge from other databases or literature, and additional data from blood samples or other relevant matrices. Even though recent advances in clinical urinary biomarker measurements  have aided in the diagnosis of some IMDs, most markers are currently not used for newborn screening  due to the limited detectability of these biomarkers in dried blood. The inclusion of an additional matrix could provide a broader overview of the metabolic disturbances in a patient and lead to a more comprehensive isolation of the involved metabolic interactions. The framework also leaves room for manual selection of potentially relevant pathways by experts, which could be aided by reviewing the patient-specific heatmap which visualizes theoretical biomarkers. Our framework could be enhanced by selecting the top three pathways covering the most unique biomarkers while prioritizing the markers with the highest log2FC. Currently, diagnostic laboratories for inborn errors of metabolism often report exact metabolite concentrations in diagnostic patient reports; Z-scores are (rarely) used as a measure to compare patient values to a control population. The pathway and network models used in our framework normally report (log)-fold changes (log2FC) to compare two groups (diseased versus control) or z-scores as a measure for over-representation of pathway entities such as metabolites. We believe that laboratory specialists would benefit from learning to interpret this type of data and visualization as a new diagnostic tool. In this study, the network model helped to easily diagnose 9 out of 16 patient samples and pointed in the correct direction or suggested follow-up analysis for 4 patients. These numbers are similar to the original diagnostic outcome for the metabolite pipeline; the remaining 7 out of 16 patients could previously only be diagnosed with additional tests (e.g. protein loading test, WES, clinical information). Patient H (previously diagnosed with ornithine translocase (SLC25A15) deficiency with the main biomarker homocitrulline, HMDB0000679) was found difficult to recognize, both through the original metabolic pipeline and our framework. This issue was most likely due to the administered treatment with citrulline, highlighting the importance of not only clinical but also medication information for a proper diagnostic workflow. As in any other computational framework, only data from untreated persons should be used. For the validation of a recently implemented diagnostic tool called targeted urine metabolomics (TUM) , similar samples were used as discussed in our study. When comparing the data interpretation from TUM and the developed framework, we can deduce that comparable interpretations were reached. This agreement was also found for the Ornithine Transcarbamylase deficiency (OTC) cases, which remains difficult to diagnose in women since the disorder is X-linked causing an atypical biomarker pattern. Patient O is an example of such, where we hypothesize that the cyclic metabolic conversion of arginine, argininosuccinate, and ornithine into one another could be the cause of this unexpected pattern. To understand these atypical cases of OTC and corresponding biomarker patterns, data from more patients is required and we recommend other laboratories to share their data on IMDs if possible. Sharing (more) rare disease patient data would also help to understand the effects of ethnicity, age range, or sex on the molecular mechanism of IMDs. For future studies, the interest should shift to measuring metabolite fluxes  over a longer timespan to better understand how for example protein intake triggers decompensation .
Data(base) curators and modelers
The presented framework highlights chances for the IMD field as a whole regarding data integration and reuse, one of the cornerstones of data modeling. Our framework leverages data and identifier (ID) harmonization to increase the machine readability of existing IMD data. This harmonization was required for the integration of clinical data with pathway knowledge and biomarker information. All metabolites in the assays were annotated manually based on their (Dutch) name; using persistent IDs to annotate data is a key aspect to enable open science , and ultimately leads to FAIR data . This annotation could not be completed for all biomarkers in the targeted assays based on their name (e.g. CysHCys, a disulfide from cysteine and homocysteine); drawing the chemical structure and converting the structure to a SMILES  could be used to annotate this compound. Creating pathway models (PWMs) annotated with resolvable IDs for the entities within the pathway  was crucial for data analysis and visualization . Several initiatives merge pathway information [43, 44] based on gene and protein content, rather than metabolites and chemical reactions, which makes them unsuitable for IMD metabolic data analysis. Furthermore, this reaction information is scattered over publications in images and text  as well as various databases , which requires dedicated curation time to arrive at a pathway model covering all relevant interactions. The lack of naming standardization in databases and papers for metabolic conversions causes issues in data curation, despite the IUPAC-nomenclature rules  and available software to translate IUPAC names to chemical structures  and vice versa . Connecting all disease IDs to their counterpart protein ID and data from the IEMbase  could only be performed manually. To facilitate data integration and comparison on an automated basis, we advise providing programmatic access to biomedical databases, for example through an API  or SPARQL endpoint . We found that some syndromes (e.g. Lesch Nyhan and Kelley Seegmiller; pyrimidine 5’-nucleotidase superactivity and pyrimidine 5’-nucleotidase I deficiency) were treated as individual disorders by one database, while the other combines the information on both disorders in one entry, which hampers data interoperability. In order to distinguish between these individual disorders based on only theoretical biochemical markers, more discriminating values are needed.
Programmers, data scientists, and bioinformaticians
By using visualization techniques from common network approaches, the developed framework is ready for extension with other data relevant to the diagnostic pipeline, e.g. genetic variants or drug-target knowledge. Furthermore, other types of (omics) data can be integrated into the workflow, e.g. transcriptomics, metabolomics, and fluxomics. Due to the use of semantic web technologies (RDF), other knowledge captured as Linked Open Data can also be used to extend our approach. However, this data integration requires automatable access to pathway content and open licenses; a lack herein prohibits data extraction and acquiring all relevant interactions for the studied biomarkers. Seven out of the 83 markers could not be found in any of the consulted pathway data. Several pathways overlap in terms of content, which could conceal potentially relevant pathways. Large pathways in terms of node size and captured reactions contain more biomarkers, however at a larger distance, diminishing a clear biological cause and effect path visualization. The data interpretation could be hindered when a biomarker was present only as a substrate or product, which could miss relevant up or downstream reactions. In order to overcome a mismatch of biomarkers and pathway data, clinical biomarker IDs could be converted to their corresponding neutral molecular structure InChIKey  ID or by performing substructure matching . We want to encourage harmonizing the information in phenotype databases, for example by using existing ontologies such as the Human Disease Ontology , Human Phenotype Ontology , or Nosology for Inherited Metabolic Disorders . The log2(change) data of patients was converted to the -3 to + 3 scale from IEMBase, where the contribution of highly altered biomarkers to the correlation might get obscured. Furthermore, since not all clinical data could be visualized directly in one PWM (due to the biomarkers being spread over multiple models), other approaches will be needed to overcome the boundaries imposed by individual models. Reactome pathways were excluded from the analysis since the model conversion  from the native Reactome pathway models to the WikiPathways RDF leads to unconnected biomarkers hampering visualization. Other possibilities to automatically visualize pathway data in the network tool Cytoscape are the Reactome Cytoscape Plugin  and Cytoscape for KEGG , the former is not optimized for metabolic data and the latter includes proprietary data access. Two other pathway apps, CyPath2  and cy3sabiork  could not be automated.
With this study, we show the potential of a Systems Biology approach combining semantic web technologies for data linking and network analysis for data visualization, to directly connect biological pathway knowledge to clinical cases and biomarker data. The presented framework is adaptable to different types of IMDs, difficult patient cases, and functional assays in the future, which opens up the possibility for usage in the diagnostic pipeline. Information on treatment and clinical conditions remains important for accurate diagnosis, as well as expert interpretation of all information combined into this framework. Furthermore, several steps in the framework are now highly dependent on the manual curation of data and databases requiring expert knowledge of the information therein. The issues highlighted in the discussion section should be overcome in the future to allow our developed framework to be easily used for other IMDs, by adding persistent identifiers to (clinical) biomarker data, allowing automatable data downloads from relevant databases, and creating computer-readable pathway models from pathway figures.
Availability of data and materials
The datasets supporting the conclusions of this article are available in the IMD-PUPY repository, https://bigcat-um.github.io/IMD-PUPY.
Amino acids (panel)
Inherited Metabolic Disorders
Transformation of biomarker values to a log(2) scale
Purine and pyrimidine (panel)
Resource Description Framework
Alanine–glyoxylate aminotransferase 2
Adenosine monophosphate deaminase 1
Argininosuccinate synthase 1
5-Aminoimidazole-4-carboxamide ribonucleotide formyltransferase/IMP cyclohydrolase
Carbamoyl-phosphate synthase 1
Dihydroorotate dehydrogenase (quinone)
Dihydropyrimidine dehydrogenase dihydropyrimidine
Hypoxanthine phosphoribosyltransferase 1
Inosine monophosphate dehydrogenase 1
5'-Nucleotidase, cytosolic IIIA
Purine nucleoside phosphorylase
- PRPPs (class of enzymes):
Phosphoribosyl pyrophosphate synthetase 1
Ribonucleotide reductase regulatory TP53 inducible subunit M2B
Solute carrier family 25 member 13
Solute carrier family 25 member 15 (ornithine translocase)
Thymidine kinase 2
Upstream binding protein 1
Uridine monophosphate synthetase
- XAN2 (gene XDH):
- XO (gene MOCOS):
Molybdenum cofactor sulfurase
Fukao T, Nakamura K. Advances in inborn errors of metabolism. J Hum Genet. 2019;64:65. https://doi.org/10.1038/s10038-018-0535-7.
Lanpher B, Brunetti-Pierri N, Lee B. Inborn errors of metabolism: the flux from Mendelian to complex diseases. Nat Rev Genet. 2006;7:449–60. https://doi.org/10.1038/nrg1880.
Ferreira CR, Rahman S, Keller M, Zschocke J, ICIMD Advisory Group. An international classification of inherited metabolic disorders (ICIMD). J Inherit Metab Dis. 2021;44:164–77.
Burton BK. Inborn errors of metabolism in infancy: a guide to diagnosis. Pediatrics. 1998;102:E69. https://doi.org/10.1542/peds.102.6.e69.
Adhikari AN, Gallagher RC, Wang Y, Currier RJ, Amatuni G, Bassaganyas L, et al. The role of exome sequencing in newborn screening for inborn errors of metabolism. Nat Med. 2020;26:1392–7. https://doi.org/10.1038/s41591-020-0966-5.
Coene KLM, Kluijtmans LAJ, van der Heeft E, Engelke UFH, de Boer S, Hoegen B, et al. Next-generation metabolic screening: targeted and untargeted metabolomics for the diagnosis of inborn errors of metabolism in individual patients. J Inherit Metab Dis. 2018;41:337–53. https://doi.org/10.1007/s10545-017-0131-6.
Arnold GL. Inborn errors of metabolism in the 21 century: past to present. Ann Transl Med. 2018;6:467. https://doi.org/10.21037/atm.2018.11.36.
Stenton SL, Kremer LS, Kopajtich R, Ludwig C, Prokisch H. The diagnosis of inborn errors of metabolism by an integrative “multi-omics” approach: a perspective encompassing genomics, transcriptomics, and proteomics. J Inherit Metab Dis. 2020;43:25–35. https://doi.org/10.1002/jimd.12130.
Burrage LC, Thistlethwaite L, Stroup BM, Sun Q, Miller MJ, Nagamani SCS, et al. Untargeted metabolomic profiling reveals multiple pathway perturbations and new clinical biomarkers in urea cycle disorders. Genet Med. 2019;21:1977–86. https://doi.org/10.1038/s41436-019-0442-0.
Jurecka A. Inborn errors of purine and pyrimidine metabolism. J Inherit Metab Dis. 2009;32:247–63. https://doi.org/10.1007/s10545-009-1094-z.
Balasubramaniam S, Duley JA, Christodoulou J. Inborn errors of purine metabolism: clinical update and therapies. J Inherit Metab Dis. 2014;37:669–86. https://doi.org/10.1007/s10545-014-9731-6.
Balasubramaniam S, Duley JA, Christodoulou J. Inborn errors of pyrimidine metabolism: clinical update and therapy. J Inherit Metab Dis. 2014;37:687–98. https://doi.org/10.1007/s10545-014-9742-3.
Smith W, Kishnani PS, Lee B, Singh RH, Rhead WJ, Sniderman King L, et al. Urea cycle disorders: clinical presentation outside the newborn period. Crit Care Clin. 2005;21:S9-17. https://doi.org/10.1016/j.ccc.2005.05.007.
Allaire JJ, Xie Y, McPherson J, Luraschi J, Ushey K, Atkins A, et al. rmarkdown: Dynamic Documents for R . 2021. https://github.com/rstudio/rmarkdown.
R Core team. R: A Language and Environment for Statistical Computing. 2020. https://www.r-project.org/.
RStudio Team. RStudio: Integrated Development Environment for R . 2019 [cited 2021 Sep 11]. https://www.rstudio.com/.
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, ChEBI in, et al. Improved services and an expanding collection of metabolites. Nucleic Acids Res. 2016;2016:D1214–9. https://doi.org/10.1093/nar/gkv1031.
Waagmeester A, Stupp G, Burgstaller-Muehlbacher S, Good BM, Griffith M, Griffith OL, et al. Wikidata as a knowledge graph for the life sciences. Elife. 2020;9:52614. https://doi.org/10.7554/eLife.52614.
Waterval WAH, Scheijen JLJM, Ortmans-Ploemen MMJC, Habets-van der Poel CD, Bierau J. Quantitative UPLC-MS/MS analysis of underivatised amino acids in body fluids is a reliable tool for the diagnosis and follow-up of patients with inborn errors of metabolism. Clin Chim Acta. 2009;407:36–42. https://doi.org/10.1016/j.cca.2009.06.023.
Kutmon M, van Iersel MP, Bohler A, Kelder T, Nunes N, Pico AR, et al. PathVisio 3: an extendable pathway analysis toolbox. PLoS Comput Biol. 2015;11:e1004085. https://doi.org/10.1371/journal.pcbi.1004085.
UniProt Consortium. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 2021;49:D480–9. https://doi.org/10.1093/nar/gkaa1100.
Lombardot T, Morgat A, Axelsen KB, Aimo L, Hyka-Nouspikel N, Niknejad A, et al. Updates in Rhea: SPARQLing biochemical reaction data. Nucleic Acids Res. 2019;47:D596-600. https://doi.org/10.1093/nar/gky876.
Amberger JS, Bocchini CA, Scott AF, Hamosh A. OMIM.org: leveraging knowledge across phenotype-gene relationships. Nucleic Acids Res. 2019;47:1038–43. https://doi.org/10.1093/nar/gky1151.
Martens M, Ammar A, Riutta A, Waagmeester A, Slenter DN, Hanspers K, et al. WikiPathways: connecting communities. Nucleic Acids Res. 2021;49:D613–21. https://doi.org/10.1093/nar/gkaa1024.
Waagmeester A, Kutmon M, Riutta A, Miller R, Willighagen EL, Evelo CT, et al. Using the semantic web for rapid integration of wikipathways with other biological online data resources. PLoS Comput Biol. 2016;12:e1004989. https://doi.org/10.1371/journal.pcbi.1004989.
Tweedie S, Braschi B, Gray K, Jones TEM, Seal RL, Yates B, et al. Genenames.org: the HGNC and VGNC resources in 2021. Nucleic Acids Res. 2021;49:D939-46. https://doi.org/10.1093/nar/gkaa980.
Wishart DS, Feunang YD, Marcu A, Guo AC, Liang K, Vázquez-Fresno R, et al. HMDB 4.0: the human metabolome database for 2018. Nucleic Acids Res. 2018;46:D608-17. https://doi.org/10.1093/nar/gkx1089.
Otasek D, Morris JH, Bouças J, Pico AR, Demchak B. Cytoscape automation: empowering workflow-based network analysis. Genome Biol. 2019;20:185. https://doi.org/10.1186/s13059-019-1758-4.
Monostori P, Klinke G, Hauke J, Richter S, Bierau J, Garbade SF, et al. Extended diagnosis of purine and pyrimidine disorders from urine: LC MS/MS assay development and clinical validation. PLoS One. 2019;14:e0212458. https://doi.org/10.1371/journal.pone.0212458.
Galgonek J, Hurt T, Michlíková V, Onderka P, Schwarz J, Vondrášek J. Advanced SPARQL querying in small molecule databases. J Cheminform. 2016;8:31. https://doi.org/10.1186/s13321-016-0144-4.
Slenter D, WikiPathways Sept 2021 Release-RDF Data. WikiPathways Sept 2021 Release - RDF data . Zenodo; 2021. https://zenodo.org/record/5632921
Lee JJY, Wasserman WW, Hoffmann GF, van Karnebeek CDM, Blau N. Knowledge base and mini-expert platform for the diagnosis of inborn errors of metabolism. Genet Med. 2018;20:151–8. https://doi.org/10.1038/gim.2017.108.
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003;13:2498–504. https://doi.org/10.1101/gr.1239303.
Kutmon M, Lotia S, Evelo CT, Pico AR. WikiPathways App for Cytoscape: Making biological pathways amenable to network analysis and visualization. F1000Res. 2014;3:152. https://doi.org/10.12688/f1000research.4254.2.
Kruszka P, Regier D. Inborn errors of metabolism: from preconception to adulthood. Am Fam Physician. 2019;99:25–32.
Steinbusch LKM, Wang P, Waterval HWAH, Stassen FAPM, Coene KLM, Engelke UFH, et al. Targeted urine metabolomics with a graphical reporting tool for rapid diagnosis of inborn errors of metabolism. J Inherit Metab Dis. 2021;44:1113–23. https://doi.org/10.1002/jimd.12385.
Deodato F, Boenzi S, Santorelli FM, Dionisi-Vici C. Methylmalonic and propionic aciduria. Am J Med Genet C Semin Med Genet. 2006;142C:104–12. https://doi.org/10.1002/ajmg.c.30090.
Dappert A, Farquhar A, Kotarski R, Hewlett K. Connecting the persistent identifier ecosystem: building the technical and human infrastructure for open research. Data Sci J. 2017. https://doi.org/10.5334/dsj-2017-028.
Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. https://doi.org/10.1038/sdata.2016.18.
Weininger D. SMILES, a chemical language and information system. Introduction to methodology and encoding rules. J Chem Inf Model. 1988;28:31–6. https://doi.org/10.1021/ci00057a005.
Kyle JE, Aimo L, Bridge AJ, Clair G, Fedorova M, Helms JB, et al. Interpreting the lipidome: bioinformatic approaches to embrace the complexity. Metabolomics. 2021;17:55. https://doi.org/10.1007/s11306-021-01802-6.
Hanspers K, Kutmon M, Coort SL, Digles D, Dupuis LJ, Ehrhart F, et al. Ten simple rules for creating reusable pathway models for computational analysis and visualization. PLoS Comput Biol. 2021;17:e1009226. https://doi.org/10.1371/journal.pcbi.1009226.
Domingo-Fernández D, Hoyt CT, Bobis-Álvarez C, Marín-Llaó J, Hofmann-Apitius M. ComPath: an ecosystem for exploring, analyzing, and curating mappings across pathway databases. NPJ Syst Biol Appl. 2019;5:3. https://doi.org/10.1038/s41540-018-0078-8.
Rodchenkov I, Babur O, Luna A, Aksoy BA, Wong JV, Fong D, et al. Pathway Commons 2019 Update: integration, analysis and exploration of pathway data. Nucleic Acids Res. 2020;48:D489–97. https://doi.org/10.1093/nar/gkz946.
Hanspers K, Riutta A, Summer-Kutmon M, Pico AR. Pathway information extracted from 25 years of pathway figures. Genome Biol. 2020;21:273. https://doi.org/10.1186/s13059-020-02181-2.
Stobbe MD, Houten SM, Jansen GA, van Kampen AHC, Moerland PD. Critical assessment of human metabolic pathway databases: a stepping stone for future integration. BMC Syst Biol. 2011;5:165. https://doi.org/10.1186/1752-0509-5-165.
Hellwich K-H, Hartshorn RM, Yerin A, Damhus T, Hutton AT. Brief Guide to the Nomenclature of Organic Chemistry (IUPAC Technical Report). IUPAC Standards Online. https://doi.org/10.1515/iupac.92.0027
Lowe DM, Corbett PT, Murray-Rust P, Glen RC. Chemical name to structure: OPSIN, an open source solution. J Chem Inf Model. 2011;51:739–53. https://doi.org/10.1021/ci100384d.
Rajan K, Zielesny A, Steinbeck C. STOUT: SMILES to IUPAC names using neural machine translation. J Cheminform. 2021;13:34. https://doi.org/10.1186/s13321-021-00512-4.
Tarkowska A, Carvalho-Silva D, Cook CE, Turner E, Finn RD, Yates AD. Eleven quick tips to build a usable REST API for life sciences. PLoS Comput Biol. 2018;14:e1006542. https://doi.org/10.1371/journal.pcbi.1006542.
Marshall MS, Boyce R, Deus HF, Zhao J, Willighagen EL, Samwald M, et al. Emerging practices for mapping and linking life sciences data using RDF—a case series. Web Semant. 2012;14:2–13.
Heller SR, McNaught A, Pletnev I, Stein S, Tchekhovskoi D. InChI, the IUPAC International Chemical Identifier. J Cheminform. 2015;7:23. https://doi.org/10.1186/s13321-015-0068-4.
Kratochvíl M, Vondrášek J, Galgonek J. Sachem: a chemical cartridge for high-performance substructure search. J Cheminform. 2018;10:27. https://doi.org/10.1186/s13321-018-0282-y.
Schriml LM, Mitraka E, Munro J, Tauber B, Schor M, Nickle L, et al. Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res. 2019;47:D955–62. https://doi.org/10.1093/nar/gky1032.
Köhler S, Gargano M, Matentzoglu N, Carmody LC, Lewis-Smith D, Vasilevsky NA, et al. The human phenotype ontology in 2021. Nucleic Acids Res. 2021;49:D1207–17. https://doi.org/10.1093/nar/gkaa1043.
Ferreira CR, van Karnebeek CDM, Vockley J, Blau N. A proposed nosology of inborn errors of metabolism. Genet Med. 2019;21:102–6. https://doi.org/10.1038/s41436-018-0022-8.
Bohler A, Wu G, Kutmon M, Pradhana LA, Coort SL, Hanspers K, et al. Reactome from a WikiPathways perspective. PLoS Comput Biol. 2016;12:e1004941. https://doi.org/10.1371/journal.pcbi.1004941.
Wu G, Feng X, Stein L. A human functional protein interaction network and its application to cancer data analysis. Genome Biology. 2010. p. R53. https://doi.org/10.1186/gb-2010-11-5-r53
Nishida K, Ono K, Kanaya S, Takahashi K. KEGGscape: a Cytoscape app for pathway data integration. F1000Res. 2014;3:144. https://doi.org/10.12688/f1000research.4524.1.
König M. cy3sabiork: a cytoscape app for visualizing kinetic data from SABIO-RK. F1000Res. 2016. https://doi.org/10.1101/062091.
We are grateful to all participating patients and their families. Four patient samples were kindly provided by Dr. P. Fitzsimons, Dr. A. Monavari, and Dr. E. Crushell from the Children's University Hospital and the National Centre for Inherited Metabolic Disorders, Temple Street, Ireland. We also want to acknowledge the work done by the laboratory technicians Huub Waterval, Marjon Ortmans-Ploemen, Sandra Busch, Jean-Pierre Bollen, and Karin Habets-van der Poel from the laboratory of Clinical Genetics at Maastricht University Medical Center (MUMC +).
The research reported in this manuscript was supported by the European Union’s Horizon 2020 research and innovation program under the EJP RD COFUND-EJP N° 825575.
Ethics approval and consent to participate
The Medical Research Involving Human Subjects Act (WMO) did not apply to the study mentioned above, official approval of this study by our committee was not required according to the Medical-Ethical Review Committee (METC) azM/UM.
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Slenter, D.N., Hemel, I.M.G.M., Evelo, C.T. et al. Extending inherited metabolic disorder diagnostics with biomarker interaction visualizations. Orphanet J Rare Dis 18, 95 (2023). https://doi.org/10.1186/s13023-023-02683-9
- Clinical metabolic biomarkers
- Purine and pyrimidine metabolism
- Urea cycle
- Semantic web technologies
- Network data analysis
- Systems biology