- Open Access
The diagnosis of inherited metabolic diseases by microarray gene expression profiling
© Hernandez et al; licensee BioMed Central Ltd. 2010
- Received: 21 July 2010
- Accepted: 1 December 2010
- Published: 1 December 2010
Inherited metabolic diseases (IMDs) comprise a diverse group of generally progressive genetic metabolic disorders of variable clinical presentations and severity. We have undertaken a study using microarray gene expression profiling of cultured fibroblasts to investigate 68 patients with a broad range of suspected metabolic disorders, including defects of lysosomal, mitochondrial, peroxisomal, fatty acid, carbohydrate, amino acid, molybdenum cofactor, and purine and pyrimidine metabolism. We aimed to define gene expression signatures characteristic of defective metabolic pathways.
Total mRNA extracted from cultured fibroblast cell lines was hybridized to Affymetrix U133 Plus 2.0 arrays. Expression data was analyzed for the presence of a gene expression signature characteristic of an inherited metabolic disorder and for genes expressing significantly decreased levels of mRNA.
No characteristic signatures were found. However, in 16% of cases, disease-associated nonsense and frameshift mutations generating premature termination codons resulted in significantly decreased mRNA expression of the defective gene. The microarray assay detected these changes with high sensitivity and specificity.
In patients with a suspected familial metabolic disorder where initial screening tests have proven uninformative, microarray gene expression profiling may contribute significantly to the identification of the genetic defect, shortcutting the diagnostic cascade.
- False Positive Rate
- Maximum Sensitivity
- Defective Gene
- Microarray Gene Expression Profile
- Nonsense Mediate mRNA Decay
At least 300 different IMDs have been described  and new disorders are being identified [2, 3] due to increasing awareness and advances in identification techniques. The birth prevalence of IMDs in the West Midlands is estimated to be 1 in 784 live births, extrapolating to approximately 800 new cases per year in the UK as a whole . The majority of patients (72%) are diagnosed by the age of 15 years, with only one-third diagnosed by the age of one year. Any hope of effective treatment rests on precise and early diagnosis [4, 5]. The diagnosis of IMDs may be a long and tedious process. The first step relies on matching clinical presentation to a potentially defective metabolic pathway. These investigations may take several months to complete, and even after this time, it may not be possible to make a diagnosis. Indeed, our experience in the Purine Research Laboratory at Guy's and St Thomas' Hospitals shows that a definitive diagnosis is only made in about 1% of children investigated for a suspected purine or pyrimidine disorder, with one reason being the overlap in clinical presentation between unrelated metabolic disorders. In the majority of cases, referrals are made for purposes of disease exclusion, or as part of a differential diagnosis.
Inherited metabolic disorders included in this study and number of patients.
Num of patients
N = 68
Lysosomal storage disorders
Niemann Pick A, B, C
Purine and Pyrimidine disorders
Lesch-Nyham disease/HPRT deficiency
Purine nucleotidase (PNP) deficiency
Adenylosuccinate lyase (ADSL) deficiency
Adenosine deaminase (ADA) deficiency
Dihydropyrimidine dehydrogenase (DPD) deficiency
Rhizomelia chondrodisplasia punctata
Urea cycle defect
Fatty acid oxidation disorders
Carnitine transport defect
Short-chain acyl-CoA dehydrogenase (SCAD) deficiency
Medium-chain acyl-CoA dehydrogenase (MCAD) deficiency
Very long-Chain acyl-CoA dehydrogenase (VLCAD) deficiency
Deoxy-guanosine kinase (DGUOK) deficiency
Surfeit-1 (SURF1) deficiency
Polymerase DNA-directed gamma (POLG) deficiency
Glycerol kinase (GK) deficiency
Molybdenum cofactor deficiency
Isolated sulphite oxidase deficiency
Patient samples and tissue culture
Human skin fibroblast cell lines from 68 patients with suspected or confirmed metabolic disorders (Table 1) were recovered from the cell bank held by the Enzyme Laboratory, Medical and Molecular Genetics, Guy's Hospital. Cells were cultured with Ham's F10 medium supplemented with 10% foetal bovine serum, 2% L-glutamine (200 mM), 2% penicillin (5.0 IU/ml) and streptomycin (5.0 μg/ml) at 37°C in a closed system. Passage numbers were recorded where known.
Cell lines were screened for Mycoplasma infection using Venor®GeM mycoplasma detection kit for conventional PCR (Minerva Biolabs GmbH, Germany).
RNA extraction and microarrays
Cells were grown in triplicate to sub-confluence. Total RNA from triplicate flasks was extracted using the RNeasy® Mini kit™(QIAGEN, Crawley, UK). The pooled RNA was then concentrated using RNeasy® MinElute™Cleanup kit (QIAGEN) and quantified by spectrophotometric analysis measuring absorbance at 260 and 280 nm. Double stranded cDNA was synthesised from 5 μg RNA using the Affymetrix One-cycle cDNA synthesis kit following the manufacturer's instructions (Affymetrix, High Wycombe, UK). Synthesis of Biotin-Labelled cRNA was performed using the Affymetrix GeneChip IVT Labelling kit, following the manufacturer's instructions. Labelled cRNA was then purified (sample cleanup module) and fragmented and 15 μg hybridized to Affymetrix GeneChip® Human Genome U133 Plus 2.0 arrays overnight.
Analysis of mycroarray data
Probe level summarization of all arrays was performed twice using two different methods: Robust multiarray averaging  (RMA) and Factor analysis for robust microarray summarization  (FARMS). In addition, Informative/Non-Informative (I/NI) P-values were computed . Control probe sets and probe sets with a relatively large number of non-aligning probes or non-uniquely aligning probes were excluded. Inclusion criteria for a probe set were that 7 or more probes (out of a total of 11 for most probe sets) had to perfectly match the human transcriptome, and the median number of perfect matches per probe had to be less than 1.5 for a probe set to be included. In the case of the RMA-summarized data, a probe set had to also exceed a median expression level of 100 (linear scale) across all arrays, resulting in 11,753 probe sets entering into the subsequent analyses. In the FARMS case, only informative probe sets were considered (I/NI P-value of less than 0.6), leaving a total of 9,787 probe sets for analysis. We refer to the measurements taken by the included probe sets for a patient sample as the sample's expression profile. Principal component analysis (PCA) was applied to identify and quantify independent sources for the variance observed in the data. Matlab r2007a was used for correlation, hierarchical clustering and PCA.
We used two metrics to determine the degree to which a gene expression measurement x constitutes an outlier: Dixon's Q statistic defined as (2nd-to-minimal-value-x)/range, and a variant of Grubb's outlier test statistic MAD-Grubb and defined as (median-x)/MAD where MAD is the median absolute deviation. MAD-Grub was preferred to Grubb's standard statistic, as it is outlier-resistant, which is beneficial for the detection of outliers at the extreme low end of the distribution, since irrelevant extreme values at the high end of the distribution have little or no influence on the median or the MAD.
PCR and Sequencing analysis
The coding region of genes of interest was sequenced from genomic DNA extracted from cultured fibroblast cell lines. Intron-located primers were designed using Primer3 v.0.4.0 website  for the following genes: AGA, ADA, ADSL, GAA, ACADM, HPRT1, SURF1, MOCS2, DGUOK, NPC1, NPC2, HEXA (Additional file 1). PCR products were purified using QIAquick®PCR purification Kit (QIAGEN). Dye-terminator cycle sequencing was performed using the BigDye®terminator v3.1 cycle sequencing kit (Applied Biosystems, Warrington, UK). Excess dye terminators were removed using Agencourt®CleanSeq® (Beckman Coulter, High Wycombe, UK). Samples were run on an ABI PRISM 3130 × l Genetic Analyzer (Applied Biosystem). Sequences were analysed by Mutation Surveyor Local v3.20 (Biogene, Kimbolton UK).
Search for a gene expression signature
Outlier (NMD) detection
Genes with mutations resulting in premature termination codons and nonsense mediated decay.
c.199T > C, [Y67H]
c.350G > A, [W117X], second mutation unknown
c.7G > C, [A3P]
c.578C > T, [R190X]
c.398C > T, [R105X]
c.2560C > T, [R854X]
c.1278-1282insTATC, second mutation unknown
c.1278-1282insTATC, second mutation unknown
g.IVS6+2T > A
3'splice junction (exon insertion)
g.IVS7+1G > T
Exon 7 skipping
c.564G > C, [W228C]
exon 5 skipping
c.1189C > T, [Q397X]
c.58G > T, [E20X]
c.326-327insAT 326-336 del TCTGCCAGCC
We then determined whether NMD of the disease-causing gene was systematically detectable from the microarray data using outlier statistics. We used two metrics to determine the degree to which a gene expression measurement x constitutes an outlier relative to the patient population: Dixon's Q statistic defined as 2nd-to-minimal-value-x)/range, and a variant of Grubb's outlier test statistic MAD-Grubb defined as median-x)/MAD where MAD is the median absolute deviation. For each metric, we investigated sensitivity and specificity with respect to NMD detection. Since we suspected that the results also depended on the choice of microarray probe-level summarization method, we performed the analysis twice, using Factor analysis for robust microarray summarization (FARMS) or Robust multiarray averaging (RMA) respectively.
Using the FARMS-summarized data (Additional file 2 'FARMS_GoIs), we found that a threshold of Dixon's Q > 0.25 achieved maximum sensitivity. For 11 out of the 14 positive NMD patients, the measurement of a probe set for the specific mutated gene exceeded the threshold. Three patients (33, 49, and 93) were considered false negatives as the probe set or sets for the affected gene (ACADM, GAA, MOCS2) were excluded a priory due to having been called non-informative during data pre-processing. Therefore, lowering the Dixon's Q threshold did not increase sensitivity and hence, 11/14 was the maximally achievable sensitivity. Using the MAD-Grubb metric, maximum sensitivity was achieved with a threshold of > 4.5.
For the RMA-summarized data (Additional file 2 'RMA_GoIs'), a threshold of Dixon's Q > 0.25 gave a sensitivity of 12 out of 14 positive controls, with the false negatives being 49 and 93. Maximum sensitivity (13/14) was achieved for a threshold of 0.19, which was exceeded for a MOCS2 probe set in patient 93. Patient 49 remained a false negative due to the only GAA probe set having been excluded during pre-processing. Using the MAD-Grubb metric, maximum sensitivity was achieved with a threshold of > 5.4.
Next, we investigated the specificity of the Dixon's Q and MAD-Grubb outlier metrics. Specifically, we determined, separately for each sample, the fraction of probe sets for which Dixon's Q (or MAD-Grubb) was less than the threshold, while systematically varying the threshold. We estimated the false positive rate (FPR) for a sample as the fraction of probe sets exceeding the threshold. This is a conservative estimate, since for some of the false positive genes polymorphism affecting mRNA expression may be responsible for the decreased expression.
For FARMS-summarized data and a threshold of Dixon's Q > 0.25 (maximum sensitivity 11/14), the false positive rate (FPR) was <0.1% for all samples except sample 34 (FPR < 0.25%. In absolute terms, an FPR of < 0.1% corresponded to, on average, less than 10 probe sets per sample exceeding the threshold from a total of 9,787 probe sets. For the MAD-Grubb threshold of > 4.5 (maximum sensitivity 11/14), the FPR was < 0.9% for most samples. The exceptions were four samples 33, 36, 32, 34 with FPR > 1%, all from the same microarray batch. So, at maximum sensitivity, the FPR for the MAD-Grubb metric was an order of magnitude larger than for Dixon's Q, and MAD-Grubb was more susceptible to microarray batch effects.
For RMA-summarized data and a threshold of Dixon's Q > 0.25 (sensitivity 12/14), the FPR was < 0.25% for all but two samples (34 and 36; FPR > 1%). For MAD-Grubb > 5.4 (maximum sensitivity 13/14), the FPR was < 0.9% for all but four samples (33, 36, 32, 34; FPR > 1%). For Dixon's Q > 0.19 (maximum sensitivity 13/14), the FPR was < 0.5%, again except for samples 34 and 36. Given the total number of 11,753 probe sets in the analysis, an FPR of < 0.25% corresponds to < 30 probe sets.
No evidence of a gene expression signature characteristic of a specific metabolic disorder was found using PCA and hierarchical clustering. Few studies have attempted to characterise mRNA profiles in inherited metabolic disorders. Using microarray-generated expression data, Bozzato et al, compared three fibroblast cell lines from patients with mucolipidoses type IV, an autosomal recessive lysosomal storage disorder, to three control cell lines, and reported differential expression of a number of genes belonging to endosome/lysosome trafficking, lysosome biogenesis, organelle acidification and lipid metabolism . The authors concluded that differential expression of these genes correlated with altered biological processes associated with the disease. Bifsha et al noted down regulation of ubiquitin C-terminal hydrolase (UCH-L1) in eight different lysosomal storage disorder samples  suggesting that impairment of the ubiquitin-dependent protein degradation pathway may contribute to increased cell death seen in some of these disorders. We found no clustering of patients with lysosomal disorders that would indicate a gene expression signature. Considerable variation in levels of gene expression between different patient cell lines was found. We have however not defined a 'normal range' for the expression of individual genes as 65/68 of cell lines in the study were derived from patients with a suspected metabolic defect. Gene expression would also be expected to vary under different culture conditions to those used in this study. A proportion of the cohort variation, < 20%, can be ascribed to a batch effect or variation between different experiments. Although variation between experiments is low, this may have been sufficient to mask the identification of a metabolic signature. Non-genetic factors which may contribute to variation in gene expression seen in the population include passage number of the cell lines and differences between cell culture medium batch.
We were able to detect significantly decreased mRNA expression levels of the defective gene relative to the expression range in the study cohort in 11/68 (16%) patients. The low levels of mRNA correlating with premature termination codon (PTC) mutations are consistent with nonsense mediated mRNA decay (NMD), a process which enables the cell to eliminate faulty mRNA that would otherwise translate into aberrant truncated proteins with potential toxic effects for the organism [12–14].
Our results suggest that FARMS-summarization and Informative/Non-Informative (I/NI)-filtering  of the array data combined with the Dixon's Q outlier metric provide the best trade-off between sensitivity (> 78%; 11/14 patients) and specificity (> 99.9%) for the purpose of NMD detection. The sensitivity can be improved (> 92%) by using RMA-summarization combined with relatively conservative low-expression threshold filtering and/or using the MAD-Grubb outlier metric. However, this reduces specificity by an order of magnitude which, given the total number of tests performed (~10,000 probe sets), can lead to dozens of genes being identified as potentially undergoing NMD (Additional file 2 'NMD_summary', 'FARMS_NMD', 'RMA_NMD').
Using FARMS-summarization and I/NI-filtering of the array data, three false negatives were identified with NMD-associated mutations in ACADM, MOCS2 and GAA. These three genes were identified as outliers and true positives when Dixon's Q outlier metric was applied to the unfiltered data. This represents a limitation of the assay as only genes with significant levels of expression in fibroblasts were included in the analysis in order to maximize specificity. As a result, disease associated genes expressed at a low level or not expressed at all in fibroblasts will be excluded from the analysis.
Genes identified as false positives (FP) after FARMS-summarization and I/NI-filtering of the data combined with Dixon's Q outlier metric in true positive (TP) patients.
TP NMD gene symbol
Num of FP
LBH, SGCD, SLC1A4, PHF10, ID4, NRAS, S100A4, SHMT2, SETBP1, BACE1, LONRF1, CXXC5
SSR2, FAR1, NOL12, NAV1, TRIOBP, SCCPDH, HSP90B1
SF1, MARS, TCEA2, ANKRD13A, PHF13
LPP, SKIL, ZNF281, PDLIM7, COL1A2, AMIGO2, STUB1, CD44, RAD23A, ZNF598, PCGF1, EMP1, FXYD5
STEAP1, MAP4K4, TMEM22, ASCC2, PDLIM4, HGS, ACAP3, PNKP, EMP3, LMNA, FLII, C11orf68, FLI10357
IL1R1, APLP2, SLC30A1, ANKRD57, APLP2, SOCS2, RECK
There are more than 300 different inherited metabolic diseases [4, 15]. Nonsense and frameshift mutations generating PTCs account for approximately one third of mutations in human genetic diseases . In our study, the defective gene could be identified in 16% of patients with an IMD. Fibroblast cell cultures are often established in patients with suspected familial metabolic disorders where initial screening tests have proven uninformative. It is in this group of patients where gene expression may contribute significantly to shortcutting the diagnostic cascade.
In this study, we investigated whether microarray gene expression profiling of cultured fibroblasts could identify the metabolic defect in 68 patients with proven or suspected inherited metabolic diseases. Using this approach, we were able to identify the defective gene in 16% of patients irrespective of the underlying metabolic defect. There are a number of emerging technologies which will find application in the routine diagnosis of genetic disorders. These include targeted re-sequencing chips aimed at specific groups of disorders  and massively parallel next generation sequencing, which is orders of magnitude more expensive than gene expression profiling. We suggest that due to the relatively low cost of microarray gene expression profiling, this technology has a role to play in the diagnosis of genetic disorders where first-line screening tests are uninformative.
We acknowledge financial support from Guy's and St.Thomas' Charity. The Purine metabolic patients association (PUMPA) provided funding for a pilot study. We would like to thank Tina Slade for her assistance with tissue culture.
- Martins AM: Inborn errors of metabolism: a clinical overview. Sao Paulo Med J. 1999, 117: 251-265.PubMedGoogle Scholar
- Haberle J, Gorg B, Toutain A, Rutsch F, Benoist JF, Gelot A, Suc AL, Koch HG, Schliess F, Haussinger D: Inborn error of amino acid synthesis: human glutamine synthetase deficiency. J Inherit Metab Dis. 2006, 29: 352-358. 10.1007/s10545-006-0256-5.View ArticlePubMedGoogle Scholar
- Marie S, Heron B, Bitoun P, Timmerman T, Van Den Berghe G, Vincent MF: AICA-ribosiduria: a novel, neurologically devastating inborn error of purine biosynthesis caused by mutation of ATIC. Am J Hum Genet. 2004, 74: 1276-1281. 10.1086/421475.PubMed CentralView ArticlePubMedGoogle Scholar
- Sanderson S, Green A, Preece MA, Burton H: The incidence of inherited metabolic disorders in the West Midlands, UK. Arch Dis Child. 2006, 91: 896-899. 10.1136/adc.2005.091637.PubMed CentralView ArticlePubMedGoogle Scholar
- Raghuveer TS, Garg U, Graf WD: Inborn errors of metabolism in infancy and early childhood: an update. Am Fam Physician. 2006, 73: 1981-1990.PubMedGoogle Scholar
- Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP: Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 2003, 31: e15-10.1093/nar/gng015.PubMed CentralView ArticlePubMedGoogle Scholar
- Hochreiter S, Clevert DA, Obermayer K: A new summarization method for Affymetrix probe level data. Bioinformatics. 2006, 22: 943-949. 10.1093/bioinformatics/btl033.View ArticlePubMedGoogle Scholar
- Talloen W, Clevert DA, Hochreiter S, Amaratunga D, Bijnens L, Kass S, Gohlmann HW: I/NI-calls for the exclusion of non-informative genes: a highly effective filtering tool for microarray data. Bioinformatics. 2007, 23: 2897-2902. 10.1093/bioinformatics/btm478.View ArticlePubMedGoogle Scholar
- Rozen S, Skaletsky H: Primer3 on the WWW for general users and for biologist programmers. Methods Mol Biol. 2000, 132: 365-386.PubMedGoogle Scholar
- Bozzato A, Barlati S, Borsani G: Gene expression profiling of mucolipidosis type IV fibroblasts reveals deregulation of genes with relevant functions in lysosome physiology. Biochim Biophys Acta. 2008, 1782: 250-258.View ArticlePubMedGoogle Scholar
- Bifsha P, Landry K, Ashmarina L, Durand S, Seyrantepe V, Trudel S, Quiniou C, Chemtob S, Xu Y, Gravel RA: Altered gene expression in cells from patients with lysosomal storage disorders suggests impairment of the ubiquitin pathway. Cell Death Differ. 2007, 14: 511-523. 10.1038/sj.cdd.4402013.View ArticlePubMedGoogle Scholar
- Chang YF, Imam JS, Wilkinson MF: The nonsense-mediated decay RNA surveillance pathway. Annu Rev Biochem. 2007, 76: 51-74. 10.1146/annurev.biochem.76.050106.093909.View ArticlePubMedGoogle Scholar
- Le Hir H, Seraphin B: EJCs at the heart of translational control. Cell. 2008, 133: 213-216. 10.1016/j.cell.2008.04.002.View ArticlePubMedGoogle Scholar
- Muhlemann O: Recognition of nonsense mRNA: towards a unified model. Biochem Soc Trans. 2008, 36: 497-501. 10.1042/BST0360497.View ArticlePubMedGoogle Scholar
- Seymour CA, Thomason MJ, Chalmers RA, Addison GM, Bain MD, Cockburn F, Littlejohns P, Lord J, Wilcox AH: Newborn screening for inborn errors of metabolism: a systematic review. Health Technol Assess. 1997, 1: i-iv. 1-95PubMedGoogle Scholar
- Linde L, Kerem B: Introducing sense into nonsense in treatments of human genetic diseases. Trends Genet. 2008, 24: 552-563. 10.1016/j.tig.2008.08.010.View ArticlePubMedGoogle Scholar
- Bruce CK, Smith M, Rahman F, Liu ZF, McMullan DJ, Ball S, Hartley J, Kroos MA, Heptinstall L, Reuser AJ: Design and validation of a metabolic disorder resequencing microarray (BRUM1). Hum Mutat. 2010, 31: 858-865. 10.1002/humu.21261.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.