The LORIS MyeliNeuroGene rare disease database for natural history studies and clinical trial readiness
Orphanet Journal of Rare Diseases volume 16, Article number: 328 (2021)
Rare diseases are estimated to affect 150–350 million people worldwide. With advances in next generation sequencing, the number of known disease-causing genes has increased significantly, opening the door for therapy development. Rare disease research has therefore pivoted from gene discovery to the exploration of potential therapies. With impending clinical trials on the horizon, researchers are in urgent need of natural history studies to help them identify surrogate markers, validate outcome measures, define historical control patients, and design therapeutic trials.
We customized a browser-accessible multi-modal (e.g. genetics, imaging, behavioral, patient-determined outcomes) database to increase cohort sizes, identify surrogate markers, and foster international collaborations. Ninety data entry forms were developed including family, perinatal, developmental history, clinical examinations, diagnostic investigations, neurological evaluations (i.e. spasticity, dystonia, ataxia, etc.), disability measures, parental stress, and quality of life. A customizable clinical letter generator was created to assist in continuity of patient care.
Small cohorts and underpowered studies are a major challenge for rare disease research. This online, rare disease database will be accessible from all over the world, making it easier to share and disseminate data. We have outlined the methodology to become Title 21 Code of Federal Regulations Part 11 Compliant, which is a requirement to use electronic records as historical controls in clinical trials in the United States. Food and Drug Administration compliant databases will be life-changing for patients and families when historical control data is used for emerging clinical trials. Future work will leverage these tools to delineate the natural history of several rare diseases and we are confident that this database will be used on a larger scale to improve care for patients affected with rare diseases.
According to the World Health Organization (WHO), the definition of a rare disease is one that affects every 1 in 2000 people or less. The global prevalence of these approximately 8000 rare genetic disorders is estimated to be between 150 and 350 million people [1,2,3,4,5,6,7]. Historically, rare diseases have been notoriously difficult to diagnose due to their heterogeneous phenotypes and genotypes . Since only around 5% of all rare diseases have a FDA-approved treatment, many orphan diseases utilize off-label indications of medications approved for other purposes . However, an incredible amount of advancement in the description of novel rare disease entities and the identification of novel disease-causing genes has been accomplished over the last decade using rapidly evolving genetic technologies, including with the most recent use of next generation sequencing (NGS). Opening the door for studies investigating disease pathogenesis and potential therapeutic approaches has pivoted rare disease research from gene discovery towards investigating potential treatments 
With impending clinical trials on the horizon, rare disease researchers are realizing a tremendous need for natural history data [10, 11]. The goal of a natural history study is to recruit patients for longitudinal analysis of natural disease progression . The data gathered is used to help identify surrogate markers, determine the best outcome measures to be used in potential therapeutic trials, can serve as the control arm and serve as benchmarks for efficacy in one arm rare disease trials [13,14,15,16,17]. Natural history studies result in incredible amounts of information being collected, including clinical, behavioral, sociodemographic, genetic, imaging, and patient and family reported outcomes.
This diversity and quantity of data can be difficult to manage, so rare disease researchers must begin to utilize information management systems, or databases, to facilitate natural history studies. Rare disease research relies heavily on international collaboration and data sharing in order to recruit large patient populations to obtain adequate statistical power [6, 18]. Therefore, utilizing an online database can uniquely benefit rare disease research more than other disease research fields where significant patient populations are more prevalent .
If rare disease databases are going to be successful in future clinical trials, they must adhere to local and international regulations for electronic records. Title 21 Code of Federal Regulations (CFR) Part 11 published in 1997, from the U.S. Food and Drug Administration, outlines what is considered trustworthy, reliable record keeping. These regulations apply to any FDA-regulated industry, such as pharmaceutical companies, medical device manufacturers, biotechnological companies, and clinical research organizations. We chose to adhere to all general requirements that will be detailed below in the Methods section.
There are a variety of different databases available to aid researchers such as RedCap , Deduce , HID , DFBIdb , LONI , MIND , NeuroLOG , etc. We elected to customize the Longitudinal Online Research and Imaging System (LORIS) [27,28,29,30] to help organize data and facilitate international collaborations when conducting multi-site natural history studies because of its strong track record and the fact that it is open source. Here, we detail below how our group used LORIS and 21 CFR Part 11 guidelines to set up workflows and develop the LORIS MyeliNeuroGene Database for Rare Diseases to lead us to clinical trial preparedness in the coming years.
An instance of LORIS was installed and configured for the MyeliNeuroGene Research Group at the Research Institute of the McGill University Health Centre. This database is easily accessible via a web browser and multi-modal, with the ability to capture genetics data, medical history, medical imaging, detailed assessments of cognition and motor function, and patient-derived outcomes, among other things.
Within LORIS, data entry forms, or instruments, were created using the “Instrument Builder” module. Using the workflow found in the Methods section, 90 LORIS instruments were created, 62 of which had scoring algorithms developed to aid in data processing.
Detailed phenotyping including family history, perinatal history, developmental history, clinical evolution, time to event (i.e. time to reaching specific disease milestones such as loss of independent ambulation, dependency to tube feeding, etc.), neurological examination, neuropsychological assessment, etc. were developed in conjunction with other parent- and patient-reported outcomes such as quality of life, disability, and stress. The resulting instruments are summarized in Table 1.
One thousand patients and family members with rare diseases have been included into LORIS and assigned unique identifiers. This includes activation of enrollment, informed consent designation, external identifier logging, and family relationship mapping.
In addition, a dynamic letter generator is currently in development to assist in forwarding patient information to other physicians. The tool compiles the patient’s data, entered via the phenotyping instruments, into a Clinical Examination Letter. In place of the database field names, highlighted in yellow in Fig. 1, an instance of the letter renders the patient data for the corresponding field. The Clinical Examination Letter can be exported as an editable word document that details patient information, such as family history, clinical evolution, time to event and future plans for investigations. This letter can then be sent to the referring physicians for continuity of care, and has the advantage of not duplicating work done by the data entry clinician; as the clinician sees the patient and enters the data in the LORIS MyeliNeuroGene Database, the clinical note is auto-populated.
Most patients affected with rare diseases, from mildly to severely affected, support data sharing to promote research, healthcare, and knowledge transfer . We have built and customized a LORIS database and detailed our workflow to aid rare disease researchers to create their own information management system, electronic health records, or database. There is a major need and benefit to sharing data in rare disease research. De-identifying and sharing information allows rare disease researchers to efficiently study disorders by collaborating and minimizing redundant studies , and by maximizing sample sizes.
An exportable dynamic letter generator has also been developed to save time when examining patients referred to the clinic. Patients with a rare disease who come to the Montreal Children’s Hospital undergo a battery of tests that can take up to two days to complete. These tests are performed in a standardized order at each visit (i.e. the order they appear in the database), to ensure consistency between research visits and research patients. All information is stored in the LORIS MyeliNeuroGene Database and can be exported in the form of a Clinical Examination Letter detailing all results, impressions, and plans to help treat the patients. This letter is then sent back to the referring physician for continuity of care. When this letter is written by hand it takes a few hours and introduces numerous chances for human error. Exporting the letter from quality-controlled instruments reduces this error and saves researchers’ and physicians’ time.
In addition to the clinical phenotyping instruments and dynamic letter generator, we have outlined, for the first time, the methodology to become Title 21 Code of Federal Regulations Part 11 Compliant, which is a requirement to use electronic records as historical controls in clinical trials in the United States [32, 33]. To our knowledge, our manuscript is the first to outline the requirements to adhere to 21 Code of Federal Regulations Part 11 Compliance. Future work will leverage the tools developed in this project to delineate the natural history of several rare diseases and will hopefully be used by clinicians and researchers around the globe.
A major obstacle in rare disease research is overcoming small cohorts. Developing an online database that international collaborators can access and contribute to from all over the world is invaluable for increasing cohort sizes, discerning surrogate markers, and improving natural history data. Using this FDA compliant natural history data to validate outcome measures will be life-changing for patients and families because it will lead to historical control data that can be used in emerging clinical trials.
Title 21 code of federal regulations part 11 compliance (part 11 compliance) 
To adhere to Part 11 Compliance regulations, the LORIS MyeliNeuroGene Database has been customized to include additional security measures such as time stamped audit trails. We are currently implementing the electronic signatures and the 2-factor authentication. There is a gap in scientific literature detailing workflow and database development. As such, we will summarize the general requirements of Part 11 Compliance below and how they were implemented into our database.
Users are required to have their credentials (e.g. education, training, experience) verified before performing tasks within the database. Written policy must be signed holding users accountable and responsible for their electronic signatures (discussed further below). This written policy must be stored, and a hard copy sent to the Office of Regional Operations (HFC-100), 5600 Fishers Lane, Rockville, MD 20857.
This is a method of verifying an individual’s identity based on a measurement of the individual’s physical features (i.e. fingerprints, etc.) or repeatable action that are unique to that person. In our case, we chose to use a unique pin separate from an authorized user’s password for 2-factor authentication.
The MyeliNeuroGene database is a closed environment, meaning that access to the system is controlled by the same people who are responsible for the content of the electronic records. This includes the researchers and principal investigator. Operational audits on the system are done on a routine basis. Time stamp audit trails are tracked for each authorized user to trace creation, modification, or deletion of any instrument, visit, or other electronic record. User access is hierarchical, meaning some users do not have full access to the database and may only have “read” or “write” access. The database also must ensure that no user has the same pin or password, and that pins and passwords are periodically checked and changed to prevent unauthorized use. If unauthorized use occurs, there are immediate system security notifications. Per Canadian predicate rules, records must be stored for 25 years after study completion. United States record retention rules require storage for a minimum of 10 years.
This includes any combination of text, graphic, data, audio, or other information that is represented in digital form by the database. Electronic signatures must include printed names of the signers, dates and times, meanings (e.g. approval, creation, reviewing), and an internal audit trail. These signatures are legally binding. Authority checks are completed every month to ensure only authorized users may sign, input, output, or modify records.
A digital signature combines the electronic signature and its corresponding cryptographic authentification, usually a pin and/or password that is used to verify the identity of the signer. It cannot be copied or pasted to or from another document, making it inexorably linked to the signed document. To not become cumbersome, continuous signing periods only require the first to be two factors authenticated with a biometric identification and password.
It is highly recommended that after database development a third-party auditor inspects the system and documentation put in place. Auditors alert parties of any gaps or shortcomings and can advise developers of what needs to be changed for full compliance with local and international regulations. This will be organized for the MyeliNeuroGene database.
LORIS database and workflow
LORIS is a web-based data and project management software that stores demographic, clinical, behavioral, genetic, imaging, and patient-related outcomes accessible from any computer browser connected to the internet . Multiple sites can enter, organize, and validate data under one management framework. Longitudinal data is organized around the “Subject Profile”. Clinical examination, imaging data, outcome measures, and metadata are organized by “Visits”. All stored information is de-identified and can be queried by an authorized user. Source documentation can be uploaded and affiliated with each visit. Quality control is ensured by automated scoring of clinical, behavioral and patient-reported outcomes, validating data types (string vs numerical), and requiring double data entry where necessary.
To properly set up our rare disease database, we first began by drafting a data dictionary in the form of an Excel sheet. This spreadsheet outlined all of the data entry forms, or instruments, that would be developed using the LORIS Instrument Builder module detailed below. After instrument creation, participant enrollment and data entry can begin, with query and dissemination details tackled later. An overview of the workflow can be found in Fig. 2.
Within LORIS are different modules to help researchers with no computer science or programming experience. The Instrument Builder module aids in the creation of demographic, clinical phenotyping, behavioral, genetic, imaging, and patient-related outcome measures. Each instrument can be customized with specific information such as a “Header”, “Label”, and “Scored Field” that give the instrument title, background information, and automatically calculated scoring respectively.
Data entry can be standardized using a “Textbox”, “Text area”, “Dropdown”, “Multiselect”, “Date”, and “Numeric” question entry. Each question is assigned a variable name “Question Name”, for calculations and data querying, and “Question Text” which asks the pertinent question at hand. For Dropdown questions, instrument specific options can be added for every question.
Instruments were first planned and drafted using Excel in the form of a Data Dictionary. Columns consisted of Question Names, type of question (e.g. Numeric, Dropdown, etc.), Question Text, Question Options (available choices), and Formulas (for later calculations). Each row represented one question. Using the Data Dictionary and the Instrument Builder module on LORIS, each instrument was created: demographic forms, clinical phenotyping (i.e. spasticity and dystonia measures, gross and fine motor, eating and drinking function, ataxia, intelligence, disability, swallowing evaluations etc.), behavioral, genetic, imaging (i.e. MRI analyses), and patient-related outcomes (i.e. health-related quality of life, parental stress, pain characterization, etc.). Instruments’ files were then uploaded onto the MyeliNeuroGene private repository on GitHub as Pull Requests for review.
After instrument completion, a PHP scoring script was developed for instruments that required them. Automatic scoring reduces human error and dramatically decreases time spent on calculations. Scoring scripts were also uploaded onto the GitHub repository for review.
After instruments and scoring scripts were developed, they were uploaded to the MyeliNeuroGene private repository on GitHub as Pull Requests for review. After revision and modification (if necessary), the Pull Requests were approved, and the instruments made available on an insulated LORIS staging server where beta testing occurred. After testing was completed, instruments were pushed to the LORIS production server for instrument pipeline completion and data entry.
Before data entry could be completed, Subject Profiles had to be entered. Our group has consented more than 1000 patients and family members with different rare diseases since 2011, and patient and family recruitment is ongoing. To create a new profile, “Date of Birth”, “Sex”, “Site” (in the case of a multi-site study), and “Project” must be entered. Projects can be separated into different studies such as natural history, imaging, genetic, or even clinical trials assessing therapeutics. A new Subject Profile, or candidate, generates two identifier codes, a DCCID and a PSCID which are unique LORIS identifiers.
After the creation of the Subject Profile, each candidate was activated in the study, designated for which informed consent form was signed, and mapped to any external identifier codes. Under “Participant Status”, we tracked the participant’s status in the study (e.g. Active, Death, Lost to Follow-up, etc.). Comments can be entered with both time, date, and author history tracked in the internal audit trail. “Consent Status” tracks the latest signed Research Ethics Board (REB) approved informed consent form. Finally, mapping the “External Identifier” is crucial for future correspondence with family doctors and other collaborators.
“Create time point” allows for data entry of clinical, behavioral, and patient determined outcomes that were created during the Instrument Creation process. It also enables uploading of any imaging data collected. We customized our time points to correspond to the age of the patient. For instance, a participant’s birth date would be time point T000, and a follow-up appointment 6 months later would be time point T006. A prenatal examination 1 month before a T000 examination would be designated as T-001. The steps to creating a time point can be seen in Figs. 3, 4, and 5.
Selecting time point T000 opens a page for all instruments developed to work on our database (Fig. 6). Time points can be customized so that only specific instruments are available to participants at specific ages. Entering multiple visits allows for prospective tracking.
We have further customized LORIS to include Family Relationship information. Linking de-identified individuals allows us to link a given patient’s disease characteristics to his/her parents’ reported measures such as parental stress or patient/parents/sibling’s quality of life. It also allows us to organize family genetic results when next generation sequencing (NGS) investigations are being conducted as well as any family/parent reported outcomes.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
Activities of Daily Living
Communication Function Classification System
Code of Federal Regulations
Eating and Drinking Ability Classification System
Food and Drug Administration
Fiberoptic Endoscopic Evaluation of Swallowing
Global Dystonia Scale
Gross Motor Function Classification System
Guy's Neurological Disability Scale
Human Clinical Imaging Database
Longitudinal Online Research and Imaging System
LONI Image Data Archive
Manual Ability Classification System
Modified Ashworth Scale
Magnetic Resonance Imaging
Next Generation Sequencing
PHP: Hypertext Preprocessor
Research Ethics Board
Speech and Language Therapy
Video Fluoroscopic Swallow Study
Aymé S, Urbero B, Oziel D, Lecouturier E, Biscarat AC. Information on rare diseases: the Orphanet project. Rev Med Interne. 1998;19:376S-S377.
Baird PA, Anderson TW, Newcombe HB, Lowry RB. Genetic disorders in children and young adults: a population study. Am J Hum Genet. 1988;42(5):677.
Humphreys G. Coming together to combat rare diseases. SciELO Public Health; 2012.
McKusick VA. Mendelian inheritance in man and its online version, OMIM. Am J Hum Genet. 2007;80(4):588–604.
Wakap SN, Lambert DM, Olry A, Rodwell C, Gueydan C, Lanneau V, et al. Estimating cumulative point prevalence of rare diseases: analysis of the Orphanet database. Eur J Hum Genet. 2019:1–9.
Boycott KM, Vanstone MR, Bulman DE, MacKenzie AE. Rare-disease genetics in the era of next-generation sequencing: discovery to translation. Nat Rev Genet. 2013;14(10):681.
Sawyer SL, Hartley T, Dyment DA, Beaulieu CL, Schwartzentruber J, Smith A, et al. Utility of whole-exome sequencing for those near the end of the diagnostic odyssey: time to address gaps in care. Clin Genet. 2016;89(3):275–84.
Samuels ME (2010) Saturation of the human phenome. Curr Genom. 11(7):482–99.
Sardana D, Zhu C, Zhang M, Gudivada RC, Yang L, Jegga AG. Drug repositioning for orphan diseases. Brief Bioinform. 2011;12(4):346–56.
Griggs RC, Batshaw M, Dunkle M, Gopal-Srivastava R, Kaye E, Krischer J, et al. Clinical research for rare disease: opportunities, challenges, and solutions. Mol Genet Metab. 2009;96(1):20–6.
Helman G, Van Haren K, Bonkowsky JL, Bernard G, Pizzino A, Braverman N, et al. Disease specific therapies in leukodystrophies and leukoencephalopathies. Mol Genet Metab. 2015;114(4):527–36.
Prevention CfDCa. Principles of Epidemiology in Public Health Practice, Third Edition: An Introduction to Applied Epidemiology and Biostatistics. cdc.gov: U.S. Department of Health and Human Services; 2006.
Hobbs BP, Sargent DJ, Carlin BP. Commensurate priors for incorporating historical information in clinical trials using general and generalized linear models. Bayesian Anal (Online). 2012;7(3):639.
Neuenschwander B, Capkun-Niggli G, Branson M, Spiegelhalter DJ. Summarizing historical information on controls in clinical trials. Clin Trials. 2010;7(1):5–18.
Pocock SJ. The combination of randomized and historical controls in clinical trials. J Chronic Dis. 1976;29(3):175–88.
Viele K, Berry S, Neuenschwander B, Amzal B, Chen F, Enas N, et al. Use of historical control data for assessing treatment effects in clinical trials. Pharm Stat. 2014;13(1):41–54.
Fouarge E, Monseur A, Boulanger B, Annoussamy M, Seferian AM, De Lucia S, et al. Hierarchical Bayesian modelling of disease progression to inform clinical trial design in centronuclear myopathy. Orphanet J Rare Dis. 2021;16(1):3.
Courbier S, Dimond R, Bros-Facer V. Share and protect our health data: an evidence based approach to rare disease patients’ perspectives on data sharing and data protection—quantitative survey and recommendations. Orphanet J Rare Dis. 2019;14(1):175.
Scheible R, Rusch S, Guzman D, Mahlaoui N, Ehl S, Kindle G. The NEW ESID online database network. Bioinformatics. 2019;35(24):5367–9.
Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J Biomed Inform. 2009;42(2):377–81.
Horvath MM, Winfield S, Evans S, Slopek S, Shang H, Ferranti J. The DEDUCE Guided Query tool: providing simplified access to clinical data for research and quality improvement. J Biomed Inform. 2011;44(2):266–76.
Ozyurt IB, Keator DB, Wei D, Fennema-Notestine C, Pease KR, Bockholt J, et al. Federated web-accessible clinical data management within an extensible neuroimaging database. Neuroinformatics. 2010;8(4):231–49.
Adamson CL, Wood AG. DFBIdb: a software package for neuroimaging data management. Neuroinformatics. 2010;8(4):273–84.
Dinov I, Van Horn J, Lozev K, Magsipoc R, Petrosyan P, Liu Z, et al. Efficient, distributed and interactive neuroimaging data analysis using the LONI pipeline. Front Neuroinform. 2009;3:22.
Bockholt HJ, Scully M, Courtney W, Rachakonda S, Scott A, Caprihan A, et al. Mining the mind research network: a novel framework for exploring large scale, heterogeneous translational neuroscience research data sources. Front Neuroinform. 2010;3:36.
Gibaud B, Kassel G, Dojat M, Batrancourt B, Michel F, Gaignard A, et al., editors. NeuroLOG: sharing neuroimaging data using an ontology-based federated approach. AMIA Annual Symposium Proceedings; 2011: American Medical Informatics Association.
Das S, Zijdenbos AP, Harlap J, Vins D, Evans AC. LORIS: a web-based data management system for multi-center studies. Front Neuroinform. 2011;5:37.
Das S, Glatard T, MacIntyre LC, Madjar C, Rogers C, Rousseau ME, et al. The MNI data-sharing and processing ecosystem. Neuroimage. 2016;124(Pt B):1188–95.
Das S, Glatard T, Rogers C, Saigle J, Paiva S, MacIntyre L, et al. Cyberinfrastructure for open science at the Montreal Neurological Institute. Front Neuroinform. 2017;10(53):1–13.
Das S, Lecours Boucher X, Rogers C, Makowski C, Chouinard-Decorte F, Oros Klein K, et al. Integration of “omics” Data and phenotypic data within a unified extensible multimodal framework. Front Neuroinform. 2018;12(91):1–16.
El Emam K, Rodgers S, Malin B. Anonymising and sharing individual patient data. bmj. 2015;350:h1139.
Administration USFaD. Guidance for Industry: Part 11, Electronic Records; Electronic Signatures - Scope and Application. In: Services USDoHaH, editor. fda.gov: U.S. Department of Health and Human Services; 2003. p. 12.
Administration UFaD. Rare diseases: natural history studies for drug development guidance for industry fda.gov: U.S. Department of Health and Human Services; 2019 [Available from: https://www.fda.gov/media/122425/download.
Administration USFD. Electronic Code of Federal Regulations [Webpage]. 62 FR 13464: U.S. Department of Health & Human services; 1997 updated January 14, 2020.
The authors wish to thank all the patients and families for their participation, time, and patience to complete questionnaires. The authors also wish to thank all collaborators and clinicians who referred patients, research would not be possible without them.
GB is a pediatric neurologist and clinician-scientist leading the Leukodystrophies and Neurometabolic Disorders Clinic at the McGill University Health Centre (MUHC) and MUHC Research Institute.
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Montreal Children’s Foundation (Estate of Daphne Dale Townsend), Fondation Les Amis d’Elliot, Fondation le Tout pour Loo, and Leuco-Action. Dr. Bernard has received the New Investigator Salary Award from the Canadian Institutes of Health Research (2017–2022). Aaron Spahr has received funding from the Desjardins Studentship in Child Health Research through the Research Institute of McGill University Health Centre (2018–2019), the Healthy Brains for Healthy Lives Graduate Student Fellowship (2019–2020), as well as a Graduate Excellence Award from the Integrated Program in Neuroscience at McGill University (2018–2019). None of the funding sources was relevant for study design, collection of data, analysis and interpretation of data, or writing of this manuscript.
Ethics approval and consent to participate
Written and informed consent was obtained from all research participants. This study was approved by the Research Ethics Board of the McGill University Health Centre Research Institute (11-105-PED, 2019-4972).
Consent for publication
All participants have given consent for publication.
Our group, the MyeliNeuroGene Lab, is a collaborator of Dr. Alan C Evans’ research group, the McGill Centre for Integrative Neuroscience, who developed LORIS, a free and open-source web-accessible database solution for multi-modal data and multi-site studies.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Spahr, A., Rosli, Z., Legault, M. et al. The LORIS MyeliNeuroGene rare disease database for natural history studies and clinical trial readiness. Orphanet J Rare Dis 16, 328 (2021). https://doi.org/10.1186/s13023-021-01953-8