Does the registry speak your language? A case study of the Global Angelman Syndrome Registry

Global disease registries are critical to capturing common patient related information on rare illnesses, allowing patients and their families to provide information about their condition in a safe, accessible, and engaging manner that enables researchers to undertake critical research aimed at improving outcomes. Typically, English is the default language of choice for these global digital health platforms. Unfortunately, language barriers can significantly inhibit participation from non-English speaking participants. In addition, there is potential for compromises in data quality and completeness. In contrast, multinational commercial entities provide access to their websites in the local language of the country they are operating in, and often provide multiple options reflecting ethnic diversity. This paper presents a case study of how the Global Angelman Syndrome Registry (GASR) has used a novel approach to enable multiple language translations for its website. Using a “semi-automated language translation” approach, the GASR, which was originally launched in English in September 2016, is now available in several other languages. In 2020, the GASR adopted a novel approach using crowd-sourcing and machine translation tools leading to the availability of the GASR in Spanish, Traditional Chinese, Italian, and Hindi. As a result, enrolments increased by 124% percent for Spain, 67% percent for Latin America, 46% percent for Asia, 24% for Italy, and 43% for India. We describe our approach here, which we believe presents an opportunity for cost-effective and timely translations responsive to changes to the registry and helps build and maintain engagement with global disease communities.


Background
Angelman syndrome (AS) is a severe neurodevelopmental disorder caused by dysfunction of the maternally inherited UBE3A gene.It is estimated that 500,000 people live with AS worldwide [1].Global rare disease registries are a valuable tool for enhancing therapeutics in rare diseases, enabling participant recruitment and capture and monitoring of patient reported outcomes, amongst other uses.Despite the need to be Global, there is a lack of diversity in terms of language available on global registry websites.This is, unfortunately, common in medicine and science where the fact that the common scientific language is English has spilled over into an apparent insistence that participants in research from non-English speaking countries must be done in English.In contrast, no multinational commercial entity would survive if it took this approach and as a result, they are available in myriad local languages.For Example, the website for the global movement Rare Disease Day is available in 103 languages besides English [2], while the European Commission websites strive to be available in all 24 recognised European languages [3].Registry development guidelines including the fourth edition of the guide Registries for Evaluating Patient Outcomes released by the Agency for Healthcare Research and Quality [4] and Rare Diseases Registry Program [5], stress the importance of careful translation of multinational registries.Other international registries such as the Hyperinsulinism Global Registry or Global Prader Willi Registry are intending to incorporate multiple languages [6,7].Another strategy is to establish a federation of linked registries for a rare disease to achieve global coverage of patients [8].Towards this end, the Global Angelman Syndrome Registry is open to data linkages with other data sets including Natural History Studies and registries such as the Angelman Syndrome Online Registry [9].Barriers to making services multilingual include lack of access to translators, and the need for technical expertise or an understanding of the topic of the registry.Tools such as google translate have demonstrated that technology can be used to assist with this, although native speaker input is still required to ensure accuracy and readability of translations.
Findings from a review of articles on methodological approaches to the cross-cultural adaptation of surveys and tools indicated that translators should be fluent in the source and target languages, understand both cultures, and knowledgeable about the content of the instrument being adapted [10].Addressing each of these requirements may be challenging, as professional translators may not be subject matter experts and will lack specialised content knowledge.Involving more than one translator in the process may be beneficial to offer a mix of perspectives with respect to language fluency, cultural understanding, and content knowledge.However, this may prove difficult due to challenges around document sharing version control, and managing division of workload inhibiting translator interactions.Additionally, reconciliation and review of translations by an expert panel, and cognitive interviews or pilot testing with focus groups should be undertaken to determine the face and content validity of translated instruments [10].There is limited evidence the value of back translations [10].

Use of technology to facilitate translations
Machine translation (MT) involves using software tools to translate text or speech from the source language to the target language [11].The process is automated and may involve different approaches including rules created by linguists and computer scientists, examples from a database of source and target language sentences, and statistical modelling of the probability that a target sentence is the correct translation of a source sentence [12].
Translation memories (TM) are a related technology which involves storing previously completed human translations, including the source text and translated text, in a database and matching segments of text, such as a sentence, from the TM database with new source text to create translations [13].Matches may be exact, or identical including formatting; full, with differences such as numbers or dates; or fuzzy, which is similar but requires editing [13].
Crowdsourcing refers to an organisation (such as a research institution or not for profit) outsourcing a task previously undertaken internally to an external community to complete a task or solve a problem for mutual benefit [14].In research, crowdsourcing has been used for a range of tasks including identification and classification, transcription or translation, and data collection and analysis [15].Organisations including Cochrane and Technology, Entertainment and Design (TED) talks involve volunteer translators to translate resources in recognition of the fact that most people globally do not speak English as a first language [16].
Rare disease registries may present a unique opportunity for crowdsourcing translations, as rare disease communities often drive the development of registries and have strong involvement in registry governance and ownership.While crowdsourcing may seem advantageous in this context, projects must be managed effectively to prevent negative outcomes such as translator or researcher burnout or malicious translations.Blohm et al. [17] reviewed the management and governance of a variety of crowdsourcing projects and determined a four-step process for running a crowdsourcing project: (1) Define Goal and System Type; (2) Start Small and Experiment; (3) Build up Scalable Structures and (4) Adapt and Monitor Governance (p.143).

Current study
The Global Angelman Syndrome Registry was launched in English in September 2016 [18].The registry was sponsored by the Foundation for Angelman Syndrome Therapeutics (FAST) Australia.The aims of the registry include: 1. Facilitate participant recruitment for clinical trials; 2. Collect the natural history of a large cohort of individuals with AS; 3. Identify demographic, phenotypic and genotypic variation in clinical features and outcomes; and 4. Aid in service provision planning for individuals with AS and their families.
Since its launch, AS organisations have expressed an interest in translation into multiple languages.Initially, the translation process was to include three steps: (1) Forward translation; (2) Back translation; and (3) Pilot testing.The forward translation process was completed for Italian, Spanish, French and Hebrew, and partially completed for Portuguese and Chinese.
In 2020, the registry was moved to the Trial Ready Registry Framework (TRRF) [30,32,[34][35][36][37][38][39].The new platform incorporated significant revisions based on feedback from families, clinicians and researchers.Changes included revisions to both content and functionality to simplify the user experience in completing forms, enable longitudinal data collection and user managed linkages with clinicians and researchers, and integrate translations and analytics.
Due to changes to the registry content and function, and the likelihood of the requirement for updates for existing translations and the addition of more languages, alternative methodologies for translations were explored that were less burdensome on the community and research team.The current study reports on the establishment of the GASR translation project.

Method
To facilitate timely translation of the GASR, Crowdin [40] was selected as a tool to integrate existing translations (converted to TM) and MT provided by Crowdin software with crowdsourcing translations from the Angelman community and manage the translation process.

Registry description
The GASR features a series of forms, or modules that collect information on an individual with AS's condition.The forms cover demographic, clinical, behavioural and developmental information.The content of the GASR modules and other patient facing information including the registration form, standard emails, and website messages constituted the information to be translated.There were approximately 13 000 words on the GASR across the 20 sections to be translated.
Prior to the availability of the translations, 1625 families have joined the registry, 1614 of whom had provided geographic data.As shown in Table 1, most families were from English speaking countries.

Governance and management of crowdsourcing
A Crowdin Enterprises project, hereafter referred to as Crowdin, (https:// erese archq ut.crowd in.com/) was established to manage the project.Crowdin allows for different levels of access to ensure that participants only have access to tasks to which they are assigned by the project administrator.As the registry was pre-translated, the volunteers were assigned one of two roles: community proofreader or final proofreader.Community proofreaders review and modify existing machine translations, while final proofreaders are trusted professionals from the AS community who review and correct community proofread translations.Proofreaders access the registry content via a testing site located at https:// trrf.qa.angel manre gistry.info/.The website enables users to view the translations in the context of the website and access the Crowdin editor tool for each string.The editor tool shows the source text and several translations obtained from translation memory files or machine translations.The proofreader can revise, add or approve translations, and leave comments for other project team members.
Community proofreaders participated in a small group training session with one of the authors (M.T.), who explained the workflow of the project and how to use the Crowdin tool.For the purposes of scalability, one language was selected for pilot testing and refining the translation process.To date, training sessions have been held with community proofreaders for the Italian, Spanish, Chinese, Portuguese and French languages.Hindi was subsequently translated by an external company capable of interfacing with Crowdin.
The author running the training sessions checked in with a nominated translator from each group weekly to receive updates about progress and obtain feedback.

Research ethics
The registry team submitted ethics amendments to incorporate translations into the protocol.Approval was granted for the translation of registry materials using the methods described above from the Mater Health Services Human Research Ethics Committee (HREC/13MHS/76/ Project 20,865).

Results
The Spanish, Traditional Chinese, Italian and Hindi versions of the registry were launched in 2022 on the 6th January, 22nd March, 27th of April and 13th October respectively.Growth in registry enrolments post translations for each country or region where the language is spoken are shown in Fig. 1, along with current totals.

Observations and feedback from the translation process
The Crowdin tool was user friendly.Proofreaders utilised a mix of the Crowdin editor tool and in-context editing tool.The editor tool displayed the source text and available translations side by side, enabling users to correct existing translations or add new translations.The Crowdin editor tool was the primary method used for proofreading, as all strings were displayed in the interfacing, ensuring comprehensive proofreading.The in-context editing tool was implemented on a test website which replicated the GASR site, which was valuable to view how the translations would appear to users in the context of the registry.A list of examples of source text, machine translations, and community and professional proofreading corrections is shown in Table 2.

A group of at least three proofreaders was advantageous
Larger groups reduced the workload for individual proofreaders and enabled more comprehensive review of translations prior to the final proofreading step.For instance, the Spanish proofreading group identified that the word for "boy" and "child" was the same and could thus lead to Spanish speaking families reading sections of the registry as "boy/ adult" rather than "child/ adult." The author in communication with the translation team (MT) was able to put them in contact with another author (RB) whose first language was Spanish.

Preliminary validation findings are encouraging
Although validation is a separate step beyond translation, the authors compared 107 English and 55 Spanish responses to the Newborn and Infancy module completed since the translations were implemented.This module was selected as being the first module users encounter in the registry, it had the highest completion rate, and responses were thought to be less impacted by the age and genotype of the person with Angelman syndrome.Responses to Likert scale items from the Newborn and Infancy module are shown in Table 3.A series of 25 Chi square tests were conducted, with a Bonferroni adjustment indicating an adjusted alpha level of p = 0.002.Out of the 25 questions, only two demonstrated significant differences between the English and Spanish samples, reflecting that: • Spanish speaking parents perceived their infant with Angelman syndrome to be placid more frequently than English speaking parents.• English speaking parents perceived their infant with Angelman syndrome to experience more frequent reflux/gastro/oesophageal problems than Spanish speaking parents.

Discussion
The GASR was translated into Spanish, Traditional Chinese, Italian and Hindi.After completion of the community and final proofreading steps, acceptable translations were obtained and made available to the Angelman community.During our experience of managing and governing the translation and proofreading project, we refined the process to reduce burden on the research team and our proofreaders for future translations utilising crowdsourcing [17].These relate to the process of proofreading, and management of translation projects.With respect to establishing translation project for future languages, our first step is to source machine translations to create the initial language translation on Crowdin.The second step is to break the registry content into individual tasks based on word count and create a document with (1) links to the Crowdin editor for each task, (2) task name and description, and (3) task word count.The third step is to administer training covering completing tasks in the editing tool, and access to the in-context tool for reviewing content within the website to proofreaders.The translation projects would continue to be managed by the author (MT).Further to this, greater efforts would be made to validate the translations generated.An initial validation of the Newborn and Infancy module of the registry was promising, with few differences between English and Spanish responses.

Table 2 Example of source text with machine translations, and community and professional proofreading corrections
A professional medical translation company was hired to complete the Hindi translation, therefore, the community translation column is blank for this language

Italian
On a typical day, how many hours do they spend using the device for AAC purposes?

Potential limitations to the crowdsourcing approach
There were two possible limitations identified in the current study.These limitations relate to the translations, but may also be relevant to validation testing.

Participants were time poor
In some cases, participants were unavailable to complete translation tasks due to competing priorities and responsibilities.Future strategies to assist families may include recruiting a larger number of proofreaders, facilitating support and connection between proofreaders, ensuring that larger proofreading tasks are broken down into smaller chunks, and providing incentives such as a donation to their local Angelman organization.

Community proofreaders were difficult to source for some languages
As participation in the registry was very low for some regions, such as Asia, Africa and the Middle East, it was difficult to source proofreaders for languages spoken in these regions such as Arabic or Hindi.As a result, the team opted for professional translation via vendors who can integrate with Crowdin, with Crowdsourcing reserved for registry revisions once families have become more engaged for the Hindi language.

Conclusion
Crowdsourcing was an effective tool for upgrading translations, facilitating proofreading and integrating translated versions of the Global Angelman Syndrome Registry on an online platform.The availability of translations has led to greater participation and engagement of Angelman populations from regions where Spanish, Italian, Traditional Chinese and Hindi are spoken.The use of Crowdsourcing via online translation software such as Crowdin helps to manage ongoing translation and proofreading needs for research projects and maintain community participation and buy-in.However, further efforts are needed beyond translation to validate the translation of the registry for different communities.
• fast, convenient online submission • thorough peer review by experienced researchers in your field • rapid publication on acceptance • support for research data, including large and complex data types • gold Open Access which fosters wider collaboration and increased citations maximum visibility for your research: over 100M website views per year

•
At BMC, research is always in progress.

Learn more biomedcentral.com/submissions
Ready to submit your research Ready to submit your research ?Choose BMC and benefit from: ? Choose BMC and benefit from:

Fig. 1
Fig. 1 Growth in registry participation pre and post translations used at any time in their infancy (eg lactation support, syringes, spooning in pumped milk)156) = 6.111, p = .191Spanish

Table 1
Country/region of residence, families in the Angelman Registry Prior to Translation Availability

Table 3
Comparison of English and Spanish language responses to Newborn and Infancy module items