%0 Journal Article
%@ 2817-1705
%I JMIR Publications
%V 4
%N 
%P e70222
%T Using AI to Translate and Simplify Spanish Orthopedic Medical Text: Instrument Validation Study
%A Andalib,Saman
%A Spina,Aidin
%A Picton,Bryce
%A Solomon,Sean S
%A Scolaro,John A
%A Nelson,Ariana M
%K large language models
%K LLM
%K patient education
%K translation
%K bilingual evaluation understudy
%K GPT-4
%K Google Translate
%D 2025
%7 21.3.2025
%9 
%J JMIR AI
%G English
%X Background: Language barriers contribute significantly to health care disparities in the United States, where a sizable proportion of patients are exclusively Spanish speakers. In orthopedic surgery, such barriers impact both patients’ comprehension of and patients’ engagement with available resources. Studies have explored the utility of large language models (LLMs) for medical translation but have yet to robustly evaluate artificial intelligence (AI)–driven translation and simplification of orthopedic materials for Spanish speakers. Objective: This study used the bilingual evaluation understudy (BLEU) method to assess translation quality and investigated the ability of AI to simplify patient education materials (PEMs) in Spanish. Methods: PEMs (n=78) from the American Academy of Orthopaedic Surgery were translated from English to Spanish, using 2 LLMs (GPT-4 and Google Translate). The BLEU methodology was applied to compare AI translations with professionally human-translated PEMs. The Friedman test and Dunn multiple comparisons test were used to statistically quantify differences in translation quality. A readability analysis and feature analysis were subsequently performed to evaluate text simplification success and the impact of English text features on BLEU scores. The capability of an LLM to simplify medical language written in Spanish was also assessed. Results: As measured by BLEU scores, GPT-4 showed moderate success in translating PEMs into Spanish but was less successful than Google Translate. Simplified PEMs demonstrated improved readability when compared to original versions (P<.001) but were unable to reach the targeted grade level for simplification. The feature analysis revealed that the total number of syllables and average number of syllables per sentence had the highest impact on BLEU scores. GPT-4 was able to significantly reduce the complexity of medical text written in Spanish (P<.001). Conclusions: Although Google Translate outperformed GPT-4 in translation accuracy, LLMs, such as GPT-4, may provide significant utility in translating medical texts into Spanish and simplifying such texts. We recommend considering a dual approach—using Google Translate for translation and GPT-4 for simplification—to improve medical information accessibility and orthopedic surgery education among Spanish-speaking patients. 
%R 10.2196/70222
%U https://ai.jmir.org/2025/1/e70222
%U https://doi.org/10.2196/70222