Search Articles

View query in Help articles search

Search Results (1 to 10 of 41 Results)

Download search results: CSV END BibTex RIS


Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework

Automated Radiology Report Labeling in Chest X-Ray Pathologies: Development and Evaluation of a Large Language Model Framework

Our generative pretrained transformer (GPT)-based LLM beats the BERT-based Che Xbert model on many pathologies and with a far bigger context length can handle long reports as compared with Che Xbert. Our model outperforms the previous labelers [8] for many pathologies on an external dataset, MIMIC-CXR [9]. Our method of training medical report labelers opens room for other labels and longer textual input which makes it broadly useful for natural language processing tasks within the medical domain.

Abdullah Abdullah, Seong Tae Kim

JMIR Med Inform 2025;13:e68618

Peer Review of “Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance”

Peer Review of “Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance”

This is the peer-review report for “Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance.” This review is the result of a virtual collaborative live review organized and hosted by PREreview and JMIR Publications on October 25, 2024.

Daniela Saderi, Goktug Bender, Toba Olatoye, Arya Rahgozar, Uday Kumar Chalwadi, Eudora Nwanaforo, Paul Hassan Ilegbusi, Sylvester Sakilay, Mitchell Collier

JMIRx Med 2025;6:e73264

Authors’ Response to Peer Reviews of “Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance”

Authors’ Response to Peer Reviews of “Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance”

This is the authors’ response to peer-review reports for “Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance.” We thank the reviewers [1] for the thoughtful and constructive feedback on our manuscript, “Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance” [2].

Masab Mansoor, Andrew F Ibrahim, David Grindem, Asad Baig

JMIRx Med 2025;6:e73258

Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance

Large Language Models for Pediatric Differential Diagnoses in Rural Health Care: Multicenter Retrospective Cohort Study Comparing GPT-3 With Pediatrician Performance

Medical terms were standardized using a medical dictionary, and data were formatted for compatibility with the GPT-3 model. The GPT-3 model (Da Vinci version) was fine-tuned using the Open AI application programming interface. The dataset was randomly split into a training set (n=350, 70%) and a testing set (n=150, 30%). The model was trained to generate up to five differential diagnoses for each input case.

Masab Mansoor, Andrew F Ibrahim, David Grindem, Asad Baig

JMIRx Med 2025;6:e65263

Creation of Scientific Response Documents for Addressing Product Medical Information Inquiries: Mixed Method Approach Using Artificial Intelligence

Creation of Scientific Response Documents for Addressing Product Medical Information Inquiries: Mixed Method Approach Using Artificial Intelligence

More recently, LLMs, such as Open AI’s Generative Pre-trained Transformer (GPT), are complex machine learning models trained to predict subsequent words in natural language text based on the text so far. This allows the machine to generate statistically plausible output given a “prompt.” Beyond simple prompt completion, such models can be trained to follow instructions in the prompt, such as “Summarize the following paragraph.”

Jerry Lau, Shivani Bisht, Robert Horton, Annamaria Crisan, John Jones, Sandeep Gantotti, Evelyn Hermes-DeSantis

JMIR AI 2025;4:e55277

Large Language Model–Based Critical Care Big Data Deployment and Extraction: Descriptive Analysis

Large Language Model–Based Critical Care Big Data Deployment and Extraction: Descriptive Analysis

ICU-GPT does not develop new LLM models but utilizes and is compatible with all Open AI application programming interface (API) models. We chose Open AI API-compatible models for ICU-GPT development for several reasons. First, our research team is more familiar with Open AI. There is also a large number of models compatible with the Open AI API, including GPT-3.5, GPT-4o, and GPT-4o mini, offering wide applications and cost-effectiveness.

Zhongbao Yang, Shan-Shan Xu, Xiaozhu Liu, Ningyuan Xu, Yuqing Chen, Shuya Wang, Ming-Yue Miao, Mengxue Hou, Shuai Liu, Yi-Min Zhou, Jian-Xin Zhou, Linlin Zhang

JMIR Med Inform 2025;13:e63216

GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews

GPT-3.5 Turbo and GPT-4 Turbo in Title and Abstract Screening for Systematic Reviews

This study aimed to compare accuracy and efficiency between GPT-3.5 Turbo and GPT-4 Turbo (Open AI)—widely used LLMs in the medical field—in title and abstract screening. We conducted a post hoc analysis of our previous study to evaluate the performance of GPT-3.5 Turbo and GPT-4 Turbo in LLM-assisted title and abstract screening, using data from 5 clinical questions (CQs) developed for the Japanese Clinical Practice Guidelines for Management of Sepsis and Septic Shock 2024 [6,10].

Takehiko Oami, Yohei Okada, Taka-aki Nakada

JMIR Med Inform 2025;13:e64682

ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini

ChatGPT’s Performance on Portuguese Medical Examination Questions: Comparative Analysis of ChatGPT-3.5 Turbo and ChatGPT-4o Mini

In July 2024, Open AI launched GPT-4o mini, a smaller version of its latest GPT-4o (“o” for “omni”) AI language model. This new model replaced GPT-3.5 Turbo in Chat GPT, making this an ideal time to study the performance of both free models in resolving written medical examinations.

Filipe Prazeres

JMIR Med Educ 2025;11:e65108

Large Language Models’ Accuracy in Emulating Human Experts’ Evaluation of Public Sentiments about Heated Tobacco Products on Social Media: Evaluation Study

Large Language Models’ Accuracy in Emulating Human Experts’ Evaluation of Public Sentiments about Heated Tobacco Products on Social Media: Evaluation Study

These aspects contribute to the attractiveness of GPT as an analytic tool for tobacco researchers, especially those with limited budgets and resources. This study examines the accuracy of GPT-3.5 and GPT-4 Turbo in emulating human sentiment evaluations of social media messages related to HTPs.

Kwanho Kim, Soojong Kim

J Med Internet Res 2025;27:e63631

Large Language Models–Supported Thrombectomy Decision-Making in Acute Ischemic Stroke Based on Radiology Reports: Feasibility Qualitative Study

Large Language Models–Supported Thrombectomy Decision-Making in Acute Ischemic Stroke Based on Radiology Reports: Feasibility Qualitative Study

One of these autoregressive LLMs is GPT-3 used within the pretrained chatbot application version, named Chat GPT, developed by Open AI [25]. GPT-3 is a third-generation deep learning model that has been trained on a massive amount of text data on 175 billion parameters, allowing it to generate humanlike logical and semantical responses to text-based questions and input information [26,27].

Jonathan Kottlors, Robert Hahnfeldt, Lukas Görtz, Andra-Iza Iuga, Philipp Fervers, Johannes Bremm, David Zopfs, Kai R Laukamp, Oezguer A Onur, Simon Lennartz, Michael Schönfeld, David Maintz, Christoph Kabbasch, Thorsten Persigehl, Marc Schlamann

J Med Internet Res 2025;27:e48328