TY - JOUR AU - Odabashian, Roupen AU - Bastin, Donald AU - Jones, Georden AU - Manzoor, Maria AU - Tangestaniapour, Sina AU - Assad, Malke AU - Lakhani, Sunita AU - Odabashian, Maritsa AU - McGee, Sharon PY - 2024 DA - 2024/1/12 TI - Assessment of ChatGPT-3.5's Knowledge in Oncology: Comparative Study with ASCO-SEP Benchmarks JO - JMIR AI SP - e50442 VL - 3 KW - artificial intelligence KW - ChatGPT-3.5 KW - language model KW - medical oncology AB - Background: ChatGPT (Open AI) is a state-of-the-art large language model that uses artificial intelligence (AI) to address questions across diverse topics. The American Society of Clinical Oncology Self-Evaluation Program (ASCO-SEP) created a comprehensive educational program to help physicians keep up to date with the many rapid advances in the field. The question bank consists of multiple choice questions addressing the many facets of cancer care, including diagnosis, treatment, and supportive care. As ChatGPT applications rapidly expand, it becomes vital to ascertain if the knowledge of ChatGPT-3.5 matches the established standards that oncologists are recommended to follow. Objective: This study aims to evaluate whether ChatGPT-3.5’s knowledge aligns with the established benchmarks that oncologists are expected to adhere to. This will furnish us with a deeper understanding of the potential applications of this tool as a support for clinical decision-making. Methods: We conducted a systematic assessment of the performance of ChatGPT-3.5 on the ASCO-SEP, the leading educational and assessment tool for medical oncologists in training and practice. Over 1000 multiple choice questions covering the spectrum of cancer care were extracted. Questions were categorized by cancer type or discipline, with subcategorization as treatment, diagnosis, or other. Answers were scored as correct if ChatGPT-3.5 selected the answer as defined by ASCO-SEP. Results: Overall, ChatGPT-3.5 achieved a score of 56.1% (583/1040) for the correct answers provided. The program demonstrated varying levels of accuracy across cancer types or disciplines. The highest accuracy was observed in questions related to developmental therapeutics (8/10; 80% correct), while the lowest accuracy was observed in questions related to gastrointestinal cancer (102/209; 48.8% correct). There was no significant difference in the program’s performance across the predefined subcategories of diagnosis, treatment, and other (P=.16, which is greater than .05). Conclusions: This study evaluated ChatGPT-3.5’s oncology knowledge using the ASCO-SEP, aiming to address uncertainties regarding AI tools like ChatGPT in clinical decision-making. Our findings suggest that while ChatGPT-3.5 offers a hopeful outlook for AI in oncology, its present performance in ASCO-SEP tests necessitates further refinement to reach the requisite competency levels. Future assessments could explore ChatGPT’s clinical decision support capabilities with real-world clinical scenarios, its ease of integration into medical workflows, and its potential to foster interdisciplinary collaboration and patient engagement in health care settings. SN - 2817-1705 UR - https://ai.jmir.org/2024/1/e50442 UR - https://doi.org/10.2196/50442 DO - 10.2196/50442 ID - info:doi/10.2196/50442 ER -