Accessibility settings

Published on in Vol 4 (2025)

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/72153, first published .
CLEVER: New evaluation methodology for Large Language Models in Healthcare

Clinical Large Language Model Evaluation by Expert Review (CLEVER): Framework Development and Validation

Clinical Large Language Model Evaluation by Expert Review (CLEVER): Framework Development and Validation

Journals

  1. Tan C, Gunasekeran D, Low C, Sim G, Foo D, Morris R, Wong T. Regulation of clinical Artificial Intelligence (AI) in the Age of Agents: Unconfined Non-Deterministic Clinical Software (UNDCS) systems for healthcare. npj Digital Medicine 2026;9(1) View
  2. Yeh Y, Shih M, De Backer D, Celi L, See K, Fujii T, Ling L, Mongkolpun W, Hu H, Chen H, Chen W, Cholley B, Fong K, Ryu H, Na S, Egi M, Chan W, Chen K, Kamaleswaran R, Chuang Y, Yang C, Hsiao W, Lai S, Ku D, Jahan A, Martin G. The IMPACT framework for evaluating generative AI in critical care: development and multinational consensus validation. Annals of Intensive Care 2026;16:100078 View
  3. Cheng A, Elkhadrawy A, Setzen S, Li A, Biskaduros A, Kostas J, Rameau A. Evaluating Injection Laryngoplasty Skills Using a Foundation Model: A Feasibility Study. The Laryngoscope 2026 View