JMIR AI
A new peer reviewed journal focused on research and applications for the health artificial intelligence (AI) community.
Editor-in-Chief:
Khaled El Emam, PhD, Canada Research Chair in Medical AI, University of Ottawa; Senior Scientist, Children’s Hospital of Eastern Ontario Research Institute: Professor, School of Epidemiology and Public Health, University of Ottawa, Canada Bradley Malin, PhD, Accenture Professor of Biomedical Informatics, Biostatistics, and Computer Science; Vice Chair for Research Affairs, Department of Biomedical Informatics: Affiliated Faculty, Center for Biomedical Ethics & Society, Vanderbilt University Medical Center, Nashville, Tennessee, USA
Impact Factor 2.0 CiteScore 2.5
Recent Articles

Objective structured clinical examinations (OSCEs) are widely used for assessing medical student competency, but their evaluation is resource-intensive, requiring trained evaluators to review 15-minute videos. The physical examination (PE) component typically constitutes only a small portion of these recordings; yet, current automated approaches struggle with processing long medical videos due to computational constraints and difficulties maintaining temporal context.

Although Large Language Models (LLMs) show great promises in processing medical text, they are prone to generating incorrect information, commonly referred to as hallucinations. These inaccuracies present a significant risk for clinical applications where precision is critical. Additionally, relying on human experts to review LLM-generated content to ensure accuracy is costly and time-consuming, which sets a barrier against large-scale deployment of LLMs in healthcare settings.

Mental disorders are frequently evaluated using questionnaires, which have been developed over the past decades for the assessment of different conditions. Despite the rigorous validation of these tools, high levels of content divergence have been reported for questionnaires measuring the same construct of psychopathology. Previous studies that examined the content overlap required manual symptom labeling which is observer-dependent and time-consuming.

Systematic Literature Reviews (SLR) build the foundation for evidence synthesis, but they are exceptionally demanding in terms of time and resources. While recent advances in Artificial Intelligence (AI), particularly Large Language Models (LLMs), offer the potential to accelerate this process, their use introduces challenges to transparency and reproducibility. Developing reporting guidelines like PRISMA-AI primarily focus on AI as a subject of research, not as a tool in the review process itself.

Neglected tropical diseases (NTDs) are the most prevalent diseases and comprise 21 different conditions. One-half of these conditions have skin manifestations, known as skin NTDs. The diagnosis of skin NTDs incorporates visual examination of patients, and deep learning (DL)–based diagnostic tools can be used to assist the diagnostic procedures. The use of advanced DL-based methods, including multimodal data fusion (MMDF) functionality, could be a potential approach to enhance the diagnostic procedures of these diseases. However, little has been done toward the application of such tools, as confirmed by the very few studies currently available that implemented MMDF for skin NTDs.

Artificial intelligence (AI) is revolutionizing digital health, driving innovation in care delivery and operational efficiency. Despite its potential, many AI systems fail to meet real-world expectations due to limited evaluation practices that focus narrowly on short-term metrics like efficiency and technical accuracy. Ignoring factors such as usability, trust, transparency, and adaptability hinders AI adoption, scalability, and long-term impact in health care. This paper emphasizes the importance of embedding scientific evaluation as a core operational layer throughout the AI lifecycle. We outline practical guidelines for digital health companies to improve AI integration and evaluation, informed by over 35 years of experience in science, the digital health industry, and AI development. It describes a multi-step approach, including stakeholder analysis, real-time monitoring, and iterative improvement, that digital health companies can adopt to ensure robust AI integration. Key recommendations include assessing stakeholder needs, designing AI systems that can check its own work, conducting testing to address usability and biases, and ensuring continuous improvement to keep systems user-centered and adaptable. By integrating these guidelines, digital health companies can improve AI reliability, scalability, and trustworthiness, driving better health care delivery and stakeholder alignment.

The proliferation of both general-purpose and healthcare-specific Large Language Models (LLMs) has intensified the challenge of effectively evaluating and comparing them. Data contamination plagues the validity of public benchmarks; self-preference distorts LLM-as-a-judge approaches; and there’s a gap between the tasks used to test models and those used in clinical practice.
Preprints Open for Peer-Review
Open Peer Review Period:
-
Open Peer Review Period:
-











