%0 Journal Article %@ 2817-1705 %I JMIR Publications %V 4 %N %P e69820 %T Prompt Engineering an Informational Chatbot for Education on Mental Health Using a Multiagent Approach for Enhanced Compliance With Prompt Instructions: Algorithm Development and Validation %A Waaler,Per Niklas %A Hussain,Musarrat %A Molchanov,Igor %A Bongo,Lars Ailo %A Elvevåg,Brita %+ Department of Computer Science, UiT The Arctic University of Norway, Backgatan 35, Södra Sandby, Lund, 24731, Sweden, 46 94444096, pwa011@uit.no %K schizophrenia %K mental health %K prompt engineering %K AI in health care %K AI safety %K self-reflection %K limiting scope of AI %K large language model %K LLM %K GPT-4 %K AI transparency %K adaptive learning %D 2025 %7 26.3.2025 %9 Original Paper %J JMIR AI %G English %X Background: People with schizophrenia often present with cognitive impairments that may hinder their ability to learn about their condition. Education platforms powered by large language models (LLMs) have the potential to improve the accessibility of mental health information. However, the black-box nature of LLMs raises ethical and safety concerns regarding the controllability of chatbots. In particular, prompt-engineered chatbots may drift from their intended role as the conversation progresses and become more prone to hallucinations. Objective: This study aimed to develop and evaluate a critical analysis filter (CAF) system that ensures that an LLM-powered prompt-engineered chatbot reliably complies with its predefined instructions and scope while delivering validated mental health information. Methods: For a proof of concept, we prompt engineered an educational chatbot for schizophrenia powered by GPT-4 that could dynamically access information from a schizophrenia manual written for people with schizophrenia and their caregivers. In the CAF, a team of prompt-engineered LLM agents was used to critically analyze and refine the chatbot’s responses and deliver real-time feedback to the chatbot. To assess the ability of the CAF to re-establish the chatbot’s adherence to its instructions, we generated 3 conversations (by conversing with the chatbot with the CAF disabled) wherein the chatbot started to drift from its instructions toward various unintended roles. We used these checkpoint conversations to initialize automated conversations between the chatbot and adversarial chatbots designed to entice it toward unintended roles. Conversations were repeatedly sampled with the CAF enabled and disabled. In total, 3 human raters independently rated each chatbot response according to criteria developed to measure the chatbot’s integrity, specifically, its transparency (such as admitting when a statement lacked explicit support from its scripted sources) and its tendency to faithfully convey the scripted information in the schizophrenia manual. Results: In total, 36 responses (3 different checkpoint conversations, 3 conversations per checkpoint, and 4 adversarial queries per conversation) were rated for compliance with the CAF enabled and disabled. Activating the CAF resulted in a compliance score that was considered acceptable (≥2) in 81% (7/36) of the responses, compared to only 8.3% (3/36) when the CAF was deactivated. Conclusions: Although more rigorous testing in realistic scenarios is needed, our results suggest that self-reflection mechanisms could enable LLMs to be used effectively and safely in educational mental health platforms. This approach harnesses the flexibility of LLMs while reliably constraining their scope to appropriate and accurate interactions. %M 39992720 %R 10.2196/69820 %U https://ai.jmir.org/2025/1/e69820 %U https://doi.org/10.2196/69820 %U http://www.ncbi.nlm.nih.gov/pubmed/39992720