TY - JOUR AU - Ejaz, Hamza AU - Tsui, Keith Hon Lung AU - Patel, Mehul AU - Ulloa Paredes, Rafael Luis AU - Knights, Ellen AU - Aftab, Bakht Shah AU - Subbe, Peter Christian PY - 2025/2/25 TI - Comparison of a Novel Machine Learning?Based Clinical Query Platform With Traditional Guideline Searches for Hospital Emergencies: Prospective Pilot Study of User Experience and Time Efficiency JO - JMIR Hum Factors SP - e52358 VL - 12 KW - artificial intelligence KW - machine learning KW - information search KW - emergency care KW - developing KW - testing KW - information retrieval KW - hospital care KW - training KW - clinical practice KW - clinical experience KW - user satisfaction KW - clinical impact KW - user group KW - users KW - study design KW - mobile phone N2 - Background: Emergency and acute medicine doctors require easily accessible evidence-based information to safely manage a wide range of clinical presentations. The inability to find evidence-based local guidelines on the trust?s intranet leads to information retrieval from the World Wide Web. Artificial intelligence (AI) has the potential to make evidence-based information retrieval faster and easier. Objective: The aim of the study is to conduct a time-motion analysis, comparing cohorts of junior doctors using (1) an AI-supported search engine versus (2) the traditional hospital intranet. The study also aims to examine the impact of the AI-supported search engine on the duration of searches and workflow when seeking answers to clinical queries at the point of care. Methods: This pre- and postobservational study was conducted in 2 phases. In the first phase, clinical information searches by 10 doctors caring for acutely unwell patients in acute medicine were observed during 10 working days. Based on these findings and input from a focus group of 14 clinicians, an AI-supported, context-sensitive search engine was implemented. In the second phase, clinical practice was observed for 10 doctors for an additional 10 working days using the new search engine. Results: The hospital intranet group (n=10) had a median of 23 months of clinical experience, while the AI-supported search engine group (n=10) had a median of 54 months. Participants using the AI-supported engine conducted fewer searches. User satisfaction and query resolution rates were similar between the 2 phases. Searches with the AI-supported engine took 43 seconds longer on average. Clinicians rated the new app with a favorable Net Promoter Score of 20. Conclusions: We report a successful feasibility pilot of an AI-driven search engine for clinical guidelines. Further development of the engine including the incorporation of large language models might improve accuracy and speed. More research is required to establish clinical impact in different user groups. Focusing on new staff at beginning of their post might be the most suitable study design. UR - https://humanfactors.jmir.org/2025/1/e52358 UR - http://dx.doi.org/10.2196/52358 ID - info:doi/10.2196/52358 ER - TY - JOUR AU - Paek, Hunki AU - Fortinsky, H. Richard AU - Lee, Kyeryoung AU - Huang, Liang-Chin AU - Maghaydah, S. Yazeed AU - Kuchel, A. George AU - Wang, Xiaoyan PY - 2025/2/25 TI - Real-World Insights Into Dementia Diagnosis Trajectory and Clinical Practice Patterns Unveiled by Natural Language Processing: Development and Usability Study JO - JMIR Aging SP - e65221 VL - 8 KW - dementia KW - memory loss KW - memory KW - cognitive KW - Alzheimer disease KW - natural language processing KW - NLP KW - deep learning KW - machine learning KW - real-world insights KW - electronic health records KW - EHR KW - cohort KW - diagnosis KW - diagnostic KW - trajectory KW - pattern KW - prognosis KW - geriatric KW - older adults KW - aging N2 - Background: Understanding the dementia disease trajectory and clinical practice patterns in outpatient settings is vital for effective management. Knowledge about the path from initial memory loss complaints to dementia diagnosis remains limited. Objective: This study aims to (1) determine the time intervals between initial memory loss complaints and dementia diagnosis in outpatient care, (2) assess the proportion of patients receiving cognition-enhancing medication prior to dementia diagnosis, and (3) identify patient and provider characteristics that influence the time between memory complaints and diagnosis and the prescription of cognition-enhancing medication. Methods: This retrospective cohort study used a large outpatient electronic health record (EHR) database from the University of Connecticut Health Center, covering 2010?2018, with a cohort of 581 outpatients. We used a customized deep learning?based natural language processing (NLP) pipeline to extract clinical information from EHR data, focusing on cognition-related symptoms, primary caregiver relation, and medication usage. We applied descriptive statistics, linear, and logistic regression for analysis. Results: The NLP pipeline showed precision, recall, and F1-scores of 0.97, 0.93, and 0.95, respectively. The median time from the first memory loss complaint to dementia diagnosis was 342 (IQR 200-675) days. Factors such as the location of initial complaints and diagnosis and primary caregiver relationships significantly affected this interval. Around 25.1% (146/581) of patients were prescribed cognition-enhancing medication before diagnosis, with the number of complaints influencing medication usage. Conclusions: Our NLP-guided analysis provided insights into the clinical pathways from memory complaints to dementia diagnosis and medication practices, which can enhance patient care and decision-making in outpatient settings. UR - https://aging.jmir.org/2025/1/e65221 UR - http://dx.doi.org/10.2196/65221 ID - info:doi/10.2196/65221 ER - TY - JOUR AU - Gong, Ke AU - Chen, Yifan AU - Song, Xinyue AU - Fu, Zhizhong AU - Ding, Xiaorong PY - 2025/1/23 TI - Causal Inference for Hypertension Prediction With Wearable Electrocardiogram and Photoplethysmogram Signals: Feasibility Study JO - JMIR Cardio SP - e60238 VL - 9 KW - hypertension KW - causal inference KW - wearable physiological signals KW - electrocardiogram KW - photoplethysmogram N2 - Background: Hypertension is a leading cause of cardiovascular disease and premature death worldwide, and it puts a heavy burden on the health care system. Therefore, it is very important to detect and evaluate hypertension and related cardiovascular events to enable early prevention, detection, and management. Hypertension can be detected in a timely manner with cardiac signals, such as through an electrocardiogram (ECG) and photoplethysmogram (PPG), which can be observed via wearable sensors. Most previous studies predicted hypertension from ECG and PPG signals with extracted features that are correlated with hypertension. However, correlation is sometimes unreliable and may be affected by confounding factors. Objective: The aim of this study was to investigate the feasibility of predicting the risk of hypertension by exploring features that are causally related to hypertension via causal inference methods. Additionally, we paid special attention to and verified the reliability and effectiveness of causality compared to correlation. Methods: We used a large public dataset from the Aurora Project, which was conducted by Microsoft Research. The dataset included diverse individuals who were balanced in terms of gender, age, and the condition of hypertension, with their ECG and PPG signals simultaneously acquired with wrist-worn wearable devices. We first extracted 205 features from the ECG and PPG signals, calculated 6 statistical metrics for these 205 features, and selected some valuable features out of the 205 features under each statistical metric. Then, 6 causal graphs of the selected features for each kind of statistical metric and hypertension were constructed with the equivalent greedy search algorithm. We further fused the 6 causal graphs into 1 causal graph and identified features that were causally related to hypertension from the causal graph. Finally, we used these features to detect hypertension via machine learning algorithms. Results: We validated the proposed method on 405 subjects. We identified 24 causal features that were associated with hypertension. The causal features could detect hypertension with an accuracy of 89%, precision of 92%, and recall of 82%, which outperformed detection with correlation features (accuracy of 85%, precision of 88%, and recall of 77%). Conclusions: The results indicated that the causal inference?based approach can potentially clarify the mechanism of hypertension detection with noninvasive signals and effectively detect hypertension. It also revealed that causality can be more reliable and effective than correlation for hypertension detection and other application scenarios. UR - https://cardio.jmir.org/2025/1/e60238 UR - http://dx.doi.org/10.2196/60238 ID - info:doi/10.2196/60238 ER - TY - JOUR AU - Merkel, Sebastian AU - Schorr, Sabrina PY - 2025/1/13 TI - Identification of Use Cases, Target Groups, and Motivations Around Adopting Smart Speakers for Health Care and Social Care Settings: Scoping Review JO - JMIR AI SP - e55673 VL - 4 KW - conversational agents KW - smart speaker KW - health care KW - social care KW - digitalization KW - scoping review KW - mobile phone N2 - Background: Conversational agents (CAs) are finding increasing application in health and social care, not least due to their growing use in the home. Recent developments in artificial intelligence, machine learning, and natural language processing have enabled a variety of new uses for CAs. One type of CA that has received increasing attention recently is smart speakers. Objective: The aim of our study was to identify the use cases, user groups, and settings of smart speakers in health and social care. We also wanted to identify the key motivations for developers and designers to use this particular type of technology. Methods: We conducted a scoping review to provide an overview of the literature on smart speakers in health and social care. The literature search was conducted between February 2023 and March 2023 and included 3 databases (PubMed, Scopus, and Sociological Abstracts), supplemented by Google Scholar. Several keywords were used, including technology (eg, voice assistant), product name (eg, Amazon Alexa), and setting (health care or social care). Publications were included if they met the predefined inclusion criteria: (1) published after 2015 and (2) used a smart speaker in a health care or social care setting. Publications were excluded if they met one of the following criteria: (1) did not report on the specific devices used, (2) did not focus specifically on smart speakers, (3) were systematic reviews and other forms of literature-based publications, and (4) were not published in English. Two reviewers collected, reviewed, abstracted, and analyzed the data using qualitative content analysis. Results: A total of 27 articles were included in the final review. These articles covered a wide range of use cases in different settings, such as private homes, hospitals, long-term care facilities, and outpatient services. The main target group was patients, especially older users, followed by doctors and other medical staff members. Conclusions: The results show that smart speakers have diverse applications in health and social care, addressing different contexts and audiences. Their affordability and easy-to-use interfaces make them attractive to various stakeholders. It seems likely that, due to technical advances in artificial intelligence and the market power of the companies behind the devices, there will be more use cases for smart speakers in the near future. UR - https://ai.jmir.org/2025/1/e55673 UR - http://dx.doi.org/10.2196/55673 UR - http://www.ncbi.nlm.nih.gov/pubmed/39804689 ID - info:doi/10.2196/55673 ER - TY - JOUR AU - Lolak, Sermkiat AU - Attia, John AU - McKay, J. Gareth AU - Thakkinstian, Ammarin PY - 2025/1/8 TI - Application of Dragonnet and Conformal Inference for Estimating Individualized Treatment Effects for Personalized Stroke Prevention: Retrospective Cohort Study JO - JMIR Cardio SP - e50627 VL - 9 KW - stroke KW - causal effect KW - ITE KW - individual treatment effect KW - Dragonnet KW - conformal inference KW - mortality KW - hospital records KW - hypertension KW - risk factor KW - diabetes KW - dyslipidemia KW - atrial fibrillation KW - machine learning KW - treatment N2 - Background: Stroke is a major cause of death and disability worldwide. Identifying individuals who would benefit most from preventative interventions, such as antiplatelet therapy, is critical for personalized stroke prevention. However, traditional methods for estimating treatment effects often focus on the average effect across a population and do not account for individual variations in risk and treatment response. Objective: This study aimed to estimate the individualized treatment effects (ITEs) for stroke prevention using a novel combination of Dragonnet, a causal neural network, and conformal inference. The study also aimed to determine and validate the causal effects of known stroke risk factors?hypertension (HT), diabetes mellitus (DM), dyslipidemia (DLP), and atrial fibrillation (AF)?using both a conventional causal model and machine learning models. Methods: A retrospective cohort study was conducted using data from 275,247 high-risk patients treated at Ramathibodi Hospital, Thailand, between 2010 and 2020. Patients aged >18 years with HT, DM, DLP, or AF were eligible. The main outcome was ischemic or hemorrhagic stroke, identified using International Classification of Diseases, 10th Revision (ICD-10) codes. Causal effects of the risk factors were estimated using a range of methods, including: (1) propensity score?based methods, such as stratified propensity scores, inverse probability weighting, and doubly robust estimation; (2) structural causal models; (3) double machine learning; and (4) Dragonnet, a causal neural network, which was used together with weighted split-conformal quantile regression to estimate ITEs. Results: AF, HT, and DM were identified as significant stroke risk factors. Average causal risk effect estimates for these risk factors ranged from 0.075 to 0.097 for AF, 0.017 to 0.025 for HT, and 0.006 to 0.010 for DM, depending on the method used. Dragonnet yielded causal risk ratios of 4.56 for AF, 2.44 for HT, and 1.41 for DM, which is comparable to other causal models and the standard epidemiological case-control study. Mean ITE analysis indicated that several patients with DM or DM with HT, who were not receiving antiplatelet treatment at the time of data collection, showed reductions in total risk of ?0.015 and ?0.016, respectively. Conclusions: This study provides a comprehensive evaluation of stroke risk factors and demonstrates the feasibility of using Dragonnet and conformal inference to estimate ITEs of antiplatelet therapy for stroke prevention. The mean ITE analysis suggested that those with DM or DM with HT, who were not receiving antiplatelet treatment at the time of data collection, could potentially benefit from this therapy. The findings highlight the potential of these advanced techniques to inform personalized treatment strategies for stroke, enabling clinicians to identify individuals who are most likely to benefit from specific interventions. UR - https://cardio.jmir.org/2025/1/e50627 UR - http://dx.doi.org/10.2196/50627 ID - info:doi/10.2196/50627 ER - TY - JOUR AU - Yang, Xiaomeng AU - Li, Zeyan AU - Lei, Lei AU - Shi, Xiaoyu AU - Zhang, Dingming AU - Zhou, Fei AU - Li, Wenjing AU - Xu, Tianyou AU - Liu, Xinyu AU - Wang, Songyun AU - Yuan, Quan AU - Yang, Jian AU - Wang, Xinyu AU - Zhong, Yanfei AU - Yu, Lilei PY - 2025/1/7 TI - Noninvasive Oral Hyperspectral Imaging?Driven Digital Diagnosis of Heart Failure With Preserved Ejection Fraction: Model Development and Validation Study JO - J Med Internet Res SP - e67256 VL - 27 KW - heart failure with preserved ejection fraction KW - HFpEF KW - hyperspectral imaging KW - HSI KW - diagnostic model KW - digital health KW - Shapley Additive Explanations KW - SHAP KW - machine learning KW - artificial intelligence KW - AI KW - cardiovascular disease KW - predictive modeling KW - oral health N2 - Background: Oral microenvironmental disorders are associated with an increased risk of heart failure with preserved ejection fraction (HFpEF). Hyperspectral imaging (HSI) technology enables the detection of substances that are visually indistinguishable to the human eye, providing a noninvasive approach with extensive applications in medical diagnostics. Objective: The objective of this study is to develop and validate a digital, noninvasive oral diagnostic model for patients with HFpEF using HSI combined with various machine learning algorithms. Methods: Between April 2023 and August 2023, a total of 140 patients were recruited from Renmin Hospital of Wuhan University to serve as the training and internal testing groups for this study. Subsequently, from August 2024 to September 2024, an additional 35 patients were enrolled from Three Gorges University and Yichang Central People?s Hospital to constitute the external testing group. After preprocessing to ensure image quality, spectral and textural features were extracted from the images. We extracted 25 spectral bands from each patient image and obtained 8 corresponding texture features to evaluate the performance of 28 machine learning algorithms for their ability to distinguish control participants from participants with HFpEF. The model demonstrating the optimal performance in both internal and external testing groups was selected to construct the HFpEF diagnostic model. Hyperspectral bands significant for identifying participants with HFpEF were identified for further interpretative analysis. The Shapley Additive Explanations (SHAP) model was used to provide analytical insights into feature importance. Results: Participants were divided into a training group (n=105), internal testing group (n=35), and external testing group (n=35), with consistent baseline characteristics across groups. Among the 28 algorithms tested, the random forest algorithm demonstrated superior performance with an area under the receiver operating characteristic curve (AUC) of 0.884 and an accuracy of 82.9% in the internal testing group, as well as an AUC of 0.812 and an accuracy of 85.7% in the external testing group. For model interpretation, we used the top 25 features identified by the random forest algorithm. The SHAP analysis revealed discernible distinctions between control participants and participants with HFpEF, thereby validating the diagnostic model?s capacity to accurately identify participants with HFpEF. Conclusions: This noninvasive and efficient model facilitates the identification of individuals with HFpEF, thereby promoting early detection, diagnosis, and treatment. Our research presents a clinically advanced diagnostic framework for HFpEF, validated using independent data sets and demonstrating significant potential to enhance patient care. Trial Registration: China Clinical Trial Registry ChiCTR2300078855; https://www.chictr.org.cn/showproj.html?proj=207133 UR - https://www.jmir.org/2025/1/e67256 UR - http://dx.doi.org/10.2196/67256 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/67256 ER - TY - JOUR AU - Roos, Jonas AU - Martin, Ron AU - Kaczmarczyk, Robert PY - 2024/12/17 TI - Evaluating Bard Gemini Pro and GPT-4 Vision Against Student Performance in Medical Visual Question Answering: Comparative Case Study JO - JMIR Form Res SP - e57592 VL - 8 KW - medical education KW - visual question answering KW - image analysis KW - large language model KW - LLM KW - student KW - performance KW - comparative KW - case study KW - artificial intelligence KW - AI KW - ChatGPT KW - effectiveness KW - diagnostic KW - training KW - accuracy KW - utility KW - image-based KW - question KW - image KW - AMBOSS KW - English KW - German KW - question and answer KW - Python KW - AI in health care KW - health care N2 - Background: The rapid development of large language models (LLMs) such as OpenAI?s ChatGPT has significantly impacted medical research and education. These models have shown potential in fields ranging from radiological imaging interpretation to medical licensing examination assistance. Recently, LLMs have been enhanced with image recognition capabilities. Objective: This study aims to critically examine the effectiveness of these LLMs in medical diagnostics and training by assessing their accuracy and utility in answering image-based questions from medical licensing examinations. Methods: This study analyzed 1070 image-based multiple-choice questions from the AMBOSS learning platform, divided into 605 in English and 465 in German. Customized prompts in both languages directed the models to interpret medical images and provide the most likely diagnosis. Student performance data were obtained from AMBOSS, including metrics such as the ?student passed mean? and ?majority vote.? Statistical analysis was conducted using Python (Python Software Foundation), with key libraries for data manipulation and visualization. Results: GPT-4 1106 Vision Preview (OpenAI) outperformed Bard Gemini Pro (Google), correctly answering 56.9% (609/1070) of questions compared to Bard?s 44.6% (477/1070), a statistically significant difference (?2?=32.1, P<.001). However, GPT-4 1106 left 16.1% (172/1070) of questions unanswered, significantly higher than Bard?s 4.1% (44/1070; ?2?=83.1, P<.001). When considering only answered questions, GPT-4 1106?s accuracy increased to 67.8% (609/898), surpassing both Bard (477/1026, 46.5%; ?2?=87.7, P<.001) and the student passed mean of 63% (674/1070, SE 1.48%; ?2?=4.8, P=.03). Language-specific analysis revealed both models performed better in German than English, with GPT-4 1106 showing greater accuracy in German (282/465, 60.65% vs 327/605, 54.1%; ?2?=4.4, P=.04) and Bard Gemini Pro exhibiting a similar trend (255/465, 54.8% vs 222/605, 36.7%; ?2?=34.3, P<.001). The student majority vote achieved an overall accuracy of 94.5% (1011/1070), significantly outperforming both artificial intelligence models (GPT-4 1106: ?2?=408.5, P<.001; Bard Gemini Pro: ?2?=626.6, P<.001). Conclusions: Our study shows that GPT-4 1106 Vision Preview and Bard Gemini Pro have potential in medical visual question-answering tasks and to serve as a support for students. However, their performance varies depending on the language used, with a preference for German. They also have limitations in responding to non-English content. The accuracy rates, particularly when compared to student responses, highlight the potential of these models in medical education, yet the need for further optimization and understanding of their limitations in diverse linguistic contexts remains critical. UR - https://formative.jmir.org/2024/1/e57592 UR - http://dx.doi.org/10.2196/57592 ID - info:doi/10.2196/57592 ER - TY - JOUR AU - Dahu, M. Butros AU - Khan, Solaiman AU - Toubal, Eddine Imad AU - Alshehri, Mariam AU - Martinez-Villar, I. Carlos AU - Ogundele, B. Olabode AU - Sheets, R. Lincoln AU - Scott, J. Grant PY - 2024/12/17 TI - Geospatial Modeling of Deep Neural Visual Features for Predicting Obesity Prevalence in Missouri: Quantitative Study JO - JMIR AI SP - e64362 VL - 3 KW - geospatial modeling KW - deep convolutional neural network KW - DCNN KW - Residual Network-50 KW - ResNet-50 KW - satellite imagery KW - Moran I KW - local indicators of spatial association KW - LISA KW - spatial lag model KW - obesity rate KW - artificial intelligence KW - AI N2 - Background: The global obesity epidemic demands innovative approaches to understand its complex environmental and social determinants. Spatial technologies, such as geographic information systems, remote sensing, and spatial machine learning, offer new insights into this health issue. This study uses deep learning and spatial modeling to predict obesity rates for census tracts in Missouri. Objective: This study aims to develop a scalable method for predicting obesity prevalence using deep convolutional neural networks applied to satellite imagery and geospatial analysis, focusing on 1052 census tracts in Missouri. Methods: Our analysis followed 3 steps. First, Sentinel-2 satellite images were processed using the Residual Network-50 model to extract environmental features from 63,592 image chips (224×224 pixels). Second, these features were merged with obesity rate data from the Centers for Disease Control and Prevention for Missouri census tracts. Third, a spatial lag model was used to predict obesity rates and analyze the association between deep neural visual features and obesity prevalence. Spatial autocorrelation was used to identify clusters of obesity rates. Results: Substantial spatial clustering of obesity rates was found across Missouri, with a Moran I value of 0.68, indicating similar obesity rates among neighboring census tracts. The spatial lag model demonstrated strong predictive performance, with an R2 of 0.93 and a spatial pseudo R2 of 0.92, explaining 93% of the variation in obesity rates. Local indicators from a spatial association analysis revealed regions with distinct high and low clusters of obesity, which were visualized through choropleth maps. Conclusions: This study highlights the effectiveness of integrating deep convolutional neural networks and spatial modeling to predict obesity prevalence based on environmental features from satellite imagery. The model?s high accuracy and ability to capture spatial patterns offer valuable insights for public health interventions. Future work should expand the geographical scope and include socioeconomic data to further refine the model for broader applications in obesity research. UR - https://ai.jmir.org/2024/1/e64362 UR - http://dx.doi.org/10.2196/64362 UR - http://www.ncbi.nlm.nih.gov/pubmed/39688897 ID - info:doi/10.2196/64362 ER - TY - JOUR AU - Liu, Zhongling AU - Li, Jinkai AU - Zhang, Yuanyuan AU - Wu, Dan AU - Huo, Yanyan AU - Yang, Jianxin AU - Zhang, Musen AU - Dong, Chuanfei AU - Jiang, Luhui AU - Sun, Ruohan AU - Zhou, Ruoyin AU - Li, Fei AU - Yu, Xiaodan AU - Zhu, Daqian AU - Guo, Yao AU - Chen, Jinjin PY - 2024/11/29 TI - Auxiliary Diagnosis of Children With Attention-Deficit/Hyperactivity Disorder Using Eye-Tracking and Digital Biomarkers: Case-Control Study JO - JMIR Mhealth Uhealth SP - e58927 VL - 12 KW - attention deficit disorder with hyperactivity KW - eye-tracking KW - auxiliary diagnosis KW - digital biomarker KW - antisaccade KW - machine learning N2 - Background: Attention-deficit/hyperactivity disorder (ADHD) is a common neurodevelopmental disorder in school-aged children. The lack of objective biomarkers for ADHD often results in missed diagnoses or misdiagnoses, which lead to inappropriate or delayed interventions. Eye-tracking technology provides an objective method to assess children?s neuropsychological behavior. Objective: The aim of this study was to develop an objective and reliable auxiliary diagnostic system for ADHD using eye-tracking technology. This system would be valuable for screening for ADHD in schools and communities and may help identify objective biomarkers for the clinical diagnosis of ADHD. Methods: We conducted a case-control study of children with ADHD and typically developing (TD) children. We designed an eye-tracking assessment paradigm based on the core cognitive deficits of ADHD and extracted various digital biomarkers that represented participant behaviors. These biomarkers and developmental patterns were compared between the ADHD and TD groups. Machine learning (ML) was implemented to validate the ability of the extracted eye-tracking biomarkers to predict ADHD. The performance of the ML models was evaluated using 5-fold cross-validation. Results: We recruited 216 participants, of whom 94 (43.5%) were children with ADHD and 122 (56.5%) were TD children. The ADHD group showed significantly poorer performance (for accuracy and completion time) than the TD group in the prosaccade, antisaccade, and delayed saccade tasks. In addition, there were substantial group differences in digital biomarkers, such as pupil diameter fluctuation, regularity of gaze trajectory, and fixations on unrelated areas. Although the accuracy and task completion speed of the ADHD group increased over time, their eye-movement patterns remained irregular. The TD group with children aged 5 to 6 years outperformed the ADHD group with children aged 9 to 10 years, and this difference remained relatively stable over time, which indicated that the ADHD group followed a unique developmental pattern. The ML model was effective in discriminating the groups, achieving an area under the curve of 0.965 and an accuracy of 0.908. Conclusions: The eye-tracking biomarkers proposed in this study effectively identified differences in various aspects of eye-movement patterns between the ADHD and TD groups. In addition, the ML model constructed using these digital biomarkers achieved high accuracy and reliability in identifying ADHD. Our system can facilitate early screening for ADHD in schools and communities and provide clinicians with objective biomarkers as a reference. UR - https://mhealth.jmir.org/2024/1/e58927 UR - http://dx.doi.org/10.2196/58927 UR - http://www.ncbi.nlm.nih.gov/pubmed/39477792 ID - info:doi/10.2196/58927 ER - TY - JOUR AU - Campbell, Marie Amy AU - Hauton, Chris AU - van Aerle, Ronny AU - Martinez-Urtaza, Jaime PY - 2024/11/28 TI - Eco-Evolutionary Drivers of Vibrio parahaemolyticus Sequence Type 3 Expansion: Retrospective Machine Learning Approach JO - JMIR Bioinform Biotech SP - e62747 VL - 5 KW - pathogen expansion KW - climate change KW - machine learning KW - ecology KW - evolution KW - vibrio parahaemolyticus KW - sequencing KW - sequence type 3 KW - VpST3 KW - genomics N2 - Background: Environmentally sensitive pathogens exhibit ecological and evolutionary responses to climate change that result in the emergence and global expansion of well-adapted variants. It is imperative to understand the mechanisms that facilitate pathogen emergence and expansion, as well as the drivers behind the mechanisms, to understand and prepare for future pandemic expansions. Objective: The unique, rapid, global expansion of a clonal complex of Vibrio parahaemolyticus (a marine bacterium causing gastroenteritis infections) named Vibrio parahaemolyticus sequence type 3 (VpST3) provides an opportunity to explore the eco-evolutionary drivers of pathogen expansion. Methods: The global expansion of VpST3 was reconstructed using VpST3 genomes, which were then classified into metrics characterizing the stages of this expansion process, indicative of the stages of emergence and establishment. We used machine learning, specifically a random forest classifier, to test a range of ecological and evolutionary drivers for their potential in predicting VpST3 expansion dynamics. Results: We identified a range of evolutionary features, including mutations in the core genome and accessory gene presence, associated with expansion dynamics. A range of random forest classifier approaches were tested to predict expansion classification metrics for each genome. The highest predictive accuracies (ranging from 0.722 to 0.967) were achieved for models using a combined eco-evolutionary approach. While population structure and the difference between introduced and established isolates could be predicted to a high accuracy, our model reported multiple false positives when predicting the success of an introduced isolate, suggesting potential limiting factors not represented in our eco-evolutionary features. Regional models produced for 2 countries reporting the most VpST3 genomes had varying success, reflecting the impacts of class imbalance. Conclusions: These novel insights into evolutionary features and ecological conditions related to the stages of VpST3 expansion showcase the potential of machine learning models using genomic data and will contribute to the future understanding of the eco-evolutionary pathways of climate-sensitive pathogens. UR - https://bioinform.jmir.org/2024/1/e62747 UR - http://dx.doi.org/10.2196/62747 UR - http://www.ncbi.nlm.nih.gov/pubmed/39607996 ID - info:doi/10.2196/62747 ER - TY - JOUR AU - Xie, Junan AU - Li, Shilin AU - Song, Zhen AU - Shu, Lin AU - Zeng, Qing AU - Huang, Guozhi AU - Lin, Yihuan PY - 2024/11/25 TI - Functional Monitoring of Patients With Knee Osteoarthritis Based on Multidimensional Wearable Plantar Pressure Features: Cross-Sectional Study JO - JMIR Aging SP - e58261 VL - 7 KW - knee osteoarthritis KW - KOA KW - 40-m fast-paced walk test KW - 40mFPWT KW - timed up-and-go test KW - TUGT KW - timed up and go KW - TUG KW - functional assessment KW - monitoring KW - wearable KW - gait KW - walk test KW - plantar KW - knee KW - joint KW - arthritis KW - gait analysis KW - regression model KW - machine learning N2 - Background: Patients with knee osteoarthritis (KOA) often present lower extremity motor dysfunction. However, traditional radiography is a static assessment and cannot achieve long-term dynamic functional monitoring. Plantar pressure signals have demonstrated potential applications in the diagnosis and rehabilitation monitoring of KOA. Objective: Through wearable gait analysis technology, we aim to obtain abundant gait information based on machine learning techniques to develop a simple, rapid, effective, and patient-friendly functional assessment model for the KOA rehabilitation process to provide long-term remote monitoring, which is conducive to reducing the burden of social health care system. Methods: This cross-sectional study enrolled patients diagnosed with KOA who were able to walk independently for 2 minutes. Participants were given clinically recommended functional tests, including the 40-m fast-paced walk test (40mFPWT) and timed up-and-go test (TUGT). We used a smart shoe system to gather gait pressure data from patients with KOA. The multidimensional gait features extracted from the data and physical characteristics were used to establish the KOA functional feature database for the plantar pressure measurement system. 40mFPWT and TUGT regression prediction models were trained using a series of mature machine learning algorithms. Furthermore, model stacking and average ensemble learning methods were adopted to further improve the generalization performance of the model. Mean absolute error (MAE), mean absolute percentage error (MAPE), and root mean squared error (RMSE) were used as regression performance metrics to evaluate the results of different models. Results: A total of 92 patients with KOA were included, exhibiting varying degrees of severity as evaluated by the Kellgren and Lawrence classification. A total of 380 gait features and 4 physical characteristics were extracted to form the feature database. Effective stepwise feature selection determined optimal feature subsets of 11 variables for the 40mFPWT and 10 variables for the TUGT. Among all models, the weighted average ensemble model using 4 tree-based models had the best generalization performance in the test set, with an MAE of 2.686 seconds, MAPE of 9.602%, and RMSE of 3.316 seconds for the prediction of the 40mFPWT and an MAE of 1.280 seconds, MAPE of 12.389%, and RMSE of 1.905 seconds for the prediction of the TUGT. Conclusions: This wearable plantar pressure feature technique can objectively quantify indicators that reflect functional status and is promising as a new tool for long-term remote functional monitoring of patients with KOA. Future work is needed to further explore and investigate the relationship between gait characteristics and functional status with more functional tests and in larger sample cohorts. UR - https://aging.jmir.org/2024/1/e58261 UR - http://dx.doi.org/10.2196/58261 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58261 ER - TY - JOUR AU - Gopukumar, Deepika AU - Menon, Nirup AU - Schoen, W. Martin PY - 2024/11/19 TI - Medication Prescription Policy for US Veterans With Metastatic Castration-Resistant Prostate Cancer: Causal Machine Learning Approach JO - JMIR Med Inform SP - e59480 VL - 12 KW - prostate cancer KW - metastatic castration resistant prostate cancer KW - causal survival forest KW - machine learning KW - heterogeneity KW - prescription policy tree KW - oncology KW - pharmacology N2 - Background: Prostate cancer is the second leading cause of death among American men. If detected and treated at an early stage, prostate cancer is often curable. However, an advanced stage such as metastatic castration-resistant prostate cancer (mCRPC) has a high risk of mortality. Multiple treatment options exist, the most common included docetaxel, abiraterone, and enzalutamide. Docetaxel is a cytotoxic chemotherapy, whereas abiraterone and enzalutamide are androgen receptor pathway inhibitors (ARPI). ARPIs are preferred over docetaxel due to lower toxicity. No study has used machine learning with patients? demographics, test results, and comorbidities to identify heterogeneous treatment rules that might improve the survival duration of patients with mCRPC. Objective: This study aimed to measure patient-level heterogeneity in the association of medication prescribed with overall survival duration (in the form of follow-up days) and arrive at a set of medication prescription rules using patient demographics, test results, and comorbidities. Methods: We excluded patients with mCRPC who were on docetaxel, cabaxitaxel, mitoxantrone, and sipuleucel-T either before or after the prescription of an ARPI. We included only the African American and white populations. In total, 2886 identified veterans treated for mCRPC who were prescribed either abiraterone or enzalutamide as the first line of treatment from 2014 to 2017, with follow-up until 2020, were analyzed. We used causal survival forests for analysis. The unit level of analysis was the patient. The primary outcome of this study was follow-up days indicating survival duration while on the first-line medication. After estimating the treatment effect, a prescription policy tree was constructed. Results: For 2886 veterans, enzalutamide is associated with an average of 59.94 (95% CI 35.60-84.28) more days of survival than abiraterone. The increase in overall survival duration for the 2 drugs varied across patient demographics, test results, and comorbidities. Two data-driven subgroups of patients were identified by ranking them on their augmented inverse-propensity weighted (AIPW) scores. The average AIPW scores for the 2 subgroups were 19.36 (95% CI ?16.93 to 55.65) and 100.68 (95% CI 62.46-138.89). Based on visualization and t test, the AIPW score for low and high subgroups was significant (P=.003), thereby supporting heterogeneity. The analysis resulted in a set of prescription rules for the 2 ARPIs based on a few covariates available to the physicians at the time of prescription. Conclusions: This study of 2886 veterans showed evidence of heterogeneity and that survival days may be improved for certain patients with mCRPC based on the medication prescribed. Findings suggest that prescription rules based on the patient characteristics, laboratory test results, and comorbidities available to the physician at the time of prescription could improve survival by providing personalized treatment decisions. UR - https://medinform.jmir.org/2024/1/e59480 UR - http://dx.doi.org/10.2196/59480 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/59480 ER - TY - JOUR AU - Wang, Renwu AU - Xu, Huimin AU - Zhang, Xupin PY - 2024/11/15 TI - Impact of Image Content on Medical Crowdfunding Success: A Machine Learning Approach JO - J Med Internet Res SP - e58617 VL - 26 KW - medical crowdfunding KW - visual analytics KW - machine learning KW - image content KW - crowdfunding success N2 - Background: As crowdfunding sites proliferate, visual content often serves as the initial bridge connecting a project to its potential backers, underscoring the importance of image selection in effectively engaging an audience. Objective: This paper aims to explore the relationship between images and crowdfunding success in cancer-related crowdfunding projects. Methods: We used the Alibaba Cloud platform to detect individual features in images. In addition, we used the Recognize Anything Model to label images and obtain content tags. Furthermore, the discourse atomic topic model was used to generate image topics. After obtaining the image features and image content topics, we built regression models to investigate the factors that influence the results of crowdfunding success. Results: Images with a higher proportion of young people (?=0.0753; P<.001), a larger number of people (?=0.00822; P<.001), and a larger proportion of smiling faces (?=0.0446; P<.001) had a higher success rate. Image content related to good things and patient health also contributed to crowdfunding success (?=0.082, P<.001; and ?=0.036, P<.001, respectively). In addition, the interaction between image topics and image characteristics had a significant effect on the final fundraising outcome. For example, when smiling faces are considered in conjunction with the image topics, using more smiling faces in the rest and play theme increased the amount of money raised (?=0.0152; P<.001). We also examined causality through a counterfactual analysis, which confirmed the influence of the variables on crowdfunding success, consistent with the results of our regression models. Conclusions: In the realm of web-based medical crowdfunding, the importance of uploaded images cannot be overstated. Image characteristics, including the number of people depicted and the presence of youth, significantly improve fundraising results. In addition, the thematic choice of images in cancer crowdfunding efforts has a profound impact. Images that evoke beauty and resonate with health issues are more likely to result in increased donations. However, it is critical to recognize that reinforcing character traits in images of different themes has different effects on the success of crowdfunding campaigns. UR - https://www.jmir.org/2024/1/e58617 UR - http://dx.doi.org/10.2196/58617 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58617 ER - TY - JOUR AU - Wenger, Franziska AU - Allenhof, Caroline AU - Schreynemackers, Simon AU - Hegerl, Ulrich AU - Reich, Hanna PY - 2024/11/15 TI - Use of Random Forest to Predict Adherence in an Online Intervention for Depression Using Baseline and Early Usage Data: Model Development and Validation on Retrospective Routine Care Log Data JO - JMIR Form Res SP - e53768 VL - 8 KW - depression KW - adherence KW - machine learning KW - digital interventions KW - random forest KW - iFightDepression KW - iFD KW - online intervention N2 - Background: Online interventions, such as the iFightDepression (iFD) tool, are increasingly recognized as effective alternatives to traditional face-to-face psychotherapy or pharmacotherapy for treating depression. However, particularly when used outside of study settings, low adherence rates and the resulting diminished benefits of the intervention can limit their effectiveness. Understanding the factors that predict adherence would allow for early, tailored interventions for individuals at risk of nonadherence, thereby enhancing user engagement and optimizing therapeutic outcomes. Objective: This study aims to develop and evaluate a random forest model that predicts adherence to the iFD tool to identify users at risk of noncompletion. The model was based on characteristics collected during baseline and the first week of the intervention in patients with depression. Methods: Log data from 4187 adult patients who registered for the iFD tool between October 1, 2016, and May 5, 2022, and provided informed consent were statistically analyzed. The resulting data set was divided into training (2932/4187, 70%) and test (1255/4187, 30%) sets using a randomly stratified split. The training data set was utilized to train a random forest model aimed at predicting each user?s adherence at baseline, based on the hypothesized predictors: age, self-reported gender, expectations of the intervention, current or previous depression treatments, confirmed diagnosis of depression, baseline 9-item Patient Health Questionnaire (PHQ-9) score, accompanying guide profession, and usage behavior within the first week. After training, the random forest model was evaluated on the test data set to assess its predictive performance. The importance of each variable in predicting adherence was analyzed using mean decrease accuracy, mean decrease Gini, and Shapley Additive Explanations values. Results: Of the 4187 patients evaluated, 1019 (24.34%) were classified as adherent based on our predefined definition. An initial random forest model that relied solely on sociodemographic and clinical predictors collected at baseline did not yield a statistically significant adherence prediction. However, after incorporating each patient?s usage behavior during the first week, we achieved a significant prediction of adherence (P<.001). Within this prediction, the model achieved an accuracy of 0.82 (95% CI 0.79-0.84), an F1-score of 0.53, an area under the curve of 0.83, and a specificity of 0.94 for predicting nonadherent users. The key predictors of adherence included logs, word count on the first workshop?s worksheet, and time spent on the tool, all measured during the first week. Conclusions: Our results highlight that early engagement, particularly usage behavior during the first week of the online intervention, is a far greater predictor of adherence than any sociodemographic or clinical factors. Therefore, analyzing usage behavior within the first week and identifying nonadherers through the algorithm could be beneficial for tailoring interventions aimed at improving user adherence. This could include follow-up calls or face-to-face discussions, optimizing resource utilization in the process. UR - https://formative.jmir.org/2024/1/e53768 UR - http://dx.doi.org/10.2196/53768 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53768 ER - TY - JOUR AU - Brons, Annette AU - Wang, Shihan AU - Visser, Bart AU - Kröse, Ben AU - Bakkes, Sander AU - Veltkamp, Remco PY - 2024/11/15 TI - Machine Learning Methods to Personalize Persuasive Strategies in mHealth Interventions That Promote Physical Activity: Scoping Review and Categorization Overview JO - J Med Internet Res SP - e47774 VL - 26 KW - artificial intelligence KW - exercise KW - mobile app KW - adaptive KW - tailoring KW - supervised learning KW - reinforcement learning KW - recommender system N2 - Background: Although physical activity (PA) has positive effects on health and well-being, physical inactivity is a worldwide problem. Mobile health interventions have been shown to be effective in promoting PA. Personalizing persuasive strategies improves intervention success and can be conducted using machine learning (ML). For PA, several studies have addressed personalized persuasive strategies without ML, whereas others have included personalization using ML without focusing on persuasive strategies. An overview of studies discussing ML to personalize persuasive strategies in PA-promoting interventions and corresponding categorizations could be helpful for such interventions to be designed in the future but is still missing. Objective: First, we aimed to provide an overview of implemented ML techniques to personalize persuasive strategies in mobile health interventions promoting PA. Moreover, we aimed to present a categorization overview as a starting point for applying ML techniques in this field. Methods: A scoping review was conducted based on the framework by Arksey and O?Malley and the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) criteria. Scopus, Web of Science, and PubMed were searched for studies that included ML to personalize persuasive strategies in interventions promoting PA. Papers were screened using the ASReview software. From the included papers, categorized by the research project they belonged to, we extracted data regarding general study information, target group, PA intervention, implemented technology, and study details. On the basis of the analysis of these data, a categorization overview was given. Results: In total, 40 papers belonging to 27 different projects were included. These papers could be categorized in 4 groups based on their dimension of personalization. Then, for each dimension, 1 or 2 persuasive strategy categories were found together with a type of ML. The overview resulted in a categorization consisting of 3 levels: dimension of personalization, persuasive strategy, and type of ML. When personalizing the timing of the messages, most projects implemented reinforcement learning to personalize the timing of reminders and supervised learning (SL) to personalize the timing of feedback, monitoring, and goal-setting messages. Regarding the content of the messages, most projects implemented SL to personalize PA suggestions and feedback or educational messages. For personalizing PA suggestions, SL can be implemented either alone or combined with a recommender system. Finally, reinforcement learning was mostly used to personalize the type of feedback messages. Conclusions: The overview of all implemented persuasive strategies and their corresponding ML methods is insightful for this interdisciplinary field. Moreover, it led to a categorization overview that provides insights into the design and development of personalized persuasive strategies to promote PA. In future papers, the categorization overview might be expanded with additional layers to specify ML methods or additional dimensions of personalization and persuasive strategies. UR - https://www.jmir.org/2024/1/e47774 UR - http://dx.doi.org/10.2196/47774 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/47774 ER - TY - JOUR AU - Hong, Minseok AU - Kang, Ri-Ra AU - Yang, Hun Jeong AU - Rhee, Jin Sang AU - Lee, Hyunju AU - Kim, Yong-gyom AU - Lee, KangYoon AU - Kim, HongGi AU - Lee, Sang Yu AU - Youn, Tak AU - Kim, Hyun Se AU - Ahn, Min Yong PY - 2024/11/13 TI - Comprehensive Symptom Prediction in Inpatients With Acute Psychiatric Disorders Using Wearable-Based Deep Learning Models: Development and Validation Study JO - J Med Internet Res SP - e65994 VL - 26 KW - digital phenotype KW - mental health monitoring KW - smart hospital KW - clinical decision support system KW - multitask learning KW - wearable sensor KW - local validation KW - mental health facility KW - deep learning N2 - Background: Assessing the complex and multifaceted symptoms of patients with acute psychiatric disorders proves to be significantly challenging for clinicians. Moreover, the staff in acute psychiatric wards face high work intensity and risk of burnout, yet research on the introduction of digital technologies in this field remains limited. The combination of continuous and objective wearable sensor data acquired from patients with deep learning techniques holds the potential to overcome the limitations of traditional psychiatric assessments and support clinical decision-making. Objective: This study aimed to develop and validate wearable-based deep learning models to comprehensively predict patient symptoms across various acute psychiatric wards in South Korea. Methods: Participants diagnosed with schizophrenia and mood disorders were recruited from 4 wards across 3 hospitals and prospectively observed using wrist-worn wearable devices during their admission period. Trained raters conducted periodic clinical assessments using the Brief Psychiatric Rating Scale, Hamilton Anxiety Rating Scale, Montgomery-Asberg Depression Rating Scale, and Young Mania Rating Scale. Wearable devices collected patients? heart rate, accelerometer, and location data. Deep learning models were developed to predict psychiatric symptoms using 2 distinct approaches: single symptoms individually (Single) and multiple symptoms simultaneously via multitask learning (Multi). These models further addressed 2 problems: within-subject relative changes (Deterioration) and between-subject absolute severity (Score). Four configurations were consequently developed for each scale: Single-Deterioration, Single-Score, Multi-Deterioration, and Multi-Score. Data of participants recruited before May 1, 2024, underwent cross-validation, and the resulting fine-tuned models were then externally validated using data from the remaining participants. Results: Of the 244 enrolled participants, 191 (78.3%; 3954 person-days) were included in the final analysis after applying the exclusion criteria. The demographic and clinical characteristics of participants, as well as the distribution of sensor data, showed considerable variations across wards and hospitals. Data of 139 participants were used for cross-validation, while data of 52 participants were used for external validation. The Single-Deterioration and Multi-Deterioration models achieved similar overall accuracy values of 0.75 in cross-validation and 0.73 in external validation. The Single-Score and Multi-Score models attained overall R² values of 0.78 and 0.83 in cross-validation and 0.66 and 0.74 in external validation, respectively, with the Multi-Score model demonstrating superior performance. Conclusions: Deep learning models based on wearable sensor data effectively classified symptom deterioration and predicted symptom severity in participants in acute psychiatric wards. Despite lower computational costs, Multi models demonstrated equivalent or superior performance to Single models, suggesting that multitask learning is a promising approach for comprehensive symptom prediction. However, significant variations were observed across wards, which present a key challenge for developing clinical decision support systems in acute psychiatric wards. Future studies may benefit from recurring local validation or federated learning to address generalizability issues. UR - https://www.jmir.org/2024/1/e65994 UR - http://dx.doi.org/10.2196/65994 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65994 ER - TY - JOUR AU - Gao, Hongxin AU - Schneider, Stefan AU - Hernandez, Raymond AU - Harris, Jenny AU - Maupin, Danny AU - Junghaenel, U. Doerte AU - Kapteyn, Arie AU - Stone, Arthur AU - Zelinski, Elizabeth AU - Meijer, Erik AU - Lee, Pey-Jiuan AU - Orriens, Bart AU - Jin, Haomiao PY - 2024/11/13 TI - Early Identification of Cognitive Impairment in Community Environments Through Modeling Subtle Inconsistencies in Questionnaire Responses: Machine Learning Model Development and Validation JO - JMIR Form Res SP - e54335 VL - 8 KW - machine learning KW - artificial intelligence KW - cognitive impairments KW - surveys and questionnaires KW - community health services KW - public health KW - early identification KW - elder care KW - dementia N2 - Background: The underdiagnosis of cognitive impairment hinders timely intervention of dementia. Health professionals working in the community play a critical role in the early detection of cognitive impairment, yet still face several challenges such as a lack of suitable tools, necessary training, and potential stigmatization. Objective: This study explored a novel application integrating psychometric methods with data science techniques to model subtle inconsistencies in questionnaire response data for early identification of cognitive impairment in community environments. Methods: This study analyzed questionnaire response data from participants aged 50 years and older in the Health and Retirement Study (waves 8-9, n=12,942). Predictors included low-quality response indices generated using the graded response model from four brief questionnaires (optimism, hopelessness, purpose in life, and life satisfaction) assessing aspects of overall well-being, a focus of health professionals in communities. The primary and supplemental predicted outcomes were current cognitive impairment derived from a validated criterion and dementia or mortality in the next ten years. Seven predictive models were trained, and the performance of these models was evaluated and compared. Results: The multilayer perceptron exhibited the best performance in predicting current cognitive impairment. In the selected four questionnaires, the area under curve values for identifying current cognitive impairment ranged from 0.63 to 0.66 and was improved to 0.71 to 0.74 when combining the low-quality response indices with age and gender for prediction. We set the threshold for assessing cognitive impairment risk in the tool based on the ratio of underdiagnosis costs to overdiagnosis costs, and a ratio of 4 was used as the default choice. Furthermore, the tool outperformed the efficiency of age or health-based screening strategies for identifying individuals at high risk for cognitive impairment, particularly in the 50- to 59-year and 60- to 69-year age groups. The tool is available on a portal website for the public to access freely. Conclusions: We developed a novel prediction tool that integrates psychometric methods with data science to facilitate ?passive or backend? cognitive impairment assessments in community settings, aiming to promote early cognitive impairment detection. This tool simplifies the cognitive impairment assessment process, making it more adaptable and reducing burdens. Our approach also presents a new perspective for using questionnaire data: leveraging, rather than dismissing, low-quality data. UR - https://formative.jmir.org/2024/1/e54335 UR - http://dx.doi.org/10.2196/54335 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54335 ER - TY - JOUR AU - Chung, Jane AU - Pretzer-Aboff, Ingrid AU - Parsons, Pamela AU - Falls, Katherine AU - Bulut, Eyuphan PY - 2024/11/12 TI - Using a Device-Free Wi-Fi Sensing System to Assess Daily Activities and Mobility in Low-Income Older Adults: Protocol for a Feasibility Study JO - JMIR Res Protoc SP - e53447 VL - 13 KW - Wi-Fi sensing KW - dementia KW - mild cognitive impairment KW - older adults KW - health disparities KW - in-home activities KW - mobility KW - machine learning N2 - Background: Older adults belonging to racial or ethnic minorities with low socioeconomic status are at an elevated risk of developing dementia, but resources for assessing functional decline and detecting cognitive impairment are limited. Cognitive impairment affects the ability to perform daily activities and mobility behaviors. Traditional assessment methods have drawbacks, so smart home technologies (SmHT) have emerged to offer objective, high-frequency, and remote monitoring. However, these technologies usually rely on motion sensors that cannot identify specific activity types. This group often lacks access to these technologies due to limited resources and technology experience. There is a need to develop new sensing technology that is discreet, affordable, and requires minimal user engagement to characterize and quantify various in-home activities. Furthermore, it is essential to explore the feasibility of developing machine learning (ML) algorithms for SmHT through collaborations between clinical researchers and engineers and involving minority, low-income older adults for novel sensor development. Objective: This study aims to examine the feasibility of developing a novel channel state information?based device-free, low-cost Wi-Fi sensing system, and associated ML algorithms for localizing and recognizing different patterns of in-home activities and mobility in residents of low-income senior housing with and without mild cognitive impairment. Methods: This feasibility study was conducted in collaboration with a wellness care group, which serves the healthy aging needs of low-income housing residents. Prior to this feasibility study, we conducted a pilot study to collect channel state information data from several activity scenarios (eg, sitting, walking, and preparing meals) using the proposed Wi-Fi sensing system continuously over a week in apartments of low-income housing residents. These activities were videotaped to generate ground truth annotations to test the accuracy of the ML algorithms derived from the proposed system. Using qualitative individual interviews, we explored the acceptability of the Wi-Fi sensing system and implementation barriers in the low-income housing setting. We use the same study protocol for the proposed feasibility study. Results: The Wi-Fi sensing system deployment began in November 2022, with participant recruitment starting in July 2023. Preliminary results will be available in the summer of 2025. Preliminary results are focused on the feasibility of developing ML models for Wi-Fi sensing?based activity and mobility assessment, community-based recruitment and data collection, ground truth, and older adults? Wi-Fi sensing technology acceptance. Conclusions: This feasibility study can make a contribution to SmHT science and ML capabilities for early detection of cognitive decline among socially vulnerable older adults. Currently, sensing devices are not readily available to this population due to cost and information barriers. Our sensing device has the potential to identify individuals at risk for cognitive decline by assessing their level of physical function by tracking their in-home activities and mobility behaviors, at a low cost. International Registered Report Identifier (IRRID): DERR1-10.2196/53447 UR - https://www.researchprotocols.org/2024/1/e53447 UR - http://dx.doi.org/10.2196/53447 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/53447 ER - TY - JOUR AU - Wang, Leyao AU - Wan, Zhiyu AU - Ni, Congning AU - Song, Qingyuan AU - Li, Yang AU - Clayton, Ellen AU - Malin, Bradley AU - Yin, Zhijun PY - 2024/11/7 TI - Applications and Concerns of ChatGPT and Other Conversational Large Language Models in Health Care: Systematic Review JO - J Med Internet Res SP - e22769 VL - 26 KW - large language model KW - ChatGPT KW - artificial intelligence KW - natural language processing KW - health care KW - summarization KW - medical knowledge inquiry KW - reliability KW - bias KW - privacy N2 - Background: The launch of ChatGPT (OpenAI) in November 2022 attracted public attention and academic interest to large language models (LLMs), facilitating the emergence of many other innovative LLMs. These LLMs have been applied in various fields, including health care. Numerous studies have since been conducted regarding how to use state-of-the-art LLMs in health-related scenarios. Objective: This review aims to summarize applications of and concerns regarding conversational LLMs in health care and provide an agenda for future research in this field. Methods: We used PubMed, ACM, and the IEEE digital libraries as primary sources for this review. We followed the guidance of PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) to screen and select peer-reviewed research articles that (1) were related to health care applications and conversational LLMs and (2) were published before September 1, 2023, the date when we started paper collection. We investigated these papers and classified them according to their applications and concerns. Results: Our search initially identified 820 papers according to targeted keywords, out of which 65 (7.9%) papers met our criteria and were included in the review. The most popular conversational LLM was ChatGPT (60/65, 92% of papers), followed by Bard (Google LLC; 1/65, 2% of papers), LLaMA (Meta; 1/65, 2% of papers), and other LLMs (6/65, 9% papers). These papers were classified into four categories of applications: (1) summarization, (2) medical knowledge inquiry, (3) prediction (eg, diagnosis, treatment recommendation, and drug synergy), and (4) administration (eg, documentation and information collection), and four categories of concerns: (1) reliability (eg, training data quality, accuracy, interpretability, and consistency in responses), (2) bias, (3) privacy, and (4) public acceptability. There were 49 (75%) papers using LLMs for either summarization or medical knowledge inquiry, or both, and there are 58 (89%) papers expressing concerns about either reliability or bias, or both. We found that conversational LLMs exhibited promising results in summarization and providing general medical knowledge to patients with a relatively high accuracy. However, conversational LLMs such as ChatGPT are not always able to provide reliable answers to complex health-related tasks (eg, diagnosis) that require specialized domain expertise. While bias or privacy issues are often noted as concerns, no experiments in our reviewed papers thoughtfully examined how conversational LLMs lead to these issues in health care research. Conclusions: Future studies should focus on improving the reliability of LLM applications in complex health-related tasks, as well as investigating the mechanisms of how LLM applications bring bias and privacy issues. Considering the vast accessibility of LLMs, legal, social, and technical efforts are all needed to address concerns about LLMs to promote, improve, and regularize the application of LLMs in health care. UR - https://www.jmir.org/2024/1/e22769 UR - http://dx.doi.org/10.2196/22769 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/22769 ER - TY - JOUR AU - Chung, young Wou AU - Yoon, Jinsik AU - Yoon, Dukyong AU - Kim, Songsoo AU - Kim, Yujeong AU - Park, Eun Ji AU - Kang, Ae Young PY - 2024/11/7 TI - Development and Validation of Deep Learning?Based Infectivity Prediction in Pulmonary Tuberculosis Through Chest Radiography: Retrospective Study JO - J Med Internet Res SP - e58413 VL - 26 KW - pulmonary tuberculosis KW - chest radiography KW - artificial intelligence KW - tuberculosis KW - TB KW - smear KW - smear test KW - culture test KW - diagnosis KW - treatment KW - deep learning KW - CXR KW - PTB KW - management KW - cost effective KW - asymptomatic infection KW - diagnostic tools KW - infectivity KW - AI tool KW - cohort N2 - Background: Pulmonary tuberculosis (PTB) poses a global health challenge owing to the time-intensive nature of traditional diagnostic tests such as smear and culture tests, which can require hours to weeks to yield results. Objective: This study aimed to use artificial intelligence (AI)?based chest radiography (CXR) to evaluate the infectivity of patients with PTB more quickly and accurately compared with traditional methods such as smear and culture tests. Methods: We used DenseNet121 and visualization techniques such as gradient-weighted class activation mapping and local interpretable model-agnostic explanations to demonstrate the decision-making process of the model. We analyzed 36,142 CXR images of 4492 patients with PTB obtained from Severance Hospital, focusing specifically on the lung region through segmentation and cropping with TransUNet. We used data from 2004 to 2020 to train the model, data from 2021 for testing, and data from 2022 to 2023 for internal validation. In addition, we used 1978 CXR images of 299 patients with PTB obtained from Yongin Severance Hospital for external validation. Results: In the internal validation, the model achieved an accuracy of 73.27%, an area under the receiver operating characteristic curve of 0.79, and an area under the precision-recall curve of 0.77. In the external validation, it exhibited an accuracy of 70.29%, an area under the receiver operating characteristic curve of 0.77, and an area under the precision-recall curve of 0.8. In addition, gradient-weighted class activation mapping and local interpretable model-agnostic explanations provided insights into the decision-making process of the AI model. Conclusions: This proposed AI tool offers a rapid and accurate alternative for evaluating PTB infectivity through CXR, with significant implications for enhancing screening efficiency by evaluating infectivity before sputum test results in clinical settings, compared with traditional smear and culture tests. UR - https://www.jmir.org/2024/1/e58413 UR - http://dx.doi.org/10.2196/58413 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58413 ER - TY - JOUR AU - Bicknell, T. Brenton AU - Butler, Danner AU - Whalen, Sydney AU - Ricks, James AU - Dixon, J. Cory AU - Clark, B. Abigail AU - Spaedy, Olivia AU - Skelton, Adam AU - Edupuganti, Neel AU - Dzubinski, Lance AU - Tate, Hudson AU - Dyess, Garrett AU - Lindeman, Brenessa AU - Lehmann, Soleymani Lisa PY - 2024/11/6 TI - ChatGPT-4 Omni Performance in USMLE Disciplines and Clinical Skills: Comparative Analysis JO - JMIR Med Educ SP - e63430 VL - 10 KW - large language model KW - ChatGPT KW - medical education KW - USMLE KW - AI in medical education KW - medical student resources KW - educational technology KW - artificial intelligence in medicine KW - clinical skills KW - LLM KW - medical licensing examination KW - medical students KW - United States Medical Licensing Examination KW - ChatGPT 4 Omni KW - ChatGPT 4 KW - ChatGPT 3.5 N2 - Background: Recent studies, including those by the National Board of Medical Examiners, have highlighted the remarkable capabilities of recent large language models (LLMs) such as ChatGPT in passing the United States Medical Licensing Examination (USMLE). However, there is a gap in detailed analysis of LLM performance in specific medical content areas, thus limiting an assessment of their potential utility in medical education. Objective: This study aimed to assess and compare the accuracy of successive ChatGPT versions (GPT-3.5, GPT-4, and GPT-4 Omni) in USMLE disciplines, clinical clerkships, and the clinical skills of diagnostics and management. Methods: This study used 750 clinical vignette-based multiple-choice questions to characterize the performance of successive ChatGPT versions (ChatGPT 3.5 [GPT-3.5], ChatGPT 4 [GPT-4], and ChatGPT 4 Omni [GPT-4o]) across USMLE disciplines, clinical clerkships, and in clinical skills (diagnostics and management). Accuracy was assessed using a standardized protocol, with statistical analyses conducted to compare the models? performances. Results: GPT-4o achieved the highest accuracy across 750 multiple-choice questions at 90.4%, outperforming GPT-4 and GPT-3.5, which scored 81.1% and 60.0%, respectively. GPT-4o?s highest performances were in social sciences (95.5%), behavioral and neuroscience (94.2%), and pharmacology (93.2%). In clinical skills, GPT-4o?s diagnostic accuracy was 92.7% and management accuracy was 88.8%, significantly higher than its predecessors. Notably, both GPT-4o and GPT-4 significantly outperformed the medical student average accuracy of 59.3% (95% CI 58.3?60.3). Conclusions: GPT-4o?s performance in USMLE disciplines, clinical clerkships, and clinical skills indicates substantial improvements over its predecessors, suggesting significant potential for the use of this technology as an educational aid for medical students. These findings underscore the need for careful consideration when integrating LLMs into medical education, emphasizing the importance of structured curricula to guide their appropriate use and the need for ongoing critical analyses to ensure their reliability and effectiveness. UR - https://mededu.jmir.org/2024/1/e63430 UR - http://dx.doi.org/10.2196/63430 ID - info:doi/10.2196/63430 ER - TY - JOUR AU - Dong, Xing-Xuan AU - Huang, Yueqing AU - Miao, Yi-Fan AU - Hu, Hui-Hui AU - Pan, Chen-Wei AU - Zhang, Tianyang AU - Wu, Yibo PY - 2024/9/12 TI - Personality and Health-Related Quality of Life of Older Chinese Adults: Cross-Sectional Study and Moderated Mediation Model Analysis JO - JMIR Public Health Surveill SP - e57437 VL - 10 KW - personality KW - health-related quality of life KW - older adults KW - sleep quality KW - quality of life KW - old KW - older KW - Chinese KW - China KW - mechanisms KW - psychology KW - behavior KW - analysis KW - hypothesis KW - neuroticism KW - mediation analysis KW - health care providers KW - aging N2 - Background: Personality has an impact on the health-related quality of life (HRQoL) of older adults. However, the relationship and mechanisms of the 2 variables are controversial, and few studies have been conducted on older adults. Objective: The aim of this study was to explore the relationship between personality and HRQoL and the mediating and moderating roles of sleep quality and place of residence in this relationship. Methods: A total of 4123 adults 60 years and older were from the Psychology and Behavior Investigation of Chinese Residents survey. Participants were asked to complete the Big Five Inventory, the Brief version of the Pittsburgh Sleep Quality Index, and EQ-5D-5L. A backpropagation neural network was used to explore the order of factors contributing to HRQoL. Path analysis was performed to evaluate the mediation hypothesis. Results: As of August 31, 2022, we enrolled 4123 older adults 60 years and older. Neuroticism and extraversion were strong influencing factors of HRQoL (normalized importance >50%). The results of the mediation analysis suggested that neuroticism and extraversion may enhance and diminish, respectively, HRQoL (index: ?=?.262, P<.001; visual analog scale: ?=?.193, P<.001) by increasing and decreasing brief version of the Pittsburgh Sleep Quality Index scores (neuroticism: ?=.17, P<.001; extraversion: ?=?.069, P<.001). The multigroup analysis suggested a significant moderating effect of the place of residence (EQ-5D-5L index: P<.001; EQ-5D-5L visual analog scale: P<.001). No significant direct effect was observed between extraversion and EQ-5D-5L index in urban older residents (?=.037, P=.73). Conclusions: This study sheds light on the potential mechanisms of personality and HRQoL among older Chinese adults and can help health care providers and relevant departments take reasonable measures to promote healthy aging. UR - https://publichealth.jmir.org/2024/1/e57437 UR - http://dx.doi.org/10.2196/57437 ID - info:doi/10.2196/57437 ER - TY - JOUR AU - Paradise Vit, Abigail AU - Magid, Avi PY - 2024/8/9 TI - Differences in Fear and Negativity Levels Between Formal and Informal Health-Related Websites: Analysis of Sentiments and Emotions JO - J Med Internet Res SP - e55151 VL - 26 KW - emotions KW - sentiment KW - health websites KW - fear N2 - Background: Searching for web-based health-related information is frequently performed by the public and may affect public behavior regarding health decision-making. Particularly, it may result in anxiety, erroneous, and harmful self-diagnosis. Most searched health-related topics are cancer, cardiovascular diseases, and infectious diseases. A health-related web-based search may result in either formal or informal medical website, both of which may evoke feelings of fear and negativity. Objective: Our study aimed to assess whether there is a difference in fear and negativity levels between information appearing on formal and informal health-related websites. Methods: A web search was performed to retrieve the contents of websites containing symptoms of selected diseases, using selected common symptoms. Retrieved websites were classified into formal and informal websites. Fear and negativity of each content were evaluated using 3 transformer models. A fourth transformer model was fine-tuned using an existing emotion data set obtained from a web-based health community. For formal and informal websites, fear and negativity levels were aggregated. t tests were conducted to evaluate the differences in fear and negativity levels between formal and informal websites. Results: In this study, unique websites (N=1448) were collected, of which 534 were considered formal and 914 were considered informal. There were 1820 result pages from formal websites and 1494 result pages from informal websites. According to our findings, fear levels were statistically higher (t2753=3.331; P<.001) on formal websites (mean 0.388, SD 0.177) than on informal websites (mean 0.366, SD 0.168). The results also show that the level of negativity was statistically higher (t2753=2.726; P=.006) on formal websites (mean 0.657, SD 0.211) than on informal websites (mean 0.636, SD 0.201). Conclusions: Positive texts may increase the credibility of formal health websites and increase their usage by the general public and the public?s compliance to the recommendations. Increasing the usage of natural language processing tools before publishing health-related information to achieve a more positive and less stressful text to be disseminated to the public is recommended. UR - https://www.jmir.org/2024/1/e55151 UR - http://dx.doi.org/10.2196/55151 UR - http://www.ncbi.nlm.nih.gov/pubmed/39120928 ID - info:doi/10.2196/55151 ER - TY - JOUR AU - Zhang, Jinxi AU - Li, Zhen AU - Liu, Yu AU - Li, Jian AU - Qiu, Hualong AU - Li, Mohan AU - Hou, Guohui AU - Zhou, Zhixiong PY - 2024/8/5 TI - An Effective Deep Learning Framework for Fall Detection: Model Development and Study Design JO - J Med Internet Res SP - e56750 VL - 26 KW - fall detection KW - deep learning KW - self-attention KW - accelerometer KW - gyroscope KW - human health KW - wearable sensors KW - Sisfall KW - MobiFall N2 - Background: Fall detection is of great significance in safeguarding human health. By monitoring the motion data, a fall detection system (FDS) can detect a fall accident. Recently, wearable sensors?based FDSs have become the mainstream of research, which can be categorized into threshold-based FDSs using experience, machine learning?based FDSs using manual feature extraction, and deep learning (DL)?based FDSs using automatic feature extraction. However, most FDSs focus on the global information of sensor data, neglecting the fact that different segments of the data contribute variably to fall detection. This shortcoming makes it challenging for FDSs to accurately distinguish between similar human motion patterns of actual falls and fall-like actions, leading to a decrease in detection accuracy. Objective: This study aims to develop and validate a DL framework to accurately detect falls using acceleration and gyroscope data from wearable sensors. We aim to explore the essential contributing features extracted from sensor data to distinguish falls from activities of daily life. The significance of this study lies in reforming the FDS by designing a weighted feature representation using DL methods to effectively differentiate between fall events and fall-like activities. Methods: Based on the 3-axis acceleration and gyroscope data, we proposed a new DL architecture, the dual-stream convolutional neural network self-attention (DSCS) model. Unlike previous studies, the used architecture can extract global feature information from acceleration and gyroscope data. Additionally, we incorporated a self-attention module to assign different weights to the original feature vector, enabling the model to learn the contribution effect of the sensor data and enhance classification accuracy. The proposed model was trained and tested on 2 public data sets: SisFall and MobiFall. In addition, 10 participants were recruited to carry out practical validation of the DSCS model. A total of 1700 trials were performed to test the generalization ability of the model. Results: The fall detection accuracy of the DSCS model was 99.32% (recall=99.15%; precision=98.58%) and 99.65% (recall=100%; precision=98.39%) on the test sets of SisFall and MobiFall, respectively. In the ablation experiment, we compared the DSCS model with state-of-the-art machine learning and DL models. On the SisFall data set, the DSCS model achieved the second-best accuracy; on the MobiFall data set, the DSCS model achieved the best accuracy, recall, and precision. In practical validation, the accuracy of the DSCS model was 96.41% (recall=95.12%; specificity=97.55%). Conclusions: This study demonstrates that the DSCS model can significantly improve the accuracy of fall detection on 2 publicly available data sets and performs robustly in practical validation. UR - https://www.jmir.org/2024/1/e56750 UR - http://dx.doi.org/10.2196/56750 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/56750 ER - TY - JOUR AU - Hassan, Ayman AU - Benlamri, Rachid AU - Diner, Trina AU - Cristofaro, Keli AU - Dillistone, Lucas AU - Khallouki, Hajar AU - Ahghari, Mahvareh AU - Littlefield, Shalyn AU - Siddiqui, Rabail AU - MacDonald, Russell AU - Savage, W. David PY - 2024/8/1 TI - An App for Navigating Patient Transportation and Acute Stroke Care in Northwestern Ontario Using Machine Learning: Retrospective Study JO - JMIR Form Res SP - e54009 VL - 8 KW - stroke care KW - acute stroke KW - northwestern KW - Ontario KW - prediction KW - models KW - machine learning KW - stroke KW - cardiovascular KW - brain KW - neuroscience KW - TIA KW - transient ischemic attack KW - coordinated care KW - navigation KW - navigating KW - mHealth KW - mobile health KW - app KW - apps KW - applications KW - geomapping KW - geography KW - geographical KW - location KW - spatial KW - predict KW - predictions KW - predictive N2 - Background: A coordinated care system helps provide timely access to treatment for suspected acute stroke. In Northwestern Ontario (NWO), Canada, communities are widespread with several hospitals offering various diagnostic equipment and services. Thus, resources are limited, and health care providers must often transfer patients with stroke to different hospital locations to ensure the most appropriate care access within recommended time frames. However, health care providers frequently situated temporarily (locum) in NWO or providing care remotely from other areas of Ontario may lack sufficient information and experience in the region to access care for a patient with a time-sensitive condition. Suboptimal decision-making may lead to multiple transfers before definitive stroke care is obtained, resulting in poor outcomes and additional health care system costs. Objective: We aimed to develop a tool to inform and assist NWO health care providers in determining the best transfer options for patients with stroke to provide the most efficient care access. We aimed to develop an app using a comprehensive geomapping navigation and estimation system based on machine learning algorithms. This app uses key stroke-related timelines including the last time the patient was known to be well, patient location, treatment options, and imaging availability at different health care facilities. Methods: Using historical data (2008-2020), an accurate prediction model using machine learning methods was developed and incorporated into a mobile app. These data contained parameters regarding air (Ornge) and land medical transport (3 services), which were preprocessed and cleaned. For cases in which Ornge air services and land ambulance medical transport were both involved in a patient transport process, data were merged and time intervals of the transport journey were determined. The data were distributed for training (35%), testing (35%), and validation (30%) of the prediction model. Results: In total, 70,623 records were collected in the data set from Ornge and land medical transport services to develop a prediction model. Various learning models were analyzed; all learning models perform better than the simple average of all points in predicting output variables. The decision tree model provided more accurate results than the other models. The decision tree model performed remarkably well, with the values from testing, validation, and the model within a close range. This model was used to develop the ?NWO Navigate Stroke? system. The system provides accurate results and demonstrates that a mobile app can be a significant tool for health care providers navigating stroke care in NWO, potentially impacting patient care and outcomes. Conclusions: The NWO Navigate Stroke system uses a data-driven, reliable, accurate prediction model while considering all variations and is simultaneously linked to all required acute stroke management pathways and tools. It was tested using historical data, and the next step will to involve usability testing with end users. UR - https://formative.jmir.org/2024/1/e54009 UR - http://dx.doi.org/10.2196/54009 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/54009 ER - TY - JOUR AU - Pellemans, Mathijs AU - Salmi, Salim AU - Mérelle, Saskia AU - Janssen, Wilco AU - van der Mei, Rob PY - 2024/8/1 TI - Automated Behavioral Coding to Enhance the Effectiveness of Motivational Interviewing in a Chat-Based Suicide Prevention Helpline: Secondary Analysis of a Clinical Trial JO - J Med Internet Res SP - e53562 VL - 26 KW - motivational interviewing KW - behavioral coding KW - suicide prevention KW - artificial intelligence KW - effectiveness KW - counseling KW - support tool KW - online help KW - mental health N2 - Background: With the rise of computer science and artificial intelligence, analyzing large data sets promises enormous potential in gaining insights for developing and improving evidence-based health interventions. One such intervention is the counseling strategy motivational interviewing (MI), which has been found effective in improving a wide range of health-related behaviors. Despite the simplicity of its principles, MI can be a challenging skill to learn and requires expertise to apply effectively. Objective: This study aims to investigate the performance of artificial intelligence models in classifying MI behavior and explore the feasibility of using these models in online helplines for mental health as an automated support tool for counselors in clinical practice. Methods: We used a coded data set of 253 MI counseling chat sessions from the 113 Suicide Prevention helpline. With 23,982 messages coded with the MI Sequential Code for Observing Process Exchanges codebook, we trained and evaluated 4 machine learning models and 1 deep learning model to classify client- and counselor MI behavior based on language use. Results: The deep learning model BERTje outperformed all machine learning models, accurately predicting counselor behavior (accuracy=0.72, area under the curve [AUC]=0.95, Cohen ?=0.69). It differentiated MI congruent and incongruent counselor behavior (AUC=0.92, ?=0.65) and evocative and nonevocative language (AUC=0.92, ?=0.66). For client behavior, the model achieved an accuracy of 0.70 (AUC=0.89, ?=0.55). The model?s interpretable predictions discerned client change talk and sustain talk, counselor affirmations, and reflection types, facilitating valuable counselor feedback. Conclusions: The results of this study demonstrate that artificial intelligence techniques can accurately classify MI behavior, indicating their potential as a valuable tool for enhancing MI proficiency in online helplines for mental health. Provided that the data set size is sufficiently large with enough training samples for each behavioral code, these methods can be trained and applied to other domains and languages, offering a scalable and cost-effective way to evaluate MI adherence, accelerate behavioral coding, and provide therapists with personalized, quick, and objective feedback. UR - https://www.jmir.org/2024/1/e53562 UR - http://dx.doi.org/10.2196/53562 UR - http://www.ncbi.nlm.nih.gov/pubmed/39088244 ID - info:doi/10.2196/53562 ER - TY - JOUR AU - Tsai, Chung-You AU - Tian, Jing-Hui AU - Lee, Chien-Cheng AU - Kuo, Hann-Chorng PY - 2024/7/23 TI - Building Dual AI Models and Nomograms Using Noninvasive Parameters for Aiding Male Bladder Outlet Obstruction Diagnosis and Minimizing the Need for Invasive Video-Urodynamic Studies: Development and Validation Study JO - J Med Internet Res SP - e58599 VL - 26 KW - bladder outlet obstruction KW - lower urinary tract symptoms KW - machine learning KW - nomogram KW - artificial intelligence KW - video urodynamic study N2 - Background: Diagnosing underlying causes of nonneurogenic male lower urinary tract symptoms associated with bladder outlet obstruction (BOO) is challenging. Video-urodynamic studies (VUDS) and pressure-flow studies (PFS) are both invasive diagnostic methods for BOO. VUDS can more precisely differentiate etiologies of male BOO, such as benign prostatic obstruction, primary bladder neck obstruction, and dysfunctional voiding, potentially outperforming PFS. Objective: These examinations? invasive nature highlights the need for developing noninvasive predictive models to facilitate BOO diagnosis and reduce the necessity for invasive procedures. Methods: We conducted a retrospective study with a cohort of men with medication-refractory, nonneurogenic lower urinary tract symptoms suspected of BOO who underwent VUDS from 2001 to 2022. In total, 2 BOO predictive models were developed?1 based on the International Continence Society?s definition (International Continence Society?defined bladder outlet obstruction; ICS-BOO) and the other on video-urodynamic studies?diagnosed bladder outlet obstruction (VBOO). The patient cohort was randomly split into training and test sets for analysis. A total of 6 machine learning algorithms, including logistic regression, were used for model development. During model development, we first performed development validation using repeated 5-fold cross-validation on the training set and then test validation to assess the model?s performance on an independent test set. Both models were implemented as paper-based nomograms and integrated into a web-based artificial intelligence prediction tool to aid clinical decision-making. Results: Among 307 patients, 26.7% (n=82) met the ICS-BOO criteria, while 82.1% (n=252) were diagnosed with VBOO. The ICS-BOO prediction model had a mean area under the receiver operating characteristic curve (AUC) of 0.74 (SD 0.09) and mean accuracy of 0.76 (SD 0.04) in development validation and AUC and accuracy of 0.86 and 0.77, respectively, in test validation. The VBOO prediction model yielded a mean AUC of 0.71 (SD 0.06) and mean accuracy of 0.77 (SD 0.06) internally, with AUC and accuracy of 0.72 and 0.76, respectively, externally. When both models? predictions are applied to the same patient, their combined insights can significantly enhance clinical decision-making and simplify the diagnostic pathway. By the dual-model prediction approach, if both models positively predict BOO, suggesting all cases actually resulted from medication-refractory primary bladder neck obstruction or benign prostatic obstruction, surgical intervention may be considered. Thus, VUDS might be unnecessary for 100 (32.6%) patients. Conversely, when ICS-BOO predictions are negative but VBOO predictions are positive, indicating varied etiology, VUDS rather than PFS is advised for precise diagnosis and guiding subsequent therapy, accurately identifying 51.1% (47/92) of patients for VUDS. Conclusions: The 2 machine learning models predicting ICS-BOO and VBOO, based on 6 noninvasive clinical parameters, demonstrate commendable discrimination performance. Using the dual-model prediction approach, when both models predict positively, VUDS may be avoided, assisting in male BOO diagnosis and reducing the need for such invasive procedures. UR - https://www.jmir.org/2024/1/e58599 UR - http://dx.doi.org/10.2196/58599 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58599 ER - TY - JOUR AU - Chen, Xi AU - Wang, Li AU - You, MingKe AU - Liu, WeiZhi AU - Fu, Yu AU - Xu, Jie AU - Zhang, Shaoting AU - Chen, Gang AU - Li, Kang AU - Li, Jian PY - 2024/7/22 TI - Evaluating and Enhancing Large Language Models? Performance in Domain-Specific Medicine: Development and Usability Study With DocOA JO - J Med Internet Res SP - e58158 VL - 26 KW - large language model KW - retrieval-augmented generation KW - domain-specific benchmark framework KW - osteoarthritis management N2 - Background: The efficacy of large language models (LLMs) in domain-specific medicine, particularly for managing complex diseases such as osteoarthritis (OA), remains largely unexplored. Objective: This study focused on evaluating and enhancing the clinical capabilities and explainability of LLMs in specific domains, using OA management as a case study. Methods: A domain-specific benchmark framework was developed to evaluate LLMs across a spectrum from domain-specific knowledge to clinical applications in real-world clinical scenarios. DocOA, a specialized LLM designed for OA management integrating retrieval-augmented generation and instructional prompts, was developed. It can identify the clinical evidence upon which its answers are based through retrieval-augmented generation, thereby demonstrating the explainability of those answers. The study compared the performance of GPT-3.5, GPT-4, and a specialized assistant, DocOA, using objective and human evaluations. Results: Results showed that general LLMs such as GPT-3.5 and GPT-4 were less effective in the specialized domain of OA management, particularly in providing personalized treatment recommendations. However, DocOA showed significant improvements. Conclusions: This study introduces a novel benchmark framework that assesses the domain-specific abilities of LLMs in multiple aspects, highlights the limitations of generalized LLMs in clinical contexts, and demonstrates the potential of tailored approaches for developing domain-specific medical LLMs. UR - https://www.jmir.org/2024/1/e58158 UR - http://dx.doi.org/10.2196/58158 UR - http://www.ncbi.nlm.nih.gov/pubmed/38833165 ID - info:doi/10.2196/58158 ER - TY - JOUR AU - Kale, U. Aditya AU - Dattani, Riya AU - Tabansi, Ashley AU - Hogg, Jeffry Henry David AU - Pearson, Russell AU - Glocker, Ben AU - Golder, Su AU - Waring, Justin AU - Liu, Xiaoxuan AU - Moore, J. David AU - Denniston, K. Alastair PY - 2024/7/11 TI - AI as a Medical Device Adverse Event Reporting in Regulatory Databases: Protocol for a Systematic Review JO - JMIR Res Protoc SP - e48156 VL - 13 KW - adverse event KW - artificial intelligence KW - regulatory science KW - regulatory database KW - safety issue KW - feedback KW - health care product KW - artificial intelligence health technology KW - reporting system KW - safety KW - medical devices KW - safety monitoring KW - risks KW - descriptive analysis N2 - Background: The reporting of adverse events (AEs) relating to medical devices is a long-standing area of concern, with suboptimal reporting due to a range of factors including a failure to recognize the association of AEs with medical devices, lack of knowledge of how to report AEs, and a general culture of nonreporting. The introduction of artificial intelligence as a medical device (AIaMD) requires a robust safety monitoring environment that recognizes both generic risks of a medical device and some of the increasingly recognized risks of AIaMD (such as algorithmic bias). There is an urgent need to understand the limitations of current AE reporting systems and explore potential mechanisms for how AEs could be detected, attributed, and reported with a view to improving the early detection of safety signals. Objective: The systematic review outlined in this protocol aims to yield insights into the frequency and severity of AEs while characterizing the events using existing regulatory guidance. Methods: Publicly accessible AE databases will be searched to identify AE reports for AIaMD. Scoping searches have identified 3 regulatory territories for which public access to AE reports is provided: the United States, the United Kingdom, and Australia. AEs will be included for analysis if an artificial intelligence (AI) medical device is involved. Software as a medical device without AI is not within the scope of this review. Data extraction will be conducted using a data extraction tool designed for this review and will be done independently by AUK and a second reviewer. Descriptive analysis will be conducted to identify the types of AEs being reported, and their frequency, for different types of AIaMD. AEs will be analyzed and characterized according to existing regulatory guidance. Results: Scoping searches are being conducted with screening to begin in April 2024. Data extraction and synthesis will commence in May 2024, with planned completion by August 2024. The review will highlight the types of AEs being reported for different types of AI medical devices and where the gaps are. It is anticipated that there will be particularly low rates of reporting for indirect harms associated with AIaMD. Conclusions: To our knowledge, this will be the first systematic review of 3 different regulatory sources reporting AEs associated with AIaMD. The review will focus on real-world evidence, which brings certain limitations, compounded by the opacity of regulatory databases generally. The review will outline the characteristics and frequency of AEs reported for AIaMD and help regulators and policy makers to continue developing robust safety monitoring processes. International Registered Report Identifier (IRRID): PRR1-10.2196/48156 UR - https://www.researchprotocols.org/2024/1/e48156 UR - http://dx.doi.org/10.2196/48156 UR - http://www.ncbi.nlm.nih.gov/pubmed/38990628 ID - info:doi/10.2196/48156 ER - TY - JOUR AU - Xu, Jie AU - Lu, Lu AU - Peng, Xinwei AU - Pang, Jiali AU - Ding, Jinru AU - Yang, Lingrui AU - Song, Huan AU - Li, Kang AU - Sun, Xin AU - Zhang, Shaoting PY - 2024/6/28 TI - Data Set and Benchmark (MedGPTEval) to Evaluate Responses From Large Language Models in Medicine: Evaluation Development and Validation JO - JMIR Med Inform SP - e57674 VL - 12 KW - ChatGPT KW - LLM KW - assessment KW - data set KW - benchmark KW - medicine N2 - Background: Large language models (LLMs) have achieved great progress in natural language processing tasks and demonstrated the potential for use in clinical applications. Despite their capabilities, LLMs in the medical domain are prone to generating hallucinations (not fully reliable responses). Hallucinations in LLMs? responses create substantial risks, potentially threatening patients? physical safety. Thus, to perceive and prevent this safety risk, it is essential to evaluate LLMs in the medical domain and build a systematic evaluation. Objective: We developed a comprehensive evaluation system, MedGPTEval, composed of criteria, medical data sets in Chinese, and publicly available benchmarks. Methods: First, a set of evaluation criteria was designed based on a comprehensive literature review. Second, existing candidate criteria were optimized by using a Delphi method with 5 experts in medicine and engineering. Third, 3 clinical experts designed medical data sets to interact with LLMs. Finally, benchmarking experiments were conducted on the data sets. The responses generated by chatbots based on LLMs were recorded for blind evaluations by 5 licensed medical experts. The evaluation criteria that were obtained covered medical professional capabilities, social comprehensive capabilities, contextual capabilities, and computational robustness, with 16 detailed indicators. The medical data sets include 27 medical dialogues and 7 case reports in Chinese. Three chatbots were evaluated: ChatGPT by OpenAI; ERNIE Bot by Baidu, Inc; and Doctor PuJiang (Dr PJ) by Shanghai Artificial Intelligence Laboratory. Results: Dr PJ outperformed ChatGPT and ERNIE Bot in the multiple-turn medical dialogues and case report scenarios. Dr PJ also outperformed ChatGPT in the semantic consistency rate and complete error rate category, indicating better robustness. However, Dr PJ had slightly lower scores in medical professional capabilities compared with ChatGPT in the multiple-turn dialogue scenario. Conclusions: MedGPTEval provides comprehensive criteria to evaluate chatbots by LLMs in the medical domain, open-source data sets, and benchmarks assessing 3 LLMs. Experimental results demonstrate that Dr PJ outperforms ChatGPT and ERNIE Bot in social and professional contexts. Therefore, such an assessment system can be easily adopted by researchers in this community to augment an open-source data set. UR - https://medinform.jmir.org/2024/1/e57674 UR - http://dx.doi.org/10.2196/57674 ID - info:doi/10.2196/57674 ER - TY - JOUR AU - Razjouyan, Javad AU - Orkaby, R. Ariela AU - Horstman, J. Molly AU - Goyal, Parag AU - Intrator, Orna AU - Naik, D. Aanand PY - 2024/6/27 TI - The Frailty Trajectory?s Additional Edge Over the Frailty Index: Retrospective Cohort Study of Veterans With Heart Failure JO - JMIR Aging SP - e56345 VL - 7 KW - gerontology KW - geriatric KW - geriatrics KW - older adult KW - older adults KW - elder KW - elderly KW - older person KW - older people KW - ageing KW - aging KW - frailty KW - frailty index KW - frailty trajectory KW - frail KW - weak KW - weakness KW - heart failure KW - HF KW - cardiovascular disease KW - CVD KW - congestive heart failure KW - CHF KW - myocardial infarction KW - MI KW - unstable angina KW - angina KW - cardiac arrest KW - atherosclerosis KW - cardiology KW - cardiac KW - cardiologist KW - cardiologists UR - https://aging.jmir.org/2024/1/e56345 UR - http://dx.doi.org/10.2196/56345 ID - info:doi/10.2196/56345 ER - TY - JOUR AU - Reshetnikov, Aleksey AU - Shaikhattarova, Natalia AU - Mazurok, Margarita AU - Kasatkina, Nadezhda PY - 2024/6/20 TI - Dental Tissue Density in Healthy Children Based on Radiological Data: Retrospective Analysis JO - JMIRx Med SP - e56759 VL - 5 KW - density KW - teeth KW - tooth KW - dental KW - dentist KW - dentists KW - dentistry KW - oral KW - tissue KW - enamel KW - dentin KW - Hounsfield KW - pathology KW - pathological KW - radiology KW - radiological KW - image KW - images KW - imaging KW - teeth density KW - Hounsfield unit KW - diagnostic imaging N2 - Background: Information about the range of Hounsfield values for healthy teeth tissues could become an additional tool in assessing dental health and could be used, among other data, for subsequent machine learning. Objective: The purpose of our study was to determine dental tissue densities in Hounsfield units (HU). Methods: The total sample included 36 healthy children (n=21, 58% girls and n=15, 42% boys) aged 10-11 years at the time of the study. The densities of 320 teeth tissues were analyzed. Data were expressed as means and SDs. The significance was determined using the Student (1-tailed) t test. The statistical significance was set at P<.05. Results: The densities of 320 teeth tissues were analyzed: 72 (22.5%) first permanent molars, 72 (22.5%) permanent central incisors, 27 (8.4%) second primary molars, 40 (12.5%) tooth germs of second premolars, 37 (11.6%) second premolars, 9 (2.8%) second permanent molars, and 63 (19.7%) tooth germs of second permanent molars. The analysis of the data showed that tissues of healthy teeth in children have different density ranges: enamel, from mean 2954.69 (SD 223.77) HU to mean 2071.00 (SD 222.86) HU; dentin, from mean 1899.23 (SD 145.94) HU to mean 1323.10 (SD 201.67) HU; and pulp, from mean 420.29 (SD 196.47) HU to mean 183.63 (SD 97.59) HU. The tissues (enamel and dentin) of permanent central incisors in the mandible and maxilla had the highest mean densities. No gender differences concerning the density of dental tissues were reliably identified. Conclusions: The evaluation of Hounsfield values for dental tissues can be used as an objective method for assessing their densities. If the determined densities of the enamel, dentin, and pulp of the tooth do not correspond to the range of values for healthy tooth tissues, then it may indicate a pathology. UR - https://xmed.jmir.org/2024/1/e56759 UR - http://dx.doi.org/10.2196/56759 ID - info:doi/10.2196/56759 ER - TY - JOUR AU - Dong, Tim AU - Sinha, Shubhra AU - Zhai, Ben AU - Fudulu, Daniel AU - Chan, Jeremy AU - Narayan, Pradeep AU - Judge, Andy AU - Caputo, Massimo AU - Dimagli, Arnaldo AU - Benedetto, Umberto AU - Angelini, D. Gianni PY - 2024/6/12 TI - Performance Drift in Machine Learning Models for Cardiac Surgery Risk Prediction: Retrospective Analysis JO - JMIRx Med SP - e45973 VL - 5 KW - cardiac surgery KW - artificial intelligence KW - risk prediction KW - machine learning KW - operative mortality KW - data set drift KW - performance drift KW - national data set KW - adult KW - data KW - cardiac KW - surgery KW - cardiology KW - heart KW - risk KW - prediction KW - United Kingdom KW - mortality KW - performance KW - model N2 - Background: The Society of Thoracic Surgeons and European System for Cardiac Operative Risk Evaluation (EuroSCORE) II risk scores are the most commonly used risk prediction models for in-hospital mortality after adult cardiac surgery. However, they are prone to miscalibration over time and poor generalization across data sets; thus, their use remains controversial. Despite increased interest, a gap in understanding the effect of data set drift on the performance of machine learning (ML) over time remains a barrier to its wider use in clinical practice. Data set drift occurs when an ML system underperforms because of a mismatch between the data it was developed from and the data on which it is deployed. Objective: In this study, we analyzed the extent of performance drift using models built on a large UK cardiac surgery database. The objectives were to (1) rank and assess the extent of performance drift in cardiac surgery risk ML models over time and (2) investigate any potential influence of data set drift and variable importance drift on performance drift. Methods: We conducted a retrospective analysis of prospectively, routinely gathered data on adult patients undergoing cardiac surgery in the United Kingdom between 2012 and 2019. We temporally split the data 70:30 into a training and validation set and a holdout set. Five novel ML mortality prediction models were developed and assessed, along with EuroSCORE II, for relationships between and within variable importance drift, performance drift, and actual data set drift. Performance was assessed using a consensus metric. Results: A total of 227,087 adults underwent cardiac surgery during the study period, with a mortality rate of 2.76% (n=6258). There was strong evidence of a decrease in overall performance across all models (P<.0001). Extreme gradient boosting (clinical effectiveness metric [CEM] 0.728, 95% CI 0.728-0.729) and random forest (CEM 0.727, 95% CI 0.727-0.728) were the overall best-performing models, both temporally and nontemporally. EuroSCORE II performed the worst across all comparisons. Sharp changes in variable importance and data set drift from October to December 2017, from June to July 2018, and from December 2018 to February 2019 mirrored the effects of performance decrease across models. Conclusions: All models show a decrease in at least 3 of the 5 individual metrics. CEM and variable importance drift detection demonstrate the limitation of logistic regression methods used for cardiac surgery risk prediction and the effects of data set drift. Future work will be required to determine the interplay between ML models and whether ensemble models could improve on their respective performance advantages. UR - https://xmed.jmir.org/2024/1/e45973 UR - http://dx.doi.org/10.2196/45973 ID - info:doi/10.2196/45973 ER - TY - JOUR AU - Eguale, Tewodros AU - Bastardot, François AU - Song, Wenyu AU - Motta-Calderon, Daniel AU - Elsobky, Yasmin AU - Rui, Angela AU - Marceau, Marlika AU - Davis, Clark AU - Ganesan, Sandya AU - Alsubai, Ava AU - Matthews, Michele AU - Volk, A. Lynn AU - Bates, W. David AU - Rozenblum, Ronen PY - 2024/6/4 TI - A Machine Learning Application to Classify Patients at Differing Levels of Risk of Opioid Use Disorder: Clinician-Based Validation Study JO - JMIR Med Inform SP - e53625 VL - 12 KW - opioid-related disorders KW - opioid use disorder KW - machine learning KW - artificial intelligence KW - electronic health record KW - clinical decision support KW - model validation KW - patient medication safety KW - medication safety KW - clinical decision KW - decision making KW - decision support KW - patient safety KW - opioid use KW - drug use KW - opioid safety KW - medication KW - OUD KW - EHR KW - AI N2 - Background: Despite restrictive opioid management guidelines, opioid use disorder (OUD) remains a major public health concern. Machine learning (ML) offers a promising avenue for identifying and alerting clinicians about OUD, thus supporting better clinical decision-making regarding treatment. Objective: This study aimed to assess the clinical validity of an ML application designed to identify and alert clinicians of different levels of OUD risk by comparing it to a structured review of medical records by clinicians. Methods: The ML application generated OUD risk alerts on outpatient data for 649,504 patients from 2 medical centers between 2010 and 2013. A random sample of 60 patients was selected from 3 OUD risk level categories (n=180). An OUD risk classification scheme and standardized data extraction tool were developed to evaluate the validity of the alerts. Clinicians independently conducted a systematic and structured review of medical records and reached a consensus on a patient?s OUD risk level, which was then compared to the ML application?s risk assignments. Results: A total of 78,587 patients without cancer with at least 1 opioid prescription were identified as follows: not high risk (n=50,405, 64.1%), high risk (n=16,636, 21.2%), and suspected OUD or OUD (n=11,546, 14.7%). The sample of 180 patients was representative of the total population in terms of age, sex, and race. The interrater reliability between the ML application and clinicians had a weighted kappa coefficient of 0.62 (95% CI 0.53-0.71), indicating good agreement. Combining the high risk and suspected OUD or OUD categories and using the review of medical records as a gold standard, the ML application had a corrected sensitivity of 56.6% (95% CI 48.7%-64.5%) and a corrected specificity of 94.2% (95% CI 90.3%-98.1%). The positive and negative predictive values were 93.3% (95% CI 88.2%-96.3%) and 60.0% (95% CI 50.4%-68.9%), respectively. Key themes for disagreements between the ML application and clinician reviews were identified. Conclusions: A systematic comparison was conducted between an ML application and clinicians for identifying OUD risk. The ML application generated clinically valid and useful alerts about patients? different OUD risk levels. ML applications hold promise for identifying patients at differing levels of OUD risk and will likely complement traditional rule-based approaches to generating alerts about opioid safety issues. UR - https://medinform.jmir.org/2024/1/e53625 UR - http://dx.doi.org/10.2196/53625 ID - info:doi/10.2196/53625 ER - TY - JOUR AU - Liu, Pei AU - Liu, Yijun AU - Liu, Hao AU - Xiong, Linping AU - Mei, Changlin AU - Yuan, Lei PY - 2024/6/3 TI - A Random Forest Algorithm for Assessing Risk Factors Associated With Chronic Kidney Disease: Observational Study JO - Asian Pac Isl Nurs J SP - e48378 VL - 8 KW - chronic kidney disease KW - random forest model KW - risk factors KW - assessment N2 - Background: The prevalence and mortality rate of chronic kidney disease (CKD) are increasing year by year, and it has become a global public health issue. The economic burden caused by CKD is increasing at a rate of 1% per year. CKD is highly prevalent and its treatment cost is high but unfortunately remains unknown. Therefore, early detection and intervention are vital means to mitigate the treatment burden on patients and decrease disease progression. Objective: In this study, we investigated the advantages of using the random forest (RF) algorithm for assessing risk factors associated with CKD. Methods: We included 40,686 people with complete screening records who underwent screening between January 1, 2015, and December 22, 2020, in Jing?an District, Shanghai, China. We grouped the participants into those with and those without CKD by staging based on the glomerular filtration rate staging and grouping based on albuminuria. Using a logistic regression model, we determined the relationship between CKD and risk factors. The RF machine learning algorithm was used to score the predictive variables and rank them based on their importance to construct a prediction model. Results: The logistic regression model revealed that gender, older age, obesity, abnormal index estimated glomerular filtration rate, retirement status, and participation in urban employee medical insurance were significantly associated with the risk of CKD. On RF algorithm?based screening, the top 4 factors influencing CKD were age, albuminuria, working status, and urinary albumin-creatinine ratio. The RF model predicted an area under the receiver operating characteristic curve of 93.15%. Conclusions: Our findings reveal that the RF algorithm has significant predictive value for assessing risk factors associated with CKD and allows the screening of individuals with risk factors. This has crucial implications for early intervention and prevention of CKD. UR - https://apinj.jmir.org/2024/1/e48378 UR - http://dx.doi.org/10.2196/48378 UR - http://www.ncbi.nlm.nih.gov/pubmed/38830204 ID - info:doi/10.2196/48378 ER - TY - JOUR AU - Arango-Ibanez, Pablo Juan AU - Posso-Nuñez, Alejandro Jose AU - Díaz-Solórzano, Pablo Juan AU - Cruz-Suárez, Gustavo PY - 2024/5/24 TI - Evidence-Based Learning Strategies in Medicine Using AI JO - JMIR Med Educ SP - e54507 VL - 10 KW - artificial intelligence KW - large language models KW - ChatGPT KW - active recall KW - memory cues KW - LLMs KW - evidence-based KW - learning strategy KW - medicine KW - AI KW - medical education KW - knowledge KW - relevance UR - https://mededu.jmir.org/2024/1/e54507 UR - http://dx.doi.org/10.2196/54507 ID - info:doi/10.2196/54507 ER - TY - JOUR AU - Tabashum, Thasina AU - Snyder, Cooper Robert AU - O'Brien, K. Megan AU - Albert, V. Mark PY - 2024/5/17 TI - Machine Learning Models for Parkinson Disease: Systematic Review JO - JMIR Med Inform SP - e50117 VL - 12 KW - Parkinson disease KW - machine learning KW - systematic review KW - deep learning KW - clinical adoption KW - validation techniques KW - PRISMA KW - Preferred Reporting Items for Systematic Reviews and Meta-Analyses N2 - Background: With the increasing availability of data, computing resources, and easier-to-use software libraries, machine learning (ML) is increasingly used in disease detection and prediction, including for Parkinson disease (PD). Despite the large number of studies published every year, very few ML systems have been adopted for real-world use. In particular, a lack of external validity may result in poor performance of these systems in clinical practice. Additional methodological issues in ML design and reporting can also hinder clinical adoption, even for applications that would benefit from such data-driven systems. Objective: To sample the current ML practices in PD applications, we conducted a systematic review of studies published in 2020 and 2021 that used ML models to diagnose PD or track PD progression. Methods: We conducted a systematic literature review in accordance with PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines in PubMed between January 2020 and April 2021, using the following exact string: ?Parkinson?s? AND (?ML? OR ?prediction? OR ?classification? OR ?detection? or ?artificial intelligence? OR ?AI?). The search resulted in 1085 publications. After a search query and review, we found 113 publications that used ML for the classification or regression-based prediction of PD or PD-related symptoms. Results: Only 65.5% (74/113) of studies used a holdout test set to avoid potentially inflated accuracies, and approximately half (25/46, 54%) of the studies without a holdout test set did not state this as a potential concern. Surprisingly, 38.9% (44/113) of studies did not report on how or if models were tuned, and an additional 27.4% (31/113) used ad hoc model tuning, which is generally frowned upon in ML model optimization. Only 15% (17/113) of studies performed direct comparisons of results with other models, severely limiting the interpretation of results. Conclusions: This review highlights the notable limitations of current ML systems and techniques that may contribute to a gap between reported performance in research and the real-life applicability of ML models aiming to detect and predict diseases such as PD. UR - https://medinform.jmir.org/2024/1/e50117 UR - http://dx.doi.org/10.2196/50117 ID - info:doi/10.2196/50117 ER - TY - JOUR AU - Mohebbi, Fahimeh AU - Forati, Masoud Amir AU - Torres, Lucas AU - deRoon-Cassini, A. Terri AU - Harris, Jennifer AU - Tomas, W. Carissa AU - Mantsch, R. John AU - Ghose, Rina PY - 2024/5/3 TI - Exploring the Association Between Structural Racism and Mental Health: Geospatial and Machine Learning Analysis JO - JMIR Public Health Surveill SP - e52691 VL - 10 KW - machine learning KW - geospatial KW - racial disparities KW - social determinant of health KW - structural racism KW - mental health KW - health disparities KW - deep learning N2 - Background: Structural racism produces mental health disparities. While studies have examined the impact of individual factors such as poverty and education, the collective contribution of these elements, as manifestations of structural racism, has been less explored. Milwaukee County, Wisconsin, with its racial and socioeconomic diversity, provides a unique context for this multifactorial investigation. Objective: This research aimed to delineate the association between structural racism and mental health disparities in Milwaukee County, using a combination of geospatial and deep learning techniques. We used secondary data sets where all data were aggregated and anonymized before being released by federal agencies. Methods: We compiled 217 georeferenced explanatory variables across domains, initially deliberately excluding race-based factors to focus on nonracial determinants. This approach was designed to reveal the underlying patterns of risk factors contributing to poor mental health, subsequently reintegrating race to assess the effects of racism quantitatively. The variable selection combined tree-based methods (random forest) and conventional techniques, supported by variance inflation factor and Pearson correlation analysis for multicollinearity mitigation. The geographically weighted random forest model was used to investigate spatial heterogeneity and dependence. Self-organizing maps, combined with K-means clustering, were used to analyze data from Milwaukee communities, focusing on quantifying the impact of structural racism on the prevalence of poor mental health. Results: While 12 influential factors collectively accounted for 95.11% of the variability in mental health across communities, the top 6 factors?smoking, poverty, insufficient sleep, lack of health insurance, employment, and age?were particularly impactful. Predominantly, African American neighborhoods were disproportionately affected, which is 2.23 times more likely to encounter high-risk clusters for poor mental health. Conclusions: The findings demonstrate that structural racism shapes mental health disparities, with Black community members disproportionately impacted. The multifaceted methodological approach underscores the value of integrating geospatial analysis and deep learning to understand complex social determinants of mental health. These insights highlight the need for targeted interventions, addressing both individual and systemic factors to mitigate mental health disparities rooted in structural racism. UR - https://publichealth.jmir.org/2024/1/e52691 UR - http://dx.doi.org/10.2196/52691 UR - http://www.ncbi.nlm.nih.gov/pubmed/38701436 ID - info:doi/10.2196/52691 ER - TY - JOUR AU - Das, Sudeshna AU - Walker, Drew AU - Rajwal, Swati AU - Lakamana, Sahithi AU - Sumner, A. Steven AU - Mack, A. Karin AU - Kaczkowski, Wojciech AU - Sarker, Abeed PY - 2024/5/2 TI - Emerging Trends of Self-Harm Using Sodium Nitrite in an Online Suicide Community: Observational Study Using Natural Language Processing Analysis JO - JMIR Ment Health SP - e53730 VL - 11 KW - online suicide community KW - suicide KW - sodium nitrite KW - sodium nitrite sources KW - mental health KW - adolescent KW - juvenile KW - self harm KW - Sanctioned Suicide KW - online forum KW - US KW - public health KW - surveillance KW - data mining KW - natural language processing KW - machine learning KW - usage KW - suicidal KW - accuracy KW - consumption KW - information KW - United States N2 - Background: There is growing concern around the use of sodium nitrite (SN) as an emerging means of suicide, particularly among younger people. Given the limited information on the topic from traditional public health surveillance sources, we studied posts made to an online suicide discussion forum, ?Sanctioned Suicide,? which is a primary source of information on the use and procurement of SN. Objective: This study aims to determine the trends in SN purchase and use, as obtained via data mining from subscriber posts on the forum. We also aim to determine the substances and topics commonly co-occurring with SN, as well as the geographical distribution of users and sources of SN. Methods: We collected all publicly available from the site?s inception in March 2018 to October 2022. Using data-driven methods, including natural language processing and machine learning, we analyzed the trends in SN mentions over time, including the locations of SN consumers and the sources from which SN is procured. We developed a transformer-based source and location classifier to determine the geographical distribution of the sources of SN. Results: Posts pertaining to SN show a rise in popularity, and there were statistically significant correlations between real-life use of SN and suicidal intent when compared to data from the Centers for Disease Control and Prevention (CDC) Wide-Ranging Online Data for Epidemiologic Research (?=0.727; P<.001) and the National Poison Data System (?=0.866; P=.001). We observed frequent co-mentions of antiemetics, benzodiazepines, and acid regulators with SN. Our proposed machine learning?based source and location classifier can detect potential sources of SN with an accuracy of 72.92% and showed consumption in the United States and elsewhere. Conclusions: Vital information about SN and other emerging mechanisms of suicide can be obtained from online forums. UR - https://mental.jmir.org/2024/1/e53730 UR - http://dx.doi.org/10.2196/53730 ID - info:doi/10.2196/53730 ER - TY - JOUR AU - Van den Eynde, Jef PY - 2024/4/19 TI - CHDmap: One Step Further Toward Integrating Medicine-Based Evidence Into Practice JO - JMIR Med Inform SP - e52343 VL - 12 KW - artificial intelligence KW - clinical practice KW - congenital heart disease KW - decision-making KW - evidence-based medicine KW - machine learning KW - medicine-based evidence KW - patient similarity networks KW - precision medicine KW - randomized controlled trials UR - https://medinform.jmir.org/2024/1/e52343 UR - http://dx.doi.org/10.2196/52343 ID - info:doi/10.2196/52343 ER - TY - JOUR AU - Shulha, Michael AU - Hovdebo, Jordan AU - D?Souza, Vinita AU - Thibault, Francis AU - Harmouche, Rola PY - 2024/4/16 TI - Integrating Explainable Machine Learning in Clinical Decision Support Systems: Study Involving a Modified Design Thinking Approach JO - JMIR Form Res SP - e50475 VL - 8 KW - explainable machine learning KW - XML KW - design thinking approach KW - NASSS framework KW - clinical decision support KW - clinician engagement KW - clinician-facing interface KW - clinician trust in machine learning KW - COVID-19 KW - chest x-ray KW - severity prediction N2 - Background: Though there has been considerable effort to implement machine learning (ML) methods for health care, clinical implementation has lagged. Incorporating explainable machine learning (XML) methods through the development of a decision support tool using a design thinking approach is expected to lead to greater uptake of such tools. Objective: This work aimed to explore how constant engagement of clinician end users can address the lack of adoption of ML tools in clinical contexts due to their lack of transparency and address challenges related to presenting explainability in a decision support interface. Methods: We used a design thinking approach augmented with additional theoretical frameworks to provide more robust approaches to different phases of design. In particular, in the problem definition phase, we incorporated the nonadoption, abandonment, scale-up, spread, and sustainability of technology in health care (NASSS) framework to assess these aspects in a health care network. This process helped focus on the development of a prognostic tool that predicted the likelihood of admission to an intensive care ward based on disease severity in chest x-ray images. In the ideate, prototype, and test phases, we incorporated a metric framework to assess physician trust in artificial intelligence (AI) tools. This allowed us to compare physicians? assessments of the domain representation, action ability, and consistency of the tool. Results: Physicians found the design of the prototype elegant, and domain appropriate representation of data was displayed in the tool. They appreciated the simplified explainability overlay, which only displayed the most predictive patches that cumulatively explained 90% of the final admission risk score. Finally, in terms of consistency, physicians unanimously appreciated the capacity to compare multiple x-ray images in the same view. They also appreciated the ability to toggle the explainability overlay so that both options made it easier for them to assess how consistently the tool was identifying elements of the x-ray image they felt would contribute to overall disease severity. Conclusions: The adopted approach is situated in an evolving space concerned with incorporating XML or AI technologies into health care software. We addressed the alignment of AI as it relates to clinician trust, describing an approach to wire framing and prototyping, which incorporates the use of a theoretical framework for trust in the design process itself. Moreover, we proposed that alignment of AI is dependent upon integration of end users throughout the larger design process. Our work shows the importance and value of engaging end users prior to tool development. We believe that the described approach is a unique and valuable contribution that outlines a direction for ML experts, user experience designers, and clinician end users on how to collaborate in the creation of trustworthy and usable XML-based clinical decision support tools. UR - https://formative.jmir.org/2024/1/e50475 UR - http://dx.doi.org/10.2196/50475 UR - http://www.ncbi.nlm.nih.gov/pubmed/38625728 ID - info:doi/10.2196/50475 ER - TY - JOUR AU - Hadar-Shoval, Dorit AU - Asraf, Kfir AU - Mizrachi, Yonathan AU - Haber, Yuval AU - Elyoseph, Zohar PY - 2024/4/9 TI - Assessing the Alignment of Large Language Models With Human Values for Mental Health Integration: Cross-Sectional Study Using Schwartz?s Theory of Basic Values JO - JMIR Ment Health SP - e55988 VL - 11 KW - large language models KW - LLMs KW - large language model KW - LLM KW - machine learning KW - ML KW - natural language processing KW - NLP KW - deep learning KW - ChatGPT KW - Chat-GPT KW - chatbot KW - chatbots KW - chat-bot KW - chat-bots KW - Claude KW - values KW - Bard KW - artificial intelligence KW - AI KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - mental health KW - mental illness KW - mental illnesses KW - mental disease KW - mental diseases KW - mental disorder KW - mental disorders KW - mobile health KW - mHealth KW - eHealth KW - mood disorder KW - mood disorders N2 - Background: Large language models (LLMs) hold potential for mental health applications. However, their opaque alignment processes may embed biases that shape problematic perspectives. Evaluating the values embedded within LLMs that guide their decision-making have ethical importance. Schwartz?s theory of basic values (STBV) provides a framework for quantifying cultural value orientations and has shown utility for examining values in mental health contexts, including cultural, diagnostic, and therapist-client dynamics. Objective: This study aimed to (1) evaluate whether the STBV can measure value-like constructs within leading LLMs and (2) determine whether LLMs exhibit distinct value-like patterns from humans and each other. Methods: In total, 4 LLMs (Bard, Claude 2, Generative Pretrained Transformer [GPT]-3.5, GPT-4) were anthropomorphized and instructed to complete the Portrait Values Questionnaire?Revised (PVQ-RR) to assess value-like constructs. Their responses over 10 trials were analyzed for reliability and validity. To benchmark the LLMs? value profiles, their results were compared to published data from a diverse sample of 53,472 individuals across 49 nations who had completed the PVQ-RR. This allowed us to assess whether the LLMs diverged from established human value patterns across cultural groups. Value profiles were also compared between models via statistical tests. Results: The PVQ-RR showed good reliability and validity for quantifying value-like infrastructure within the LLMs. However, substantial divergence emerged between the LLMs? value profiles and population data. The models lacked consensus and exhibited distinct motivational biases, reflecting opaque alignment processes. For example, all models prioritized universalism and self-direction, while de-emphasizing achievement, power, and security relative to humans. Successful discriminant analysis differentiated the 4 LLMs? distinct value profiles. Further examination found the biased value profiles strongly predicted the LLMs? responses when presented with mental health dilemmas requiring choosing between opposing values. This provided further validation for the models embedding distinct motivational value-like constructs that shape their decision-making. Conclusions: This study leveraged the STBV to map the motivational value-like infrastructure underpinning leading LLMs. Although the study demonstrated the STBV can effectively characterize value-like infrastructure within LLMs, substantial divergence from human values raises ethical concerns about aligning these models with mental health applications. The biases toward certain cultural value sets pose risks if integrated without proper safeguards. For example, prioritizing universalism could promote unconditional acceptance even when clinically unwise. Furthermore, the differences between the LLMs underscore the need to standardize alignment processes to capture true cultural diversity. Thus, any responsible integration of LLMs into mental health care must account for their embedded biases and motivation mismatches to ensure equitable delivery across diverse populations. Achieving this will require transparency and refinement of alignment techniques to instill comprehensive human values. UR - https://mental.jmir.org/2024/1/e55988 UR - http://dx.doi.org/10.2196/55988 UR - http://www.ncbi.nlm.nih.gov/pubmed/38593424 ID - info:doi/10.2196/55988 ER - TY - JOUR AU - Vike, L. Nicole AU - Bari, Sumra AU - Stefanopoulos, Leandros AU - Lalvani, Shamal AU - Kim, Woo Byoung AU - Maglaveras, Nicos AU - Block, Martin AU - Breiter, C. Hans AU - Katsaggelos, K. Aggelos PY - 2024/3/18 TI - Predicting COVID-19 Vaccination Uptake Using a Small and Interpretable Set of Judgment and Demographic Variables: Cross-Sectional Cognitive Science Study JO - JMIR Public Health Surveill SP - e47979 VL - 10 KW - reward KW - aversion KW - judgment KW - relative preference theory KW - cognitive science KW - behavioral economics KW - machine learning KW - balanced random forest KW - mediation KW - moderation KW - mobile phone KW - smartphone N2 - Background: Despite COVID-19 vaccine mandates, many chose to forgo vaccination, raising questions about the psychology underlying how judgment affects these choices. Research shows that reward and aversion judgments are important for vaccination choice; however, no studies have integrated such cognitive science with machine learning to predict COVID-19 vaccine uptake. Objective: This study aims to determine the predictive power of a small but interpretable set of judgment variables using 3 machine learning algorithms to predict COVID-19 vaccine uptake and interpret what profile of judgment variables was important for prediction. Methods: We surveyed 3476 adults across the United States in December 2021. Participants answered demographic, COVID-19 vaccine uptake (ie, whether participants were fully vaccinated), and COVID-19 precaution questions. Participants also completed a picture-rating task using images from the International Affective Picture System. Images were rated on a Likert-type scale to calibrate the degree of liking and disliking. Ratings were computationally modeled using relative preference theory to produce a set of graphs for each participant (minimum R2>0.8). In total, 15 judgment features were extracted from these graphs, 2 being analogous to risk and loss aversion from behavioral economics. These judgment variables, along with demographics, were compared between those who were fully vaccinated and those who were not. In total, 3 machine learning approaches (random forest, balanced random forest [BRF], and logistic regression) were used to test how well judgment, demographic, and COVID-19 precaution variables predicted vaccine uptake. Mediation and moderation were implemented to assess statistical mechanisms underlying successful prediction. Results: Age, income, marital status, employment status, ethnicity, educational level, and sex differed by vaccine uptake (Wilcoxon rank sum and chi-square P<.001). Most judgment variables also differed by vaccine uptake (Wilcoxon rank sum P<.05). A similar area under the receiver operating characteristic curve (AUROC) was achieved by the 3 machine learning frameworks, although random forest and logistic regression produced specificities between 30% and 38% (vs 74.2% for BRF), indicating a lower performance in predicting unvaccinated participants. BRF achieved high precision (87.8%) and AUROC (79%) with moderate to high accuracy (70.8%) and balanced recall (69.6%) and specificity (74.2%). It should be noted that, for BRF, the negative predictive value was <50% despite good specificity. For BRF and random forest, 63% to 75% of the feature importance came from the 15 judgment variables. Furthermore, age, income, and educational level mediated relationships between judgment variables and vaccine uptake. Conclusions: The findings demonstrate the underlying importance of judgment variables for vaccine choice and uptake, suggesting that vaccine education and messaging might target varying judgment profiles to improve uptake. These methods could also be used to aid vaccine rollouts and health care preparedness by providing location-specific details (eg, identifying areas that may experience low vaccination and high hospitalization). UR - https://publichealth.jmir.org/2024/1/e47979 UR - http://dx.doi.org/10.2196/47979 UR - http://www.ncbi.nlm.nih.gov/pubmed/38315620 ID - info:doi/10.2196/47979 ER - TY - JOUR AU - Yang, C. Phillip AU - Jha, Alokkumar AU - Xu, William AU - Song, Zitao AU - Jamp, Patrick AU - Teuteberg, J. Jeffrey PY - 2024/3/1 TI - Cloud-Based Machine Learning Platform to Predict Clinical Outcomes at Home for Patients With Cardiovascular Conditions Discharged From Hospital: Clinical Trial JO - JMIR Cardio SP - e45130 VL - 8 KW - smart sensor KW - wearable technology KW - moving average KW - physical activity KW - artificial intelligence KW - AI N2 - Background: Hospitalizations account for almost one-third of the US $4.1 trillion health care cost in the United States. A substantial portion of these hospitalizations are attributed to readmissions, which led to the establishment of the Hospital Readmissions Reduction Program (HRRP) in 2012. The HRRP reduces payments to hospitals with excess readmissions. In 2018, >US $700 million was withheld; this is expected to exceed US $1 billion by 2022. More importantly, there is nothing more physically and emotionally taxing for readmitted patients and demoralizing for hospital physicians, nurses, and administrators. Given this high uncertainty of proper home recovery, intelligent monitoring is needed to predict the outcome of discharged patients to reduce readmissions. Physical activity (PA) is one of the major determinants for overall clinical outcomes in diabetes, hypertension, hyperlipidemia, heart failure, cancer, and mental health issues. These are the exact comorbidities that increase readmission rates, underlining the importance of PA in assessing the recovery of patients by quantitative measurement beyond the questionnaire and survey methods. Objective: This study aims to develop a remote, low-cost, and cloud-based machine learning (ML) platform to enable the precision health monitoring of PA, which may fundamentally alter the delivery of home health care. To validate this technology, we conducted a clinical trial to test the ability of our platform to predict clinical outcomes in discharged patients. Methods: Our platform consists of a wearable device, which includes an accelerometer and a Bluetooth sensor, and an iPhone connected to our cloud-based ML interface to analyze PA remotely and predict clinical outcomes. This system was deployed at a skilled nursing facility where we collected >17,000 person-day data points over 2 years, generating a solid training database. We used these data to train our extreme gradient boosting (XGBoost)?based ML environment to conduct a clinical trial, Activity Assessment of Patients Discharged from Hospital-I, to test the hypothesis that a comprehensive profile of PA would predict clinical outcome. We developed an advanced data-driven analytic platform that predicts the clinical outcome based on accurate measurements of PA. Artificial intelligence or an ML algorithm was used to analyze the data to predict short-term health outcome. Results: We enrolled 52 patients discharged from Stanford Hospital. Our data demonstrated a robust predictive system to forecast health outcome in the enrolled patients based on their PA data. We achieved precise prediction of the patients? clinical outcomes with a sensitivity of 87%, a specificity of 79%, and an accuracy of 85%. Conclusions: To date, there are no reliable clinical data, using a wearable device, regarding monitoring discharged patients to predict their recovery. We conducted a clinical trial to assess outcome data rigorously to be used reliably for remote home care by patients, health care professionals, and caretakers. UR - https://cardio.jmir.org/2024/1/e45130 UR - http://dx.doi.org/10.2196/45130 UR - http://www.ncbi.nlm.nih.gov/pubmed/38427393 ID - info:doi/10.2196/45130 ER - TY - JOUR AU - Sun, Yinan AU - Kargarandehkordi, Ali AU - Slade, Christopher AU - Jaiswal, Aditi AU - Busch, Gerald AU - Guerrero, Anthony AU - Phillips, T. Kristina AU - Washington, Peter PY - 2024/2/7 TI - Personalized Deep Learning for Substance Use in Hawaii: Protocol for a Passive Sensing and Ecological Momentary Assessment Study JO - JMIR Res Protoc SP - e46493 VL - 13 KW - machine learning KW - precision health KW - Indigenous data sovereignty KW - substance use KW - personalized artificial intelligence KW - wearables KW - ecological momentary assessments KW - passive sensing KW - mobile phone N2 - Background: Artificial intelligence (AI)?powered digital therapies that detect methamphetamine cravings via consumer devices have the potential to reduce health care disparities by providing remote and accessible care solutions to communities with limited care solutions, such as Native Hawaiian, Filipino, and Pacific Islander communities. However, Native Hawaiian, Filipino, and Pacific Islander communities are understudied with respect to digital therapeutics and AI health sensing despite using technology at the same rates as other racial groups. Objective: In this study, we aimed to understand the feasibility of continuous remote digital monitoring and ecological momentary assessments in Native Hawaiian, Filipino, and Pacific Islander communities in Hawaii by curating a novel data set of longitudinal Fitbit (Fitbit Inc) biosignals with the corresponding craving and substance use labels. We also aimed to develop personalized AI models that predict methamphetamine craving events in real time using wearable sensor data. Methods: We will develop personalized AI and machine learning models for methamphetamine use and craving prediction in 40 individuals from Native Hawaiian, Filipino, and Pacific Islander communities by curating a novel data set of real-time Fitbit biosensor readings and the corresponding participant annotations (ie, raw self-reported substance use data) of their methamphetamine use and cravings. In the process of collecting this data set, we will gain insights into cultural and other human factors that can challenge the proper acquisition of precise annotations. With the resulting data set, we will use self-supervised learning AI approaches, which are a new family of machine learning methods that allows a neural network to be trained without labels by being optimized to make predictions about the data. The inputs to the proposed AI models are Fitbit biosensor readings, and the outputs are predictions of methamphetamine use or craving. This paradigm is gaining increased attention in AI for health care. Results: To date, more than 40 individuals have expressed interest in participating in the study, and we have successfully recruited our first 5 participants with minimal logistical challenges and proper compliance. Several logistical challenges that the research team has encountered so far and the related implications are discussed. Conclusions: We expect to develop models that significantly outperform traditional supervised methods by finetuning according to the data of a participant. Such methods will enable AI solutions that work with the limited data available from Native Hawaiian, Filipino, and Pacific Islander populations and that are inherently unbiased owing to their personalized nature. Such models can support future AI-powered digital therapeutics for substance abuse. International Registered Report Identifier (IRRID): DERR1-10.2196/46493 UR - https://www.researchprotocols.org/2024/1/e46493 UR - http://dx.doi.org/10.2196/46493 UR - http://www.ncbi.nlm.nih.gov/pubmed/38324375 ID - info:doi/10.2196/46493 ER - TY - JOUR AU - Elyoseph, Zohar AU - Refoua, Elad AU - Asraf, Kfir AU - Lvovsky, Maya AU - Shimoni, Yoav AU - Hadar-Shoval, Dorit PY - 2024/2/6 TI - Capacity of Generative AI to Interpret Human Emotions From Visual and Textual Data: Pilot Evaluation Study JO - JMIR Ment Health SP - e54369 VL - 11 KW - Reading the Mind in the Eyes Test KW - RMET KW - emotional awareness KW - emotional comprehension KW - emotional cue KW - emotional cues KW - ChatGPT KW - large language model KW - LLM KW - large language models KW - LLMs KW - empathy KW - mentalizing KW - mentalization KW - machine learning KW - artificial intelligence KW - AI KW - algorithm KW - algorithms KW - predictive model KW - predictive models KW - predictive analytics KW - predictive system KW - practical model KW - practical models KW - early warning KW - early detection KW - mental health KW - mental disease KW - mental illness KW - mental illnesses KW - mental diseases N2 - Background: Mentalization, which is integral to human cognitive processes, pertains to the interpretation of one?s own and others? mental states, including emotions, beliefs, and intentions. With the advent of artificial intelligence (AI) and the prominence of large language models in mental health applications, questions persist about their aptitude in emotional comprehension. The prior iteration of the large language model from OpenAI, ChatGPT-3.5, demonstrated an advanced capacity to interpret emotions from textual data, surpassing human benchmarks. Given the introduction of ChatGPT-4, with its enhanced visual processing capabilities, and considering Google Bard?s existing visual functionalities, a rigorous assessment of their proficiency in visual mentalizing is warranted. Objective: The aim of the research was to critically evaluate the capabilities of ChatGPT-4 and Google Bard with regard to their competence in discerning visual mentalizing indicators as contrasted with their textual-based mentalizing abilities. Methods: The Reading the Mind in the Eyes Test developed by Baron-Cohen and colleagues was used to assess the models? proficiency in interpreting visual emotional indicators. Simultaneously, the Levels of Emotional Awareness Scale was used to evaluate the large language models? aptitude in textual mentalizing. Collating data from both tests provided a holistic view of the mentalizing capabilities of ChatGPT-4 and Bard. Results: ChatGPT-4, displaying a pronounced ability in emotion recognition, secured scores of 26 and 27 in 2 distinct evaluations, significantly deviating from a random response paradigm (P<.001). These scores align with established benchmarks from the broader human demographic. Notably, ChatGPT-4 exhibited consistent responses, with no discernible biases pertaining to the sex of the model or the nature of the emotion. In contrast, Google Bard?s performance aligned with random response patterns, securing scores of 10 and 12 and rendering further detailed analysis redundant. In the domain of textual analysis, both ChatGPT and Bard surpassed established benchmarks from the general population, with their performances being remarkably congruent. Conclusions: ChatGPT-4 proved its efficacy in the domain of visual mentalizing, aligning closely with human performance standards. Although both models displayed commendable acumen in textual emotion interpretation, Bard?s capabilities in visual emotion interpretation necessitate further scrutiny and potential refinement. This study stresses the criticality of ethical AI development for emotional recognition, highlighting the need for inclusive data, collaboration with patients and mental health experts, and stringent governmental oversight to ensure transparency and protect patient privacy. UR - https://mental.jmir.org/2024/1/e54369 UR - http://dx.doi.org/10.2196/54369 UR - http://www.ncbi.nlm.nih.gov/pubmed/38319707 ID - info:doi/10.2196/54369 ER - TY - JOUR AU - Touzet, Yanez Alvaro AU - Rujeedawa, Tanzil AU - Munro, Colin AU - Margetis, Konstantinos AU - Davies, M. Benjamin PY - 2024/1/25 TI - Machine Learning and Symptom Patterns in Degenerative Cervical Myelopathy: Web-Based Survey Study JO - JMIR Form Res SP - e54747 VL - 8 KW - cervical KW - myelopathy KW - machine learning KW - cluster KW - clusters KW - clustering KW - spine KW - spinal KW - compression KW - neck KW - degenerative KW - k-means KW - patient reported KW - degenerative cervical myelopathy N2 - Background: Degenerative cervical myelopathy (DCM), a progressive spinal cord injury caused by spinal cord compression from degenerative pathology, often presents with neck pain, sensorimotor dysfunction in the upper or lower limbs, gait disturbance, and bladder or bowel dysfunction. Its symptomatology is very heterogeneous, making early detection as well as the measurement or understanding of the underlying factors and their consequences challenging. Increasingly, evidence suggests that DCM may consist of subgroups of the disease, which are yet to be defined. Objective: This study aimed to explore whether machine learning can identify clinically meaningful groups of patients based solely on clinical features. Methods: A survey was conducted wherein participants were asked to specify the clinical features they had experienced, their principal presenting complaint, and time to diagnosis as well as demographic information, including disease severity, age, and sex. K-means clustering was used to divide respondents into clusters according to their clinical features using the Euclidean distance measure and the Hartigan-Wong algorithm. The clinical significance of groups was subsequently explored by comparing their time to presentation, time with disease severity, and other demographics. Results: After a review of both ancillary and cluster data, it was determined by consensus that the optimal number of DCM response groups was 3. In Cluster 1, there were 40 respondents, and the ratio of male to female participants was 13:21. In Cluster 2, there were 92 respondents, with a male to female participant ratio of 27:65. Cluster 3 had 57 respondents, with a male to female participant ratio of 9:48. A total of 6 people did not report biological sex in Cluster 1. The mean age in this Cluster was 56.2 (SD 10.5) years; in Cluster 2, it was 54.7 (SD 9.63) years; and in Cluster 3, it was 51.8 (SD 8.4) years. Patients across clusters significantly differed in the total number of clinical features reported, with more clinical features in Cluster 3 and the least clinical features in Cluster 1 (Kruskal-Wallis rank sum test: ?22=159.46; P<.001). There was no relationship between the pattern of clinical features and severity. There were also no differences between clusters regarding time since diagnosis and time with DCM. Conclusions: Using machine learning and patient-reported experience, 3 groups of patients with DCM were defined, which were different in the number of clinical features but not in the severity of DCM or time with DCM. Although a clearer biological basis for the clusters may have been missed, the findings are consistent with the emerging observation that DCM is a heterogeneous disease, difficult to diagnose or stratify. There is a place for machine learning methods to efficiently assist with pattern recognition. However, the challenge lies in creating quality data sets necessary to derive benefit from such approaches. UR - https://formative.jmir.org/2024/1/e54747 UR - http://dx.doi.org/10.2196/54747 UR - http://www.ncbi.nlm.nih.gov/pubmed/38271070 ID - info:doi/10.2196/54747 ER - TY - JOUR AU - Tabja Bortesi, Pablo Juan AU - Ranisau, Jonathan AU - Di, Shuang AU - McGillion, Michael AU - Rosella, Laura AU - Johnson, Alistair AU - Devereaux, PJ AU - Petch, Jeremy PY - 2024/1/18 TI - Machine Learning Approaches for the Image-Based Identification of Surgical Wound Infections: Scoping Review JO - J Med Internet Res SP - e52880 VL - 26 KW - surgical site infection KW - machine learning KW - postoperative surveillance KW - wound imaging KW - mobile phone N2 - Background: Surgical site infections (SSIs) occur frequently and impact patients and health care systems. Remote surveillance of surgical wounds is currently limited by the need for manual assessment by clinicians. Machine learning (ML)?based methods have recently been used to address various aspects of the postoperative wound healing process and may be used to improve the scalability and cost-effectiveness of remote surgical wound assessment. Objective: The objective of this review was to provide an overview of the ML methods that have been used to identify surgical wound infections from images. Methods: We conducted a scoping review of ML approaches for visual detection of SSIs following the JBI (Joanna Briggs Institute) methodology. Reports of participants in any postoperative context focusing on identification of surgical wound infections were included. Studies that did not address SSI identification, surgical wounds, or did not use image or video data were excluded. We searched MEDLINE, Embase, CINAHL, CENTRAL, Web of Science Core Collection, IEEE Xplore, Compendex, and arXiv for relevant studies in November 2022. The records retrieved were double screened for eligibility. A data extraction tool was used to chart the relevant data, which was described narratively and presented using tables. Employment of TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) guidelines was evaluated and PROBAST (Prediction Model Risk of Bias Assessment Tool) was used to assess risk of bias (RoB). Results: In total, 10 of the 715 unique records screened met the eligibility criteria. In these studies, the clinical contexts and surgical procedures were diverse. All papers developed diagnostic models, though none performed external validation. Both traditional ML and deep learning methods were used to identify SSIs from mostly color images, and the volume of images used ranged from under 50 to thousands. Further, 10 TRIPOD items were reported in at least 4 studies, though 15 items were reported in fewer than 4 studies. PROBAST assessment led to 9 studies being identified as having an overall high RoB, with 1 study having overall unclear RoB. Conclusions: Research on the image-based identification of surgical wound infections using ML remains novel, and there is a need for standardized reporting. Limitations related to variability in image capture, model building, and data sources should be addressed in the future. UR - https://www.jmir.org/2024/1/e52880 UR - http://dx.doi.org/10.2196/52880 UR - http://www.ncbi.nlm.nih.gov/pubmed/38236623 ID - info:doi/10.2196/52880 ER - TY - JOUR AU - Tolks, Daniel AU - Schmidt, Jeremy Johannes AU - Kuhn, Sebastian PY - 2024/1/15 TI - The Role of AI in Serious Games and Gamification for Health: Scoping Review JO - JMIR Serious Games SP - e48258 VL - 12 KW - artificial intelligence KW - AI KW - games KW - serious games KW - gamification KW - health care KW - review N2 - Background: Artificial intelligence (AI) and game-based methods such as serious games or gamification are both emerging technologies and methodologies in health care. The merging of the two could provide greater advantages, particularly in the field of therapeutic interventions in medicine. Objective: This scoping review sought to generate an overview of the currently existing literature on the connection of AI and game-based approaches in health care. The primary objectives were to cluster studies by disease and health topic addressed, level of care, and AI or games technology. Methods: For this scoping review, the databases PubMed, Scopus, IEEE Xplore, Cochrane Library, and PubPsych were comprehensively searched on February 2, 2022. Two independent authors conducted the screening process using Rayyan software (Rayyan Systems Inc). Only original studies published in English since 1992 were eligible for inclusion. The studies had to involve aspects of therapy or education in medicine and the use of AI in combination with game-based approaches. Each publication was coded for basic characteristics, including the population, intervention, comparison, and outcomes (PICO) criteria; the level of evidence; the disease and health issue; the level of care; the game variant; the AI technology; and the function type. Inductive coding was used to identify the patterns, themes, and categories in the data. Individual codings were analyzed and summarized narratively. Results: A total of 16 papers met all inclusion criteria. Most of the studies (10/16, 63%) were conducted in disease rehabilitation, tackling motion impairment (eg, after stroke or trauma). Another cluster of studies (3/16, 19%) was found in the detection and rehabilitation of cognitive impairment. Machine learning was the main AI technology applied and serious games the main game-based approach used. However, direct interaction between the technologies occurred only in 3 (19%) of the 16 studies. The included studies all show very limited quality evidence. From the patients? and healthy individuals? perspective, generally high usability, motivation, and satisfaction were found. Conclusions: The review shows limited quality of evidence for the combination of AI and games in health care. Most of the included studies were nonrandomized pilot studies with few participants (14/16, 88%). This leads to a high risk for a range of biases and limits overall conclusions. However, the first results present a broad scope of possible applications, especially in motion and cognitive impairment, as well as positive perceptions by patients. In future, the development of adaptive game designs with direct interaction between AI and games seems promising and should be a topic for future reviews. UR - https://games.jmir.org/2024/1/e48258 UR - http://dx.doi.org/10.2196/48258 UR - http://www.ncbi.nlm.nih.gov/pubmed/38224472 ID - info:doi/10.2196/48258 ER - TY - JOUR AU - Hernández Guillamet, Guillem AU - Morancho Pallaruelo, Ning Ariadna AU - Miró Mezquita, Laura AU - Miralles, Ramón AU - Mas, Àngel Miquel AU - Ulldemolins Papaseit, José María AU - Estrada Cuxart, Oriol AU - López Seguí, Francesc PY - 2023/12/28 TI - Machine Learning Model for Predicting Mortality Risk in Patients With Complex Chronic Conditions: Retrospective Analysis JO - Online J Public Health Inform SP - e52782 VL - 15 KW - machine learning KW - mortality prediction KW - chronicity KW - chromic KW - complex KW - artificial intelligence KW - complexity KW - health data KW - predict KW - prediction KW - predictive KW - mortality KW - death KW - classification KW - algorithm KW - algorithms KW - mortality risk KW - risk prediction N2 - Background: The health care system is undergoing a shift toward a more patient-centered approach for individuals with chronic and complex conditions, which presents a series of challenges, such as predicting hospital needs and optimizing resources. At the same time, the exponential increase in health data availability has made it possible to apply advanced statistics and artificial intelligence techniques to develop decision-support systems and improve resource planning, diagnosis, and patient screening. These methods are key to automating the analysis of large volumes of medical data and reducing professional workloads. Objective: This article aims to present a machine learning model and a case study in a cohort of patients with highly complex conditions. The object was to predict mortality within the following 4 years and early mortality over 6 months following diagnosis. The method used easily accessible variables and health care resource utilization information. Methods: A classification algorithm was selected among 6 models implemented and evaluated using a stratified cross-validation strategy with k=10 and a 70/30 train-test split. The evaluation metrics used included accuracy, recall, precision, F1-score, and area under the receiver operating characteristic (AUROC) curve. Results: The model predicted patient death with an 87% accuracy, recall of 87%, precision of 82%, F1-score of 84%, and area under the curve (AUC) of 0.88 using the best model, the Extreme Gradient Boosting (XGBoost) classifier. The results were worse when predicting premature deaths (following 6 months) with an 83% accuracy (recall=55%, precision=64% F1-score=57%, and AUC=0.88) using the Gradient Boosting (GRBoost) classifier. Conclusions: This study showcases encouraging outcomes in forecasting mortality among patients with intricate and persistent health conditions. The employed variables are conveniently accessible, and the incorporation of health care resource utilization information of the patient, which has not been employed by current state-of-the-art approaches, displays promising predictive power. The proposed prediction model is designed to efficiently identify cases that need customized care and proactively anticipate the demand for critical resources by health care providers. UR - https://ojphi.jmir.org/2023/1/e52782 UR - http://dx.doi.org/10.2196/52782 UR - http://www.ncbi.nlm.nih.gov/pubmed/38223690 ID - info:doi/10.2196/52782 ER - TY - JOUR AU - Bougeard, Stéphanie AU - Huneau-Salaun, Adeline AU - Attia, Mikael AU - Richard, Jean-Baptiste AU - Demeret, Caroline AU - Platon, Johnny AU - Allain, Virginie AU - Le Vu, Stéphane AU - Goyard, Sophie AU - Gillon, Véronique AU - Bernard-Stoecklin, Sibylle AU - Crescenzo-Chaigne, Bernadette AU - Jones, Gabrielle AU - Rose, Nicolas AU - van der Werf, Sylvie AU - Lantz, Olivier AU - Rose, Thierry AU - Noël, Harold PY - 2023/11/28 TI - Application of Machine Learning Prediction of Individual SARS-CoV-2 Vaccination and Infection Status to the French Serosurveillance Survey From March 2020 to 2022: Cross-Sectional Study JO - JMIR Public Health Surveill SP - e46898 VL - 9 KW - SARS-CoV-2 KW - serological surveillance KW - infection KW - vaccination KW - machine learning KW - seroprevalence KW - blood testing KW - immunity KW - survey KW - vaccine response KW - French population KW - prediction N2 - Background: The seroprevalence of SARS-CoV-2 infection in the French population was estimated with a representative, repeated cross-sectional survey based on residual sera from routine blood testing. These data contained no information on infection or vaccination status, thus limiting the ability to detail changes observed in the immunity level of the population over time. Objective: Our aim is to predict the infected or vaccinated status of individuals in the French serosurveillance survey based only on the results of serological assays. Reference data on longitudinal serological profiles of seronegative, infected, and vaccinated individuals from another French cohort were used to build the predictive model. Methods: A model of individual vaccination or infection status with respect to SARS-CoV-2 obtained from a machine learning procedure was proposed based on 3 complementary serological assays. This model was applied to the French nationwide serosurveillance survey from March 2020 to March 2022 to estimate the proportions of the population that were negative, infected, vaccinated, or infected and vaccinated. Results: From February 2021 to March 2022, the estimated percentage of infected and unvaccinated individuals in France increased from 7.5% to 16.8%. During this period, the estimated percentage increased from 3.6% to 45.2% for vaccinated and uninfected individuals and from 2.1% to 29.1% for vaccinated and infected individuals. The decrease in the seronegative population can be largely attributed to vaccination. Conclusions: Combining results from the serosurveillance survey with more complete data from another longitudinal cohort completes the information retrieved from serosurveillance while keeping its protocol simple and easy to implement. UR - https://publichealth.jmir.org/2023/1/e46898 UR - http://dx.doi.org/10.2196/46898 UR - http://www.ncbi.nlm.nih.gov/pubmed/38015594 ID - info:doi/10.2196/46898 ER - TY - JOUR AU - Dou, Xuelin AU - Liu, Yang AU - Liao, Aijun AU - Zhong, Yuping AU - Fu, Rong AU - Liu, Lihong AU - Cui, Canchan AU - Wang, Xiaohong AU - Lu, Jin PY - 2023/11/2 TI - Patient Journey Toward a Diagnosis of Light Chain Amyloidosis in a National Sample: Cross-Sectional Web-Based Study JO - JMIR Form Res SP - e44420 VL - 7 KW - systemic light chain amyloidosis KW - AL amyloidosis KW - rare disease KW - big data KW - network analysis KW - machine model KW - natural language processing KW - web-based N2 - Background: Systemic light chain (AL) amyloidosis is a rare and multisystem disease associated with increased morbidity and a poor prognosis. Delayed diagnoses are common due to the heterogeneity of the symptoms. However, real-world insights from Chinese patients with AL amyloidosis have not been investigated. Objective: This study aimed to describe the journey to an AL amyloidosis diagnosis and to build an in-depth understanding of the diagnostic process from the perspective of both clinicians and patients to obtain a correct and timely diagnosis. Methods: Publicly available disease-related content from social media platforms between January 2008 and April 2021 was searched. After performing data collection steps with a machine model, a series of disease-related posts were extracted. Natural language processing was used to identify the relevance of variables, followed by further manual evaluation and analysis. Results: A total of 2204 valid posts related to AL amyloidosis were included in this study, of which 1968 were posted on haodf.com. Of these posts, 1284 were posted by men (median age 57, IQR 46-67 years); 1459 posts mentioned renal-related symptoms, followed by heart (n=833), liver (n=491), and stomach (n=368) symptoms. Furthermore, 1502 posts mentioned symptoms related to 2 or more organs. Symptoms for AL amyloidosis most frequently mentioned by suspected patients were nonspecific weakness (n=252), edema (n=196), hypertrophy (n=168), and swelling (n=140). Multiple physician visits were common, and nephrologists (n=265) and hematologists (n=214) were the most frequently visited specialists by suspected patients for initial consultation. Additionally, interhospital referrals were also commonly seen, centralizing in tertiary hospitals. Conclusions: Chinese patients with AL amyloidosis experienced referrals during their journey toward accurate diagnosis. Increasing awareness of the disease and early referral to a specialized center with expertise may reduce delayed diagnosis and improve patient management. UR - https://formative.jmir.org/2023/1/e44420 UR - http://dx.doi.org/10.2196/44420 UR - http://www.ncbi.nlm.nih.gov/pubmed/37917132 ID - info:doi/10.2196/44420 ER - TY - JOUR AU - Yang, Liuyang AU - Zhang, Ting AU - Han, Xuan AU - Yang, Jiao AU - Sun, Yanxia AU - Ma, Libing AU - Chen, Jialong AU - Li, Yanming AU - Lai, Shengjie AU - Li, Wei AU - Feng, Luzhao AU - Yang, Weizhong PY - 2023/10/17 TI - Influenza Epidemic Trend Surveillance and Prediction Based on Search Engine Data: Deep Learning Model Study JO - J Med Internet Res SP - e45085 VL - 25 KW - early warning KW - epidemic intelligence KW - infectious disease KW - influenza-like illness KW - surveillance N2 - Background: Influenza outbreaks pose a significant threat to global public health. Traditional surveillance systems and simple algorithms often struggle to predict influenza outbreaks in an accurate and timely manner. Big data and modern technology have offered new modalities for disease surveillance and prediction. Influenza-like illness can serve as a valuable surveillance tool for emerging respiratory infectious diseases like influenza and COVID-19, especially when reported case data may not fully reflect the actual epidemic curve. Objective: This study aimed to develop a predictive model for influenza outbreaks by combining Baidu search query data with traditional virological surveillance data. The goal was to improve early detection and preparedness for influenza outbreaks in both northern and southern China, providing evidence for supplementing modern intelligence epidemic surveillance methods. Methods: We collected virological data from the National Influenza Surveillance Network and Baidu search query data from January 2011 to July 2018, totaling 3,691,865 and 1,563,361 respective samples. Relevant search terms related to influenza were identified and analyzed for their correlation with influenza-positive rates using Pearson correlation analysis. A distributed lag nonlinear model was used to assess the lag correlation of the search terms with influenza activity. Subsequently, a predictive model based on the gated recurrent unit and multiple attention mechanisms was developed to forecast the influenza-positive trend. Results: This study revealed a high correlation between specific Baidu search terms and influenza-positive rates in both northern and southern China, except for 1 term. The search terms were categorized into 4 groups: essential facts on influenza, influenza symptoms, influenza treatment and medicine, and influenza prevention, all of which showed correlation with the influenza-positive rate. The influenza prevention and influenza symptom groups had a lag correlation of 1.4-3.2 and 5.0-8.0 days, respectively. The Baidu search terms could help predict the influenza-positive rate 14-22 days in advance in southern China but interfered with influenza surveillance in northern China. Conclusions: Complementing traditional disease surveillance systems with information from web-based data sources can aid in detecting warning signs of influenza outbreaks earlier. However, supplementation of modern surveillance with search engine information should be approached cautiously. This approach provides valuable insights for digital epidemiology and has the potential for broader application in respiratory infectious disease surveillance. Further research should explore the optimization and customization of search terms for different regions and languages to improve the accuracy of influenza prediction models. UR - https://www.jmir.org/2023/1/e45085 UR - http://dx.doi.org/10.2196/45085 UR - http://www.ncbi.nlm.nih.gov/pubmed/37847532 ID - info:doi/10.2196/45085 ER - TY - JOUR AU - Zhou, Weipeng AU - Prater, C. Laura AU - Goldstein, V. Evan AU - Mooney, J. Stephen PY - 2023/10/17 TI - Identifying Rare Circumstances Preceding Female Firearm Suicides: Validating A Large Language Model Approach JO - JMIR Ment Health SP - e49359 VL - 10 KW - female firearm suicide KW - large language model KW - document classification KW - suicide prevention KW - suicide KW - firearm suicide KW - machine learning KW - mental health for women KW - violent death KW - mental health KW - language models KW - women KW - female KW - depression KW - suicidal N2 - Background: Firearm suicide has been more prevalent among males, but age-adjusted female firearm suicide rates increased by 20% from 2010 to 2020, outpacing the rate increase among males by about 8 percentage points, and female firearm suicide may have different contributing circumstances. In the United States, the National Violent Death Reporting System (NVDRS) is a comprehensive source of data on violent deaths and includes unstructured incident narrative reports from coroners or medical examiners and law enforcement. Conventional natural language processing approaches have been used to identify common circumstances preceding female firearm suicide deaths but failed to identify rarer circumstances due to insufficient training data. Objective: This study aimed to leverage a large language model approach to identify infrequent circumstances preceding female firearm suicide in the unstructured coroners or medical examiners and law enforcement narrative reports available in the NVDRS. Methods: We used the narrative reports of 1462 female firearm suicide decedents in the NVDRS from 2014 to 2018. The reports were written in English. We coded 9 infrequent circumstances preceding female firearm suicides. We experimented with predicting those circumstances by leveraging a large language model approach in a yes/no question-answer format. We measured the prediction accuracy with F1-score (ranging from 0 to 1). F1-score is the harmonic mean of precision (positive predictive value) and recall (true positive rate or sensitivity). Results: Our large language model outperformed a conventional support vector machine?supervised machine learning approach by a wide margin. Compared to the support vector machine model, which had F1-scores less than 0.2 for most infrequent circumstances, our large language model approach achieved an F1-score of over 0.6 for 4 circumstances and 0.8 for 2 circumstances. Conclusions: The use of a large language model approach shows promise. Researchers interested in using natural language processing to identify infrequent circumstances in narrative report data may benefit from large language models. UR - https://mental.jmir.org/2023/1/e49359 UR - http://dx.doi.org/10.2196/49359 UR - http://www.ncbi.nlm.nih.gov/pubmed/37847549 ID - info:doi/10.2196/49359 ER - TY - JOUR AU - Li, Ziyu AU - Wu, Xiaoqian AU - Xu, Lin AU - Liu, Ming AU - Huang, Cheng PY - 2023/9/21 TI - Hot Topic Recognition of Health Rumors Based on Anti-Rumor Articles on the WeChat Official Account Platform: Topic Modeling JO - J Med Internet Res SP - e45019 VL - 25 KW - topic model KW - health rumors KW - social media KW - WeChat official account KW - content analysis KW - public health KW - machine learning KW - Twitter KW - social network KW - misinformation KW - users KW - disease KW - diet N2 - Background: Social networks have become one of the main channels for obtaining health information. However, they have also become a source of health-related misinformation, which seriously threatens the public?s physical and mental health. Governance of health-related misinformation can be implemented through topic identification of rumors on social networks. However, little attention has been paid to studying the types and routes of dissemination of health rumors on the internet, especially rumors regarding health-related information in Chinese social media. Objective: This study aims to explore the types of health-related misinformation favored by WeChat public platform users and their prevalence trends and to analyze the modeling results of the text by using the Latent Dirichlet Allocation model. Methods: We used a web crawler tool to capture health rumor?dispelling articles on WeChat rumor-dispelling public accounts. We collected information from health-debunking articles posted between January 1, 2016, and August 31, 2022. Following word segmentation of the collected text, a document topic generation model called Latent Dirichlet Allocation was used to identify and generalize the most common topics. The proportion distribution of the themes was calculated, and the negative impact of various health rumors in different periods was analyzed. Additionally, the prevalence of health rumors was analyzed by the number of health rumors generated at each time point. Results: We collected 9366 rumor-refuting articles from January 1, 2016, to August 31, 2022, from WeChat official accounts. Through topic modeling, we divided the health rumors into 8 topics, that is, rumors on prevention and treatment of infectious diseases (1284/9366, 13.71%), disease therapy and its effects (1037/9366, 11.07%), food safety (1243/9366, 13.27%), cancer and its causes (946/9366, 10.10%), regimen and disease (1540/9366, 16.44%), transmission (914/9366, 9.76%), healthy diet (1068/9366, 11.40%), and nutrition and health (1334/9366, 14.24%). Furthermore, we summarized the 8 topics under 4 themes, that is, public health, disease, diet and health, and spread of rumors. Conclusions: Our study shows that topic modeling can provide analysis and insights into health rumor governance. The rumor development trends showed that most rumors were on public health, disease, and diet and health problems. Governments still need to implement relevant and comprehensive rumor management strategies based on the rumors prevalent in their countries and formulate appropriate policies. Apart from regulating the content disseminated on social media platforms, the national quality of health education should also be improved. Governance of social networks should be clearly implemented, as these rapidly developed platforms come with privacy issues. Both disseminators and receivers of information should ensure a realistic attitude and disseminate health information correctly. In addition, we recommend that sentiment analysis?related studies be conducted to verify the impact of health rumor?related topics. UR - https://www.jmir.org/2023/1/e45019 UR - http://dx.doi.org/10.2196/45019 UR - http://www.ncbi.nlm.nih.gov/pubmed/37733396 ID - info:doi/10.2196/45019 ER - TY - JOUR AU - Lee, Ji-Soo AU - Lee, Soo-Kyoung PY - 2023/9/12 TI - Identification of Risk Groups for and Factors Affecting Metabolic Syndrome in South Korean Single-Person Households Using Latent Class Analysis and Machine Learning Techniques: Secondary Analysis Study JO - JMIR Form Res SP - e42756 VL - 7 KW - latent class analysis KW - machine learning KW - metabolic syndrome KW - risk factor KW - single-person households N2 - Background: The rapid increase of single-person households in South Korea is leading to an increase in the incidence of metabolic syndrome, which causes cardiovascular and cerebrovascular diseases, due to lifestyle changes. It is necessary to analyze the complex effects of metabolic syndrome risk factors in South Korean single-person households, which differ from one household to another, considering the diversity of single-person households. Objective: This study aimed to identify the factors affecting metabolic syndrome in single-person households using machine learning techniques and categorically characterize the risk factors through latent class analysis (LCA). Methods: This cross-sectional study included 10-year secondary data obtained from the National Health and Nutrition Examination Survey (2009-2018). We selected 1371 participants belonging to single-person households. Data were analyzed using SPSS (version 25.0; IBM Corp), Mplus (version 8.0; Muthen & Muthen), and Python (version 3.0; Plone & Python). We applied 4 machine learning algorithms (logistic regression, decision tree, random forest, and extreme gradient boost) to identify important factors and then applied LCA to categorize the risk groups of metabolic syndromes in single-person households. Results: Through LCA, participants were classified into 4 groups (group 1: intense physical activity in early adulthood, group 2: hypertension among middle-aged female respondents, group 3: smoking and drinking among middle-aged male respondents, and group 4: obesity and abdominal obesity among middle-aged respondents). In addition, age, BMI, obesity, subjective body shape recognition, alcohol consumption, smoking, binge drinking frequency, and job type were investigated as common factors that affect metabolic syndrome in single-person households through machine learning techniques. Group 4 was the most susceptible and at-risk group for metabolic syndrome (odds ratio 17.67, 95% CI 14.5-25.3; P<.001), and obesity and abdominal obesity were the most influential risk factors for metabolic syndrome. Conclusions: This study identified risk groups and factors affecting metabolic syndrome in single-person households through machine learning techniques and LCA. Through these findings, customized interventions for each generational risk factor for metabolic syndrome can be implemented, leading to the prevention of metabolic syndrome, which causes cardiovascular and cerebrovascular diseases. In conclusion, this study contributes to the prevention of metabolic syndrome in single-person households by providing new insights and priority groups for the development of customized interventions using classification. UR - https://formative.jmir.org/2023/1/e42756 UR - http://dx.doi.org/10.2196/42756 UR - http://www.ncbi.nlm.nih.gov/pubmed/37698907 ID - info:doi/10.2196/42756 ER -