TY - JOUR AU - Gyrard, Amelie AU - Abedian, Somayeh AU - Gribbon, Philip AU - Manias, George AU - van Nuland, Rick AU - Zatloukal, Kurt AU - Nicolae, Emilia Irina AU - Danciu, Gabriel AU - Nechifor, Septimiu AU - Marti-Bonmati, Luis AU - Mallol, Pedro AU - Dalmiani, Stefano AU - Autexier, Serge AU - Jendrossek, Mario AU - Avramidis, Ioannis AU - Garcia Alvarez, Eva AU - Holub, Petr AU - Blanquer, Ignacio AU - Boden, Anna AU - Hussein, Rada PY - 2025/3/24 TI - Lessons Learned From European Health Data Projects With Cancer Use Cases: Implementation of Health Standards and Internet of Things Semantic Interoperability JO - J Med Internet Res SP - e66273 VL - 27 KW - artificial intelligence KW - cancer KW - European Health Data Space KW - health care standards KW - interoperability KW - AI KW - health data KW - cancer use cases KW - IoT KW - Internet of Things KW - primary data KW - diagnosis KW - prognosis KW - decision-making UR - https://www.jmir.org/2025/1/e66273 UR - http://dx.doi.org/10.2196/66273 UR - http://www.ncbi.nlm.nih.gov/pubmed/40126534 ID - info:doi/10.2196/66273 ER - TY - JOUR AU - Miletic, Marko AU - Sariyar, Murat PY - 2025/3/20 TI - Utility-based Analysis of Statistical Approaches and Deep Learning Models for Synthetic Data Generation With Focus on Correlation Structures: Algorithm Development and Validation JO - JMIR AI SP - e65729 VL - 4 KW - synthetic data generation KW - medical data synthesis KW - random forests KW - simulation study KW - deep learning KW - propensity score mean-squared error N2 - Background: Recent advancements in Generative Adversarial Networks and large language models (LLMs) have significantly advanced the synthesis and augmentation of medical data. These and other deep learning?based methods offer promising potential for generating high-quality, realistic datasets crucial for improving machine learning applications in health care, particularly in contexts where data privacy and availability are limiting factors. However, challenges remain in accurately capturing the complex associations inherent in medical datasets. Objective: This study evaluates the effectiveness of various Synthetic Data Generation (SDG) methods in replicating the correlation structures inherent in real medical datasets. In addition, it examines their performance in downstream tasks using Random Forests (RFs) as the benchmark model. To provide a comprehensive analysis, alternative models such as eXtreme Gradient Boosting and Gated Additive Tree Ensembles are also considered. We compare the following SDG approaches: Synthetic Populations in R (synthpop), copula, copulagan, Conditional Tabular Generative Adversarial Network (ctgan), tabular variational autoencoder (tvae), and tabula for LLMs. Methods: We evaluated synthetic data generation methods using both real-world and simulated datasets. Simulated data consist of 10 Gaussian variables and one binary target variable with varying correlation structures, generated via Cholesky decomposition. Real-world datasets include the body performance dataset with 13,393 samples for fitness classification, the Wisconsin Breast Cancer dataset with 569 samples for tumor diagnosis, and the diabetes dataset with 768 samples for diabetes prediction. Data quality is evaluated by comparing correlation matrices, the propensity score mean-squared error (pMSE) for general utility, and F1-scores for downstream tasks as a specific utility metric, using training on synthetic data and testing on real data. Results: Our simulation study, supplemented with real-world data analyses, shows that the statistical methods copula and synthpop consistently outperform deep learning approaches across various sample sizes and correlation complexities, with synthpop being the most effective. Deep learning methods, including large LLMs, show mixed performance, particularly with smaller datasets or limited training epochs. LLMs often struggle to replicate numerical dependencies effectively. In contrast, methods like tvae with 10,000 epochs perform comparably well. On the body performance dataset, copulagan achieves the best performance in terms of pMSE. The results also highlight that model utility depends more on the relative correlations between features and the target variable than on the absolute magnitude of correlation matrix differences. Conclusions: Statistical methods, particularly synthpop, demonstrate superior robustness and utility preservation for synthetic tabular data compared with deep learning approaches. Copula methods show potential but face limitations with integer variables. Deep Learning methods underperform in this context. Overall, these findings underscore the dominance of statistical methods for synthetic data generation for tabular data, while highlighting the niche potential of deep learning approaches for highly complex datasets, provided adequate resources and tuning. UR - https://ai.jmir.org/2025/1/e65729 UR - http://dx.doi.org/10.2196/65729 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/65729 ER - TY - JOUR AU - Kamis, Arnold AU - Gadia, Nidhi AU - Luo, Zilin AU - Ng, Xin Shu AU - Thumbar, Mansi PY - 2024/8/29 TI - Obtaining the Most Accurate, Explainable Model for Predicting Chronic Obstructive Pulmonary Disease: Triangulation of Multiple Linear Regression and Machine Learning Methods JO - JMIR AI SP - e58455 VL - 3 KW - chronic obstructive pulmonary disease KW - COPD KW - cigarette smoking KW - ethnic and racial differences KW - machine learning KW - multiple linear regression KW - household income KW - practical model N2 - Background: Lung disease is a severe problem in the United States. Despite the decreasing rates of cigarette smoking, chronic obstructive pulmonary disease (COPD) continues to be a health burden in the United States. In this paper, we focus on COPD in the United States from 2016 to 2019. Objective: We gathered a diverse set of non?personally identifiable information from public data sources to better understand and predict COPD rates at the core-based statistical area (CBSA) level in the United States. Our objective was to compare linear models with machine learning models to obtain the most accurate and interpretable model of COPD. Methods: We integrated non?personally identifiable information from multiple Centers for Disease Control and Prevention sources and used them to analyze COPD with different types of methods. We included cigarette smoking, a well-known contributing factor, and race/ethnicity because health disparities among different races and ethnicities in the United States are also well known. The models also included the air quality index, education, employment, and economic variables. We fitted models with both multiple linear regression and machine learning methods. Results: The most accurate multiple linear regression model has variance explained of 81.1%, mean absolute error of 0.591, and symmetric mean absolute percentage error of 9.666. The most accurate machine learning model has variance explained of 85.7%, mean absolute error of 0.456, and symmetric mean absolute percentage error of 6.956. Overall, cigarette smoking and household income are the strongest predictor variables. Moderately strong predictors include education level and unemployment level, as well as American Indian or Alaska Native, Black, and Hispanic population percentages, all measured at the CBSA level. Conclusions: This research highlights the importance of using diverse data sources as well as multiple methods to understand and predict COPD. The most accurate model was a gradient boosted tree, which captured nonlinearities in a model whose accuracy is superior to the best multiple linear regression. Our interpretable models suggest ways that individual predictor variables can be used in tailored interventions aimed at decreasing COPD rates in specific demographic and ethnographic communities. Gaps in understanding the health impacts of poor air quality, particularly in relation to climate change, suggest a need for further research to design interventions and improve public health. UR - https://ai.jmir.org/2024/1/e58455 UR - http://dx.doi.org/10.2196/58455 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/58455 ER - TY - JOUR AU - Lee, Kyeryoung AU - Liu, Zongzhi AU - Mai, Yun AU - Jun, Tomi AU - Ma, Meng AU - Wang, Tongyu AU - Ai, Lei AU - Calay, Ediz AU - Oh, William AU - Stolovitzky, Gustavo AU - Schadt, Eric AU - Wang, Xiaoyan PY - 2024/7/29 TI - Optimizing Clinical Trial Eligibility Design Using Natural Language Processing Models and Real-World Data: Algorithm Development and Validation JO - JMIR AI SP - e50800 VL - 3 KW - natural language processing KW - real-world data KW - clinical trial eligibility criteria KW - eligibility criteria?specific ontology KW - clinical trial protocol optimization KW - data-driven approach N2 - Background: Clinical trials are vital for developing new therapies but can also delay drug development. Efficient trial data management, optimized trial protocol, and accurate patient identification are critical for reducing trial timelines. Natural language processing (NLP) has the potential to achieve these objectives. Objective: This study aims to assess the feasibility of using data-driven approaches to optimize clinical trial protocol design and identify eligible patients. This involves creating a comprehensive eligibility criteria knowledge base integrated within electronic health records using deep learning?based NLP techniques. Methods: We obtained data of 3281 industry-sponsored phase 2 or 3 interventional clinical trials recruiting patients with non?small cell lung cancer, prostate cancer, breast cancer, multiple myeloma, ulcerative colitis, and Crohn disease from ClinicalTrials.gov, spanning the period between 2013 and 2020. A customized bidirectional long short-term memory? and conditional random field?based NLP pipeline was used to extract all eligibility criteria attributes and convert hypernym concepts into computable hyponyms along with their corresponding values. To illustrate the simulation of clinical trial design for optimization purposes, we selected a subset of patients with non?small cell lung cancer (n=2775), curated from the Mount Sinai Health System, as a pilot study. Results: We manually annotated the clinical trial eligibility corpus (485/3281, 14.78% trials) and constructed an eligibility criteria?specific ontology. Our customized NLP pipeline, developed based on the eligibility criteria?specific ontology that we created through manual annotation, achieved high precision (0.91, range 0.67-1.00) and recall (0.79, range 0.50-1) scores, as well as a high F1-score (0.83, range 0.67-1), enabling the efficient extraction of granular criteria entities and relevant attributes from 3281 clinical trials. A standardized eligibility criteria knowledge base, compatible with electronic health records, was developed by transforming hypernym concepts into machine-interpretable hyponyms along with their corresponding values. In addition, an interface prototype demonstrated the practicality of leveraging real-world data for optimizing clinical trial protocols and identifying eligible patients. Conclusions: Our customized NLP pipeline successfully generated a standardized eligibility criteria knowledge base by transforming hypernym criteria into machine-readable hyponyms along with their corresponding values. A prototype interface integrating real-world patient information allows us to assess the impact of each eligibility criterion on the number of patients eligible for the trial. Leveraging NLP and real-world data in a data-driven approach holds promise for streamlining the overall clinical trial process, optimizing processes, and improving efficiency in patient identification. UR - https://ai.jmir.org/2024/1/e50800 UR - http://dx.doi.org/10.2196/50800 UR - http://www.ncbi.nlm.nih.gov/pubmed/39073872 ID - info:doi/10.2196/50800 ER - TY - JOUR AU - Baronetto, Annalisa AU - Graf, Luisa AU - Fischer, Sarah AU - Neurath, F. Markus AU - Amft, Oliver PY - 2024/7/10 TI - Multiscale Bowel Sound Event Spotting in Highly Imbalanced Wearable Monitoring Data: Algorithm Development and Validation Study JO - JMIR AI SP - e51118 VL - 3 KW - bowel sound KW - deep learning KW - event spotting KW - wearable sensors N2 - Background: Abdominal auscultation (i.e., listening to bowel sounds (BSs)) can be used to analyze digestion. An automated retrieval of BS would be beneficial to assess gastrointestinal disorders noninvasively. Objective: This study aims to develop a multiscale spotting model to detect BSs in continuous audio data from a wearable monitoring system. Methods: We designed a spotting model based on the Efficient-U-Net (EffUNet) architecture to analyze 10-second audio segments at a time and spot BSs with a temporal resolution of 25 ms. Evaluation data were collected across different digestive phases from 18 healthy participants and 9 patients with inflammatory bowel disease (IBD). Audio data were recorded in a daytime setting with a smart T-Shirt that embeds digital microphones. The data set was annotated by independent raters with substantial agreement (Cohen ? between 0.70 and 0.75), resulting in 136 hours of labeled data. In total, 11,482 BSs were analyzed, with a BS duration ranging between 18 ms and 6.3 seconds. The share of BSs in the data set (BS ratio) was 0.0089. We analyzed the performance depending on noise level, BS duration, and BS event rate. We also report spotting timing errors. Results: Leave-one-participant-out cross-validation of BS event spotting yielded a median F1-score of 0.73 for both healthy volunteers and patients with IBD. EffUNet detected BSs under different noise conditions with 0.73 recall and 0.72 precision. In particular, for a signal-to-noise ratio over 4 dB, more than 83% of BSs were recognized, with precision of 0.77 or more. EffUNet recall dropped below 0.60 for BS duration of 1.5 seconds or less. At a BS ratio greater than 0.05, the precision of our model was over 0.83. For both healthy participants and patients with IBD, insertion and deletion timing errors were the largest, with a total of 15.54 minutes of insertion errors and 13.08 minutes of deletion errors over the total audio data set. On our data set, EffUNet outperformed existing BS spotting models that provide similar temporal resolution. Conclusions: The EffUNet spotter is robust against background noise and can retrieve BSs with varying duration. EffUNet outperforms previous BS detection approaches in unmodified audio data, containing highly sparse BS events. UR - https://ai.jmir.org/2024/1/e51118 UR - http://dx.doi.org/10.2196/51118 UR - http://www.ncbi.nlm.nih.gov/pubmed/38985504 ID - info:doi/10.2196/51118 ER - TY - JOUR AU - Mullick, Tahsin AU - Shaaban, Sam AU - Radovic, Ana AU - Doryab, Afsaneh PY - 2024/5/20 TI - Framework for Ranking Machine Learning Predictions of Limited, Multimodal, and Longitudinal Behavioral Passive Sensing Data: Combining User-Agnostic and Personalized Modeling JO - JMIR AI SP - e47805 VL - 3 KW - machine learning KW - AI KW - artificial intelligence KW - passive sensing KW - ranking framework KW - small health data set KW - ranking KW - algorithm KW - algorithms KW - sensor KW - multimodal KW - predict KW - prediction KW - agnostic KW - framework KW - validation KW - data set N2 - Background: Passive mobile sensing provides opportunities for measuring and monitoring health status in the wild and outside of clinics. However, longitudinal, multimodal mobile sensor data can be small, noisy, and incomplete. This makes processing, modeling, and prediction of these data challenging. The small size of the data set restricts it from being modeled using complex deep learning networks. The current state of the art (SOTA) tackles small sensor data sets following a singular modeling paradigm based on traditional machine learning (ML) algorithms. These opt for either a user-agnostic modeling approach, making the model susceptible to a larger degree of noise, or a personalized approach, where training on individual data alludes to a more limited data set, giving rise to overfitting, therefore, ultimately, having to seek a trade-off by choosing 1 of the 2 modeling approaches to reach predictions. Objective: The objective of this study was to filter, rank, and output the best predictions for small, multimodal, longitudinal sensor data using a framework that is designed to tackle data sets that are limited in size (particularly targeting health studies that use passive multimodal sensors) and that combines both user agnostic and personalized approaches, along with a combination of ranking strategies to filter predictions. Methods: In this paper, we introduced a novel ranking framework for longitudinal multimodal sensors (FLMS) to address challenges encountered in health studies involving passive multimodal sensors. Using the FLMS, we (1) built a tensor-based aggregation and ranking strategy for final interpretation, (2) processed various combinations of sensor fusions, and (3) balanced user-agnostic and personalized modeling approaches with appropriate cross-validation strategies. The performance of the FLMS was validated with the help of a real data set of adolescents diagnosed with major depressive disorder for the prediction of change in depression in the adolescent participants. Results: Predictions output by the proposed FLMS achieved a 7% increase in accuracy and a 13% increase in recall for the real data set. Experiments with existing SOTA ML algorithms showed an 11% increase in accuracy for the depression data set and how overfitting and sparsity were handled. Conclusions: The FLMS aims to fill the gap that currently exists when modeling passive sensor data with a small number of data points. It achieves this through leveraging both user-agnostic and personalized modeling techniques in tandem with an effective ranking strategy to filter predictions. UR - https://ai.jmir.org/2024/1/e47805 UR - http://dx.doi.org/10.2196/47805 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875667 ID - info:doi/10.2196/47805 ER - TY - JOUR AU - Hammoud, Mohammad AU - Douglas, Shahd AU - Darmach, Mohamad AU - Alawneh, Sara AU - Sanyal, Swapnendu AU - Kanbour, Youssef PY - 2024/4/29 TI - Evaluating the Diagnostic Performance of Symptom Checkers: Clinical Vignette Study JO - JMIR AI SP - e46875 VL - 3 KW - digital health KW - symptom checker KW - artificial intelligence KW - AI KW - patient-centered care KW - eHealth apps KW - eHealth N2 - Background: Medical self-diagnostic tools (or symptom checkers) are becoming an integral part of digital health and our daily lives, whereby patients are increasingly using them to identify the underlying causes of their symptoms. As such, it is essential to rigorously investigate and comprehensively report the diagnostic performance of symptom checkers using standard clinical and scientific approaches. Objective: This study aims to evaluate and report the accuracies of a few known and new symptom checkers using a standard and transparent methodology, which allows the scientific community to cross-validate and reproduce the reported results, a step much needed in health informatics. Methods: We propose a 4-stage experimentation methodology that capitalizes on the standard clinical vignette approach to evaluate 6 symptom checkers. To this end, we developed and peer-reviewed 400 vignettes, each approved by at least 5 out of 7 independent and experienced primary care physicians. To establish a frame of reference and interpret the results of symptom checkers accordingly, we further compared the best-performing symptom checker against 3 primary care physicians with an average experience of 16.6 (SD 9.42) years. To measure accuracy, we used 7 standard metrics, including M1 as a measure of a symptom checker?s or a physician?s ability to return a vignette?s main diagnosis at the top of their differential list, F1-score as a trade-off measure between recall and precision, and Normalized Discounted Cumulative Gain (NDCG) as a measure of a differential list?s ranking quality, among others. Results: The diagnostic accuracies of the 6 tested symptom checkers vary significantly. For instance, the differences in the M1, F1-score, and NDCG results between the best-performing and worst-performing symptom checkers or ranges were 65.3%, 39.2%, and 74.2%, respectively. The same was observed among the participating human physicians, whereby the M1, F1-score, and NDCG ranges were 22.8%, 15.3%, and 21.3%, respectively. When compared against each other, physicians outperformed the best-performing symptom checker by an average of 1.2% using F1-score, whereas the best-performing symptom checker outperformed physicians by averages of 10.2% and 25.1% using M1 and NDCG, respectively. Conclusions: The performance variation between symptom checkers is substantial, suggesting that symptom checkers cannot be treated as a single entity. On a different note, the best-performing symptom checker was an artificial intelligence (AI)?based one, shedding light on the promise of AI in improving the diagnostic capabilities of symptom checkers, especially as AI keeps advancing exponentially. UR - https://ai.jmir.org/2024/1/e46875 UR - http://dx.doi.org/10.2196/46875 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875676 ID - info:doi/10.2196/46875 ER - TY - JOUR AU - Yan, Chao AU - Zhang, Ziqi AU - Nyemba, Steve AU - Li, Zhuohang PY - 2024/4/22 TI - Generating Synthetic Electronic Health Record Data Using Generative Adversarial Networks: Tutorial JO - JMIR AI SP - e52615 VL - 3 KW - synthetic data generation KW - electronic health record KW - generative neural networks KW - tutorial UR - https://ai.jmir.org/2024/1/e52615 UR - http://dx.doi.org/10.2196/52615 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875595 ID - info:doi/10.2196/52615 ER - TY - JOUR AU - Späth, Julian AU - Sewald, Zeno AU - Probul, Niklas AU - Berland, Magali AU - Almeida, Mathieu AU - Pons, Nicolas AU - Le Chatelier, Emmanuelle AU - Ginès, Pere AU - Solé, Cristina AU - Juanola, Adrià AU - Pauling, Josch AU - Baumbach, Jan PY - 2024/3/29 TI - Privacy-Preserving Federated Survival Support Vector Machines for Cross-Institutional Time-To-Event Analysis: Algorithm Development and Validation JO - JMIR AI SP - e47652 VL - 3 KW - federated learning KW - survival analysis KW - support vector machine KW - machine learning KW - federated KW - algorithm KW - survival KW - FeatureCloud KW - predict KW - predictive KW - prediction KW - predictions KW - Implementation science KW - Implementation KW - centralized model KW - privacy regulation N2 - Background: Central collection of distributed medical patient data is problematic due to strict privacy regulations. Especially in clinical environments, such as clinical time-to-event studies, large sample sizes are critical but usually not available at a single institution. It has been shown recently that federated learning, combined with privacy-enhancing technologies, is an excellent and privacy-preserving alternative to data sharing. Objective: This study aims to develop and validate a privacy-preserving, federated survival support vector machine (SVM) and make it accessible for researchers to perform cross-institutional time-to-event analyses. Methods: We extended the survival SVM algorithm to be applicable in federated environments. We further implemented it as a FeatureCloud app, enabling it to run in the federated infrastructure provided by the FeatureCloud platform. Finally, we evaluated our algorithm on 3 benchmark data sets, a large sample size synthetic data set, and a real-world microbiome data set and compared the results to the corresponding central method. Results: Our federated survival SVM produces highly similar results to the centralized model on all data sets. The maximal difference between the model weights of the central model and the federated model was only 0.001, and the mean difference over all data sets was 0.0002. We further show that by including more data in the analysis through federated learning, predictions are more accurate even in the presence of site-dependent batch effects. Conclusions: The federated survival SVM extends the palette of federated time-to-event analysis methods by a robust machine learning approach. To our knowledge, the implemented FeatureCloud app is the first publicly available implementation of a federated survival SVM, is freely accessible for all kinds of researchers, and can be directly used within the FeatureCloud platform. UR - https://ai.jmir.org/2024/1/e47652 UR - http://dx.doi.org/10.2196/47652 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875678 ID - info:doi/10.2196/47652 ER - TY - JOUR AU - Ewals, S. Lotte J. AU - Heesterbeek, J. Lynn J. AU - Yu, Bin AU - van der Wulp, Kasper AU - Mavroeidis, Dimitrios AU - Funk, Mathias AU - Snijders, P. Chris C. AU - Jacobs, Igor AU - Nederend, Joost AU - Pluyter, R. Jon AU - PY - 2024/3/13 TI - The Impact of Expectation Management and Model Transparency on Radiologists? Trust and Utilization of AI Recommendations for Lung Nodule Assessment on Computed Tomography: Simulated Use Study JO - JMIR AI SP - e52211 VL - 3 KW - application KW - artificial intelligence KW - AI KW - computer-aided detection or diagnosis KW - CAD KW - design KW - human centered KW - human computer interaction KW - HCI KW - interaction KW - mental model KW - radiologists KW - trust N2 - Background: Many promising artificial intelligence (AI) and computer-aided detection and diagnosis systems have been developed, but few have been successfully integrated into clinical practice. This is partially owing to a lack of user-centered design of AI-based computer-aided detection or diagnosis (AI-CAD) systems. Objective: We aimed to assess the impact of different onboarding tutorials and levels of AI model explainability on radiologists? trust in AI and the use of AI recommendations in lung nodule assessment on computed tomography (CT) scans. Methods: In total, 20 radiologists from 7 Dutch medical centers performed lung nodule assessment on CT scans under different conditions in a simulated use study as part of a 2×2 repeated-measures quasi-experimental design. Two types of AI onboarding tutorials (reflective vs informative) and 2 levels of AI output (black box vs explainable) were designed. The radiologists first received an onboarding tutorial that was either informative or reflective. Subsequently, each radiologist assessed 7 CT scans, first without AI recommendations. AI recommendations were shown to the radiologist, and they could adjust their initial assessment. Half of the participants received the recommendations via black box AI output and half received explainable AI output. Mental model and psychological trust were measured before onboarding, after onboarding, and after assessing the 7 CT scans. We recorded whether radiologists changed their assessment on found nodules, malignancy prediction, and follow-up advice for each CT assessment. In addition, we analyzed whether radiologists? trust in their assessments had changed based on the AI recommendations. Results: Both variations of onboarding tutorials resulted in a significantly improved mental model of the AI-CAD system (informative P=.01 and reflective P=.01). After using AI-CAD, psychological trust significantly decreased for the group with explainable AI output (P=.02). On the basis of the AI recommendations, radiologists changed the number of reported nodules in 27 of 140 assessments, malignancy prediction in 32 of 140 assessments, and follow-up advice in 12 of 140 assessments. The changes were mostly an increased number of reported nodules, a higher estimated probability of malignancy, and earlier follow-up. The radiologists? confidence in their found nodules changed in 82 of 140 assessments, in their estimated probability of malignancy in 50 of 140 assessments, and in their follow-up advice in 28 of 140 assessments. These changes were predominantly increases in confidence. The number of changed assessments and radiologists? confidence did not significantly differ between the groups that received different onboarding tutorials and AI outputs. Conclusions: Onboarding tutorials help radiologists gain a better understanding of AI-CAD and facilitate the formation of a correct mental model. If AI explanations do not consistently substantiate the probability of malignancy across patient cases, radiologists? trust in the AI-CAD system can be impaired. Radiologists? confidence in their assessments was improved by using the AI recommendations. UR - https://ai.jmir.org/2024/1/e52211 UR - http://dx.doi.org/10.2196/52211 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875574 ID - info:doi/10.2196/52211 ER - TY - JOUR AU - Brann, Felix AU - Sterling, William Nicholas AU - Frisch, O. Stephanie AU - Schrager, D. Justin PY - 2024/1/25 TI - Sepsis Prediction at Emergency Department Triage Using Natural Language Processing: Retrospective Cohort Study JO - JMIR AI SP - e49784 VL - 3 KW - natural language processing KW - machine learning KW - sepsis KW - emergency department KW - triage N2 - Background: Despite its high lethality, sepsis can be difficult to detect on initial presentation to the emergency department (ED). Machine learning?based tools may provide avenues for earlier detection and lifesaving intervention. Objective: The study aimed to predict sepsis at the time of ED triage using natural language processing of nursing triage notes and available clinical data. Methods: We constructed a retrospective cohort of all 1,234,434 consecutive ED encounters in 2015-2021 from 4 separate clinically heterogeneous academically affiliated EDs. After exclusion criteria were applied, the final cohort included 1,059,386 adult ED encounters. The primary outcome criteria for sepsis were presumed severe infection and acute organ dysfunction. After vectorization and dimensional reduction of triage notes and clinical data available at triage, a decision tree?based ensemble (time-of-triage) model was trained to predict sepsis using the training subset (n=950,921). A separate (comprehensive) model was trained using these data and laboratory data, as it became available at 1-hour intervals, after triage. Model performances were evaluated using the test (n=108,465) subset. Results: Sepsis occurred in 35,318 encounters (incidence 3.45%). For sepsis prediction at the time of patient triage, using the primary definition, the area under the receiver operating characteristic curve (AUC) and macro F1-score for sepsis were 0.94 and 0.61, respectively. Sensitivity, specificity, and false positive rate were 0.87, 0.85, and 0.15, respectively. The time-of-triage model accurately predicted sepsis in 76% (1635/2150) of sepsis cases where sepsis screening was not initiated at triage and 97.5% (1630/1671) of cases where sepsis screening was initiated at triage. Positive and negative predictive values were 0.18 and 0.99, respectively. For sepsis prediction using laboratory data available each hour after ED arrival, the AUC peaked to 0.97 at 12 hours. Similar results were obtained when stratifying by hospital and when Centers for Disease Control and Prevention hospital toolkit for adult sepsis surveillance criteria were used to define sepsis. Among septic cases, sepsis was predicted in 36.1% (1375/3814), 49.9% (1902/3814), and 68.3% (2604/3814) of encounters, respectively, at 3, 2, and 1 hours prior to the first intravenous antibiotic order or where antibiotics where not ordered within the first 12 hours. Conclusions: Sepsis can accurately be predicted at ED presentation using nursing triage notes and clinical information available at the time of triage. This indicates that machine learning can facilitate timely and reliable alerting for intervention. Free-text data can improve the performance of predictive modeling at the time of triage and throughout the ED course. UR - https://ai.jmir.org/2024/1/e49784 UR - http://dx.doi.org/10.2196/49784 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875594 ID - info:doi/10.2196/49784 ER - TY - JOUR AU - Karpathakis, Kassandra AU - Pencheon, Emma AU - Cushnan, Dominic PY - 2024/1/4 TI - Learning From International Comparators of National Medical Imaging Initiatives for AI Development: Multiphase Qualitative Study JO - JMIR AI SP - e51168 VL - 3 KW - digital health KW - mobile health KW - mHealth KW - medical imaging KW - artificial intelligence KW - health policy N2 - Background: The COVID-19 pandemic drove investment and research into medical imaging platforms to provide data to create artificial intelligence (AI) algorithms for the management of patients with COVID-19. Building on the success of England?s National COVID-19 Chest Imaging Database, the national digital policy body (NHSX) sought to create a generalized national medical imaging platform for the development, validation, and deployment of algorithms. Objective: This study aims to understand international use cases of medical imaging platforms for the development and implementation of algorithms to inform the creation of England?s national imaging platform. Methods: The National Health Service (NHS) AI Lab Policy and Strategy Team adopted a multiphased approach: (1) identification and prioritization of national AI imaging platforms; (2) Political, Economic, Social, Technological, Legal, and Environmental (PESTLE) factor analysis deep dive into national AI imaging platforms; (3) semistructured interviews with key stakeholders; (4) workshop on emerging themes and insights with the internal NHSX team; and (5) formulation of policy recommendations. Results: International use cases of national AI imaging platforms (n=7) were prioritized for PESTLE factor analysis. Stakeholders (n=13) from the international use cases were interviewed. Themes (n=8) from the semistructured interviews, including interview quotes, were analyzed with workshop participants (n=5). The outputs of the deep dives, interviews, and workshop were synthesized thematically into 8 categories with 17 subcategories. On the basis of the insights from the international use cases, policy recommendations (n=12) were developed to support the NHS AI Lab in the design and development of the English national medical imaging platform. Conclusions: The creation of AI algorithms supporting technology and infrastructure such as platforms often occurs in isolation within countries, let alone between countries. This novel policy research project sought to bridge the gap by learning from the challenges, successes, and experience of England?s international counterparts. Policy recommendations based on international learnings focused on the demonstrable benefits of the platform to secure sustainable funding, validation of algorithms and infrastructure to support in situ deployment, and creating wraparound tools for nontechnical participants such as clinicians to engage with algorithm creation. As health care organizations increasingly adopt technological solutions, policy makers have a responsibility to ensure that initiatives are informed by learnings from both national and international initiatives as well as disseminating the outcomes of their work. UR - https://ai.jmir.org/2024/1/e51168 UR - http://dx.doi.org/10.2196/51168 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/51168 ER - TY - JOUR AU - Kendale, Samir AU - Bishara, Andrew AU - Burns, Michael AU - Solomon, Stuart AU - Corriere, Matthew AU - Mathis, Michael PY - 2023/9/8 TI - Machine Learning for the Prediction of Procedural Case Durations Developed Using a Large Multicenter Database: Algorithm Development and Validation Study JO - JMIR AI SP - e44909 VL - 2 KW - medical informatics KW - artificial intelligence KW - AI KW - machine learning KW - operating room KW - OR management KW - perioperative KW - algorithm development KW - validation KW - patient communication KW - surgical procedure KW - prediction model N2 - Background: Accurate projections of procedural case durations are complex but critical to the planning of perioperative staffing, operating room resources, and patient communication. Nonlinear prediction models using machine learning methods may provide opportunities for hospitals to improve upon current estimates of procedure duration. Objective: The aim of this study was to determine whether a machine learning algorithm scalable across multiple centers could make estimations of case duration within a tolerance limit because there are substantial resources required for operating room functioning that relate to case duration. Methods: Deep learning, gradient boosting, and ensemble machine learning models were generated using perioperative data available at 3 distinct time points: the time of scheduling, the time of patient arrival to the operating or procedure room (primary model), and the time of surgical incision or procedure start. The primary outcome was procedure duration, defined by the time between the arrival and the departure of the patient from the procedure room. Model performance was assessed by mean absolute error (MAE), the proportion of predictions falling within 20% of the actual duration, and other standard metrics. Performance was compared with a baseline method of historical means within a linear regression model. Model features driving predictions were assessed using Shapley additive explanations values and permutation feature importance. Results: A total of 1,177,893 procedures from 13 academic and private hospitals between 2016 and 2019 were used. Across all procedures, the median procedure duration was 94 (IQR 50-167) minutes. In estimating the procedure duration, the gradient boosting machine was the best-performing model, demonstrating an MAE of 34 (SD 47) minutes, with 46% of the predictions falling within 20% of the actual duration in the test data set. This represented a statistically and clinically significant improvement in predictions compared with a baseline linear regression model (MAE 43 min; P<.001; 39% of the predictions falling within 20% of the actual duration). The most important features in model training were historical procedure duration by surgeon, the word ?free? within the procedure text, and the time of day. Conclusions: Nonlinear models using machine learning techniques may be used to generate high-performing, automatable, explainable, and scalable prediction models for procedure duration. UR - https://ai.jmir.org/2023/1/e44909 UR - http://dx.doi.org/10.2196/44909 UR - http://www.ncbi.nlm.nih.gov/pubmed/ ID - info:doi/10.2196/44909 ER - TY - JOUR AU - Sorbello, Alfred AU - Haque, Arefinul Syed AU - Hasan, Rashedul AU - Jermyn, Richard AU - Hussein, Ahmad AU - Vega, Alex AU - Zembrzuski, Krzysztof AU - Ripple, Anna AU - Ahadpour, Mitra PY - 2023/7/18 TI - Artificial Intelligence?Enabled Software Prototype to Inform Opioid Pharmacovigilance From Electronic Health Records: Development and Usability Study JO - JMIR AI SP - e45000 VL - 2 KW - electronic health records KW - pharmacovigilance KW - artificial intelligence KW - real world data KW - EHR KW - natural language KW - software application KW - drug KW - Food and Drug Administration KW - deep learning N2 - Background: The use of patient health and treatment information captured in structured and unstructured formats in computerized electronic health record (EHR) repositories could potentially augment the detection of safety signals for drug products regulated by the US Food and Drug Administration (FDA). Natural language processing and other artificial intelligence (AI) techniques provide novel methodologies that could be leveraged to extract clinically useful information from EHR resources. Objective: Our aim is to develop a novel AI-enabled software prototype to identify adverse drug event (ADE) safety signals from free-text discharge summaries in EHRs to enhance opioid drug safety and research activities at the FDA. Methods: We developed a prototype for web-based software that leverages keyword and trigger-phrase searching with rule-based algorithms and deep learning to extract candidate ADEs for specific opioid drugs from discharge summaries in the Medical Information Mart for Intensive Care III (MIMIC III) database. The prototype uses MedSpacy components to identify relevant sections of discharge summaries and a pretrained natural language processing (NLP) model, Spark NLP for Healthcare, for named entity recognition. Fifteen FDA staff members provided feedback on the prototype?s features and functionalities. Results: Using the prototype, we were able to identify known, labeled, opioid-related adverse drug reactions from text in EHRs. The AI-enabled model achieved accuracy, recall, precision, and F1-scores of 0.66, 0.69, 0.64, and 0.67, respectively. FDA participants assessed the prototype as highly desirable in user satisfaction, visualizations, and in the potential to support drug safety signal detection for opioid drugs from EHR data while saving time and manual effort. Actionable design recommendations included (1) enlarging the tabs and visualizations; (2) enabling more flexibility and customizations to fit end users? individual needs; (3) providing additional instructional resources; (4) adding multiple graph export functionality; and (5) adding project summaries. Conclusions: The novel prototype uses innovative AI-based techniques to automate searching for, extracting, and analyzing clinically useful information captured in unstructured text in EHRs. It increases efficiency in harnessing real-world data for opioid drug safety and increases the usability of the data to support regulatory review while decreasing the manual research burden. UR - https://ai.jmir.org/2023/1/e45000 UR - http://dx.doi.org/10.2196/45000 UR - http://www.ncbi.nlm.nih.gov/pubmed/37771410 ID - info:doi/10.2196/45000 ER - TY - JOUR AU - Naseri, Hossein AU - Skamene, Sonia AU - Tolba, Marwan AU - Faye, Daro Mame AU - Ramia, Paul AU - Khriguian, Julia AU - David, Marc AU - Kildea, John PY - 2023/5/22 TI - A Scalable Radiomics- and Natural Language Processing?Based Machine Learning Pipeline to Distinguish Between Painful and Painless Thoracic Spinal Bone Metastases: Retrospective Algorithm Development and Validation Study JO - JMIR AI SP - e44779 VL - 2 KW - cancer KW - pain KW - palliative care KW - radiotherapy KW - bone metastases KW - radiomics KW - natural language processing KW - machine learning KW - artificial intelligent KW - radiation therapy N2 - Background: The identification of objective pain biomarkers can contribute to an improved understanding of pain, as well as its prognosis and better management. Hence, it has the potential to improve the quality of life of patients with cancer. Artificial intelligence can aid in the extraction of objective pain biomarkers for patients with cancer with bone metastases (BMs). Objective: This study aimed to develop and evaluate a scalable natural language processing (NLP)? and radiomics-based machine learning pipeline to differentiate between painless and painful BM lesions in simulation computed tomography (CT) images using imaging features (biomarkers) extracted from lesion center point?based regions of interest (ROIs). Methods: Patients treated at our comprehensive cancer center who received palliative radiotherapy for thoracic spine BM between January 2016 and September 2019 were included in this retrospective study. Physician-reported pain scores were extracted automatically from radiation oncology consultation notes using an NLP pipeline. BM center points were manually pinpointed on CT images by radiation oncologists. Nested ROIs with various diameters were automatically delineated around these expert-identified BM center points, and radiomics features were extracted from each ROI. Synthetic Minority Oversampling Technique resampling, the Least Absolute Shrinkage And Selection Operator feature selection method, and various machine learning classifiers were evaluated using precision, recall, F1-score, and area under the receiver operating characteristic curve. Results: Radiation therapy consultation notes and simulation CT images of 176 patients (mean age 66, SD 14 years; 95 males) with thoracic spine BM were included in this study. After BM center point identification, 107 radiomics features were extracted from each spherical ROI using pyradiomics. Data were divided into 70% and 30% training and hold-out test sets, respectively. In the test set, the accuracy, sensitivity, specificity, and area under the receiver operating characteristic curve of our best performing model (neural network classifier on an ensemble ROI) were 0.82 (132/163), 0.59 (16/27), 0.85 (116/136), and 0.83, respectively. Conclusions: Our NLP- and radiomics-based machine learning pipeline was successful in differentiating between painful and painless BM lesions. It is intrinsically scalable by using NLP to extract pain scores from clinical notes and by requiring only center points to identify BM lesions in CT images. UR - https://ai.jmir.org/2023/1/e44779 UR - http://dx.doi.org/10.2196/44779 UR - http://www.ncbi.nlm.nih.gov/pubmed/38875572 ID - info:doi/10.2196/44779 ER -