TY - JOUR AU - Kamruzzaman, Methun AU - Heavey, Jack AU - Song, Alexander AU - Bielskas, Matthew AU - Bhattacharya, Parantapa AU - Madden, Gregory AU - Klein, Eili AU - Deng, Xinwei AU - Vullikanti, Anil PY - 2024 DA - 2024/5/16 TI - Improving Risk Prediction of Methicillin-Resistant Staphylococcus aureus Using Machine Learning Methods With Network Features: Retrospective Development Study JO - JMIR AI SP - e48067 VL - 3 KW - methicillin-resistant Staphylococcus aureus KW - network KW - machine learning KW - penalized logistic regression KW - ensemble learning KW - gradient-boosted classifier KW - random forest classifier KW - extreme boosted gradient boosted classifier KW - Shapley Additive Explanations KW - SHAP KW - health care–associated infection KW - HAI AB - Background: Health care–associated infections due to multidrug-resistant organisms (MDROs), such as methicillin-resistant Staphylococcus aureus (MRSA) and Clostridioides difficile (CDI), place a significant burden on our health care infrastructure. Objective: Screening for MDROs is an important mechanism for preventing spread but is resource intensive. The objective of this study was to develop automated tools that can predict colonization or infection risk using electronic health record (EHR) data, provide useful information to aid infection control, and guide empiric antibiotic coverage. Methods: We retrospectively developed a machine learning model to detect MRSA colonization and infection in undifferentiated patients at the time of sample collection from hospitalized patients at the University of Virginia Hospital. We used clinical and nonclinical features derived from on-admission and throughout-stay information from the patient’s EHR data to build the model. In addition, we used a class of features derived from contact networks in EHR data; these network features can capture patients’ contacts with providers and other patients, improving model interpretability and accuracy for predicting the outcome of surveillance tests for MRSA. Finally, we explored heterogeneous models for different patient subpopulations, for example, those admitted to an intensive care unit or emergency department or those with specific testing histories, which perform better. Results: We found that the penalized logistic regression performs better than other methods, and this model’s performance measured in terms of its receiver operating characteristics-area under the curve score improves by nearly 11% when we use polynomial (second-degree) transformation of the features. Some significant features in predicting MDRO risk include antibiotic use, surgery, use of devices, dialysis, patient’s comorbidity conditions, and network features. Among these, network features add the most value and improve the model’s performance by at least 15%. The penalized logistic regression model with the same transformation of features also performs better than other models for specific patient subpopulations. Conclusions: Our study shows that MRSA risk prediction can be conducted quite effectively by machine learning methods using clinical and nonclinical features derived from EHR data. Network features are the most predictive and provide significant improvement over prior methods. Furthermore, heterogeneous prediction models for different patient subpopulations enhance the model’s performance. SN - 2817-1705 UR - https://ai.jmir.org/2024/1/e48067 UR - https://doi.org/10.2196/48067 DO - 10.2196/48067 ID - info:doi/10.2196/48067 ER -