Published on in Vol 2 (2023)

This is a member publication of The University of Edinburgh, Usher Institute, Edinburgh, United Kingdom

Preprints (earlier versions) of this paper are available at https://preprints.jmir.org/preprint/46717, first published .
Machine Learning–Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review

Machine Learning–Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review

Machine Learning–Based Asthma Attack Prediction Models From Routinely Collected Electronic Health Records: Systematic Scoping Review

Review

1Asthma UK Center for Applied Research, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom

2Bioinformatics and Data Science Research Center, Bina Nusantara University, Jakarta, Indonesia

3Norwich Medical School, University of East Anglia, Norwich, United Kingdom

4Norfolk and Norwich University Hospital NHS Foundation Trust, Norwich, United Kingdom

Corresponding Author:

Arif Budiarto, MSc

Asthma UK Center for Applied Research

Usher Institute

University of Edinburgh

NINE, 9 Little France Road

Edinburgh BioQuarter

Edinburgh, EH16 4UX

United Kingdom

Phone: 44 7447900766

Email: arif.budiarto@ed.ac.uk


Background: An early warning tool to predict attacks could enhance asthma management and reduce the likelihood of serious consequences. Electronic health records (EHRs) providing access to historical data about patients with asthma coupled with machine learning (ML) provide an opportunity to develop such a tool. Several studies have developed ML-based tools to predict asthma attacks.

Objective: This study aims to critically evaluate ML-based models derived using EHRs for the prediction of asthma attacks.

Methods: We systematically searched PubMed and Scopus (the search period was between January 1, 2012, and January 31, 2023) for papers meeting the following inclusion criteria: (1) used EHR data as the main data source, (2) used asthma attack as the outcome, and (3) compared ML-based prediction models’ performance. We excluded non-English papers and nonresearch papers, such as commentary and systematic review papers. In addition, we also excluded papers that did not provide any details about the respective ML approach and its result, including protocol papers. The selected studies were then summarized across multiple dimensions including data preprocessing methods, ML algorithms, model validation, model explainability, and model implementation.

Results: Overall, 17 papers were included at the end of the selection process. There was considerable heterogeneity in how asthma attacks were defined. Of the 17 studies, 8 (47%) studies used routinely collected data both from primary care and secondary care practices together. Extreme imbalanced data was a notable issue in most studies (13/17, 76%), but only 38% (5/13) of them explicitly dealt with it in their data preprocessing pipeline. The gradient boosting–based method was the best ML method in 59% (10/17) of the studies. Of the 17 studies, 14 (82%) studies used a model explanation method to identify the most important predictors. None of the studies followed the standard reporting guidelines, and none were prospectively validated.

Conclusions: Our review indicates that this research field is still underdeveloped, given the limited body of evidence, heterogeneity of methods, lack of external validation, and suboptimally reported models. We highlighted several technical challenges (class imbalance, external validation, model explanation, and adherence to reporting guidelines to aid reproducibility) that need to be addressed to make progress toward clinical adoption.

JMIR AI 2023;2:e46717

doi:10.2196/46717

Keywords



Background

Asthma is a chronic lung illness characterized by reversible airway blockage caused by inflammation and narrowing of the small airways in the lungs that can lead to cough, wheezing, chest tightness, and breathing difficulties [1]. It is a common noncommunicable disease that affects children and adults alike. In 2019, asthma affected an estimated 262 million individuals, resulting in 461,000 fatalities [1,2]. Asthma attacks occur particularly in those with poorly controlled diseases [3]. An asthma attack is a sudden or gradual deterioration of asthma symptoms that can have a major influence on a patient’s quality of life [4]. Such attacks can be life-threatening and necessitate rapid medical attention, such as an accident and emergency department visit or hospitalization, and can even lead to mortality [5]. Asthma attacks are prevalent, with >90,000 annual hospital admissions in the United Kingdom alone [6]. Early warning tools to predict asthma attacks offer the opportunity to provide timely treatments and, thereby, minimize the risk of serious outcomes [4].

Machine learning (ML) offers the potential to develop an early warning tool that takes different risk factors as input and then outputs the probability of an adverse outcome. So far, logistic regression (LR) has been the most common approach in building an asthma attack risk prediction tool [7-9]. However, the predictive performance of this method may be inferior to more advanced ML methods, especially for relatively high-dimensional data with complex and nonlinear relationships between the variables [10,11]. The use of ML has been investigated in a wide range of medical domains by using various data such as electronic health records (EHRs), medical images, genomics data, and wearables data [12-14]. However, to the best of our knowledge, there is still no widely used ML-based asthma attack risk prediction tool in clinical practice.

Objective

Previous recent systematic reviews have discussed the choice of models used for asthma prognosis [15,16]. An ML pipeline, however, has several components besides modeling choice (eg, feature engineering [17]), which can profoundly influence the performance of the algorithms. Owing to the lack of consensus about what constitutes best practices for the application of ML for predicting asthma attacks, there is considerable heterogeneity in previous studies [15,16], thereby making direct comparisons challenging. In this scoping review, we aimed to critically examine existing studies that used ML algorithms for the prediction of asthma attacks with routinely collected EHR data. Besides data type and choice of models, we have reviewed additional ML pipeline challenges. These include customizing off-the-shelf algorithms to account for domain-specific subtleties and the need for the model to be explainable, extensively validated (externally and prospectively), and transparently reported.


Overview

The scoping review was conducted based on the 5-stage framework by Arksey and O’Malley [18]. This framework includes identifying the research question; searching and collecting relevant studies; study filtering; data charting; and finally, collating, summarizing, and reporting the results. The research questions in this scoping review were the following:

  1. What methods are commonly used in developing an asthma attack prediction model?
  2. How did the authors process the features and outcome variables?
  3. Are there any of these prediction models that have been implemented in a real-world clinical setting?

We then translated these 3 questions into the patient or population, intervention, comparison, and outcomes model [19,20], as shown in Table 1.

Table 1. The patient or population, intervention, comparison, and outcomes structure.
ItemExpansionKeywords
PPatient, populationPeople with asthma
IIntervention, prognostic factor, or exposureMachine learning
CComparison of interventionN/Aa
OOutcomeAsthma attack

aN/A: not applicable.

Search Strategy

We used the patient or population, intervention, comparison, and outcomes model in Table 1 as our framework for defining relevant keywords. This approach led us to include clinical terms associated with asthma attacks, encompassing concepts such as asthma exacerbation, asthma control, asthma management, and hospitalization. In addition, we integrated technical terminology related to ML, incorporating terms such as artificial intelligence, supervised methods, and deep learning (DL). All the keywords that we used in the search strategy can be found in Multimedia Appendix 1 [4,11,21-35]. Overall, 2 databases, PubMed and Scopus, were chosen as the sources of papers. The search period was between January 1, 2012, and January 31, 2023, and the search was limited to the title, abstract, and keywords of each paper but without any language restriction. The complete query syntax for both databases is listed in Textbox 1.

Textbox 1. Query syntax.

Scopus

  • ((TITLE-ABS-KEY(“asthma”) AND (TITLE-ABS-KEY(“management”) OR TITLE-ABS-KEY(“control”) OR TITLE-ABS-KEY(“attack”) OR TITLE-ABS-KEY(“exacerbation”) OR TITLE-ABS-KEY(“risk stratification”) OR TITLE-ABS-KEY(“risk prediction”) OR TITLE-ABS-KEY(“risk classification”) OR TITLE-ABS-KEY (hospitalization”) OR TITLE-ABS-KEY (“hospitalisation”) OR TITLE-ABS-KEY (“prognosis”))) AND (TITLE-ABS-KEY(“machine learning”) OR TITLE-ABS-KEY(“artificial intelligence”) OR TITLE-ABS-KEY(“supervised method”) OR TITLE-ABS-KEY(“unsupervised method”) OR TITLE-ABS-KEY (“deep learning”) OR TITLE-ABS-KEY (“supervised learning”) OR TITLE-ABS-KEY (“unsupervised learning”))) AND PUBYEAR > 2011

PubMed

  • ((asthma[Text Word]) AND ((Management[Text Word]) OR (Control[Text Word]) OR (Attack[Text Word]) OR (Exacerbation[Text Word]) OR (Risk Stratification[Text Word]) OR (Risk Prediction[Text Word]) OR (Risk Classification[Text Word]) OR (hospitalization[Text Word]) OR (hospitalisation[Text Word]) OR (prognosis[Text Word])) AND ((machine learning[Text Word]) OR (Artificial Intelligence[Text Word]) OR (supervised method[Text Word]) OR (unsupervised method[Text Word]) OR (deep learning[Text Word]) OR (supervised learning[Text Word]) OR (unsupervised learning[Text Word]))) AND (“2012/01/01”[Date - Publication] : “2023/01/31”[Date - Publication])

Eligibility Criteria and Study Selection

Overall, 2 authors (AB and KCHT) performed the 2-step study selection process. During the first selection step, we focused on the abstract. In the second step, we conducted a thorough reading of the full text of the manuscript. We only included papers that met our inclusion criteria: (1) used asthma attack as the outcome, (2) included an ML-based prediction model, and (3) used EHR data as the main data source. We defined the concept of EHR-derived data as structured, text-based, individual-level, and routinely collected data gathered within the health care system. In cases of unclear information extracted from the abstract, the reviewers decided to retain the studies for the next iteration (full-text review). We excluded nonresearch papers, such as commentary and systematic review papers because of the insufficient technical information. We also filtered out papers that did not provide sufficient details about the ML approach and the result, including protocol papers.

Data Extraction

From each of the eligible papers, we extracted data from the full text and web-based supplements. We then summarized these data under different categories such as data set (whether publicly available or not), population characteristics (source, size, age range, and region), year of data, outcome definition and how it was represented in the model, number of features, feature selection method, imbalance handling strategy, ML prediction methods, performance evaluation metric, evaluation result, external validation, explainability method, and real-world clinical setting implementation. The data extraction and summarization for each paper were conducted independently by 2 authors (AB and KCHT). In case of any discrepancies, the 2 authors discussed them in detail during face-to-face meetings to reach an agreement. If the 2 reviewers could not resolve the disagreement, we had a further discussion with the whole team. For each study, we have reported both the performance evaluation result of the prediction models and the most important predictors where available.


Overview

In total, 458 nonduplicated, potentially eligible papers were identified. At the end of the selection process, 3.7% (17/458) of the papers were included based on the inclusion criteria (refer to the PRISMA [Preferred Reporting Items for Systematic Reviews and Meta-Analyses] diagram in Figure 1). The earliest study that was included in the full review was published in 2018. In the abstract filtering stage, most of the studies (353/458, 77.1%) were excluded because the prediction outcome was not an asthma attack. We included 10.5% (48/458) of the studies in the full-text filtering stage. Eventually, 3.1% (14/458) of the studies were excluded because they did not meet our inclusion criteria. Then, 2.6% (12/458) nonresearch papers were also excluded. In addition, we excluded 0.9% (4/458) of the studies, which were a follow-up for 2 main papers that we included in the extraction stage. All the summary points in these follow-up studies were identical to the ones in the main studies. We also excluded 0.2% (1/458) of the studies owing to insufficient information.

Figure 1. PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) diagram. EHR: electronic health record; ML: machine learning.

Asthma Data Sets

Table 2 summarizes the basic information about each included study. Only 6% (1/17) of the studies used routinely collected data from primary care alone [21]. Of the 17 studies, only 8 (47%) used data from secondary care, and the remaining 8 (47%) used routinely collected data from both primary and secondary care. All studies used data sets hosted either at the author’s institution or their collaborators’ institution, except a study [22] that used publicly available data (the Medical Information Mart for Intensive Care III data set [36]) as one of their data sets. Overall, 76% (13/17) of the studies used only EHR data to build the prediction model. Of the 17 studies, 4 (24%) studies integrated EHR data with additional modalities, including radiology images (chest computed tomography scans) [23] and environmental data [11,24,25], aiming to enhance predictive accuracy. The study populations varied across the studies, with most of them involving adults (8/17, 47%), followed by the general population, both children and adults (5/17, 29%), and children (4/17, 24%). Of the 17 studies, 14 (82%) had study populations from the United States. The other countries studied included Japan, Sweden, and the United Kingdom. All studies incorporated >1000 samples, except a study [23] that trained the prediction model on <200 samples. Among the studies, the biggest data set had data from 397,858 patients [26].

Table 2. Summary of studies’ basic information.
Study, yearHealth care settingPublicly available data setData sourceRegionData yearSample size
Inselman et al [27], 2022Secondary careNoSingle modalityUnited States2003-20203057
Hurst et al [25], 2022BothNoMultimodalityUnited States2014-20195982
Hogan et al [28], 2022Secondary careNoSingle modalityUnited States201318,489
Zein et al [29], 2021BothNoSingle modalityUnited States2010-201860,302
Sills et al [30], 2021Secondary careNoSingle modalityUnited States2009-20139069
Hozawa et al [31], 2021Secondary careNoSingle modalityJapan2016-201742,685
Lisspers et al [32], 2021BothNoSingle modalitySweden2000-201329,396
Ananth et al [23], 2021Secondary careNoMultimodalityUnited Kingdom2018-2020200
Tong et al [33], 2021BothNoSingle modalityUnited States2011-201882,888
Mehrish et al [24], 2021Secondary careNoMultimodalityUnited States2013-201710,000
Xiang et al [4], 2020BothNoSingle modalityUnited States1992-201531,433
Cobian et al [34], 2020BothNoSingle modalityUnited States2007-201128,101
Luo et al [35], 2020BothNoSingle modalityUnited States2005-2018315,308
Roe et al [22], 2020Secondary careYesSingle modalityUnited States2001-201238,597
Luo et al [26], 2020BothNoSingle modalityUnited States2012-2018397,858
Wu et al [21], 2018Primary careNoSingle modalityUnited States1997-20024013
Patel et al [11], 2018Secondary careNoMultimodalityUnited States2012-201529,392

Data Preprocessing

There was considerable heterogeneity in the definition of the prediction outcome used in the models, including asthma exacerbation [4,25,27,29,31,32,34], asthma-related hospitalization [11,24,26,30,33,35], asthma readmission [28], asthma prevalence [24], asthma-related mortality [22], and asthma relapse [21].

The time horizon used to define the prediction outcome also varied across studies. Of the 17 studies, 6 (35%) studies defined the model task as a 1-year prediction [4,23,26,31,33,35]. Other variations in the time horizon for the outcome were 180 days [28], 90 days [34], 28 days [29], and 15 days [32]. A study compared the prediction model performances across 3 time horizons: 30, 90, and 180 days [25]. Of the 17 studies, 2 (12%) studies undertook a different approach, where the aim was to predict asthma attack–related hospitalization within 2 hours after an accident and emergency department visit [11,30]. Of the 17 studies, 3 (18%) studies did not report the prediction time horizon [21,22,24].

There was an obvious class imbalance in 76% (13/17) of the studies (Table 3). Class imbalance is a problem where the distribution of samples across the classes is skewed [37]. Ignoring this problem during model development will produce a biased model. Among the selected studies, the smallest minority class ratio accounted for as little as 0.04% [32]. Among these 17 studies, only 5 (29%) [4,21,30,32,33] explicitly mentioned their strategies to appropriately handle imbalanced data. Synthetic minority oversampling technique [38], oversampling [39,40], and undersampling [39,40] were the methods reported in these studies. The objective of these 3 methods is to balance the proportion of samples in each class by either generating synthetic data from the minority class or omitting a certain number of samples in the majority class. Of the 17 studies, only 2 (12%) studies used a balanced data set [22,23], whereas 2 (12%) other studies did not report the class ratio of their data set [24,34]. Various feature selection methods were explicitly mentioned as part of the data preprocessing step, including backward stepwise variable selection [28], light gradient boosting method feature importance [32], and Pearson correlation [32]. Of the 17 studies, 5 (29%) studies [4,26,30,33,35] implemented the feature selection process as the built-in method in the model development phase, whereas the remaining studies did not mention the feature selection method in their report. The smallest feature set used in the study was 7 variables [24], and the biggest set was >500 variables [32]. The handling of missing values varied across the studies. In most cases (9/17, 53%), missing values were treated either as a distinct category or assigned a specific value [21,23,25-27,29,32,33,35]. However, some studies opted to exclude data containing missing values [4,11,28,30], whereas others did not specify their approach for addressing this issue [22,24,31,34]. Notably, more than half of the studies (11/17, 65%) did not describe their methods for data normalization. This step is particularly critical for certain ML algorithms such as LR and support vector machine to prevent uneven weighting of features in the model. In contrast, 35% (6/17) of the studies [11,22,23,26,33,35] used a standard mean normalization technique to standardize the continuous features, ensuring uniform scaling across the data set.

Table 3. Summary of the data preprocessing step.
Study, yearOutcomesPrediction time horizonClass imbalance ratio (%)Data imbalance handling methodsFeature selection methodsNumber of features
Inselman et al [27], 2022Asthma exacerbation180 d
  • 22.60
UnknownUnknown21
Hurst et al [25], 2022Asthma exacerbation30, 90, and 180 d
  • 37
UnknownUnknown41
Hogan et al [28], 2022Asthma readmission180 d
  • 5.70
UnknownBackward stepwise variable selection21
Zein et al [29], 2021Asthma exacerbation28 d
  • Nonsevere=32.80
  • Severe=2.90
UnknownUnknown82
Sills et al [30], 2021Asthma-related hospitalizationAdmission after A&Ea department visit
  • 22.50
OversamplingAutomated feature selection13
Hozawa et al [31], 2021Asthma exacerbation365 d
  • 13.70
UnknownUnknown25
Lisspers et al [32], 2021Asthma exacerbation15 d
  • 0.04
Undersampling and weighting methodCorrelation and LGBMb model>500
Ananth et al [23], 2021Asthma exacerbation365 d
  • 50
UnknownUnknown17
Tong et al [33], 2021Asthma-related hospitalization or A&E department visit365 d
  • 1.66
WEKAcAutomated feature selection234
Mehrish et al [24], 2021Asthma prevalence, asthma-related hospitalization, or hospital readmissionUnknown
  • Unknown
UnknownUnknown7
Xiang et al [4], 2020Asthma exacerbation365 d
  • 7.20
SMOTEdAutomated feature selectionUnknown
Cobian et al [34], 2020Asthma exacerbation90 d
  • Unknown
UnknownUnknown>25
Luo et al [35], 2020Asthma-related hospitalization365 d
  • 3.59
UnknownAutomated feature selection235
Roe et al [22], 2020Asthma-related mortalityUnknown
  • 49
UnknownUnknown42
Luo et al [26], 2020Asthma-related hospitalization365 d
  • 2.30
UnknownAutomated feature selection337
Wu et al [21], 2018Asthma relapseUnknown
  • 32.89
Random undersamplingUnknown60
Patel et al [11], 2018Asthma-related hospitalizationAdmission after EDe visit
  • 17
UnknownUnknown100

aA&E: accident and emergency.

bLGBM: light gradient boosting method.

cWEKA: Waikato Environment for Knowledge Analysis.

dSMOTE: synthetic minority oversampling technique.

eED: emergency department.

ML Methods and Performance Evaluation

Table 4 describes the ML and performance evaluation methods used in the selected studies. We found a wide range of ML methods in the selected studies. Most (14/17, 82%) used conventional ML methods such as support vector machine [41], random forest [42], naïve Bayes [43], decision tree (DT) [44], K-nearest neighbor [45], and artificial neural network [46]. LR and its variations (ie, Ridge, Lasso, and Elastic Net) [47] were found to be the most common baseline model among the studies (10/15, 67%) [4,11,22-25,27-30,32,34]. Some studies developed the prediction model with more advanced ML algorithms such as gradient boosting DT (GBDT)–based methods [11,22,25-27,29,31-33,35] and DL-based methods [4,21,34]. A few studies [26,30,35] also used automated model selection tools, such as Waikato Environment for Knowledge Analysis [48] and autoML [49]. GBDT-based methods including extreme gradient boosting (XGBoost) [50] were the common best-performing models (area under the curve scores ranging from 0.6 to 0.9). The model performances in all studies were evaluated using the area under the receiver operating characteristic curve score, except in a study [21] that used F1-score as the only performance metric. Half of them (9/17, 53%) included additional evaluation metrics such as accuracy, precision, recall, sensitivity, specificity, positive predictive value, negative predictive value, F1-score, area under the precision-recall curve, and microaveraged accuracy [21,23,25-27,30,32,33,35]. Owing to different data sets and the heterogeneity in the definitions of the outcome, prediction time horizon, and preprocessing across the studies, we considered a direct comparison across studies based on the reported evaluation metric to be inappropriate. Only 18% (3/17) of the studies included external validation using retrospective studies in their analysis pipeline [21,26,33].

Table 4. Summary of machine learning (ML) methods.
Study, yearML methodsBest modelsBest performance metricsExternal validation
Inselman et al [27], 2022GLMNeta, RFb, and GBMcGBM
  • AUCd=0.74
No
Hurst et al [25], 2022Lasso, RF, and XGBoosteXGBoost
  • 30-d AUC=0.761
  • 90-d AUC=0.752
  • 180-d AUC=0.739
No
Hogan et al [28], 2022Cox proportional hazard, LRf, and ANNgANN
  • AUC=0.636
No
Zein et al [29], 2021LR, RF, and GBDThGBDT
  • Nonsevere AUC=0.71
  • Hospitalization AUC=0.85
  • EDi AUC=0.88
No
Sills et al [30], 2021AutoML, RF, and LRAutoML
  • AUC=0.914
No
Hozawa et al [31], 2021XGBoostXGBoost
  • AUC=0.656
No
Lisspers et al [32], 2021XGBoost, LGBMj, RNNk, and LR (Lasso, Ridge, and Elastic Net)XGBoost
  • AUC=0.90
No
Ananth et al [23], 2021LR, DTl, and ANNLR
  • AUC=0.802
No
Tong et al [33], 2021WEKAm and XGBoostXGBoost
  • AUC=0.902
Yes
Mehrish et al [24], 2021GLMn, correlation models, and LRLR
  • AUC=0.78
No
Xiang et al [4], 2020LR, MLPo, and LSTMp with an attention mechanismLSTM with an attention mechanism
  • AUC=0.7003
No
Cobian et al [34], 2020LR, RF, and LSTMLR with L1 (Ridge)
  • AUC=0.7697
No
Luo et al [35], 2020WEKA and XGBoostXGBoost
  • AUC=0.859
No
Roe et al [22], 2020XGBoost, NNq, LR, and KNNrXGBoost
  • AUC=0.75
No
Luo et al [26], 2020WEKA and XGBoostXGBoost
  • AUC=0.820
Yes
Wu et al [21], 2018LSTMLSTM
  • Binary classification F1-score=0.8508
  • Multiclass classification F1-score=0.4976
Yes
Patel et al [11], 2018DT, Lasso, RF, and GBDTGBDT
  • AUC=0.84
No

aGLMNet: Lasso and Elastic-Net Regularized Generalized Linear Models.

bRF: Random Forest.

cGBM: gradient boosting method.

dAUC: area under the curve.

eXGBoost: extreme gradient boosting.

fLR: logistic regression.

gANN: artificial neural network.

hGBDT: gradient boosting decision tree.

iED: emergency department.

jLGBM: light gradient boosting method.

kRNN: recurrent neural network.

lDT: decision tree.

mWEKA: Waikato Environment for Knowledge Analysis.

nGLM: Generalized Linear Model.

oMLP: multilayers perceptron.

pLSTM: long short-term memory.

qNN: neural network.

rKNN: K-nearest neighbor.

Model Explainability and Implementation

We then compared how model explainability was handled across studies. Model explainability refers to the degree of transparency and the level of detail a model can provide to offer additional information about its output, facilitating a better understanding of how the model operates [51]. We grouped the studies into 2 categories based on their best model’s transparency. In the first group, we included 18% (3/17) of the studies in which the best-performing model can be considered as a transparent model [51], including LR [23,24,34]. However, only 67% (2/3) of them provided a report on this model explanation in the form of LR coefficient values for each variable [23,34]. We grouped the remaining studies into an opaque model category where a post hoc analysis was needed to explain the model prediction mechanism [51]. In this group, all studies [4,11,22,26,28-33,35] used a model-specific method for explaining the prediction mechanism, except for 14% (2/14) of the studies [27,29] that used a model-agnostic method called the shapely additive explanation (SHAP) method [29]. Overall, 14% (2/14) of the studies in this group did not include any model explanation approach [21,25]. Although model-specific explanation methods, such as those used in DT-based models, gauge the impact of each feature on a model’s decision through specific metrics developed during training, the SHAP method takes a more comprehensive approach. SHAP conducts a deductive assessment by exploring all the potential combinations of features to determine how each one influences the final prediction.

None of the studies followed any specific reporting guidelines. Furthermore, despite promising performances in some studies, none were implemented in a real-world clinical setting for prospective evaluation. In each of the studies reviewed, various limitations were identified, encompassing both clinical and nonclinical factors. One of the common limitations in these studies was the issue of generalizing their findings to different health care settings and patient groups [22,25,26,29,33,35]. This difficulty often arose because they lacked important information such as medication histories [35], environmental factors [25,30], and social determinants of health [28], which are known to play significant roles in health outcomes. Data-related limitations were also prevalent, with some studies dealing with the drawbacks of structured EHR data [4,26,33,35], potential of data misreporting [32], and missing data that could affect the reliability of their models [29,35]. In addition, from a clinical perspective, certain studies faced limitations owing to the lack of standardized definitions for specific outcomes [11,22,23,27,28], emphasizing the importance of consistent criteria in health care research such as in asthma management. The model explanation and implementation information are summarized in Table 5. All data extraction results can be found in Multimedia Appendix 1. We have also depicted some of the important principal findings in Multimedia Appendix 2.

Table 5. Summary of model explainability and implementation.
Study, yearBest model transparencyModel explanation methodsFollow reporting guidelinesClinical implementationStudy limitations
Inselman, et al [27], 2022Opaque modelSHAPaNoNo
  • Missing relevant variables
  • Limited data about different biologics
  • Diverse primary uses of biologics
  • Heterogeneity in patient characteristics
Hurst et al [25], 2022Opaque modelNo model explanationNoNo
  • Missing relevant variables
  • Single-center study
  • Location-dependent model performance
  • Limited environmental data
Hogan et al [28], 2022Opaque modelEstimated weightsNoNo
  • Missing relevant variables
  • Lack of longitudinal outcomes
  • Use of ICD-9b (older clinical coding)
  • Hospital differentiation
  • Absence of demographic data and social determinants
Zein et al [29], 2021Opaque modelSHAPNoNo
  • Limited generalizability
  • Reliance on diagnostic codes
  • Limited clinical information
  • Exclusion of anti-IL5c therapy
  • Cross-sectional nature
  • Quality of clinical information
  • Limited PFTd and FeNOe data
  • Handling missing data
Sills et al [30], 2021Opaque modelautoML methodNoNo
  • Retrospective nature
  • Patient selection criteria
  • Limited clinical information
  • Exclusion of home and environmental factors
  • Timing of posttriage variables
Hozawa et al [31], 2021Opaque modelExtracted risk factorsNoNo
  • Age distribution discrepancy
  • Limitations of claim data
  • Prevalent user design
  • Causality estimation
Lisspers et al [32], 2021Opaque modelLGBMf gain scoreNoNo
  • Data misreporting
  • Applicability to other settings
  • High false-positive rate
  • Performance of shortlist model
Ananth et al [23], 2021Transparent modelLRg coefficientsNoNo
  • Lack of formal asthma control assessment
  • Limited longitudinal outcomes
  • Lack of comorbidity information
Tong et al [33], 2021Opaque modelXGBoosth feature importanceNoNo
  • Lack of relevant variables
  • Nonuse of deep learning and unstructured data
  • Expansion of data sources
  • Generalizability across health care systems and diseases
Mehrish et al [24], 2021Transparent modelNo model explanationNoNo
  • Lack of relevant variables
  • Limited method explanation
Xiang et al [4], 2020Opaque modelAttention mechanismNoNo
  • Absence of complex interactions among clinical variables
  • Limitations of structured EHRi data
  • Challenges in distinguishing symptoms and risk factors
  • Opportunities for model enhancement
Cobian et al [34], 2020Transparent modelLR coefficientsNoNo
  • Limited samples
Luo et al [35], 2020Opaque modelXGBoost feature importanceNoNo
  • Lack of medication claim data
  • Limitations of structured EHR data
  • Opportunities for additional features
  • Data completeness and generalizability
Roe et al [22], 2020Opaque modelXGBoost feature importanceNoNo
  • Intensive care setting exclusivity
  • Exclusion of routine intensive care features
  • Generalizability to outpatient settings
Luo et al [26], 2020Opaque modelXGBoost feature importanceNoNo
  • Potential unexplored features
  • Nonuse of deep learning and unstructured data
  • Limited generalizability assessment
Wu et al [21], 2018Opaque modelNo model explanationNoNo
  • Suboptimal neural network configuration
  • Limited scope
  • Clinical relevance and feature weighting
Patel et al [11], 2018Opaque modelGBDTj feature importanceNoNo
  • Single institution data
  • Pragmatic definition of the asthma population
  • Lack of model validation
  • Data limitations
  • Lack of weather and CDCk influenza data

aSHAP: shapely additive explanation.

bICD-9: International Classification of Diseases, Ninth Revision.

cIL-5: interleukin 5.

dPFT: Pulmonary Function Tests.

eFeNO: Fractional Exhaled Nitric Oxide.

fLGBM: light gradient boosting method.

gLR: logistic regression.

hXGBoost: extreme gradient boosting.

iEHR: electronic health record.

jGBDT: gradient boosting decision tree.

kCDC: Centers for Disease Control and Prevention.


Principal Findings

Our review indicates that this research field is still underdeveloped, given the limited body of evidence, heterogeneity of methods, lack of external validation, and suboptimally reported models. There was considerable heterogeneity in the specific definition of asthma outcome and the associated time horizon used by studies that sought to develop asthma attack risk prediction models. Class imbalance was also common across studies, and there was also considerable heterogeneity in how it was handled. Consequently, it was challenging to directly compare the studies.

The GBDT-based methods were the most reported best-performing method. DL methods such as long short-term memory (LSTM), a relatively more complex and advanced method, were also found in a few studies [4,21,34]. However, none of the studies compared the performance of the DL-based models with that of GBDT-based models. Moreover, none of the studies was prospectively evaluated or followed any reporting guidelines, and most studies (3/17, 18%) were not externally validated.

Strengths and Limitations

The key strengths of our study include undertaking a systematic and transparent approach to ensure reproducibility. Overall, 2 independent reviewers followed a clear framework during the study selection and data extraction stage. Furthermore, the interpretation of the result was supported by a multidisciplinary team consisting of both technical and clinical experts.

A further strength is that most systematic reviews about the use of ML methods in asthma research have focused on either diagnosis or classifying asthma subtypes [52-56]. Although there have been 2 previous reviews about the use of ML in predicting asthma attacks [15,16], our review is the first to focus on several key considerations in an ML pipeline, from data preprocessing to model implementation for asthma attack predictions.

However, this review also has 3 key limitations. First, this scoping review provided broad coverage of various technical challenges, but it cannot ascertain how feasible and effective an ML-based intervention can be in supporting asthma management. Second, we were not able to directly compare studies owing to the heterogeneity across studies, and that prohibited us from identifying the best algorithm or approach for solving the technical challenges highlighted in this review. Finally, this review only focused on the technical challenges without taking into account additional, crucial, sociocultural and organizational barriers to the adoption of ML-based tools in health care [57-59].

Interpretation in the Light of the Published Literature

The heterogeneity of outcome definitions found in this paper was also uncovered in previous non-ML asthma attack prognosis studies [16,60]. This heterogeneity includes both the indicators they used to define asthma attacks and the prediction time resolution. Recent systematic reviews also highlighted the wide range of outcome variations in ML-based prognostic models for ischemic stroke [61] and brain injury [62].

GBDT methods, especially XGBoost, have become a state-of-the-art method, especially for large and structured data in several domains [63-65]. Among the DL methods, LSTM has also shown potential in several previous studies [66,67]. LSTM is one of the most popular methods for analyzing complex time series data. Its capability to learn the sequence pattern makes it very powerful to build a prediction model by representing the data as a sequence of events. EHR data consist of a sequence of historical clinical events, which represent the trajectory of each patient’s condition over time. Incorporating the temporal features into the model, rather than just summarizing the events, can potentially boost the model’s performance.

Most of the studies (14/17, 82%) in this review incorporated some form of model explainability that aimed to provide an accessible explanation of how the prediction is derived by the model to instill trust in the users [68]. Previous studies in various domains showed that an ML model can output a biased prediction caused by latent characteristics within the data [69]. Model explainability is therefore crucial to provide model transparency and enhance fairness [70], especially in high-stake tasks such as those in health care [71].

Model validation and standard reporting are some of the important challenges that can influence adoption into routine practices [72]. An ML model should be internally, externally, and prospectively validated to assess its robustness in predicting new data [73]. In addition, a standard guideline needs to be followed in reporting an ML model development [74] such as the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis [75] or the Developmental and Exploratory Clinical Investigations of Decision Support Systems Driven by Artificial Intelligence [76]. It will facilitate an improved and objective understanding and interpretation of model performance. However, our review found a lack of external validation and adherence to reporting guidelines among the selected studies. These points resonated with the findings in other reviews of different cases [77,78].

Implications for Research, Policy, and Practice

This review highlighted several technical challenges that need to be addressed when developing asthma attack risk prediction algorithms. Further studies are required to develop a robust strategy for dealing with the class imbalance in asthma research. Class imbalance has been a common problem when working with EHR data [79,80]. However, there remains a notable gap in the literature regarding a systematic comparison of the effectiveness of existing methods, particularly in the context of asthma attack prediction. Several simple ML algorithms, such as linear regression, LR, and simple DTs, are easily interpretable [81]. In general, however, there is a trade-off between model interpretability and complexity, and most advanced methods are difficult to interpret, which then influences the users’ perception and understanding of the model [82]. We believe that the black box nature of the more complex methods, such as XGBoost and LSTM, is likely a technical barrier to implementing such models in a real-world clinical setting. Consequently, there is a need to continue exploring model explainability methods such as the attention mechanism approach recently developed for LSTM [83-85] that can augment complex “black box” algorithms.

There is a need for developing a global or at least a nationwide benchmark data set to facilitate external validation and to test the model’s generalizability [86]. Such validation is needed to ensure that the model will not only perform well under the data used in the model development but also can be reproduced to predict new data from different settings [87]. In addition, to maintain the transparency and reproducibility of the ML-based prediction model, adherence to a standard reporting guideline such as the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis [75] should be encouraged. Both good reproducibility and clear reporting are key points to facilitate critical assessment of the model before its implementation into routine practices. This effort is pivotal in addressing ethical concerns associated with data-driven prediction tools and in guaranteeing the safety and impartiality of the prediction [88]. Ensuring the ethical aspects of integrating a data-driven model into routine clinical practice is becoming a great challenge. This task demands substantial resources and relies on a collaborative effort involving experts from various disciplines [89].

Finally, to ensure that the ML-based model meets the requirements of the practices, a clear use case must be articulated. We found that almost all studies follow a clear clinical guideline to define asthma attacks, but there is a wide range of prediction time horizons across the studies. These variations are the result of distinct needs and goals from different practices. It is impossible to make a one-size-fits-all model. Therefore, a clear and specific clinical use case should be defined as the basis for developing an ML-based model.

Conclusions

ML model development for asthma attack prediction has been studied in recent years and includes the use of both traditional and DL methods. There is considerable heterogeneity in ML pipelines across existing studies that prohibits meaningful comparison. Our review indicates several key technical challenges that need to be tackled to make progress toward clinical implementation such as class imbalance problem, external validation, model explanation, and adherence to reporting guidelines for model reproducibility.

Acknowledgments

This paper presents independent research under the Asthma UK Centre for Applied Research (AUKCAR) funded by Asthma+Lung UK and Chief Scientist Office (CSO), Scotland (grant number: AUK-AC-2018-01). The views expressed are those of the authors and not necessarily those of Asthma+Lung UK or CSO, Scotland.

Conflicts of Interest

None declared.

Multimedia Appendix 1

List of the search keywords and full data extraction result.

XLSX File (Microsoft Excel File), 19 KB

Multimedia Appendix 2

Key findings.

PDF File (Adobe PDF File), 84 KB

Multimedia Appendix 3

Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews (PRISMA-ScR) checklist.

PDF File (Adobe PDF File), 101 KB

  1. Asthma. World Health Organization. May 04, 2023. URL: https://www.who.int/news-room/fact-sheets/detail/asthma [accessed 2023-11-28]
  2. GBD 2019 Diseases and Injuries Collaborators. Global burden of 369 diseases and injuries in 204 countries and territories, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019. Lancet. Oct 17, 2020;396(10258):1204-1222. [FREE Full text] [CrossRef] [Medline]
  3. Pocket guide for asthma management and prevention. Global Initiative for Asthma. URL: https://ginasthma.org/pocket-guide-for-asthma-management-and-prevention/ [accessed 2023-11-20]
  4. Xiang Y, Ji H, Zhou Y, Li F, Du J, Rasmy L, et al. Asthma exacerbation prediction and risk factor analysis based on a time-sensitive, attentive neural network: retrospective cohort study. J Med Internet Res. Jul 31, 2020;22(7):e16981. [FREE Full text] [CrossRef] [Medline]
  5. Wark PA, Gibson PG. Asthma exacerbations. 3: pathogenesis. Thorax. Oct 2006;61(10):909-915. [FREE Full text] [CrossRef] [Medline]
  6. Martin MJ, Beasley R, Harrison TW. Towards a personalised treatment approach for asthma attacks. Thorax. Dec 2020;75(12):1119-1129. [CrossRef] [Medline]
  7. Noble M, Burden A, Stirling S, Clark AB, Musgrave S, Alsallakh MA, et al. Predicting asthma-related crisis events using routine electronic healthcare data: a quantitative database analysis study. Br J Gen Pract. Nov 25, 2021;71(713):e948-e957. [FREE Full text] [CrossRef] [Medline]
  8. Tibble H, Tsanas A, Horne E, Horne R, Mizani M, Simpson CR, et al. Predicting asthma attacks in primary care: protocol for developing a machine learning-based prediction model. BMJ Open. Jul 09, 2019;9(7):e028375. [FREE Full text] [CrossRef] [Medline]
  9. Hussain Z, Shah SA, Mukherjee M, Sheikh A. Predicting the risk of asthma attacks in children, adolescents and adults: protocol for a machine learning algorithm derived from a primary care-based retrospective cohort. BMJ Open. Jul 23, 2020;10(7):e036099. [FREE Full text] [CrossRef] [Medline]
  10. Bose S, Kenyon CC, Masino AJ. Personalized prediction of early childhood asthma persistence: a machine learning approach. PLoS One. Mar 1, 2021;16(3):e0247784. [FREE Full text] [CrossRef] [Medline]
  11. Patel SJ, Chamberlain DB, Chamberlain JM. A machine learning approach to predicting need for hospitalization for pediatric asthma exacerbation at the time of emergency department triage. Acad Emerg Med. Dec 2018;25(12):1463-1470. [FREE Full text] [CrossRef] [Medline]
  12. Shickel B, Tighe PJ, Bihorac A, Rashidi P. Deep EHR: a survey of recent advances in deep learning techniques for electronic health record (EHR) analysis. IEEE J Biomed Health Inform. Sep 2018;22(5):1589-1604. [FREE Full text] [CrossRef] [Medline]
  13. Giger ML. Machine learning in medical imaging. J Am Coll Radiol. Mar 2018;15(3 Pt B):512-520. [CrossRef] [Medline]
  14. Yu KH, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat Biomed Eng. Oct 2018;2(10):719-731. [CrossRef] [Medline]
  15. Alharbi ET, Nadeem F, Cherif A. Predictive models for personalized asthma attacks based on patient's biosignals and environmental factors: a systematic review. BMC Med Inform Decis Mak. Dec 09, 2021;21(1):345. [FREE Full text] [CrossRef] [Medline]
  16. Bridge J, Blakey JD, Bonnett LJ. A systematic review of methodology used in the development of prediction models for future asthma exacerbation. BMC Med Res Methodol. Feb 05, 2020;20(1):22. [FREE Full text] [CrossRef] [Medline]
  17. Duboue P. The Art of Feature Engineering: Essentials for Machine Learning. Cambridge, United Kingdom. Cambridge University Press; 2020.
  18. Arksey H, O'Malley L. Scoping studies: towards a methodological framework. Int J Soc Res Methodol. 2005;8(1):19-32. [CrossRef]
  19. Eriksen MB, Frandsen TF. The impact of patient, intervention, comparison, outcome (PICO) as a search strategy tool on literature search quality: a systematic review. J Med Libr Assoc. Oct 2018;106(4):420-431. [FREE Full text] [CrossRef] [Medline]
  20. Leonardo R. PICO: model for clinical questions. Evid Based Med. 2018;4(1):1-2. [CrossRef]
  21. Wu S, Liu S, Sohn S, Moon S, Wi CI, Juhn Y, et al. Modeling asynchronous event sequences with RNNs. J Biomed Inform. Jul 2018;83:167-177. [FREE Full text] [CrossRef] [Medline]
  22. Roe KD, Jawa V, Zhang X, Chute CG, Epstein JA, Matelsky J, et al. Feature engineering with clinical expert knowledge: a case study assessment of machine learning model complexity and performance. PLoS One. Apr 23, 2020;15(4):e0231300. [FREE Full text] [CrossRef] [Medline]
  23. Ananth S, Navarra A, Vancheeswaran R. S1 obese, non-eosinophilic asthma: frequent exacerbators in a real-world setting. Thorax. 2021;76:A5-A6. [CrossRef]
  24. Mehrish D, Sairamesh J, Hasson L, Sharma M. Combining weather and pollution indicators with insurance claims for identifying and predicting asthma prevalence and hospitalizations. Presented at: 4th International Conference on Human Interaction and Emerging Technologies: Future Applications (IHIET – AI 2021); April 28-30, 2021, 2021; Strasbourg, France. [CrossRef]
  25. Hurst JH, Zhao C, Hostetler HP, Ghiasi Gorveh M, Lang JE, Goldstein BA. Environmental and clinical data utility in pediatric asthma exacerbation risk prediction models. BMC Med Inform Decis Mak. Apr 22, 2022;22(1):108. [FREE Full text] [CrossRef] [Medline]
  26. Luo G, Nau CL, Crawford WW, Schatz M, Zeiger RS, Rozema E, et al. Developing a predictive model for asthma-related hospital encounters in patients with asthma in a large, integrated health care system: secondary analysis. JMIR Med Inform. Nov 09, 2020;8(11):e22689. [FREE Full text] [CrossRef] [Medline]
  27. Inselman JW, Jeffery MM, Maddux JT, Lam RW, Shah ND, Rank MA, et al. A prediction model for asthma exacerbations after stopping asthma biologics. Ann Allergy Asthma Immunol. Mar 2023;130(3):305-311. [CrossRef] [Medline]
  28. Hogan AH, Brimacombe M, Mosha M, Flores G. Comparing artificial intelligence and traditional methods to identify factors associated with pediatric asthma readmission. Acad Pediatr. 2022;22(1):55-61. [CrossRef] [Medline]
  29. Zein JG, Wu CP, Attaway AH, Zhang P, Nazha A. Novel machine learning can predict acute asthma exacerbation. Chest. May 2021;159(5):1747-1757. [FREE Full text] [CrossRef] [Medline]
  30. Sills MR, Ozkaynak M, Jang H. Predicting hospitalization of pediatric asthma patients in emergency departments using machine learning. Int J Med Inform. Jul 2021;151:104468. [CrossRef] [Medline]
  31. Hozawa S, Maeda S, Kikuchi A, Koinuma M. Exploratory research on asthma exacerbation risk factors using the Japanese claims database and machine learning: a retrospective cohort study. J Asthma. Jul 2022;59(7):1328-1337. [CrossRef] [Medline]
  32. Lisspers K, Ställberg B, Larsson K, Janson C, Müller M, Łuczko M, et al. Developing a short-term prediction model for asthma exacerbations from Swedish primary care patients' data using machine learning - based on the ARCTIC study. Respir Med. 2021;185:106483. [FREE Full text] [CrossRef] [Medline]
  33. Tong Y, Messinger AI, Wilcox AB, Mooney SD, Davidson GH, Suri P, et al. Forecasting future asthma hospital encounters of patients with asthma in an academic health care system: predictive model development and secondary analysis study. J Med Internet Res. Apr 16, 2021;23(4):e22796. [FREE Full text] [CrossRef] [Medline]
  34. Cobian A, Abbott M, Sood A, Sverchkov Y, Hanrahan L, Guilbert T, et al. Modeling asthma exacerbations from electronic health records. AMIA Jt Summits Transl Sci Proc. May 30, 2020;2020:98-107. [FREE Full text] [Medline]
  35. Luo G, He S, Stone BL, Nkoy FL, Johnson MD. Developing a model to predict hospital encounters for asthma in asthmatic patients: secondary analysis. JMIR Med Inform. Jan 21, 2020;8(1):e16080. [FREE Full text] [CrossRef] [Medline]
  36. Johnson AE, Pollard TJ, Shen L, Lehman LW, Feng M, Ghassemi M, et al. MIMIC-III, a freely accessible critical care database. Sci Data. May 24, 2016;3:160035. [FREE Full text] [CrossRef] [Medline]
  37. Sun Y, Wong AK, Kamel MS. Classification of imbalanced data: a review. Int J Pattern Recognit Artif Intell. 2009;23(04):687-719. [CrossRef]
  38. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res. 2002;16:321-357. [CrossRef]
  39. Spelmen VS, Porkodi R. A review on handling imbalanced data. Presented at: 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT); March 1-3, 2018, 2018; Coimbatore, India. [CrossRef]
  40. Mohammed R, Rawashdeh J, Abdullah M. Machine learning with oversampling and undersampling techniques: overview study and experimental results. Presented at: 11th International Conference on Information and Communication Systems (ICICS); April 7-9, 2020, 2020; Irbid, Jordan. [CrossRef]
  41. Brereton RG, Lloyd GR. Support vector machines for classification and regression. Analyst. Feb 2010;135(2):230-267. [CrossRef] [Medline]
  42. Xu M, Tantisira KG, Wu A, Litonjua AA, Chu JH, Himes BE, et al. Genome wide association study to predict severe asthma exacerbations in children using random forests classifiers. BMC Med Genet. Jun 30, 2011;12:90. [FREE Full text] [CrossRef] [Medline]
  43. Ren J, Lee SD, Chen X, Kao B, Cheng R, Cheung D. Naive Bayes classification of uncertain data. Presented at: 2009 Ninth IEEE International Conference on Data Mining; December 6-9, 2009, 2009; Miami Beach, FL. [CrossRef]
  44. Song YY, Lu Y. Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry. Apr 25, 2015;27(2):130-135. [FREE Full text] [CrossRef] [Medline]
  45. Guo G, Wang H, Bell D, Bi Y, Greer K. KNN model-based approach in classification. Presented at: OTM Confederated International Conferences CoopIS, DOA, and ODBASE 2003; November 3-7, 2003, 2003; Sicily, Italy. [CrossRef]
  46. Wu YC, Feng JW. Development and application of artificial neural network. Wireless Pers Commun. Dec 30, 2017;102:1645-1656. [CrossRef]
  47. Liang X, Jacobucci R. Regularized structural equation modeling to detect measurement bias: evaluation of lasso, adaptive lasso, and elastic net. Struct Equ Modeling Multidiscip J. Dec 12, 2019;27(5):722-734. [FREE Full text] [CrossRef]
  48. Holmes G, Donkin A, Witten IH. WEKA: a machine learning workbench. Presented at: ANZIIS '94 - Australian New Zealand Intelligent Information Systems Conference; November 29-December 2, 1994, 1994; Brisbane, Australia. [CrossRef]
  49. He X, Zhao K, Chu X. AutoML: a survey of the state-of-the-art. Knowl Based Syst. Jan 05, 2021;212:106622. [CrossRef]
  50. Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Presented at: 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; August 13-17, 2016, 2016; San Francisco, CA. [CrossRef]
  51. Barredo Arrieta A, Díaz-Rodríguez N, Del Ser J, Bennetot A, Tabik S, Barbado A, et al. Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion. Jun 2020;58:82-115. [CrossRef]
  52. Daines L, McLean S, Buelo A, Lewis S, Sheikh A, Pinnock H. Systematic review of clinical prediction models to support the diagnosis of asthma in primary care. NPJ Prim Care Respir Med. May 09, 2019;29(1):19. [FREE Full text] [CrossRef] [Medline]
  53. Loymans RJ, Debray TP, Honkoop PJ, Termeer EH, Snoeck-Stroband JB, Schermer TR, et al. Exacerbations in adults with asthma: a systematic review and external validation of prediction models. J Allergy Clin Immunol Pract. 2018;6(6):1942-52.e15. [FREE Full text] [CrossRef] [Medline]
  54. Luo G, Nkoy FL, Stone BL, Schmick D, Johnson MD. A systematic review of predictive models for asthma development in children. BMC Med Inform Decis Mak. Nov 28, 2015;15:99. [FREE Full text] [CrossRef] [Medline]
  55. Smit HA, Pinart M, Antó JM, Keil T, Bousquet J, Carlsen KH, et al. Childhood asthma prediction models: a systematic review. Lancet Respir Med. Dec 2015;3(12):973-984. [CrossRef] [Medline]
  56. Exarchos KP, Beltsiou M, Votti CA, Kostikas K. Artificial intelligence techniques in asthma: a systematic review and critical appraisal of the existing literature. Eur Respir J. Sep 3, 2020;56(3):2000521. [FREE Full text] [CrossRef] [Medline]
  57. Watson J, Hutyra CA, Clancy SM, Chandiramani A, Bedoya A, Ilangovan K, et al. Overcoming barriers to the adoption and implementation of predictive modeling and machine learning in clinical care: what can we learn from US academic medical centers? JAMIA Open. Jul 2020;3(2):167-172. [CrossRef]
  58. Morrison K. Artificial intelligence and the NHS: a qualitative exploration of the factors influencing adoption. Future Healthc J. Sep 02, 2021;8(3):e648-e654. [CrossRef]
  59. Pumplun L, Fecho M, Wahl N, Peters F, Buxmann P. Adoption of machine learning systems for medical diagnostics in clinics: qualitative interview study. J Med Internet Res. Oct 15, 2021;23(10):e29301. [CrossRef]
  60. Alharbi F, Atkins A, Stanier C. Understanding the determinants of cloud computing adoption in Saudi healthcare organisations. Complex Intell Syst. Jul 13, 2016;2(3):155-171. [CrossRef]
  61. Zeng M, Oakden-Rayner L, Bird A, Smith L, Wu Z, Scroop R, et al. Pre-thrombectomy prognostic prediction of large-vessel ischemic stroke using machine learning: a systematic review and meta-analysis. Front Neurol. Sep 8, 2022;13:945813. [CrossRef]
  62. Cerasa A, Tartarisco G, Bruschetta R, Ciancarelli I, Morone G, Calabrò RS, et al. Predicting outcome in patients with brain injury: differences between machine learning versus conventional statistics. Biomedicines. Sep 13, 2022;10(9):2267. [CrossRef]
  63. Stemerman R, Arguello J, Brice J, Krishnamurthy A, Houston M, Kitzmiller R. Identification of social determinants of health using multi-label classification of electronic health record clinical notes. JAMIA Open. Feb 9, 2021;4(3):ooaa069. [CrossRef]
  64. Pan P, Li Y, Xiao Y, Han B, Su L, Su M, et al. Prognostic assessment of COVID-19 in the intensive care unit by machine learning methods: model development and validation. J Med Internet Res. Nov 11, 2020;22(11):e23128. [CrossRef]
  65. Muro S, Ishida M, Horie Y, Takeuchi W, Nakagawa S, Ban H, et al. Machine learning methods for the diagnosis of chronic obstructive pulmonary disease in healthy subjects: retrospective observational cohort study. JMIR Med Inform. Jul 6, 2021;9(7):e24796. [CrossRef] [Medline]
  66. Dong X, Deng J, Rashidian S, Abell-Hart K, Hou W, Rosenthal RN, et al. Identifying risk of opioid use disorder for patients taking opioid medications with deep learning. J Am Med Inform Assoc. Jul 30, 2021;28(8):1683-1693. [FREE Full text] [CrossRef] [Medline]
  67. Zhao J, Feng Q, Wu P, Lupu RA, Wilke RA, Wells QS, et al. Learning from longitudinal data in electronic health record and genetic data to improve cardiovascular event prediction. Sci Rep. Jan 24, 2019;9:717. [CrossRef]
  68. Petch J, Di S, Nelson W. Opening the black box: the promise and limitations of explainable machine learning in cardiology. Can J Cardiol. Feb 2022;38(2):204-213. [CrossRef]
  69. Rudin C, Radin J. Why are we using black box models in AI when we don’t need to? A lesson from an explainable AI competition. Harvard Data Sci Rev. Nov 01, 2019;1(2) [CrossRef]
  70. Rasheed K, Qayyum A, Ghaly M, Al-Fuqaha A, Razi A, Qadir J. Explainable, trustworthy, and ethical machine learning for healthcare: a survey. Comput Biol Med. Oct 2022;149:106043. [CrossRef]
  71. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat Mach Intell. May 13, 2019;1(5):206-215. [CrossRef]
  72. Verma AA, Murray J, Greiner R, Cohen JP, Shojania KG, Ghassemi M, et al. Implementing machine learning in medicine. Can Med Assoc J. Aug 29, 2021;193(34):E1351-E1357. [CrossRef]
  73. Cabitza F, Campagner A, Soares F, García de Guadiana-Romualdo L, Challa F, Sulejmani A, et al. The importance of being external. methodological insights for the external validation of machine learning models in medicine. Comput Methods Programs Biomed. Sep 2021;208:106288. [CrossRef]
  74. Stevens LM, Mortazavi BJ, Deo RC, Curtis L, Kao DP. Recommendations for reporting machine learning analyses in clinical research. Circ Cardiovasc Qual Outcomes. Oct 14, 2020;13(10) [CrossRef]
  75. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD Statement. BMC Med. 2015;13(1):1. [CrossRef]
  76. Vasey B, Nagendran M, Campbell B, Clifton DA, Collins GS, Denaxas S, et al. Reporting guideline for the early stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI. BMJ. May 18, 2022;377:e070904. [FREE Full text] [CrossRef] [Medline]
  77. Wynants L, Van Calster B, Collins GS, Riley RD, Heinze G, Schuit E, et al. Prediction models for diagnosis and prognosis of covid-19: systematic review and critical appraisal. BMJ. Apr 07, 2020;369:m1328. [CrossRef] [Medline]
  78. Tsang KC, Pinnock H, Wilson AM, Shah SA. Application of machine learning algorithms for asthma management with mHealth: a clinical review. J Asthma Allergy. Jun 2022;Volume 15:855-873. [CrossRef]
  79. Santiso S, Casillas A, Pérez A. The class imbalance problem detecting adverse drug reactions in electronic health records. Health Informatics J. Sep 19, 2018;25(4):1768-1778. [CrossRef]
  80. Tasci E, Zhuge Y, Camphausen K, Krauze AV. Bias and class imbalance in oncologic data—towards inclusive and transferrable AI in large scale oncology data sets. Cancers. Jun 12, 2022;14(12):2897. [CrossRef]
  81. Molnar C, Casalicchio G, Bischl B. Interpretable machine learning – a brief history, state-of-the-art and challenges. Presented at: ECML PKDD 2020 Workshops; September 14-18, 2020, 2020; Ghent, Belgium. [CrossRef]
  82. Lauritsen SM, Kristensen M, Olsen MV, Larsen MS, Lauritsen KM, Jørgensen MJ, et al. Explainable artificial intelligence model to predict acute critical illness from electronic health records. Nat Commun. Jul 31, 2020;11(1):3852. [CrossRef]
  83. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. arXiv. Preprint posted online June 12, 2017. [FREE Full text] [CrossRef]
  84. Shen L, Zheng J, Lee EH, Shpanskaya K, McKenna ES, Atluri MG, et al. Attention-guided deep learning for gestational age prediction using fetal brain MRI. Sci Rep. Jan 26, 2022;12(1):1408. [CrossRef]
  85. Nguyen-Duc T, Mulligan N, Mannu GS, Bettencourt-Silva JH. Deep EHR spotlight: a framework and mechanism to highlight events in electronic health records for explainable predictions. AMIA Jt Summits Transl Sci Proc. May 17, 2021;2021:475-484. [Medline]
  86. Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health. Sep 2020;2(9):e489-e492. [FREE Full text] [CrossRef] [Medline]
  87. Bates DW, Auerbach A, Schulam P, Wright A, Saria S. Reporting and implementing interventions involving machine learning and artificial intelligence. Ann Intern Med. Jun 02, 2020;172(11_Supplement):S137-S144. [CrossRef]
  88. Angerschmid A, Zhou J, Theuermann K, Chen F, Holzinger A. Fairness and explanation in AI-informed decision making. Mach Learn Knowl Extr. Jun 16, 2022;4(2):556-579. [CrossRef]
  89. Obafemi-Ajayi T, Perkins A, Nanduri B, Wunsch DCII, Foster JA, Peckham J. No-boundary thinking: a viable solution to ethical data-driven AI in precision medicine. AI Ethics. Nov 29, 2021;2(4):635-643. [CrossRef]


DL: deep learning
DT: decision tree
EHR: electronic health record
GBDT: gradient boosting decision tree
LR: logistic regression
LSTM: long short-term memory
ML: machine learning
PRISMA: Preferred Reporting Items for Systematic Reviews and Meta-Analyses
SHAP: shapely additive explanation
XGBoost: extreme gradient boosting


Edited by K El Emam, B Malin; submitted 23.02.23; peer-reviewed by N Mungoli, H Musawir; comments to author 03.08.23; revised version received 28.09.23; accepted 09.10.23; published 07.12.23.

Copyright

©Arif Budiarto, Kevin C H Tsang, Andrew M Wilson, Aziz Sheikh, Syed Ahmar Shah. Originally published in JMIR AI (https://ai.jmir.org), 07.12.2023.

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR AI, is properly cited. The complete bibliographic information, a link to the original publication on https://www.ai.jmir.org/, as well as this copyright and license information must be included.