Background

JMIR AI

2817-1705

JMIR Publications

Toronto, Canada

v4i1e68260

10.2196/68260

Original Paper

Use of Automated Machine Learning to Detect Undiagnosed Diabetes in US Adults: Development and Validation Study

Liu

Jianxiu

PhD12Ssewamala

Fred

PhD34An

Ruopeng

MPP, PhD35Ji

Mengmeng

MBBS, PhD6

Division of Sports Science and Physical Education, Tsinghua University

Beijing

ChinaIDG/McGovern Institute for Brain Research, Tsinghua University

Beijing

ChinaSilver School of Social Work, New York University

New York

United StatesMcSilver Institute for Poverty Policy and Research, New York University

New York

United StatesConstance and Martin Silver Center on Data Science and Social Equity, New York University

New York

United StatesDivision of Public Health Sciences, Department of Surgery, Washington University in St. Louis

600 S Taylor Ave

St. Louis

United States

Emam

Khaled El

Maheshwari

Harsh

Thayil

Jerry John

Correspondence to Mengmeng Ji, MBBS, PhD, Division of Public Health Sciences, Department of Surgery, Washington University in St. Louis, 600 S Taylor Ave, St. Louis, MO, 63110, United States, 1 2179799336; j.mengmeng@wustl.edu

2025

8102025

e68260

311020242305202522072025

2025

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR AI, is properly cited. The complete bibliographic information, a link to the original publication on https://www.ai.jmir.org/, as well as this copyright and license information must be included.

Background

Early diagnosis of diabetes is essential for early interventions to slow the progression of dysglycemia and its comorbidities. However, among individuals with diabetes, about 23% were unaware of their condition.

Objective

This study aims to investigate the potential use of automated machine learning (AutoML) models and self-reported data in detecting undiagnosed diabetes among US adults.

Methods

Individual-level data, including biochemical tests for diabetes, demographic characteristics, family history of diabetes, anthropometric measures, dietary intakes, health behaviors, and chronic conditions, were retrieved from the National Health and Nutrition Examination Survey, 1999‐2020. Undiagnosed diabetes was defined as having no prior self-reported diagnosis but meeting diagnostic criteria for elevated hemoglobin A_1c, fasting plasma glucose, or 2-hour plasma glucose. The H2O AutoML framework, which allows for automated hyperparameter tuning, model selection, and ensemble learning, was used to automate the machine learning workflow. For comparative analysis, 4 traditional machine learning models—logistic regression, support vector machines, random forest, and extreme gradient boosting—were implemented. Model performance was evaluated using the area under the receiver operating characteristic curve.

Results

The study included 11,815 participants aged 20 years and older, comprising 2256 patients with undiagnosed diabetes and 9559 without diabetes. The average age was 59.76 (SD 15.0) years for participants with undiagnosed diabetes and 46.78 (SD 17.2) years for those without diabetes. The AutoML model demonstrated superior performance compared with the 4 traditional machine learning models. The trained AutoML model achieved an area under the receiver operating characteristic curve of 0.909 (95% CI 0.897-0.921) in the test set. The model demonstrated a sensitivity of 70.26%, specificity of 90.46%, positive predictive value of 64.10%, and negative predictive value of 92.61% for identifying undiagnosed diabetes from nondiabetes.

Conclusions

To our knowledge, this study is the first to utilize the AutoML model for detecting undiagnosed diabetes in US adults. The model’s strong performance and applicability to the broader US population make it a promising tool for large-scale diabetes screening efforts.

machine learningAutoMLself-reportscreeningundiagnosed diabetes

Introduction

Diabetes mellitus is the eighth leading cause of death in the United States and contributes to substantial health care costs [1]. In 2021, an estimated 38.4 million Americans of all ages had diabetes, representing 11.6% of the US population [2,3]. Of those with diabetes, 22.8% were unaware of or did not report having diabetes [2,3]. When diabetes is undiagnosed, and consequently hyperglycemia remains unmanaged, severe and irreversible microvascular and macrovascular complications can develop, including diabetic neuropathy, nephropathy, retinopathy, and cardiovascular disease [4-7].

Screening asymptomatic individuals for undiagnosed diabetes enables earlier diagnosis and treatment, ultimately reducing the risk of complications and premature death [8-10]. The latest American Diabetes Association (ADA) and US Preventive Services Task Force guidelines recommend beginning diabetes screenings at the age of 35 years [11,12]. However, diabetes screening guidelines that rely on blood testing are not widely followed. Only 50%‐60% of US adults who met the criteria for screening reported receiving glucose testing within the past 3 years [13,14]. The testing rate was alarmingly low among high-risk groups, including those with low education, low household income, and limited health care access [13,14].

Risk assessment tools for diabetes detection using easily accessible and self-reported data have been proposed, but they have shown low overall accuracy and validity in the general population [15-21]. In recent years, various machine learning algorithms have been used to predict diabetes and have yielded better performance than traditional statistics-based models [22-28]. Few studies have developed machine learning models to detect undiagnosed diabetes in the US population [29,30]. Although 2 studies reported a good overall accuracy of 80%, the quality of a positive prediction by models (ie, precision) was notably low, which could lead to a high number of false positives and unnecessary follow-up testing [29,30].

More recently, there has been growing interest within the health care community in automated machine learning (AutoML), which automates machine learning models’ selection, composition, and parameterization to optimize performance [31-33]. AutoML uses voting and stacking ensemble techniques to combine multiple learning models, often improving classification accuracy more effectively than a single machine learning algorithm. Its automation also reduces human error and bias by impartially exploring a wide range of machine learning models [33]. However, despite its potential, no prior studies have investigated the feasibility and performance of AutoML in screening for undiagnosed diabetes.

This study aimed to investigate the potential use of AutoML and self-reported data in detecting undiagnosed diabetes among US adults in a nationally representative survey. The trained model could aid in detecting undiagnosed diabetes in the general US population, particularly in underserved populations with limited access to blood glucose tests. This study could also promote the adoption of AutoML in diabetes research.

MethodsData Source

Individual-level data were retrieved from the National Health and Nutrition Examination Survey (NHANES), 1999‐2020. NHANES is a nationally representative, repeated cross-sectional study conducted by the National Center for Health Statistics. NHANES adopts a complex, multistage probability sampling design to ensure that the collected data are representative of the noninstitutionalized civilian population in the United States. NHANES includes clinical examinations, selected medical and laboratory tests, and self-reported data. NHANES interviews people in their homes and conducts health examinations in a mobile examination center, including laboratory analysis of blood, urine, and other tissue samples. The detailed study design and methodology of NHANES have been described elsewhere [34,35]. This study followed the CREMLS (Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models) [36,37].

Biochemical Tests for Undiagnosed Diabetes

Following the ADA guidelines [38], diabetes was diagnosed based on elevated levels of hemoglobin A_1c (≥6.5%), fasting plasma glucose (≥126 mg/dL), or 2-hour plasma glucose (≥200 mg/dL) during a 75-g oral glucose tolerance test. In this analysis, undiagnosed diabetes was defined as having no prior self-reported diagnosis but meeting any of the diagnosis criteria for elevated hemoglobin A_1c, fasting plasma glucose level, or oral glucose tolerance test level. The details of the diagnostic method used to define diabetes in this study are provided in Table 1.

Table 1.

Diagnostic method used to define diabetes in this study.

	Diagnostic method
Diabetic
Diagnosed diabetes	Answer “Yes” to “Other than during pregnancy, have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?”
Undiagnosed diabetes	Answer “No” to “Other than during pregnancy, have you ever been told by a doctor or health professional that you have diabetes or sugar diabetes?”ANDAny of the following tests meet criteria: HbA_1c^a≥6.5% (≥48 mmol/mol). FPG^b≥126 mg/dL (≥7.0 mmol/L). 2-h PG^c≥200 mg/dL (≥11.1 mmol/L) during OGTT^d.
Nondiabetic
Prediabetes	Does not meet criteria for diabetes diagnosisANDAny of the following tests meet criteria: HbA_1c 5.7%‐6.4% (39‐47 mmol/mol) FPG 100‐125 mg/dL (5.6‐6.9 mmol/L) 2-h PG 140‐199 mg/dL (7.8‐11.0 mmol/L)
Normoglycemia	All the following tests meet criteria: HbA_1c<5.7% (<39 mmol) FPG<100 mg/dL (<5.6 mmol/) 2-h PG<140 mg/dL (<7.8 mmol/L)

^aHbA_1c: hemoglobin A_1c.

^bFPG: fasting plasma glucose.

^cPG: plasma glucose.

^dOGTT: oral glucose tolerance test.

Participant Selection

The study utilized data from NHANES 1999‐2020, comprising an initial cohort of 112,502 participants. Participants with self-reported diabetes (n=8657) and those with missing data on self-reported diabetes status (n=9672) were excluded. Individuals aged <20 years (n=43,879) and pregnant females (n=1540) were removed from the cohort. Participants with missing laboratory results for diabetes were further excluded (n=36,939). For inclusion in the nondiabetic group, participants had to meet all 3 test criteria to confirm the absence of diabetes. For the diabetes group, participants were included if at least one test met the diagnostic criteria, even if the other two test results were missing. In total, the study cohort included 11,815 participants and was categorized into 2 groups: 9559 without diabetes and 2256 with undiagnosed diabetes.

FeaturesDemographic Characteristics

Demographic features included age at the survey, gender (male/female), race/ethnicity (non-Hispanic White, non-Hispanic Black, Mexican American, and other races), educational attainment (lower than 9th-grade education, 9th- to 11th-grade education, high school, some college or associate degree, college or higher), marital status (married, widowed, divorced, separated, never married, living with a partner), and income-to-poverty ratio (ratio of monthly family income to the poverty guidelines).

Family history of diabetes was ascertained from the Medical Conditions Questionnaire: “Including living and deceased, were any of your biological, that is, blood relatives including grandparents, parents, brothers, sisters ever told by a health professional that they had diabetes?”

Anthropometric Measures

Participants were weighed in mobile examination centers, wearing only underclothing and an examination gown. Weight was recorded on a digital scale in kilograms. Standing height was measured using a stadiometer with a fixed vertical backboard and an adjustable headpiece. BMI was calculated as measured weight in kilograms divided by height in meters squared. Waist circumference was measured just above the iliac crest using a steel measuring tape.

Diet Intake and Behaviors

In NHANES, 24-hour dietary recalls were administered to obtain detailed nutritional intake information from participants. Daily dietary intake (the average of 2 d) of energy (kcal), total fat (g), cholesterol (mg), and total sugars (g) was calculated. The frequency of eating out per week was obtained from the question: “On average, how many times per week do you eat meals prepared in a restaurant?”

Health Behaviors

The NHANES physical activity questionnaire included questions about daily and leisure-time activities. The average hours spent in each activity were multiplied by the suggested metabolic equivalent (MET) scores to estimate MET hours per week [39]. Indicators (yes/no) for smoking and drinking were obtained from answers to questions: “In any one year, have you had at least 12 drinks of any type of alcoholic beverage?” and “Have you smoked at least 100 cigarettes in your entire life?”

Chronic Conditions

Comorbid conditions were obtained from self-reports to the Medical Conditions Questionnaire, “Have you ever been told by a doctor that you had (medical problem)?” which includes hypertension, rheumatoid arthritis, myocardial infarction, congestive heart failure, coronary heart disease, stroke, liver disease, weak/failing kidneys, and cancer/malignancy of any kind.

AutoML and Custom Machine Learning Models

The H2O AutoML framework was used in this study to automate the machine learning workflow. The H2O AutoML trains several models, cross-validated by default, by using the following available algorithms: extreme gradient boosting (XGBoost), gradient boosting machine, generalized linear model, distributed random forest, extremely randomized trees, and fully connected deep neural network. H2O AutoML introduces two essential advancements to optimize model performance. First, it fine-tunes base models using a fast random search approach, where hyperparameters are selected from a range of values identified as most impactful. Second, H2O AutoML leverages a sophisticated stacking technique to create two powerful ensemble models: “All models ensemble,” which combines all the base models trained, and “Best of the Family ensemble,” which contains the best-performing models. The stacked ensemble models are designed to leverage the diverse strengths of various algorithms, resulting in a final model that is accurate and generalizable across different datasets. H2O AutoML has built-in functionality for class balancing and handling of missing values. Detailed documentation, as well as directions for algorithms and the implementation of H2O.ai, are available online [40].

We randomly split the dataset into training (70%) and test (30%) sets. AutoML trained diverse base models on the training set, ranking them by cross-validated area under the receiver operating characteristic curve (AUC). Cross-validation was performed using a 5-fold approach, with models iteratively trained on 4 subsets of the training set and validated on the remaining subset. A stacked ensemble (“leader”) was constructed by blending the top-performing base models via a meta learner. This stacked ensemble model was exported and applied to our independent holdout test set to generate class-probability predictions. The 95% CIs for each AUC were derived from 1000 bootstrap resamples of the test set. Confusion matrices were constructed to calculate sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) using the classification threshold that maximized the F₁-score that yields the highest harmonic mean of precision and recall on the test set. We extracted feature importance from each base learner and computed a weighted aggregate in the H2O leader model. For tree-based models, feature importance was determined by the frequency of each feature used for splitting and the overall reduction in squared error. For non–tree-based models, importance was based on coefficient magnitudes.

In addition to the AutoML model, we conducted a comparative analysis using 4 traditional machine learning models—logistic regression, support vector machines, random forest, and XGBoost, with Synthetic Minority Over-sampling Technique applied to address data imbalance during training [41,42].

The clinical guidelines generally recommend confirming an elevated test with a secondary measurement for the diagnosis of diabetes [38,43]. In the main analyses, diabetes was diagnosed based on a single elevated test result. In additional analyses, the diagnosis of diabetes was confirmed by at least 2 elevated tests recommended by the ADA guidelines. We also evaluated the performance of a 3-class prediction model for diabetes within the AutoML framework, using classification schemes that distinguished between normoglycemia, prediabetes, and undiagnosed diabetes.

Summary statistics for participant characteristics, stratified by diabetes status, were calculated. Categorical variables were compared using χ² tests, and continuous variables were evaluated using independent samples t tests. Missing data were imputed using mean values for continuous variables and mode for categorical variables. We used STATA 18 (StataCorp LLC) for data preparation and Python version 3.10.12 (Python Software Foundation) to implement the H2O AutoML (H2O.ai, Inc) and custom machine learning models (version 3.46.0.6).

Ethical Considerations

This study used publicly available, deidentified NHANES data. In accordance with the US Department of Health and Human Services (Title 45 of the Code of Federal Regulations; §46.104 (d), section 4) [44], analyses of publicly available, deidentified data are not considered human subjects research and therefore do not require review by the Washington University Institutional Review Board.

Results

In total, the study cohort included 11,815 participants, with 9559 participants without diabetes and 2256 with undiagnosed diabetes. The characteristics of the study cohort are summarized in Table 2. The average ages were 59.76 (SD 15.0) years for those with undiagnosed diabetes and 46.78 (SD 17.2) years for those without. The diabetes group had a higher proportion of males, lower levels of education, and a greater likelihood of having a relative with diabetes. Additionally, patients with undiagnosed diabetes had higher BMI, larger waist circumference, and a higher prevalence of chronic conditions. The flow diagram of participant selection is presented in Figure 1.

Table 2.

Cohort characteristics by diabetes status.

	No diabetes (n=9559)	Undiagnosed diabetes (n=2256)	P value
Laboratory tests, mean (SD)
Glycohemoglobin (%)	5.41 (0.38)	6.88 (1.60)	<.001
Fasting glucose plasma (mg/dL)	98.18 (9.46)	141.38 (46.79)	<.001
Oral glucose tolerance test (mg/dL)	109.53 (31.95)	229.24 (75.68)	<.001
Demographic characteristics
Age (years), mean (SD)	46.78 (17.19)	59.76 (14.98)	<.001
Gender, n (%)			.005
Male	4711 (49.28)	1186 (52.57)
Female	4848 (50.72)	1070 (47.43)
Race/ethnicity, n (%)			<.001
Non-Hispanic White	4381 (45.83)	835 (37.01)
Non-Hispanic Black	1750 (18.31)	504 (22.34)
Hispanic	2459 (25.72)	698 (30.94)
Other	969 (10.14)	219 (9.71)
Education, n (%)			<.001
≤9 grade	881 (9.22)	434 (19.28)
9th-11th grade	1301 (13.62)	391 (17.37)
High school	2127 (22.27)	554 (24.61)
Some college	2801 (29.32)	543 (24.12)
College or above	2443 (25.57)	329 (14.62)
Marital status, n (%)			<.001
Married	5016 (52.47)	1248 (55.54)
Widowed	568 (5.94)	390 (17.36)
Divorced	950 (9.94)	251 (11.17)
Separated	330 (3.45)	72 (3.20)
Never married	1843 (19.28)	192 (8.54)
Living with partner	852 (8.91)	94 (4.18)
Income-to-poverty ratio, mean (SD)	2.60 (1.64)	2.29 (1.53)	<.001
Family history of diabetes, n (%)	3394 (36.16)	1072 (48.79)	<.001
Anthropometric measures, mean (SD)
Height (cm)	167.89 (10.00)	165.84 (10.24)	<.001
Weight (kg)	80.12 (20.69)	88.19 (23.32)	<.001
BMI (kg/m²)	28.33 (6.53)	31.94 (7.43)	<.001
Waist circumference (cm)	96.92 (15.32)	107.5 (15.69)	<.001
Diet intake and eating behavior, mean (SD)
Daily total energy (kcal)	2090.13 (837.69)	1912.17 (851.19)	<.001
Daily total fat (g)	78.27 (38.36)	72.29 (39.76)	<.001
Daily total sugars (g)	113.59 (65.37)	104.57 (68.14)	<.001
Daily total cholesterol (mg)	287.05 (193.07)	285.82 (201.17)	.80
Times of dining out per week	3.43 (3.84)	2.60 (3.43)	<.001
Health behaviors
Physical activity (MET^a-h/wk), mean (SD)	2.36 (3.69)	1.62 (2.81)	<.001
Smoking, n (%)	4200 (43.97)	1031 (48.91)	<.001
Drinking, n (%)	6758 (74.26)	1262 (67.52)	<.001
Self-reported chronic conditions, n (%)
Hypertension	2778 (29.10)	1188 (52.85)	<.001
Rheumatoid arthritis	2206 (23.12)	778 (34.55)	<.001
Myocardial infarction	283 (2.96)	152 (6.76)	<.001
Congestive heart failure	182 (1.91)	112 (4.99)	<.001
Coronary heart disease	261 (2.74)	145 (6.47)	<.001
Stroke	231 (2.42)	127 (5.64)	<.001
Liver disease	295 (3.09)	117 (5.19)	<.001
Weak/failing kidneys	199 (2.08)	64 (2.84)	.03
Cancer	760 (7.95)	257 (11.40)	<.001

^aMET: metabolic equivalent.

Figure 1.

Flow diagram of participant selection. 2-h PG: 2-hour plasma glucose; A_1c: hemoglobin A_1c; FPG: fasting plasma glucose; NHANES: National Health and Nutrition Examination Survey.

The performance of the AutoML model and traditional machine learning models is summarized in Figure 2. The AutoML model demonstrated superior performance compared to the 4 traditional machine learning models—logistic regression, support vector machines, random forest, and XGBoost. The trained AutoML model achieved an AUC of 0.909 (95% CI 0.897-0.921) and an accuracy of 86.5% in the test set. The model demonstrated a sensitivity of 70.26%, specificity of 90.46%, PPV of 64.10%, and NPV of 92.61% for identifying undiagnosed diabetes from nondiabetes (Table 3). The model summary and details were provided in Multimedia Appendix 1.

Figure 2.

Performance of AutoML model and custom machine learning models in detecting undiagnosed diabetes on the test set. AutoML: automated machine learning; XGBoost: extreme gradient boosting.

Table 3.

Confusion matrix for the classification of undiagnosed diabetes using the AutoML model.^a

	Predicted label, n (%)
	No diabetes	Undiagnosed diabetes
True label
No diabetes	2558 (73)	270 (8)
Undiagnosed diabetes	204 (6)	482 (14)

^aNote: The cutoff threshold was 0.248718, optimized for F₁-score that maximized the harmonic mean of precision and recall. The matrix shows the distribution of true labels against predicted labels. The cell values indicate the number of instances (absolute counts) and their corresponding percentages of the total in the test data. Sensitivity (482/686, 70.26%), specificity (2558/2828, 90.45%), positive predictive value (482/752, 64.10%), and negative predictive value (2558/2762, 92.61%) are derived from the matrix to assess model performance.

The top 5 features are age, waist circumference, daily total sugar intake, income, and BMI, together accounting for 50% of total model importance. Comorbidities, except hypertension, contributed minimally relative to demographic and behavioral factors. Excluding comorbidities (except hypertension) resulted in comparable model performance, with an AUC of 0.830 and an accuracy of 85.1%.

Additional analysis results are summarized in Table 4. The model using diabetes diagnosis criteria with a confirmatory test achieved a testing accuracy of 98.0% and 89.7% with an AUC of 0.823. However, precision (ie, PPV) and recall (ie, sensitivity) were suboptimal due to the small number of patients meeting the diabetes criteria with ≥2 tests. The model demonstrated a sensitivity of 44.10%, specificity of 92.22%, PPV of 24.23%, and NPV of 96.69% for identifying undiagnosed diabetes from nondiabetes. The performance of the multiclass prediction model was poor, with an overall accuracy of 58.9% for using diagnosis criteria with ≥1 test and 67.1% for using diagnosis criteria with ≥2 tests.

Table 4.

Additional models for alternative diabetes diagnosis criteria with the second confirmative test and multiclass prediction including prediabetes.

	Undiagnosed diabetes (≥2 test) versus no diabetes^a		Undiagnosed diabetes (≥1 test) versus prediabetes versus normoglycemia		Undiagnosed diabetes (≥2 test) versus prediabetes versus normoglycemia
	Train	Test	Train	Test	Train	Test
Accuracy (%)	98.0	89.7	66.8	59.0	68.6	67.1
Overall AUC^b	0.993	0.823	0.627	0.557	0.588	0.557
AUC (normal versus rest)	—^c	—	0.502	0.445	0.316	0.307
AUC (prediabetes versus rest)	—	—	0.814	0.652	0.792	0.715
AUC (diabetes versus rest)	—	—	0.415	0.545	0.555	0.673

^a“No diabetes” group includes normoglycemia and prediabetes.

^bAUC: area under the receiver operating characteristic curve.

^cNot applicable.

Discussion

To our knowledge, this study is the first to utilize the AutoML model for detecting undiagnosed diabetes in US adults. The best-performing model achieved an AUC of 0.91 and an accuracy of 86.5% in the test set. National surveillance shows that nearly half of those with undiagnosed diabetes have hypertension, lipid abnormalities, or cardiovascular and chronic kidney diseases [45-48]. Delayed diagnosis of diabetes hinders the opportunities for early intervention to slow the progression of dysglycemia and its comorbidities. Our model was trained and tested using a substantial and diverse dataset comprising nationally representative survey data. The model’s high accuracy and applicability to the broader US population make it a promising tool for large-scale diabetes screening efforts.

The feature importance ranking of the best-performing model highlights waist circumference, BMI, and dietary variables as key predictors, underscoring their strong links to metabolic health, insulin resistance, and dietary habits that influence diabetes risk. Lifestyle factors such as drinking frequency and physical activity also emerged as significant contributors, whereas self-reported comorbidities played a smaller role once anthropometric and behavioral measures were included. These findings align with epidemiological studies and improve the model’s interpretability, providing actionable insights to prioritize targeted interventions for modifiable risk factors [49].

Machine learning has advanced clinical research but faces adoption barriers like data access, imbalances, and reliance on data science expertise for deployment [50,51]. AutoML reduces the need for machine learning expertise, enabling clinicians to use advanced technologies without programming skills and integrating them into research and clinical practice [52-55]. Despite being promising, few studies have explored the application of AutoML for diabetes diagnosis [56].

Previous studies have compared traditional machine learning models with conventional statistical models for identifying undiagnosed diabetes, demonstrating that machine learning models outperform statistical models [29,30]. These studies reported AUC values between 0.73 and 0.81, consistent with the performance of traditional models in this study. However, the reported low PPVs highlighted the limited ability of these models to accurately identify undiagnosed diabetes cases [30]. This study showed that AutoML models are superior and outperformed traditional machine learning models in detecting undiagnosed diabetes. Similarly, one study has reported that the AutoML model outperformed both individual and ensemble models in identifying patients with diabetes using electronic medical records data [57]. These findings suggest that AutoML provides a more accessible and efficient approach, eliminating the need for manual optimization while delivering superior performance.

Nonetheless, several issues with the AutoML model in diabetes screening should be noted. When applying more stringent diabetes diagnostic criteria, the accuracy reached 90%; however, precision and recall were low, likely due to the limited number of samples that met the ≥2 test criteria. H2O AutoML provides significant advantages, including its built-in class balancing functionality, which automates the handling of moderate class imbalance without requiring external implementations. Using random sampling to upsample minority classes or downsample majority classes, it effectively manages datasets with moderate imbalance, such as the 4:1 ratio under the 1+ test criterion for diabetes diagnosis in this study. However, with the stricter 2+ test criterion, the imbalance ratio rose to 24:1, likely exceeding H2O AutoML’s ability to mitigate the imbalance. This severe imbalance impacted the model’s ability to accurately distinguish undiagnosed diabetes from nondiabetes in unseen data, reducing precision and recall. For such highly imbalanced datasets, combining AutoML with other data resampling methods, such as Synthetic Minority Over-sampling Technique, could better improve model performance [58].

In addition, the model’s performance in multiclass classification of no diabetes, prediabetes, and undiagnosed diabetes was notably poor. Prediabetes is an intermediate stage between normal glycemia and diabetes and is highly prevalent [59]. Clinically, prediabetes and diabetes, as the continuum of dysglycemia, share many overlapping risk factors, such as insulin resistance and elevated glucose levels [60], making their differentiation challenging. The subtle metabolic differences between these conditions may not have been adequately captured by the included self-reported data. Future efforts should focus on incorporating additional features and refining the model architecture to enhance accuracy and improve its ability to identify prediabetes.

A major strength of this model is its use of NHANES data, which is nationally representative of the US population, enhancing the generalizability of the findings. The model incorporates comprehensive self-reported data, including nutritional information, for predicting undiagnosed diabetes. The application of AutoML in this study represents the first use of this approach in diabetes research, providing a foundation for further development and validation of similar models. However, the widespread adoption of our AutoML in health care requires further development and validation. False positives can lead to unnecessary tests, higher costs, and patient anxiety, while false negatives may delay treatment and worsen outcomes. Using AutoML models in real-world clinical settings requires meticulous threshold optimization and validation to achieve an appropriate balance between precision and recall. These models should be rigorously evaluated for fairness and equity, ensuring that performance does not vary significantly across demographic groups [61,62]. Practical barriers to implementing AutoML in clinical practice also include integration with electronic health record systems and the lack of trust in “black-box” models due to their opacity [31,61-63].

Several limitations should be noted. First, the definition of undiagnosed diabetes in the base model was based on a single elevated measurement, which may not fully capture the condition. However, the model still showed utility as a screening tool. Second, the model demonstrated poor performance in multiclass prediction, including prediabetes, indicating that additional feature refinement and model adjustments may be necessary to improve accuracy in 3-class predictions. Third, known diabetes diagnoses rely on self-reports, which may introduce potential recall bias and misclassification. This could lead to mislabeling cases and diminishing model performance. Although most published public health studies still rely on self-reports—hence making our study not an unusual one in using self-reports—future studies should aim to validate self-reported data against medical records where feasible to minimize errors and increase reliability of the observations. Finally, the generalizability of this model may be limited to US populations and may not extend to non-US populations.

This study demonstrates the potential of AutoML in detecting undiagnosed diabetes using self-reported and easily accessible data. Although challenges remain in accurately classifying multiple categories, including prediabetes, the model shows promise as a tool for large-scale diabetes screening. Further refinement and validation are required to improve its applicability across diverse populations.

JL was supported by Beijing Educational Sciences planning project (CCHA241) and Sports Special Project of China Disabled Persons' Federation (2024 KFJS & 001).

MJ and RA conceived and designed the study. JL and MJ contributed to data curation, formal analysis, funding acquisition, and investigation. MJ was responsible for resources and validation. Visualization was performed by JL and MJ. The original draft of the manuscript was written by JL and MJ, with critical review and editing provided by JL, FS, RA, and MJ.

None declared.

Abbreviations

ADA

American Diabetes Association

AUC

area under the receiver operating characteristic curve

AutoML

automated machine learning

CREMLS

Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models

MET

metabolic equivalent

NHANES

National Health and Nutrition Examination Survey

NPV

negative predictive value

PPV

positive predictive value

XGBoost

extreme gradient boosting

References1

Ahmad

Anderson

The leading causes of death in the US for 2020

JAMA202105113251818291830

10.1001/jama.2021.5469

33787821

Statistics about diabetes

American Diabetes Association2025-01-27

https://diabetes.org/about-diabetes/statistics/about-diabetes

Diabetes statistics

National Institute of Diabetes and Digestive and Kidney Diseases2025-01-27

https://www.niddk.nih.gov/health-information/health-statistics/diabetes-statistics

Beagley

Guariguata

Weil

Motala

Global estimates of undiagnosed diabetes in adults

Diabetes Res Clin Pract2014021032150160

10.1016/j.diabres.2013.11.001

24300018

Fowler

Microvascular and macrovascular complications of diabetes

Clin Diabetes2011071293116122

10.2337/diaclin.29.3.116

Deshpande

Harris-Hayes

Schootman

Epidemiology of diabetes and diabetes-related complications

Phys Ther200811881112541264

10.2522/ptj.20080020

18801858

Ahmad

Joshi

Self-care practices and their role in the control of diabetes: a narrative review

Cureus202307157e41409

10.7759/cureus.41409

37546053

Ali

McKeever Bullard

Imperatore

Reach and use of diabetes prevention services in the United States, 2016-2017

JAMA Netw Open201905325e193160

10.1001/jamanetworkopen.2019.3160

31074808

Ali

Bullard

Gregg

Del Rio

A cascade of care for diabetes in the United States: visualizing the gaps

Ann Intern Med2014111816110681689

10.7326/M14-0019

25402511

Diabetes Prevention Program Research GroupKnowler

Fowler

10-year follow-up of diabetes incidence and weight loss in the Diabetes Prevention Program Outcomes Study

Lancet20091114374970216771686

10.1016/S0140-6736(09)61457-4

19878986

US Preventive Services Task ForceDavidson

Barry

Screening for prediabetes and type 2 diabetes: US Preventive Services Task Force recommendation statement

JAMA202108243268736743

10.1001/jama.2021.12531

34427594

American Diabetes Association Professional Practice Committee

Classification and diagnosis of diabetes: standards of medical care in diabetes—2022

Diabetes Care202201145Supplement_1S17S38

10.2337/dc22-S002

Bullard

Ali

Imperatore

Receipt of glucose testing and performance of two US diabetes screening guidelines, 2007-2012

PLoS ONE2015104e0125249

10.1371/journal.pone.0125249

25928306

Ali

Imperatore

Benoit

Impact of changes in diabetes screening guidelines on testing eligibility and potential yield among adults without diagnosed diabetes in the United States

Diabetes Res Clin Pract202303197110572

10.1016/j.diabres.2023.110572

36775024

Franciosi

De Berardis

Rossi

MCE

Use of the diabetes risk score for opportunistic screening of undiagnosed diabetes and impaired glucose tolerance: the IGLOO (Impaired Glucose Tolerance and Long-Term Outcomes Observational) study

Diabetes Care20050528511871194

10.2337/diacare.28.5.1187

15855587

Lee

Bang

Kim

Park

Kim

A simple screening score for diabetes for the Korean population: development, validation, and comparison with other scores

Diabetes Care20120835817231730

10.2337/dc11-2347

22688547

Bang

Edwards

Bomback

Development and validation of a patient self-assessment score for diabetes risk

Ann Intern Med200912115111775783

10.7326/0003-4819-151-11-200912010-00005

19949143

Herman

Smith

Thompson

Engelgau

Aubert

A new and simple questionnaire to identify people at increased risk for undiagnosed diabetes

Diabetes Care199503183382387

10.2337/diacare.18.3.382

7555482

Heikes

Eddy

Arondekar

Schlessinger

Diabetes risk calculator: a simple tool for detecting undiagnosed diabetes and pre-diabetes

Diabetes Care20080531510401045

10.2337/dc07-1150

18070993

Rolka

Narayan

Thompson

Performance of recommended screening tests for undiagnosed diabetes and dysglycemia

Diabetes Care200111241118991903

10.2337/diacare.24.11.1899

11679454

Dhippayom

Chaiyakunapruk

Krass

How diabetes risk assessment tools are implemented in practice: a systematic review

Diabetes Res Clin Pract2014061043329342

10.1016/j.diabres.2014.01.008

24485859

Maniruzzaman

Rahman

Al-MehediHasan

Accurate diabetes risk stratification using machine learning: role of missing value and outliers

J Med Syst2018041042592

10.1007/s10916-018-0940-7

29637403

Hasan

Alam

Das

Hossain

Hasan

Diabetes prediction using ensembling of different machine learning classifiers

IEEE Access202087651676531

10.1109/ACCESS.2020.2989857

34812373

Sisodia

Prediction of diabetes using classification algorithms

Procedia Comput Sci201813215781585

10.1016/j.procs.2018.05.122

Olivera

Roesler

Iochpe

Comparison of machine-learning algorithms to build a predictive model for detecting undiagnosed diabetes - ELSA-Brasil: accuracy study

Sao Paulo Med J20171353234246

10.1590/1516-3180.2016.0309010217

28746659

Ryu

Lee

Batbaatar

Lee

Choi

Cha

A deep learning model for estimation of patients with undiagnosed diabetes

Appl Sci (Basel)2020101421

10.3390/app10010421

Choi

Park

Comparisons of the prediction models for undiagnosed diabetes between machine learning versus traditional statistical methods

Sci Rep2023081113113101

10.1038/s41598-023-40170-0

37567907

Kavakiotis

Tsave

Salifoglou

Maglaveras

Vlahavas

Chouvarda

Machine learning and data mining methods in diabetes research

Comput Struct Biotechnol J201715104116

10.1016/j.csbj.2016.12.005

28138367

Liu

Valdez

Gwinn

Khoury

Application of support vector machine modeling for prediction of common diseases: the case of diabetes and pre-diabetes

BMC Med Inform Decis Mak2010032210116

10.1186/1472-6947-10-16

20307319

Cichosz

Bender

Hejlesen

A comparative analysis of machine learning models for the detection of undiagnosed diabetes patients

Diabetology202451111

10.3390/diabetology5010001

Waring

Lindvall

Umeton

Automated machine learning: review of the state-of-the-art and opportunities for healthcare

Artif Intell Med202004104101822

10.1016/j.artmed.2020.101822

32499001

Scott

De Guzman

Falconer

Evaluating automated machine learning platforms for use in healthcare

JAMIA Open20240772ooae031

10.1093/jamiaopen/ooae031

38863963

Luo

A review of automatic selection methods for machine learning algorithms and hyper-parameter values

Netw Model Anal Health Inform Bioinforma2016125118

10.1007/s13721-016-0125-6

Chen

Clark

Riddles

Mohadjer

Fakhouri

THI

National Health and Nutrition Examination Survey, 2015-2018: sample design and estimation procedures

Vital Health Stat 2202004184135

33663649

Mirel

Mohadjer

Dohrmann

National Health and Nutrition Examination Survey: estimation procedures, 2007-2010

Vital Health Stat 22013082159117

25093338

El Emam

Leung

Malin

Klement

Eysenbach

Consolidated Reporting Guidelines for Prognostic and Diagnostic Machine Learning Models (CREMLS)

J Med Internet Res202405226e52508

10.2196/52508

38696776

Klement

El Emam

Consolidated reporting guidelines for prognostic and diagnostic machine learning modeling studies: development and validation

J Med Internet Res2023083125e48763

10.2196/48763

37651179

ElSayed

Aleppo

Bannuru

Diagnosis and classification of diabetes: standards of care in diabetes—2024

Diabetes Care202401147Supplement_1S20S42

10.2337/dc24-S002

NHANES 2005-2006 data documentation, codebook, and frequencies

Centers for Disease Control and Prevention2025-01-27

https://wwwn.cdc.gov/Nchs/Nhanes/2005-2006/PAQIAF_D.htm

LeDell

Poirier

H2O AutoML: scalable automatic machine learning

20200718

2025-01-27

7th ICML Workshop on Automated Machine Learning

https://www.automl.org/wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf

Chawla

Bowyer

Hall

Kegelmeyer

SMOTE: Synthetic Minority Over-Sampling Technique

J Artif Intell Res200216321357

10.1613/jair.953

Blagus

Lusa

SMOTE for high-dimensional class-imbalanced data

BMC Bioinformatics2013032214106

10.1186/1471-2105-14-106

23522326

Samson

Vellanki

Blonde

American Association of Clinical Endocrinology consensus statement: comprehensive type 2 diabetes management algorithm 2023 update

Endocr Pract202305295305340

10.1016/j.eprac.2023.02.001

37150579

Code of federal regulations

US Department of Health and Human Services2024-9-30

https://www.ecfr.gov/current/title-45/subtitle-A/subchapter-A/part-46/subpart-A/section-46.104

Plantinga

Crews

Coresh

Prevalence of chronic kidney disease in US adults with undiagnosed diabetes or prediabetes

Clin J Am Soc Nephrol20100454673682

10.2215/CJN.07891109

20338960

Casagrande

Cowie

Fradkin

Utility of the U.S. Preventive Services Task Force criteria for diabetes screening

Am J Prev Med201308452167174

10.1016/j.amepre.2013.02.026

23867023

Young

Mustard

Undiagnosed diabetes: does it matter?

CMAJ200101916412428

11202663

Tenenbaum

Motro

Fisman

Clinical impact of borderline and undiagnosed diabetes mellitus in patients with coronary artery disease

Am J Cardiol20001215861213631366

10.1016/s0002-9149(00)01244-3

11113414

Wang

Trends in prevalence of diabetes and control of risk factors in diabetes among US adults, 1999-2018

JAMA202106253268113

10.1001/jama.2021.9883

34170288

Khan

Fatima

Qureshi

Drawbacks of artificial intelligence and their potential solutions in the healthcare sector

Biomed Mater Devices20230288 202318

10.1007/s44174-023-00063-2

36785697

Habehh

Gohel

Machine learning in healthcare

Curr Genomics20211216224291300

10.2174/1389202922666210705124359

35273459

A Romero

Y Deypalan

Mehrotra

Benchmarking AutoML frameworks for disease prediction using medical claims

BioData Min2022072615115

10.1186/s13040-022-00300-2

35883154

Rashidi

Tran

Albahra

Dang

Machine learning in health care and laboratory medicine: general overview of supervised learning and Auto-ML

Int J Lab Hematol20210743 Suppl 11522

10.1111/ijlh.13537

34288435

Raj

Kannath

Mathew

Sylaja

AutoML accurately predicts endovascular mechanical thrombectomy in acute large vessel ischemic stroke

Front Neurol2023141259958

10.3389/fneur.2023.1259958

37840939

Yang

Shi

Wei

MedMNIST v2—a large-scale lightweight benchmark for 2D and 3D biomedical image classification

Sci Data2023011910141

10.1038/s41597-022-01721-8

36658144

Zhuhadar

Lytras

The application of AutoML techniques in diabetes diagnosis: current approaches, performance, and future directions

Sustainability2023151813484

10.3390/su151813484

Mohsen

Biswas

Ali

Alam

Househ

Shah

Customized and automated machine learning-based models for diabetes type 2 classification

Stud Health Technol Inform20220629295517520

10.3233/SHTI220779

35773925

Khushi

Shaukat

Alam

A comparative performance analysis of data resampling methods on imbalance medical data

IEEE Access20219109960109975

10.1109/ACCESS.2021.3102399

Echouffo-Tcheugui

Selvin

Prediabetes and what it means: the epidemiological evidence

Annu Rev Public Health2021041425977

10.1146/annurev-publhealth-090419-102644

33355476

Skyler

Bakris

Bonifacio

Differentiation of diabetes by pathophysiology, natural history, and prognosis

Diabetes201702662241255

10.2337/db16-0806

27980006

Huguet

Chen

Parikh

Applying machine learning techniques to implementation science

Online J Public Health Inform2024042216e50201

10.2196/50201

38648094

Sarker

Machine learning: algorithms, real-world applications and research directions

SN Comput Sci202123160

10.1007/s42979-021-00592-x

33778771

Shick

Webber

Kiarashi

Transparency of artificial intelligence/machine learning-enabled medical devices

NPJ Digit Med202401267121

10.1038/s41746-023-00992-8

38273098

Multimedia Appendix 1

Model summary for stacked ensemble.