This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR AI, is properly cited. The complete bibliographic information, a link to the original publication on https://www.ai.jmir.org/, as well as this copyright and license information must be included.
Machine learning (ML) can offer greater precision and sensitivity in predicting Medicare patient end of life and potential need for palliative services compared to provider recommendations alone. However, earlier ML research on older community dwelling Medicare beneficiaries has provided insufficient exploration of key model feature impacts and the role of the social determinants of health.
This study describes the development of a binary classification ML model predicting 1-year mortality among Medicare Advantage plan members aged ≥65 years (N=318,774) and further examines the top features of the predictive model.
A light gradient-boosted trees model configuration was selected based on 5-fold cross-validation. The model was trained with 80% of cases (n=255,020) using randomized feature generation periods, with 20% (n=63,754) reserved as a holdout for validation. The final algorithm used 907 feature inputs extracted primarily from claims and administrative data capturing patient diagnoses, service utilization, demographics, and census tract–based social determinants index measures.
The total sample had an actual mortality prevalence of 3.9% in the 2018 outcome period. The final model correctly predicted 44.2% of patient expirations among the top 1% of highest risk members (AUC=0.84; 95% CI 0.83-0.85) versus 24.0% predicted by the model iteration using only age, gender, and select high-risk utilization features (AUC=0.74; 95% CI 0.73-0.74). The most important algorithm features included patient demographics, diagnoses, pharmacy utilization, mean costs, and certain social determinants of health.
The final ML model better predicts Medicare Advantage member end of life using a variety of routinely collected data and supports earlier patient identification for palliative care.
Approximately 43% of all Medicare beneficiaries are enrolled in Medicare Advantage plans, totaling 24.4 million Americans as of July 2020 [
Although the need for and engagement with palliative care among older adults and Medicare beneficiaries is growing, these valuable services are often underutilized [
ML is being adopted across hospital and community-based health care settings as a mechanism to guide early identification of older adults in need of palliative services. ML algorithms attain superior predictive performance from using one or more sources of big data for model training, such as routinely collected medical service claims, electronic medical records, and clinical assessment outcomes [
Previous research on ML mortality models for earlier palliative care identification in the Medicare population has mainly focused on optimizing and comparing the performance of different model configurations [
The important individual features of ML mortality models used to identify palliative care need among nonhospitalized older Medicare patients remain underreported in the current research [
To what extent is the performance of a baseline ML model (demographics-based with high-risk indicators) predicting 1-year mortality of Medicare Advantage plan members (aged ≥65 years) improved by adding features capturing patient service utilization, diagnoses, and SDOH?
What individual features are of top importance in the final ML model iteration?
An ML algorithm predicting 1-year mortality among Medicare Advantage plan members was developed by the team at Cigna, a large US commercial health benefits company. The aim was to create a prognostic ML model of mortality risk that could enhance the process of identifying patients for palliative care, with the long-term goal of increasing engagement with community-based, nonhospice palliative services among adults (aged ≥65 years) in Medicare Advantage plans for whom it would be appropriate. Increasing utilization of palliative services can reduce unnecessary high-cost hospital care and improve patient quality of life. An overview of the health plan’s process for identifying and connecting with potential palliative care patients is outlined in
The retrospective data used in the analysis were internally sourced from Cigna’s proprietary administrative records and claims database. These standard data elements are routinely collected to fulfill the operational purposes of the health benefits company; claims and administrative data were only extracted for the purposes of developing the ML algorithm post facto. Security measures for personal health information require all data be completely de-identified by a separate internal team prior to any secondary data analysis to protect member confidentiality. Due to the sensitivity and proprietary nature of the information, data cannot be shared externally.
Our study methods were in accordance with the ethical guidelines of the 1975 Declaration of Helsinki, and our reporting conforms to the Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research [
Medicare Advantage plan members eligible for inclusion in analysis were all those with continuous health benefits coverage enrollment as of July 1, 2016, through the feature generation period of December 31, 2017, who also had at least one inpatient or outpatient service encounter in their randomly assigned feature generation time frame. Additionally, to be included in the analyzed sample, during the outcomes period (January 1, 2018, through December 31, 2018), patients must have either (1) had continuous enrollment for the 2018 calendar year or (2) became deceased during 2018. This requirement ensured any beneficiaries who disenrolled from their Medicare Advantage plan in 2018 but were not deceased were not counted as patient expirations.
Various binary classification ML models were considered. Performance was compared using 5-fold cross-validation. A light gradient-boosted tree model (LightGBM) performed best and was selected based on cross-validation log loss (or cross-entropy loss). The protocol analyzed data from a total sample of 318,774 Medicare Advantage plan members. Features were generated using a training cohort (255,020/318,774, 80% of the sample) with a randomized outcomes time period. Models were further applied to a holdout data set (63,754/318,774, 20% of the sample) to validate and assess generalization to new cases. Data were computed using an instance of DataRobot v6.1.2 (Python 3, custom lightgbm model) running on an on-premise Red Hat Enterprise Linux 7.9 (Maipo) server and with variable resources dedicated via Docker containers (4-8 CPUs each with 32-64 GB RAM).
The model’s predicted outcome was defined as any member who expired between January 1, 2018, and December 31, 2018 (1 year). Patients were determined to be deceased based on corresponding plan enrollment data and validation through reporting to the Centers for Medicare and Medicaid Services [
A SQL script aggregated data to generate predictive features. To determine the date range for model input generation, a randomized cutoff date was assigned to negative and positive cases. We randomized the actual feature generation dates used per customer, so the distribution of start dates was the same for deceased and alive customers. The random date ensured the ML process did not suffer from seasonality and selection bias. Features were built from the 1-year look-back period (ending December 31, 2017) and included 907 unique inputs based on routinely collected data. Data used in model development were information sourced from claims, EHRs, and administrative member records.
Data from claims were primarily used to generate features representing patient service utilization. Diagnosis information was also extracted from claims. Types of claims data included medical service claims, pharmacy claims, and laboratory encounters. Laboratory encounters were based on medical claims for lab-related Current Procedural Terminology (CPT) codes. The actual clinical outcomes (results) of laboratory tests are not part of claims data and were thus not incorporated into the model.
Medical data were extracted from EHRs to supplement claims in generating 5 features of high-risk service utilization used in the first iteration of the model (ie, occurrence counts of electrocardiograms, kidney disease, sepsis, ventilator usage, and surgeries). Data from EHRs are aggregated through a third-party vendor partner and are used by the health plan for internal care management and care coordination activities. Not all patients had EHR data on record.
Demographic data, as well as information used to calculate measures of SDOH, were extracted from internal administrative member records. Demographic features were patient age (continuous, in years) and gender (male/female). Social determinants index (SDI) scores are a suite of measures in the administrative member record that were developed for internal use. SDI scores are composite measures representing 6 domains of the SDOH: economy, education, language, health, infrastructure, and food access. SDI scores are determined by the member’s census tract, which corresponds to the member’s residential address and zip code [
Sample members must have had at least one countable service utilization claim in the randomized feature generation period. No feature observations were removed due to missing data. The data had some categorical fields, such as gender or a categorical indicator of utilization status, but most features were continuous and numeric. Numeric data were not transformed (apart from missing value imputation). Most instances of missing numeric data indicated an individual did not experience a particular type of claim, diagnosis, or event (not due to data quality); such instances were manually coded as 0 to avoid missing values and to represent the patient did not experience the event. Beyond this, DataRobot handles the missing value imputation strategy automatically based on the specified type of imputation algorithm. For the selected model configuration (LightGBM), both continuous/numeric and categorical data had imputed values to represent “missing” data. The final model used ordinal encoding for categorical variables that included a separate category for “missing.” The most common type of missing data was SDI scores, which occurred for 4.9% (15,655/318,774) of the sample population. Age (541/318,774) and gender (647/318,774) data were each missing for 0.2% of the sample.
Data were split 80/20 into training and holdout partitions, respectively. Within the training partition, additional subdivisions were made to tune parameters and apply early stopping. In a LightGBM tree-based algorithm, early stopping refers to stopping the training process if the model performance does not improve after some consecutive iterations. First, the training data were split (training split 1) to keep 90% for train and 10% for test; this set was used for early stopping. Next, the data were split yet again to create training split 2; using only the training portion of training split 1, we assigned 70% for training and 30% for testing. Training split 2 was used to tune model parameters (ie, num_leaves). After these parameters were tuned, we returned to training split 1 to tune the number of estimators (n_estimators) using early stopping (early_stopping). Key parameters included learning_rate (0.05), n_estimators (550), num_leaves (16), max_depth (no limit), min_child_samples (10), and early_stopping_rounds (200). Both the training and holdout partitions had similar mortality rates of 4% in 2018, indicating the mortality outcome was not biased nor skewed in either the training or validation step.
Model performance was assessed using AUC, positive predictive value, negative predictive value, true positive rate, true negative rate, average precision, and lift charts focusing on true positives in the top 10% of predictions for the holdout cohort. Based on the data, DataRobot software selected a threshold of 0.16 for comparing the performance metric matrices of the different model iterations. We performed 1-tailed and 2-tailed
Of the 318,774 patients included in the total sample, 96.1% (306,227/318,774) were determined to be alive, and 3.9% (12,547/318,774) were determined to be deceased during the 2018 outcomes period (see
Sample member characteristics.
Characteristic | Total sample (n=318,774) | Alive (n=306,227, 96.1%) | Deceased (n=12,547, 3.9%) | ||||
|
|||||||
|
Female | 181,158 (56.8) | 174,640 (57.0) | 6518 (51.9) | |||
|
Male | 136,970 (43.0) | 130,941 (42.8) | 6029 (48.1) | |||
|
Missing/not available | 646 (0.2) | 646 (0.2) | 0 (0) | |||
Age (years) |
70.7 (11.5) | 70.4 (11.5) | 77.2 (9.7) | ||||
|
|||||||
|
Chronic respiratory disease | 56,734 (10.4) | 52,183 (10.2) | 4551 (14.0) | |||
|
Heart failure | 54,702 (10.1) | 50,254 (9.8) | 4448 (13.7) | |||
|
Cancer | 44,145 (8.1) | 40,985 (8.0) | 3160 (9.7) | |||
|
Stroke | 21,338 (3.9) | 19,327 (3.8) | 2011 (6.2) | |||
|
Dementia or Alzheimer disease | 15,626 (2.9) | 13,018 (2.5) | 2608 (8.0) | |||
|
Hypertension | 204,405 (37.6) | 195,035 (38.2) | 9370 (28.8) | |||
|
Diabetes | 146,394 (26.9) | 139,999 (27.4) | 6395 (19.7) | |||
|
|||||||
|
Total care visits per yeara | 20.8 (39.5) | 20.2 (38.2) | 36.7 (60.9) | |||
|
Emergency room visits per year | 0.4 (1.1) | 0.4 (1.1) | 0.9 (1.7) | |||
|
|||||||
|
Total unique medications prescribed | 9.04 (7.4) | 8.9 (7.3) | 11.7 (8.3) | |||
|
Number of prescribed medications per day | 8.11 (12.0) | 8.0 (12.1) | 9.8 (9.9) | |||
|
|||||||
|
Total unique lab-related CPTb codes | 8.7 (8.4) | 8.6 (8.2) | 11.7 (11.0) | |||
|
|||||||
|
Weighted SDI scored | 58.41 (8.65) | 58.43 (8.67) | 58.09 (8.08) | |||
|
Unweighted SDI scored | 56.94 (10.12) | 56.98 (10.13) | 55.91 (9.63) |
aIncludes all inpatient and outpatient visits.
bCPT: Current Procedural Terminology.
cHigher is better.
d100 points maximum.
Model summary and performance comparison (holdout cohort).
Measure | Model 1 (M1; baseline) | Model 2 (M2) | Model 3 (M3; final) | ||||
Total model features, n | 7 | 899 | 907 | ||||
Model input summary | Demographicsa, High-risk utilization indicatorsb,c | Demographicsa; High-risk utilization indicatorsb,c; Medical, lab, and pharmacy utilizationc | Demographicsa; High-risk utilization indicatorsb,c; Medical, lab, and pharmacy utilizationc; SDId scoresa | ||||
|
|||||||
|
AUCe (95% CI) | 0.736 (0.728-0.744) | 0.834 (0.828-0.840) | 0.839 (0.833-0.845) | |||
|
True positive ratef | 0.105 | 0.320 | 0.2993 | |||
|
PPVf,g | 0.212 | 0.264 | 0.2991 | |||
|
False positive ratef | 0.016 | 0.037 | 0.029 | |||
|
True negative ratef | 0.984 | 0.963 | 0.97126 | |||
|
NPVf,h | 0.964 | 0.972 | 0.97129 | |||
|
False negative ratef | 0.890 | 0.679 | 0.701 | |||
|
APi | 0.122 | 0.233 | 0.243 | |||
|
|||||||
|
Null hypothesis | AUCM1 = 0.5 | AUCM2 – AUCM1 = 0.0 | AUCM3 – AUCM2 = 0.0 | |||
|
56.4 | 19.1 | 1.2 | ||||
|
<.001 | <.001 | .19 |
aSource: internal administrative member records.
bSource: electronic health record (EHR) data.
cSource: claims data.
dSDI: social determinants index.
eAUC: area under the curve.
fValues based on a defined threshold of 0.16.
gPPV: positive predictive value.
hNPV: negative predictive value.
iAP: average precision.
Comparison of Model 1 (M1), Model 2 (M2), and Model 3 (M3) using (A) receiver operating characteristic curves and (B) precision recall curves. AP: average precision; AUC: area under the receiver operating characteristic curve.
Model mortality outcomes for patients in the top decile of the highest predicted risk. M1: Model 1; M2: Model 2; M3: Model 3.
Ranked importance of top features in the final model (M3; 907 total inputs).
Feature category and M3 features | M3 ranked importancea | |
|
||
|
Ageb | 1 |
Genderb | 2 | |
|
||
|
Chronic respiratory disease | 3 |
Kidney diseaseb | 8 | |
Total patient diagnoses | 17 | |
Dementia | 18 | |
|
||
|
Total patient claims | 4 |
Average cost of claim | 11 | |
Total CTc scans | 13 | |
Time since last outpatient visit | 15 | |
|
||
|
Antihyperlipidemics | 5 |
Furosemide | 7 | |
Anti-inflammatory analgesics | 9 | |
Beta blockers | 14 | |
Antidepressants | 16 | |
Diuretics | 19 | |
|
||
|
Systemic and topical nasal agents | 20 |
|
Lipid panel lab test | 6 |
|
||
|
Food access | 10 |
|
Economy | 12 |
aRanked importance based on positive Shapley Additive Explanations value of features.
bM1 feature.
cCT: computed tomography.
Absolute feature importance in Model 3 (M3). CT: computed tomography; DEM: demographics; DNX: diagnoses; LAB: laboratory utilization; MED: medical utilization; PHA: pharmacy utilization; SDI: social determinants index.
In the past, provider groups and physicians have relied on manual checking of patient records to prescribe palliative care for patients. Today, palliative care teams are increasingly using enhanced decision tools, such as ML approaches, for expedient care delivery. Our palliative care ML model aims to provide a more objective, automated way to identify patients in Medicare Advantage who could most benefit from palliative services, ensuring appropriate clinical resource allocation to the patients with the highest need. The health plan’s goal is to optimize the patient’s quality of life outcomes and incorporate all aspects of palliative care, including care coordination, polypharmacy, symptom management, advanced care plans, as well as spiritual and psychosocial assessments. In this sense, identifying patients who can benefit from a palliative care intervention takes a whole-person health approach to chronic health management and end of life care; the focus is not solely on a transition to hospice. In practice, the model could be deployed within case management, home health, or direct-to-provider programs.
Earlier ML studies of community-dwelling older Medicare beneficiaries have attempted to refine the predictive capabilities of various ML model configurations. However, few have reported outcomes of their specific model feature inputs [
The performance of our baseline gradient-boosted machine model predicting 1-year mortality in Medicare Advantage plan members (aged ≥65 years) improved with the incorporation of patient service utilization, diagnoses, and SDOH features. Having access to and adding the full medical, laboratory, and pharmacy claims data and SDI measures enhanced our ML approach. The performance of our model is comparable to that of previous ML studies of older community-dwelling Medicare beneficiaries using claims data (see
Age and gender were the most influential features in our final model. Although these demographic features had substantial impact on the mortality risk outcome, it is unsurprising that age is the most important model feature, as the probability of death increases with age in older individuals. There is also evidence that, for various reasons, men may be likelier to die earlier than women [
Our model was developed using only data from a nationwide population sample of community-dwelling Medicare Advantage plan members aged 65 years or older, which could constrain the generalizability of study results to other kinds of patient groups and health settings. Although our model was trained based just on the Medicare Advantage population, bidirectional data sharing between US commercial and other government products would allow for other types of health care consumers to benefit from ML tools for early identification of patients for palliative care. Additionally, our ML model was built to be generic and disease-agnostic. The mortality outcome for the year 2018 encompassed all causes of death, and the feature generation period was also randomized with the span of 1 year. Although the model’s applicability to different patient populations and care settings is still unknown, the generic model can be applied to the plan’s Medicare Advantage members across different years.
ML offers greater precision and sensitivity in predicting patient end of life and potential need for palliative services among community-dwelling older Medicare beneficiaries. In response to a lack of feature reporting in relevant previous research, this study explored the development of a binary classification ML algorithm predicting 1-year mortality among a sample of Medicare Advantage plan members and investigated the mortality model’s features of top importance. We found the most important features included demographics, diagnoses, pharmacy utilization, mean costs, and certain SDOH. The final ML model predicts mortality among Medicare Advantage plan members with a high degree of accuracy and precision using a variety of routinely collected data and can support earlier patient identification for palliative care.
Health plan process for identifying palliative care patients using machine learning.
Social determinants index (SDI) select measures summary.
Model summary and performance comparison (Training Cohort).
Machine learning (ML) models predicting patient mortality for earlier identification for palliative care.
area under the receiver operating characteristic curve
Current Procedural Terminology
electronic health record
light gradient-boosted tree model
Model 1
Model 2
Model 3
machine learning
social determinants index
social determinants of health
Shapley Additive Explanations
The authors would like to acknowledge and thank Joshua Barrett and Dr Mayank Shah for their important contributions to the development of this manuscript.
AB, CD, AEM, RM, and AT are employees of the organization that requested and funded the study (Cigna/Evernorth). BM is a contracted employee of the same organization. The authors have no further interests to declare.