Objective To develop and validate a prediction model of mortality in patients with COVID-19 attending hospital emergency rooms.
Design Multivariable prognostic prediction model.
Setting 127 Spanish hospitals.
Participants Derivation (DC) and external validation (VC) cohorts were obtained from multicentre and singlecentre databases, including 4035 and 2126 patients with confirmed COVID-19, respectively.
Interventions Prognostic variables were identified using multivariable logistic regression.
Main outcome measures 30-day mortality.
Results Patients’ characteristics in the DC and VC were median age 70 and 61 years, male sex 61.0% and 47.9%, median time from onset of symptoms to admission 5 and 8 days, and 30-day mortality 26.6% and 15.5%, respectively. Age, low age-adjusted saturation of oxygen, neutrophil-to-lymphocyte ratio, estimated glomerular filtration rate by the Chronic Kidney Disease Epidemiology Collaboration (CKDEPI) equation, dyspnoea and sex were the strongest predictors of mortality. Calibration and discrimination were satisfactory with an area under the receiver operating characteristic curve with a 95% CI for prediction of 30-day mortality of 0.822 (0.806–0.837) in the DC and 0.845 (0.819–0.870) in the VC. A simplified score system ranging from 0 to 30 to predict 30-day mortality was also developed. The risk was considered to be low with 0–2 points (0%–2.1%), moderate with 3–5 (4.7%–6.3%), high with 6–8 (10.6%–19.5%) and very high with 9–30 (27.7%–100%).
Conclusions A simple prediction score, based on readily available clinical and laboratory data, provides a useful tool to predict 30-day mortality probability with a high degree of accuracy among hospitalised patients with COVID-19.
The clinical spectrum of the novel SARS-CoV-2 associated COVID-19 varies broadly, from asymptomatic disease to pneumonia and life-threatening complications, including acute respiratory distress syndrome, multisystem organ failure and death.
The main poor prognostic factor identified in different series of COVID-19 is advanced age.
Other factors that have been associated with poor outcomes include male gender, several comorbidities, lymphocyte counts, high concentrations of different inflammatory or coagulation markers, serum levels of different cytokines and features derived from imaging studies.
Prediction prognostic models are developed to aid healthcare providers in estimating the probability or risk that a specific event will occur, to inform their decision-making.11 Prediction models can be based on regression or machine learning.12 In a recent systematic review and critical appraisal of prediction models for diagnosis and prognosis of COVID-19, 50 prognostic models were identified; 23 estimated mortality risk, 8 aimed to predict severe disease or critical illness and the remaining 19 assessed other outcomes.13 The majority of the models included in the review used clinical and laboratory data from Chinese patients. All models were considered to have a high risk of bias due to a combination of poor reporting and poor methodological conduct for participant selection, predictor description and statistical methods, and none were recommended for clinical use.13 14 Eight additional studies of prognostic prediction models for COVID-19, including predominantly participants from China, have been published. Outcomes included mortality in five studies and severe disease or critical illness in three. The model performance was good across all studies, although the same methodological limitations found in the meta analysis also applied.
The development of a high-quality clinical predictive model of death to stratify patients into risk groups is essential for improving the management of patients with severe COVID-19 and evaluating therapeutic interventions’ efficacy. Our study’s objective was to develop and validate a prediction score to estimate the probability of 30-day mortality in patients with severe COVID-19.
The predictive model’s development followed the recommendations stated in the Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) Initiative11 23 (see online supplemental appendix table 1).
Source of data
The data source was the databases of two large retrospective cohorts of hospitalised patients with COVID-19 in Spain in 2020. The derivation cohort (DC) was the COVID-19@Spain, a multicentre cohort of patients hospitalised from 2 February to 17 March, with 17 April as the follow-up censoring date, sponsored by the Spanish Society of Infectious Diseases and Clinical Microbiology (SEIMC), and registered in ClinicalTrials.gov (NCT04355871).24 The external validation was COVID-19@ HULP, a large single-centre cohort of patients admitted to La Paz University Hospital in Madrid (Spain) from 25 February (the first case admitted) to 19 April; and registered in the European Union Electronic Register of Post-Authorisation Studies (EUPAS34331).
The DC included the first consecutive 4035 patients with COVID-19 admitted to 127 hospitals distributed across all regions in Spain. The external validation cohort (VC) included 2126 of the 2226 patients from COVID-19@HULP after the exclusion of the 100 patients contributing to COVID-19@ Spain. The eligibility criteria in the DC and external VC were hospital admission due to COVID-19 confirmed with real-time PCR for SARS-CoV-2. No age limit was required in the DC, whereas an age of 18 years or older was an eligibility criterion in the external VC. The DC and VC were identical in terms of setting and definitions for outcomes and predictors. Besides, data in both cohorts were collected using the same modified version of the case report form (CRF) of the WHO–International Severe Acute Respiratory and Emerging Infections Consortium (ISARIC) Core CRF.
The outcome was 30-day all-cause mortality, measured from the day of hospital admission. Patients that were discharged alive before 30 days after admission were assumed to have survived for at least 30 days.
Predictors were preselected among the 17 baseline variables, recorded at hospital admission, independently associated with death in the COVID-19@Spain cohort by multivariable Cox regression analyses.24 These variables were distributed in the following five clusters: (1) demographics, age in years and sex at birth; (2) comorbidities defined as diagnoses included in the medical record such as hypertension, obesity (body mass index >30), liver cirrhosis, chronic neurological disorder, active neoplasia (solid or haematologic) and dementia; (3) signs or symptoms, including dyspnoea and confusion; (4) low ageadjusted capillary oxygen saturation (SaO2 ) on room air, defined as ≤90% for patients aged >50 years and ≤93% for patients aged ≤50 years27; (5) tests results, including white cell count, neutrophil-to-lymphocyte ratio, platelet count, international normalised ratio (INR), estimated glomerular filtration rate (eGFR) measured by the Chronic Kidney Disease Epidemiology Collaboration (CKD-EPI) equation28 and serum concentrations of C reactive protein.
Statistical analysis methods
We followed recent recommendations to calculate the minimum sample size required for prediction model development.29 We carried out a complete-case analysis (primary analysis) and two sensitivity analyses. In the first sensitivity analysis, we included all patients and missing values for predictors were considered as a separate category (missing indicator method). In the second sensitivity analysis, we also included all patients and missing values for predictors were left blank (equivalent to the lowest risk situation). No missing values for outcomes occurred in the DC or the external VC.
Continuous variables were categorised for the analysis. As mortality from COVID-19 among hospitalised patients is highly correlated with age, this variable was divided into 11 levels: <40 years that was the reference category and after that into 11 5-year to 10-year intervals up to ≥90 years that was the last category. The neutrophil-to-lymphocyte ratio was categorised into tertiles: <3.22, which was the reference category, 3.22 to 6.33, and >6.33. The eGFR in mL/min/1.73 m2 was grouped before the analysis into three categories: >60 (normal to mildly decreased eGFR), 30–59 (moderately to severely decreased eGFR) and <30 (severely decreased eGFR).
We used univariable and multivariable logistic regression in the derivation dataset to estimate the coefficients of each potential predictor of 30-day overall mortality. We fitted the final model by choosing predictors based on the strength of their unadjusted association with death. The model started with the predictor with the highest area under the receiver operating characteristics (AUROC) to predict 30-day mortality. Subsequently, the rest of the variables were introduced one by one, creating all the possible models of two independent variables, and the combination of higher AUROC was chosen. This process was repeated to form models of 3, 4 and more variables, always choosing the combination with the highest AUROC. The process stopped when the inclusion of a new variable in the model meant an increase lower than 0.005 unit in the AUROC.
We assessed the predictive performance of the model by examining measures of calibration and discrimination. We developed a calibration plot with estimates of the calibration slope and intercept. Calibration was also assessed using the Hosmer-Lemeshow test. Discrimination was examined by calculating its AUROC with the 95% CI. We carried out internal validation through a bootstrap with 1000 random samples with replacement to estimate the model optimism and shrinkage factor.
The logistic regression model’s coefficients were converted to a simplified score to facilitate its application in clinical practice. The score was developed, dividing each coefficient by the coefficient with the lowest value and rounding to an integer. Risk groups were created using the 30-day probability of death according to the simplified score. The sensitivity, specificity, positive and negative predictive values, and likelihood ratios were calculated for different scores.
The statistical analyses were performed using Stata software (V.15.0; Stata Corporation, College Station, Texas, USA).
The developing cohort included 4035 patients, of which 1074 (26.6%) died and 2961 were alive within 30 days of hospital admission. The cohort size was more than twice the required for developing a clinical prognostic model (online supplemental appendix figure 1). The external VC included 2202 patients, 341 (15.5%) died and 1861 were alive within 30 days of hospital admission. The median time to death since hospital admission was 10 (IQR 6–16) days in the -DC and 5 (IQR 3–10) days in the VC.
The characteristics of the participants, including demographics, presenting signs and symptoms, presence of lung infiltrates on chest radiograph, oxygenation and laboratory parameters, are shown in table 1. Patients in the DC were, on average, 9 years older, and more frequently, males than patients in the external VC. Statistically significant differences between the cohorts were found in all the analysed variables.
In the DC, targeted viral agents were administered to 82.0% of patients, including lopinavir/ritonavir (LPV/r) (70.4%), hydroxychloroquine (65.5%) and subcutaneous interferon-beta (29.2%), usually in combination with LPV/r. In the external VC, targeted viral agents were administered to 65.3% of patients. The most frequent combination was hydroxychloroquine plus azithromycin (31.7%), followed by hydroxychloroquine alone. Host-targeted agents in the DC included systemic corticosteroids in 28.0% patients and tocilizumab in 9.4% patients. In the VC, corticosteroids and tocilizumab were administered to 13.3% and 2.3% patients, respectively.
Model development and performance
The number of participants in the DC without missing values for each predictor, the number of outcomes per predictor and the unadjusted associations between predictors and outcomes are shown in table 2.
The final prediction model generated without recoding missing values (3358 participants) is shown in table 3. The variables used in the model to generate the score were those in table 2. The model started with the variable age since it was the one with the highest predictive capacity for death at 30 days (AUROC (95% CI) 0.768 (0.753 to 0.784)). The final input sequence of the variables to the model, following the procedure described in the Methods section, was age, low age-adjusted SaO2 , neutrophilto-lymphocyte ratio, eGFR by the CKD-EPI equation, dyspnoea and sex.
The predicted probability of 30-day mortality was determined by the following equation: P death at day 30 = 1 / (1+exp (-b)), where b=0 (if age =90)+0.875 (if low age-adjusted SaO2 )+0.173 (if neutrophil-to-lymphocyte ratio 3.22–6.33)+0.657 (if neutrophil-to-lymphocyte ratio >6.33)+0.498 (if eGFR 30–59)+1.093 (eGFR <30) +0.414 (if dyspnoea)+0.466 (if male sex)−4.266.
The final model showed good calibration across the range of risk (figure 1), and the goodness-of-fit Hosmer-Lemeshow test was 11.21, p=0.1902 vs p<0.05, confirming the calibration of the model. Using bootstrapping techniques, an optimism of 0.006 and a shrinkage factor of 0.968 were estimated. In 600 of the samples (60%), the Hosmer-Lemeshow test was significant. The AUROC (95% CI) of the model for prediction of 30-day mortality was 0.822 (0.806 to 0.837) in the DC and 0.845 (0.819 to 0.870) in the external VC (online supplemental appendix table 2).
Simplified score development and performance
The simplified point score (from 0 to 30) resulting from the division of the regression coefficients of predictors in the final model by the coefficient of age 40–49, which was the lowest value among all coefficients, is shown in figure 2A. The prediction of 30-day mortality on presentation in hospitalised patients with COVID-19 according to the point score in the DC and in the external VC is shown in table 4.
The AUROC (95% CI) of the simplified score for prediction of 30-day mortality was 0.806 (0.790 to 0.821) in the DC and 0.831 (0.806–0.856) in the external VC (online supplemental appendix table 2). The sensitivity, specificity, positive and negative predictive values, and likelihood ratios for the different scores in the DC and external VC are shown in table 5 and online supplemental appendix table 3, respectively.
We considered the risk of 30-day mortality as low with 0–2 points (0%–2.1%), moderate with 3–5 (4.7%–6.3%), high with 6–8 (10.6%–19.5%) and very high with 9–30 (27.7%–100.0%) (figure 2B). Kaplan-Meier survival plots for the different 30-day mortality risk categories according to the simplified score in the DC and VC are shown in online supplemental appendix figure 2.
Sensitivity analysis 1
When we generated the final prediction model recoding missing values for predictors as a separate category, the AUROC (95% CI) was 0.822 (0.809 to 0.836) in the DC and 0.850 (0.831 to 0.867) in the external VC. Likewise, when we applied the same approach to the simplified point score, the AUROC (95% CI) was 0.805 (0.791 to 0.820) in the DC and 0.848 (0.830 to 0.866) in the external VC (online supplemental appendix table 2)
Sensitivity analysis 2
When we applied the final prediction model to all patients, and missing values for predictors were left blank (equivalent to the lowest risk situation), the AUROC (95% CI) was 0.818 (0.805 to 0.832) in the DC and 0.859 (0.842 to 0.876) in the external VC. Likewise, when we applied the same approach to the simplified point score, the AUROC (95% CI) was 0.806 (0.791 to 0.820) in the DC and 0.849 (0.831 to 0.866) in the external VC (online supplemental appendix table 2).
The COVID-19 SEIMC score for predicting 30-day mortality of patients attending hospital emergency rooms was developed and externally validated with two large datasets from patients hospitalised with laboratory-confirmed COVID-19 in Spain. The predictors were age, low age-adjusted SaO2 , neutrophil-tolymphocyte ratio, eGFR by the CKD-EPI equation, dyspnoea and sex. The model showed good performance in both the DC and the external VC and permitted an easy stratification of patients into four risk categories.
Our prediction model uses widely accessible clinical and laboratory data, and its simplicity would allow clinicians to perform rapid risk stratification of patients with COVID-19. Of note, our model does not take into account comorbidities, which have been associated with worse COVID-19 prognosis in descriptive studies and included in most prognostic prediction models reported to date.13 15–22 In our study, underlying diseases such as hypertension, obesity, liver cirrhosis, chronic neurological disorder, active neoplasia and dementia were independently associated with an increased risk of 30-day mortality. However, none of these conditions improved the model’s discrimination capacity and, following the principle of parsimony, were discarded.
Once again, our study highlights the extraordinary impact of age on COVID-19 mortality, which is, to the best of our knowledge, unparalleled in infectious diseases. For example, our score would classify a 65-year-old male patient attending the emergency room— regardless of the results of the other variables—as a high-risk category with a 30-day mortality probability that could reach up to 19.5%. For younger patients, our score also shows the importance of basic laboratory parameters. A 55-year-old man without dyspnoea, normal SaO2 and normal renal function but with a neutrophil-to-lymphocyte ratio higher than 6.33 would also be classified as high risk.
At the time of writing, an eight variable mortality score developed and validated in a UK prospective cohort of 57824 patients admitted to hospital with COVID-19, the 4C Mortality Score, has been published.30 Some of the variables included in this score, such as respiratory rate, Glasgow Coma Scale score and urea, are not available in the COVID-19@Database precluding the cross-validation the 4C Mortality Score in our population.
Our study is limited, as is the case with other reported studies, by the retrospective capture of data. Another potential limitation is that it was based exclusively on predictors from patients attending hospital emergency rooms. However, we believe that our score could be applied in primary care settings if capillary SaO2 and routine laboratory tests such as blood counts and serum creatinine could be determined. Finally, our score was derived from hospitalised patients in a single country, raising the question about their transportability to other countries, a common limitation to all currently described prognostic models of COVID-19. We believe that it would be of interest to carry out cross-validation between the SEIMC COVID-19 score and other scores in a large multinational dataset.
Our study has several strengths. In contrast with the majority of prior published prognostic models, ours adhere to the TRIPOD statement’s recommendations. Besides, the large sample size and the high number of events in the DC minimise the risk of model overfitting, a general limitation of previous studies. Our model’s strengths also include the calibration, the internal validation by bootstrapping rather than by random split of the DC and the validation in a large external cohort. Finally, the sensitivity analyses exploring different approaches for missing values for predictors did not modify the model’s performance, suggesting that missing values in both cohorts occurred at random.
The SEIMC COVID-19 score could be a useful triage tool enabling quick decision-making for patients with COVID-19. For example, patients in the low-risk category are likely suitable for outpatient care, whereas hospital admission or intensive or high dependency care should be considered for patients in high and very high-risk categories. Besides, management in emergency department observation units or makeshift medicalised facilities could be considered for patients in the moderate risk category. Another potential application of the SEIMC COVID-19 score is the risk stratification of patients with COVID-19 in observational studies or clinical trials.
Our study showed that the COVID-19 SEIMC score, a simple prediction tool using readily available clinical and laboratory data results, could identify the probability of 30-day mortality with a high degree of accuracy among patients with COVID-19.