Machine learning model for prediction of coronavirus disease 2019 within 6 months after three doses of BNT162b2 in Hong Kong

Hong Kong Med J 2025;31:Epub 23 Jun 2025

ORIGINAL ARTICLE

Jing Tong Tan, BSc¹; Ruiqi Zhang, PhD¹; KH Chan, PhD²; Jian Qin, PhD³; Ivan FN Hung, MD¹; KS Cheung, MD, MPH^1,4

¹ Department of Medicine, School of Clinical Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong SAR, China

² Department of Microbiology, School of Clinical Medicine, The University of Hong Kong, Queen Mary Hospital, Hong Kong SAR, China

³ Department of Medicine, Yulin Traditional Chinese Medicine Hospital, Guangxi, China

⁴ Department of Medicine, The University of Hong Kong–Shenzhen Hospital, Shenzhen, China

Corresponding authors: Prof Ivan FN Hung (ivanhung@hku.hk); Prof KS Cheung (cks634@hku.hk)

Full paper in PDF

Abstract

Introduction: We aimed to develop a machine learning (ML) model to predict the risk of coronavirus disease 2019 (COVID-19) among three-dose BNT162b2 vaccine recipients in Hong Kong.

Methods: A total of 304 individuals who had received three doses of BNT162b2 were recruited from three vaccination centres in Hong Kong between May and August 2021. The dataset was randomly divided into training (n=184) and testing (n=120) sets in a 6:4 ratio. Demographics, co-morbidities and medications, blood tests (complete blood count, liver and renal function tests, glycated haemoglobin level, lipid profile, and presence of hepatitis B surface antigen), and controlled attenuation parameter (CAP) were used to develop six ML models (logistic regression, linear discriminant analysis, random forest, naïve Bayes, neural network [NN], and extreme gradient boosting models) to predict COVID-19 risk. Model performance was assessed using area under the receiver operating characteristic curve (AUC), sensitivity, specificity, and positive predictive value (PPV) and negative predictive value (NPV).

Results: Among the study population (median age: 50.9 years [interquartile range=43.6-57.8]; men: 30.9% [n=94]), 27 participants (8.9%) developed COVID-19 within 6 months. Fifteen clinical variables were used to train the models. The NN model achieved the best performance, with an AUC of 0.74 (95% confidence interval [95% CI]=0.60-0.88). Using the optimal cut-off value based on the maximised Youden index, sensitivity, specificity, PPV, and NPV were 90% (95% CI=55%-100%), 58% (95% CI=48%-68%), 16% (95% CI=8%-29%), and 98% (95% CI=92%-100%), respectively. The top predictors in the NN model include age, prediabetes/diabetes, CAP, alanine aminotransferase level, and aspartate aminotransferase level.

Conclusion: An NN model integrating 15 clinical variables effectively identified individuals at low risk of COVID-19 following three doses of BNT162b2.

New knowledge added by this study

A neural network model is a useful tool that effectively predicts coronavirus disease 2019 (COVID-19) risk in individuals who have received three doses of the BNT162b2 vaccine.
Metabolic risk factors, including prediabetes/diabetes, non-alcoholic fatty liver disease, and steatohepatitis, play key roles in vaccine immunogenicity.

Implications for clinical practice or policy

Clinicians can use the model to identify high-risk patients for booster doses and preventive strategies.
Our findings can guide targeted educational campaigns and resource allocation by identifying demographic and clinical factors associated with higher COVID-19 risk despite vaccination.
The identification of key variables such as age, prediabetes/diabetes, and liver enzyme levels can prompt further studies to understand the underlying mechanisms and to develop more effective interventions.

[Abstract in Chinese]

Introduction

The severe acute respiratory syndrome coronavirus 2 pandemic has been a global health crisis, resulting in substantial morbidity and mortality worldwide, with over 13 billion vaccine doses administered.1 To mitigate the risk of breakthrough infections by dominant Omicron variants, a third-dose booster following two doses of BNT162b2 vaccine (BioNTech-Pfizer, Mainz, Germany) has been rolled out. Compared with a two-dose schedule, a third dose significantly reduces the risk of infection, hospitalisation, and severe disease.2 3 However, waning anti-Omicron neutralising antibody and T cell responses have been reported even after the booster dose,4 and sustained long-term immunogenicity remains uncertain.

Advanced machine learning (ML) algorithms, such as random forest, artificial neural network (NN), and gradient boosting, have been increasingly utilised to develop prognostic models that can identify individuals at high risk of coronavirus disease 2019 (COVID-19). These models offer potential to improve risk stratification and inform targeted prevention and intervention strategies. Numerous studies have demonstrated the development of such models, which integrate various clinical, demographic, and routine laboratory variables to predict risks of COVID-19, hospitalisation, and mortality.5 6 7 8 9 However, these previous studies did not stratify patients by vaccination status, leading to heterogeneous cohorts of both vaccinated and unvaccinated individuals. This may introduce limitations and biases in model performance, given that vaccination status can substantially affect COVID-19 risk and disease severity.10 11

This study focused on individuals who had received three doses of BNT162b2, aiming to identify the ML algorithm with optimal performance for predicting COVID-19 risk using clinically available data. We also sought to identify key predictors used by the model to stratify individuals who may be more susceptible to COVID-19 despite vaccination.

Methods

Study design and study population

This multi-centre, prospective cohort study recruited individuals aged 18 years or above who had received three doses of BNT162b2 vaccine from three vaccination centres in Hong Kong, namely, Sun Yat Sen Memorial Park Sports Centre, Queen Mary Hospital, and Sai Ying Pun Jockey Club Polyclinic, between May and August 2021. Participants volunteered for the study after being informed through flyers and announcements at the vaccination sites. All participants were screened by a trained research assistant using a checklist form (online Appendix) to confirm no active COVID-19 case or a history of the disease. Exclusion criteria included prior COVID-19 infection identified through serological testing for antibodies to the nucleocapsid protein of severe acute respiratory syndrome coronavirus 2, gastrointestinal surgery, inflammatory bowel disease, immunocompromised status (including post-transplantation, use of immunosuppressants, or receipt of chemotherapy), other medical conditions (malignancy, haematological, rheumatological or autoimmune diseases), and fewer than 14 days between the booster dose and either the study endpoint or the date of COVID-19 diagnosis.

Demographic and clinical information—including age, sex, body mass index (BMI), waist-to-hip ratio, smoking status, alcohol use, co-morbidities (hypertension, diabetes mellitus, and prediabetes), and recent medication use within 6 months of vaccination (proton pump inhibitors, statins, metformin, antibiotics,12 antidepressants, steroids, probiotics or prebiotics)—was collected. Additional data included blood pressure; blood test results (complete blood count, liver and renal function tests,13 glycated haemoglobin [HbA1c] level, lipid profile, and presence of hepatitis B surface antigen); controlled attenuation parameter (CAP) to measure liver fat14; and liver stiffness measured by transient elastography15 using FibroScan (Echosens, Paris, France). We also cross-checked the Hospital Authority’s database (eg, Clinical Management System) to verify participants’ co-morbidity conditions.

The primary outcome was COVID-19. All participants were prospectively followed from the date of their third vaccine dose until either a COVID-19 diagnosis or the end of the study (18 May 2022), whichever occurred first. Monthly follow-ups were conducted via phone calls or messages to inquire about participants’ COVID-19 status, especially during the fifth COVID-19 outbreak in Hong Kong in early 2022,16 when face-to-face meetings were not recommended. Participants were also instructed to notify the study team if they tested positive. COVID-19 diagnosis was based on self-reported symptoms followed by either a rapid antigen test or deep throat saliva reverse transcription polymerase chain reaction test.

Model development

This was a binary classification task using supervised learning algorithms, aiming to predict COVID-19 status after three vaccine doses. Predicted outcomes were labelled as ‘0’ (negative) or ‘1’ (positive). The dataset was randomly divided into training and validation sets in a 6:4 ratio.

Data preprocessing included three steps: missing data imputation, feature engineering, and data transformation. First, variables with more than 20% missing data were dropped because high levels of missingness can hinder the accuracy and reliability of imputation methods.17 18 Remaining missing values were imputed using the MICE (Multivariate Imputation by Chained Equations) package in R software (version 4.2.1, R Foundation for Statistical Computing, Vienna, Austria). Second, new features were extracted from existing variables (ie, transforming numerical variables into categorical groups and combining similar variables). Third, continuous variables were standardised through centring and scaling, whereas categorical variables were processed using one-hot encoding to ensure data compatibility for different ML algorithms.

Feature selection involved correlation analysis between variables and the dependent variable, the Boruta package in R,19 literature review, and expert consultation. A total of 37 variables were selected and ranked based on their overall importance using the aforementioned methods. Male sex, age ≥60 years, hepatitis B virus surface antigen positivity, diabetes/prediabetes, and recent medication use (antibiotics, proton pump inhibitors, probiotics/prebiotics, metformin, statins) were regarded as categorical variables (online supplementary Table 1).

Six frequently used supervised ML models were selected: logistic regression, linear discriminant analysis, random forest, naïve Bayes, NN, and extreme gradient boosting (XGBoost) [online supplementary Table 2]. Due to the imbalance in the dataset, with relatively few COVID-19 cases, multiple models were explored to assess different strategies for handling class imbalance. Hyperparameter tuning was performed using the caret package in R with grid search (3^p grid size, where p represents the number of hyperparameters) and three-fold cross-validation. The dataset was divided into three equal subsets: the model trained on two subsets and validated on the third; the process was repeated five times, with the validation subset rotated each time. Hyperparameters yielding the highest area under the receiver operating characteristic curve (AUC) on the validation set were selected. A loop function was implemented to iteratively train the model while removing a single variable from the end of the ranked list of variables. By evaluating model performance with different variable combinations, we identified the most predictive variables.

A sensitivity analysis was conducted by excluding variables not routinely available in clinical practice (eg, CAP and liver stiffness).

Evaluation and comparison of model performance

To compare the performance of the ML models, we calculated AUCs and used DeLong’s test to assess statistical significance among the AUCs. We estimated the best cut-off point for each model using the Youden index, selecting the threshold that maximised the sum of sensitivity and specificity. Using these cut-off points, we calculated performance metrics including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), positive likelihood ratio (PLR), and negative likelihood ratio (NLR) to identify the best model. We also compared the miss rate (false negative rate) across models. Given the imbalanced nature of the dataset, precision-recall curves and F1 scores were used. Higher F1 scores indicate better balance between precision and recall.

All statistical analyses were conducted using R, with packages such as caret, randomForest, naivebayes, nnet, xgboost, pROC, and SHAPforxgboost for model building, evaluation, and interpretation. The DTComPair package was used to compare performance metrics.20 Continuous variables were summarised as medians and interquartile ranges (IQRs), with comparisons performed using the Wilcoxon rank-sum test. Categorical variables were presented as counts and percentages, and compared using Pearson’s Chi squared test or Fisher’s exact test, applying Bonferroni correction for multiple comparisons. SHapley Additive exPlanations (SHAP) analysis was utilised to interpret complex models by generating SHAP values to determine feature impact.

Results

Patient characteristics

A total of 304 three-dose BNT162b2 recipients were identified between May and August 2021 (Fig 1a). The median age was 50.9 years (IQR=43.6-57.8), and 94 participants (30.9%) were men. Over a median follow-up of 2.6 months (IQR=1.8-3.1; up to 5.1 months), 27 participants (8.9%) tested positive for COVID-19. The dataset was randomly split into training and testing sets, comprising 184 (60.5%) and 120 (39.5%) participants, respectively. Table 1 summarises baseline characteristics, stratified by the outcome of interest (COVID-19 status) and by training and testing sets. Baseline characteristics prior to imputation are shown in online supplementary Table 3.

Figure 1. Machine learning model development. (a) Participant selection process. (b) Model development and validation on the testing set

Table 1. Baseline participant characteristics based on the status of coronavirus disease 2019 and train-test dataset (n=304)

The COVID-19-positive patients had worse medical conditions than those tested negative. Specifically, they were older (with a higher proportion aged ≥60 years: 22.2% vs 16.2%), predominantly male (33.3% vs 30.7%), had greater liver fat content (median CAP: 249.0 dB/m vs 227.0 dB/m), and were more frequently diagnosed with prediabetes/diabetes (55.6% vs 38.3%). Both the training and testing sets had a comparable proportion of COVID-19–positive cases (8.3-9.2%). Most independent variables were similarly distributed between sets (P>0.05), although CAP differed significantly (Table 1).

Performance of different machine learning models

We trained six different ML algorithms on the training set to predict COVID-19. Model performance was evaluated using three-fold cross-validation (Fig 1b). Concerning the testing set, performance metrics for each model are reported in Table 2 and AUCs are summarised in Figure 2. A comparison of AUCs between training and testing sets for the six ML models is presented in online supplementary Figure 1. All models showed a slight decrease in AUC from the training to the testing set, indicating some degree of overfitting. Notably, the NN model did not exhibit a significant AUC reduction, suggesting it was less susceptible to overfitting than other models.

Table 2. Performance metrics of different machine learning models

Of the six ML models evaluated, the NN algorithm performed best (AUC: 0.74, 95% CI=0.60-0.88), followed by XGBoost (AUC: 0.62, 95% CI=0.42-0.82) [Fig 2]. Using the optimal cut-off value estimated by the maximum Youden index, performance metrics are summarised in Table 2. The 2×2 confusion matrix tables, which summarise the numbers of true positives, true negatives, false positives, and false negatives for each model’s predictions, are shown in online supplementary Table 4. Multiple comparisons between the NN and other models in terms of performance metrics are presented in online supplementary Table 5.

Figure 2. Receiver operating characteristic curves of different machine learning models using the testing set

The NN and linear discriminant analysis models achieved the highest sensitivity, with values of 90% (95% CI=55%-100%) and 80% (95% CI=44%-97%), respectively. The random forest model had the best specificity (72%, 95% CI=62%-80%). The NN model also had the highest NPV (98%, (95% CI=92%-100%) and the best likelihood ratios (PLR: 2.15, 95% CI=1.59-2.91; NLR: 0.17, 95% CI=0.03-1.11) [Table 2]. It classified 45.8% of participants as high risk for COVID-19, with a miss rate or false negative rate of 10% (Table 3). Precision-recall curves and F1 scores for all models are shown in online supplementary Figure 2, offering a more precise evaluation of model performance in the context of an imbalanced dataset. With a precision baseline of 0.092, naïve Bayes and random forest models recorded AUC values of around 0.10, reflecting modest discrimination ability under class imbalance. The NN model achieved an F1 score of 0.277, highlighting a better balance between precision and recall.

Table 3. Number and proportion of predicted positive cases of coronavirus disease 2019 and miss rates or false negative rates by different machine learning models (n=120)

Crucial risk factors associated with coronavirus disease 2019 in the neural network model

According to the best-performing model (the NN model), the five most important predictors of COVID-19 risk were CAP, alanine aminotransferase level, age (≥60 years), presence of prediabetes/diabetes, and aspartate aminotransferase (AST) level, with relative importance values of 14.9%, 10.1%, 9.4%, 8.4%, and 7.9%, respectively (Fig 3). These were further confirmed by SHAP analysis, a method specifically compatible with ensemble algorithms (ie, XGBoost) that quantifies the contribution of each input variable to the model’s prediction. When SHAP analysis was applied to the second best-performing model (XGBoost), leading variables remained similar to those in the NN model, except for BMI which ranked highest in importance (with a mean absolute SHAP value of 0.992) in the XGBoost model (online supplementary Fig 3). The SHAP analysis in online supplementary Figure 3b also provided deeper insights into the contribution of each variable to the model’s prediction. Among leading variables in the XGBoost model, higher CAP (red dots), lower BMI (blue dots), and age ≥60 years (red dots) had a positive impact (right side of the plot) on COVID-19 prediction. In terms of high-density lipoprotein (HDL) and AST levels, the SHAP plot showed a wide distribution with mixed colours, suggesting that HDL and AST levels had diverse impacts on COVID-19 prediction.

Figure 3. Relative importance of risk factors in predicting the risk of coronavirus disease 2019 by the neural network model

Sensitivity analysis excluding non-routine clinical variables

Excluding CAP and liver stiffness, XGBoost achieved the best performance (AUC: 0.66, 95% CI=0.50-0.82), followed by naïve Bayes, logistic regression, linear discriminant analysis, random forest, and NN models (AUCs: 0.49- 0.63) [online supplementary Fig 4]. The top five predictors in the XGBoost model were BMI, alanine aminotransferase level, HDL level, HbA1c level, and age ≥60 years (online supplementary Fig 5). In the NN model, the top predictors were AST level, HbA1c level, HDL level, hepatitis B virus antigen positivity, and alanine aminotransferase level (online supplementary Fig 6).

Discussion

In this study involving three-dose BNT162b2 recipients, the NN model achieved satisfactory performance in predicting COVID-19 using baseline clinical data. The leading predictors identified were age ≥60 years, presence of prediabetes/diabetes, CAP, alanine aminotransferase level, and AST level, highlighting the need for vigilance among fully vaccinated individuals, especially those with concomitant co-morbidities.

Advanced age, prediabetes/diabetes, and abnormal liver condition (ie, high fatty liver content and abnormal liver function test results) were significant predictors of high infection risk, consistent with previous studies.21 22 23 24 25 A meta-analysis of 18 studies revealed a higher prevalence of diabetes (11.5%) among hospitalised COVID-19 patients21 compared to the general population (9.3%).26 Studies have found that the presence of preexisting diabetes or hyperglycaemia is associated with higher risks of severe illness, mortality, and complications in COVID-19 patients.22 23 This elevated risk is likely due to impaired immune function, chronic inflammation, and common cardiovascular and metabolic co-morbidities in diabetic patients.27 28 Individuals with liver diseases or abnormal liver function test results also exhibit higher risks of severe COVID-19 and complications.24 25

This study is among the few that have developed ML models to predict COVID-19 in recipients of three doses of BNT162b2. No prior studies have developed COVID-19 prognostic models with clear information on vaccination status, type, and number of doses. A study from Hong Kong11 showed that a timely third vaccine dose strongly protected against Omicron BA.2 variant infections, the dominant strain in Hong Kong during our study period. The effectiveness of vaccination against infection declined over time after two doses but was restored to a high level after a third dose, resulting in significantly lower risks of infection, hospitalisation, and severe illness compared with those who received only two doses.2 3 By including only three-dose vaccinated patients in the development of ML models, the resulting models may be more accurate in predicting COVID-19 risk and severity among vaccinated individuals. This can be particularly important in settings where vaccination rates are high and breakthrough infections are a concern; it may help identify individuals with higher infection risk who could benefit from additional precautions or interventions.

Strengths and limitations

Our study offers practical value by enabling risk stratification, allowing healthcare providers to focus resources on higher-risk populations. It informs public health strategies by identifying factors associated with increased COVID-19 risk despite vaccination, guiding targeted campaigns and resource allocation. Additionally, an understanding of risk predictors in vaccinated individuals supports tailored booster strategies. The identification of key variables such as age, prediabetes/diabetes, and liver enzyme levels also encourages further research into underlying mechanisms and potential interventions.

However, this study had some limitations. First, the small sample size (~300 participants) may affect model performance and generalisability. The dataset size was constrained by specific inclusion criteria, but this represented the maximum size available for model training. We believe that selection of high-quality data maximises training efficacy. Second, we did not include gut microbiota data, which may be associated with COVID-19 vaccine immunogenicity.29 A focus on readily available clinical data facilitates practical and clinically relevant predictive models. Third, our dataset exhibited significant class imbalance, such that only 8.9% of participants developed COVID-19 within 6 months. Whereas receiver operating characteristic curve analysis provides an optimistic assessment, we also used precision-recall curves and F1 scores for a more realistic evaluation. Fourth, although missing values for certain variables might introduce error into the prediction models, the small percentage of missing data and the use of multiple imputation likely had minimal impact on model accuracy. Fifth, COVID-19 cases were self-reported and confirmed by either rapid antigen or polymerase chain reaction tests. In Hong Kong, rapid antigen tests have a false negative rate of approximately 15% (sensitivity: 85%)30 but a high specificity of 99.93%,30 indicating very few false positives. Although some cases may have gone unreported or untested, we believe that the majority adhered to testing requirements as mandated by law. Additionally, we did not grade infection severity, and there were no hospitalised cases in our cohort, limiting our ability to predict hospitalisation outcomes in this study. Sixth, the NN model—our best-performing model—is complex and has low interpretability. We used a variable importance plot to visualise and identify the most influential features, enhancing its practical application. It should be noted that the other models demonstrated suboptimal performance, with AUCs below 0.7. The NN model’s superior performance is likely due to its ability to capture complex patterns and interactions. Simpler models struggled with the dataset’s complexity, class imbalance, non-linear relationships, and outliers. Finally, although this study offers insights into the use of advanced ML models to predict COVID-19 outcomes, its generalisability is limited. Overfitting remains a concern despite mitigation techniques (eg, regularisation, pruning, and ensemble methods). The complexity of our models and the dataset hinder generalisability. Variability in vaccines, booster intervals, doses, demographics, and study design further impacts the generalisability of our model. Future studies should include diverse populations and vaccine types to enhance applicability. External validation of our results in other centres is also warranted.

Conclusion

The NN model is a useful tool for identifying individuals at low risk of COVID-19 within 6 months after receiving three doses of BNT162b2. Key features selected by the model highlight the central role of metabolic risk factors (prediabetes/diabetes, non-alcoholic fatty liver disease, and steatohepatitis) in vaccine immunogenicity.

Author contributions

Concept or design: JT Tan, KS Cheung.
Acquisition of data: JT Tan, R Zhang, KH Chan.
Analysis or interpretation of data: JT Tan, KS Cheung.
Drafting of the manuscript: JT Tan.
Critical revision of the manuscript for important intellectual content: KS Cheung, IFN Hung.

All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.

Conflicts of interest

All authors have disclosed no conflicts of interest.

Funding/support

This research was funded by the Health and Medical Research Fund of the former Food and Health Bureau, Hong Kong SAR Government (Ref No.: COVID1903010, Project 16). The funder had no role in the study design, data collection/analysis/interpretation, or manuscript preparation.

Ethics approval

The research was approved by the Institutional Review Board of The University of Hong Kong/Hospital Authority Hong Kong West Cluster, Hong Kong (Ref No.: UW 21-216). Participants provided written informed consent to participate in this study.

Supplementary material

The supplementary material was provided by the authors and some information may not have been peer reviewed. Accepted supplementary material will be published as submitted by the authors, without any editing or formatting. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by the Hong Kong Academy of Medicine and the Hong Kong Medical Association. The Hong Kong Academy of Medicine and the Hong Kong Medical Association disclaim all liability and responsibility arising from any reliance placed on the content.

References

1. World Health Organization. WHO Coronavirus (COVID-19) Dashboard. 2023. Available from: https://data.who.int/dashboards/covid19/cases. Accessed 4 May 2023.

2. Andrews N, Stowe J, Kirsebom F, et al. Effectiveness of COVID-19 booster vaccines against COVID-19–related symptoms, hospitalization and death in England. Nat Med 2022;28:831-37. Crossref

3. Andrews N, Stowe J, Kirsebom F, et al. COVID-19 vaccine effectiveness against the Omicron (B.1.1.529) variant. N Engl J Med 2022;386:1532-46. Crossref

4. Peng Q, Zhou R, Wang Y, et al. Waning immune responses against SARS-CoV-2 variants of concern among vaccinees in Hong Kong. EBioMedicine 2022;77:103904. Crossref

5. Willette AA, Willette SA, Wang Q, et al. Using machine learning to predict COVID-19 infection and severity risk among 4510 aged adults: a UK Biobank cohort study. Sci Rep 2022;12:7736. Crossref

6. Wynants L, Van Calster B, Collins GS, et al. Prediction models for diagnosis and prognosis of COVID-19: systematic review and critical appraisal. BMJ 2020;369:m1328. Crossref

7. Subudhi S, Verma A, Patel AB, et al. Comparing machine learning algorithms for predicting ICU admission and mortality in COVID-19. NPJ Digit Med 2021;4:87. Crossref

8. Brinati D, Campagner A, Ferrari D, Locatelli M, Banfi G, Cabitza F. Detection of COVID-19 infection from routine blood exams with machine learning: a feasibility study. J Med Syst 2020;44:135. Crossref

9. Yao H, Zhang N, Zhang R, et al. Severity detection for the coronavirus disease 2019 (COVID-19) patients using a machine learning model based on the blood and urine tests. Front Cell Dev Biol 2020;8:683. Crossref

10. Baden LR, El Sahly HM, Essink B, et al. Efficacy and safety of the mRNA-1273 SARS-CoV-2 vaccine. N Engl J Med 2021;384:403-16. Crossref

11. Zhou R, Liu N, Li X, et al. Three-dose vaccination–induced immune responses protect against SARS-CoV-2 Omicron BA.2: a population-based study in Hong Kong. Lancet Reg Health West Pac 2023;32:100660. Crossref

12. Cheung KS, Lam LK, Zhang R, et al. Association between recent usage of antibiotics and immunogenicity within six months after COVID-19 vaccination. Vaccines (Basel) 2022;10:1122. Crossref

13. Cheung KS, Mok CH, Mao X, et al. COVID-19 vaccine immunogenicity among chronic liver disease patients and liver transplant recipients: a meta-analysis. Clin Mol Hepatol 2022;28:890-911. Crossref

14. Cheung KS, Lam LK, Hui RW, et al. Effect of moderate-to-severe hepatic steatosis on neutralising antibody response among BNT162b2 and CoronaVac recipients. Clin Mol Hepatol 2022;28:553-64. Crossref

15. Cheung KS, Lam LK, Mao X, et al. Effect of moderate to severe hepatic steatosis on vaccine immunogenicity against wild-type and mutant virus and COVID-19 infection among BNT162b2 recipients. Vaccines (Basel) 2023;11:497. Crossref

16. Cheung PH, Chan CP, Jin DY. Lessons learned from the fifth wave of COVID-19 in Hong Kong in early 2022. Emerg Microbes Infect 2022;11:1072-8. Crossref

17. Little RJ, Rubin DB. Statistical Analysis with Missing Data, 3rd edition. New York [NY]: John Wiley & Sons; 2019. Crossref

18. Dong Y, Peng CY. Principled missing data methods for researchers. Springerplus 2013;2:222. Crossref

19. Kursa MB, Rudnicki WR. Feature selection with the Boruta package. J Stat Softw 2010;36:1-13. Crossref

20. Kitcharanant N, Chotiyarnwong P, Tanphiriyakun T, et al. Development and internal validation of a machine-learning–developed model for predicting 1-year mortality after fragility hip fracture. BMC Geriatr 2022;22:451. Crossref

21. Singh AK, Gillies CL, Singh R, et al. Prevalence of co-morbidities and their association with mortality in patients with COVID-19: a systematic review and meta-analysis. Diabetes Obes Metab 2020;22:1915-24. Crossref

22. Zhu L, She ZG, Cheng X, et al. Association of blood glucose control and outcomes in patients with COVID-19 and pre-existing type 2 diabetes. Cell Metab 2020;31:1068-77.e3. Crossref

23. Yang JK, Feng Y, Yuan MY, et al. Plasma glucose levels and diabetes are independent predictors for mortality and morbidity in patients with SARS. Diabet Med 2006;23:623-8. Crossref

24. Singh S, Khan A. Clinical characteristics and outcomes of coronavirus disease 2019 among patients with preexisting liver disease in the United States: a multicenter research network study. Gastroenterology 2020;159:768-771.e3. Crossref

25. Simon TG, Hagström H, Sharma R, et al. Risk of severe COVID-19 and mortality in patients with established chronic liver disease: a nationwide matched cohort study. BMC Gastroenterol 2021;21:439. Crossref

26. Saeedi P, Petersohn I, Salpea P, et al. Global and regional diabetes prevalence estimates for 2019 and projections for 2030 and 2045: results from the International Diabetes Federation Diabetes Atlas, 9th edition. Diabetes Res Clin Pract 2019;157:107843. Crossref

27. Pal R, Bhadada SK. COVID-19 and diabetes mellitus: an unholy interaction of two pandemics. Diabetes Metab Syndr 2020;14:513-7. Crossref

28. Azar WS, Njeim R, Fares AH, et al. COVID-19 and diabetes mellitus: how one pandemic worsens the other. Rev Endocr Metab Disord 2020;21:451-63. Crossref

29. Ng HY, Leung WK, Cheung KS. Association between gut microbiota and SARS-CoV-2 infection and vaccine immunogenicity. Microorganisms 2023;11:452. Crossref

30. Zee JS, Chan CT, Leung AC, et al. Rapid antigen test during a COVID-19 outbreak in a private hospital in Hong Kong. Hong Kong Med J 2022;28:300-5. Crossref