Efficacy, toxicities, and prognostic factors of stereotactic body radiotherapy for unresectable liver metastases

Hong Kong Med J 2023 Apr;29(2):105–11 | Epub 30 Mar 2023
© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE  CME
Efficacy, toxicities, and prognostic factors of stereotactic body radiotherapy for unresectable liver metastases
Calvin KK Choi, FHKCR, FHKAM (Radiology); Connie HM Ho, FHKCR, FHKAM (Radiology); Matthew YP Wong, MSc; Ronnie WK Leung, MSc; Frank CS Wong, FHKCR, FHKAM (Radiology); Stewart Y Tung, FHKCR, FHKAM (Radiology); Francis AS Lee, FHKCR, FHKAM (Radiology)
Department of Clinical Oncology, Tuen Mun Hospital, Hong Kong SAR, China
 
Corresponding author: Dr Calvin KK Choi (calvinkkchoi@hkbh.org.hk)
 
 Full paper in PDF
 
Abstract
Introduction: This study aims to determine the outcomes of stereotactic body radiotherapy (SBRT) for liver metastases in patients not eligible for surgery.
 
Methods: This study included 31 consecutive patients with unresectable liver metastases who received SBRT between January 2012 and December 2017; 22 patients had primary colorectal cancer and nine patients had primary non-colorectal cancer. Treatments ranged from 24 Gy to 48 Gy in 3 to 6 fractions over 1 to 2 weeks. Survival, response rates, toxicities, clinical characteristics, and dosimetric parameters were evaluated. Multivariate analysis was performed to identify significant prognostic factors for survival.
 
Results: Among these 31 patients, 65% had received at least one prior regimen of systemic therapy for metastatic disease, whereas 29% had received chemotherapy for disease progression or immediately after SBRT. The median follow-up interval was 18.9 months; actuarial in-field local control rates at 1, 2, and 3 years after SBRT were 94%, 55%, and 42%, respectively. The median survival duration was 32.9 months; 1-year, 2-year, and 3-year actuarial survival rates were 89.6%, 57.1%, and 46.2%, respectively. The median time to progression was 10.9 months. Stereotactic body radiotherapy was well-tolerated, with grade 1 toxicities of fatigue (19%) and nausea (10%). Patients who received post-SBRT chemotherapy had significant longer overall survival (P=0.039 for all patients and P=0.001 for patients with primary colorectal cancer).
 
Conclusion: Stereotactic body radiotherapy can be safely administered to patients with unresectable liver metastases, and it may delay the need for chemotherapy. This treatment should be considered for selected patients with unresectable liver metastases.
 
 
New knowledge added by this study
  • Stereotactic body radiotherapy (SBRT) for unresectable liver metastases was effective and well-tolerated. It may delay the need for chemotherapy while prolonging progression-free survival.
  • The receipt of post-SBRT chemotherapy is a significant prognostic factor for survival.
Implications for clinical practice or policy
  • Stereotactic body radiotherapy can be regarded as an alternative to surgery for patients with liver metastases, particularly patients with unresectable tumours.
  • We recommend offering SBRT to patients with unresectable liver metastases if they have good performance status (ie, Eastern Cooperative Oncology Group 0-1), liver tumours ≤6 cm in diameter, three or fewer liver tumours, normal liver volume >700 cm3, adequate organ function, and adequate liver function (Child-Pugh class A).
 
 
Introduction
The liver is a common site of metastases, which most frequently originate from primary colorectal cancer via portal circulation. Surgical resection is the standard treatment for medically and technically operable liver metastases, particularly from primary colorectal cancer. However, most patients are not eligible for surgery because of co-morbidities or unfavourable tumour factors. Most patients receive systemic therapy as initial treatment for liver metastases, but such treatment rarely leads to permanent elimination of the metastases; some form of local ablative intervention is required. For patients with unresectable limited liver metastases, numerous local therapeutic approaches are available, such as radiofrequency ablation, transcatheter arterial chemoembolisation, cryotherapy, and high-intensity focal ultrasound. However, all of these approaches exhibit a degree of invasiveness and are currently limited by tumour size (usually <3 cm), distance from critical structures, and distance from critical vasculature.1
 
In the past, radiotherapy has had a limited role in the management of liver metastases because of concerns regarding radiation-induced liver disease.2 3 Because the liver is subject to the parallel architecture principles of radiobiology, the risk of radiation-induced liver disease is generally proportional to the mean dose of radiation delivered to normal liver tissue. Therefore, small hepatic lesions can be safely treated with high doses of radiation via stereotactic body radiotherapy (SBRT). Advances in tumour imaging, radiotherapy planning and delivery, and motion management have facilitated the delivery of highly precise and four-dimensional SBRT. This non-invasive method can be used to deliver ablative treatments on an outpatient basis, thereby decreasing morbidity and cost.4
 
Ablative techniques offer a minimally invasive treatment option for selected patients with oligometastatic liver disease.5 There is increasing evidence to support the use of SBRT.6 To our knowledge, there is limited published information regarding the role of SBRT in the treatment of unresectable liver metastases in Hong Kong. In this study, we investigated the efficacy, toxicities, and prognostic factors of SBRT in patients with unresectable liver metastases.
 
Methods
Patient eligibility
Data regarding consecutive patients with unresectable liver metastases who received SBRT between January 2012 and December 2017 were retrospectively retrieved from the treatment database of the Department of Clinical Oncology at Tuen Mun Hospital. All patients with liver metastases were evaluated in multidisciplinary team meetings involving radiation oncologists and hepatobiliary surgeons. Eligibility was determined using the following criteria: (1) histologically confirmed malignancy (hepatic lesion biopsy not required); (2) biphasic computed tomography (CT) scan or positron emission tomography–CT of the liver within 4 weeks of radiation planning demonstrating liver tumours ≤6 cm in diameter, presence of three or fewer liver tumours, and normal liver volume >700 cm3; (3) discussion of the case in a multidisciplinary team meeting that included an opinion regarding the lack of qualification for radiofrequency ablation, along with a determination of non-resectability by a qualified hepatic surgeon; (4) patient refusal of surgical treatment; (5) Eastern Cooperative Oncology Group performance status 0 or 1; (6) adequate organ function (absolute neutrophil count ≥1.5×109/L; platelet count ≥75×109/L; creatinine level ≤1.5×upper limit of normal), liver function test results (aspartate aminotransferase and alanine aminotransferase levels ≤1.5×normal level), and Child-Pugh score of ≤6 (class A); (7) controlled extrahepatic disease and life expectancy >6 months; (8) no chemotherapy concurrent with radiotherapy (previous chemotherapy was not an exclusion criterion); and (9) previous treatment with radiofrequency ablation was not an exclusion criterion if recurrence had been confirmed.
 
Radiotherapy treatment
During four-dimensional CT scans, patients were positioned supine on an evacuated foam bag (Klarity Medical, China) with both arms abducted. The extent of tumour motion during respiration was used to determine whether treatment would be administered with free breathing plus abdominal compression or active breathing control. The gross tumour volume (GTV) was determined using contrast CT and co-registered with positron emission tomography–CT. For patients who required optimal abdominal compression to mitigate organ motion, planning was conducted using the mid-ventilation–based planning target volume (PTV) approach, and the GTV was determined using intravenous contrast CT. The clinical target volume was 0 mm outside of the GTV within the liver (ie, equal to GTV); it included the position of the tumour in all phases of respiration. The PTV was defined by adding an isotropic margin of 3 to 5 mm from the clinical target volume or 7 to 10 mm in the cranial-caudal axis and 4 to 6 mm in the anterior-posterior and lateral axes. Pretreatment four-dimensional cone-beam CT was performed prior to each treatment for all patients to adjust for setup uncertainties. Tumour localisation was conducted using the diaphragm or whole liver as a surrogate for the tumour. A two-step four-dimensional registration approach was used to align the diaphragm/liver surrogate to its time-weighted mean position. The SBRT dose, ranging from 8 to 16 Gy × 3 fractions to 5 to 7.5 Gy × 6 fractions, was individualised according to the following normal tissue constraints: (1) maximum spinal cord dose <15 Gy; (2) ≥700 cm3 of liver should receive <15 Gy, and D5% <30 Gy; (3) maximum stomach point dose of 25 Gy; and (4) maximum duodenum point dose of 25 Gy.
 
Evaluation
Patients were evaluated weekly during SBRT, immediately after completion of treatment, at 6 weeks after treatment, every 3 months for the first 2 years, and every 4 months thereafter. Physical examinations and blood tests were performed at each follow-up visit. Triphasic CT of the liver was conducted at 3 months after SBRT and then every 6 months until disease progression. Tumour response was assessed using modified response evaluation criteria for solid tumours.
 
The primary endpoint of the study was local control; secondary endpoints were overall survival and toxicity. Local control was defined as the absence of progressive disease within the PTV. The appearance of new lesions outside of the PTV was regarded as intrahepatic out-field failure. Overall survival was calculated from the start of SBRT until the end of follow-up or death.
 
Toxicity was graded using the National Cancer Institute Common Terminology Criteria for Adverse Events version 4.0. Toxicities were defined as adverse events that occurred <3 months after SBRT. Newly developed toxicities or toxicities that progressed to one grade above baseline were regarded as adverse events. Grade 5 liver failure related to SBRT was defined as death from liver failure in the presence of acute grade 3 liver toxicities during <6 months without intrahepatic progression.
 
Statistical analysis
Data were analysed using SPSS software (Windows version 23.0; IBM Corp, Armonk [NY], United States). Fisher’s exact test and independent t tests were used for univariate analysis of patient, disease, and treatment factors associated with liver toxicity. Binary logistic regression analysis was used for univariate analysis of dose-volumetric parameters associated with liver toxicity. Kaplan–Meier test was used for univariate analysis of overall survival, with a significance threshold of P<0.25; it was used for multivariate analysis of overall survival, with a significance threshold of P<0.05. Cox regression was used for further evaluation of variables which were significant in univariate analysis of overall survival.7 8
 
Results
Patients and treatment
During the study period, 31 consecutive patients with unresectable liver metastases underwent SBRT at our institution. Their characteristics are shown in Table 1. Colorectal cancer was the most common primary cancer. A total of 64.5% of patients received systemic treatment before SBRT; 71% of liver lesions were ≤ 30 mm. All patients received a fixed course of 3 or 6 fractions with total prescribed dose ranges of 24-48 Gy. The mean GTV was 26.9 cm3 (range, 1.5-137) and mean PTV was 91.8 cm3 (range, 21.7-269). The mean biological equivalent dose (BED10) to GTV was 79.8 Gy (range, 43.2-124.8). The median BED10 to GTV was 76.8 Gy. Surgical resection or radiofrequency ablation were performed in 32% of patients before SBRT. Targeted or non-targeted systemic chemotherapy was administered to 65% and 29% of patients before and after SBRT, respectively.
 

Table 1. Patient characteristics
 
Toxicities
Stereotactic body radiotherapy was well-tolerated. There were no grade 2-4 toxicities. Most patients were asymptomatic (grade 0) during radiotherapy; 19% of patients had grade 1 fatigue, 10% of patients had grade 1 nausea, and 3% of patients had skin reaction. No patients exhibited a change in Child-Pugh class after SBRT, and no significant prognostic factors for liver toxicities were identified.
 
Local control, survival, and prognostic factors
The median follow-up interval was 18.9 months. The 1-year, 2-year, and 3-year local control rates were 94% (29/31), 55% (17/31) and 42% (13/31), respectively. Only two patients (9% of all patients) with primary colorectal cancer had in-field recurrence at 1 year after SBRT. Sixteen patients in all treatment groups had out-field recurrence at 1 year after SBRT. The median time to progression was 10.9 months.
 
The median survival duration in all treatment groups was 32.9 months. The 1-year, 2-year, and 3-year survival rates were 89.6%, 57.1%, and 46.2%, respectively. The only significant prognostic factor for overall survival was receipt of post-SBRT chemotherapy for disease progression (P=0.039). Figures 1 and 2 show the survival curves and prognostic factors for all treatment groups. Previous local treatment, rat sarcoma virus status of colorectal cancer, number of liver metastases, extrahepatic metastases, BED to the liver, extrahepatic metastasis status, number of chemotherapy lines before or after SBRT, and carcinoembryonic antigen level after SBRT were not significant prognostic factors for overall survival. Table 2 summarises the factors that affected overall survival.
 

Figure 1. Overall survival of the whole group after stereotactic body radiotherapy (SBRT)
 

Figure 2. Overall survival of patients who received chemotherapy after stereotactic body radiotherapy (SBRT) for disease progression (PD) versus those who did not
 

Table 2. Prognostic factors affecting overall survival
 
The median survival duration in the colorectal cancer subgroup was 32.9 months. The only significant prognostic factor for overall survival was receipt of post-SBRT chemotherapy for disease progression (P=0.001). No other significant prognostic factors for overall survival were identified. Figures 3 and 4 show the survival curves and prognostic factors for the colorectal cancer subgroup.
 

Figure 3. Overall survival of colorectal cancer patients after stereotactic body radiotherapy (SBRT)
 

Figure 4. Overall survival of colorectal cancer patients who received chemotherapy after stereotactic body radiotherapy (SBRT) for disease progression (PD) versus those who did not
 
Discussion
Although surgical resection is the standard treatment for liver metastases, many patients are not eligible for such treatment. Multiple retrospective and prospective studies have demonstrated SBRT is a promising, safe, and non-invasive alternative to surgery for unresectable liver metastases.9 10 To our knowledge, there is limited published information regarding the use of SBRT to treat liver metastases in Hong Kong. In the present study, we retrospectively collected data regarding consecutive patients who received SBRT for unresectable liver metastases after multidisciplinary team evaluation; we assessed outcomes in terms of safety, local control, and survival. Among the 31 patients treated with SBRT, the 1-year and 2-year local control rates were 93% and 55%, respectively. The median survival duration was 32.9 months; the 1-year and 2-year survival rates were 89.6% and 57.1%, respectively. In the colorectal cancer subgroup, the 1-year and 2-year survival rates were 84.7% and 62.1%, respectively.
 
Multiple retrospective and prospective studies have been performed regarding SBRT for liver metastases from colorectal cancers (Table 3).11 12 13 14 In the present study, local control rates and survival rates were comparable with findings in previous reports. Notably, McPartlin et al11 conducted a prospective study using SBRT doses of 22-62 Gy in 6 fractions. The present study, with SBRT doses of 24-48 Gy in 3-6 fractions, demonstrated better 1-year local control (93% vs 50%) and 2-year survival (62.1% vs 26%) than the study by McPartlin et al.11
 

Table 3. Summary of literature regarding stereotactic body radiotherapy for liver metastases from colorectal cancers
 
Three other SBRT trials12 13 14 (45-75 Gy in 3 fractions) all demonstrated better local control rates than the findings in the present study (Table 3). These results indicate that a higher local control rate is associated with a higher radiation dose. Compared with the present study, Scorsetti et al12 and Joo et al14 showed higher 2-year survival rates (65% and 75%, respectively vs 62.1% in the present study), whereas Hoyer et al13 revealed a considerably lower 2-year survival rate (38%). These discrepant findings may be related to radiation dose—Scorsetti et al12 and Joo et al14 reported higher BED than that achieved by Hoyer et al13 and the present study. Among patients with primary colorectal tumours, the survival rate in the present study was comparable with rates in the previous studies.11 12 13 14 However, overall survival is dependent on many factors other than local control of irradiated liver metastases. Compared with earlier studies, overall survival is expected to be better in more recent studies because of stage migration, improvements in diagnostic techniques, and enhanced systemic treatment. Importantly, although the present study showed that post-SBRT chemotherapy was a prognostic factor for longer survival, selection bias may have been involved in the decision to administer chemotherapy to patients with better performance status.
 
In the present study, the incidence of toxicities was low, and there were no grade 2-4 toxicities. Among patients who received SBRT, only grade 1 toxicities were reported (fatigue, nausea, and skin reaction); these findings indicate that SBRT was well-tolerated.
 
Based on our results, we recommend that patients with unresectable liver metastases are evaluated in multidisciplinary team meetings; patients should be offered SBRT if they have good performance status (ie, Eastern Cooperative Oncology Group 0-1), liver tumours ≤6 cm in diameter, three or fewer liver tumours, normal liver volume >700 cm3, adequate organ function, and adequate liver function (Child-Pugh class A). Considering its minimal invasiveness and toxicity, as well as its potential for improving progression-free survival, SBRT should be regarded as an alternative to surgical resection of liver metastases to those patients who refuse surgical treatment.
 
There were some limitations in the present study. First, the BED to the tumour was low (median BED10 >100 Gy was administered to 35.5% of patients), and the mean GTV was high (26.9 cm3). The local control rate may have been influenced by the lower total radiation dose administered and larger tumour volume. Second, this was a retrospective study, and the sample size was small. Thus, a randomised controlled trial with a large number of patients is needed to determine whether SBRT can prolong overall survival in patients with liver metastases.
 
Conclusion
Stereotactic body radiotherapy can be safely administered to patients with unresectable liver metastases, and it may delay the need for chemotherapy. Considering its minimal invasiveness and toxicity, this treatment should be offered to selected patients with unresectable liver metastases; such an approach may improve progression-free survival. A phase III randomised study is needed to confirm these results.
 
Author contributions
All authors contributed to the concept or design of the study, acquisition of data, analysis or interpretation of data, drafting of the manuscript, and critical revision of the manuscript for important intellectual content. All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
The authors declare no conflict of interest.
 
Acknowledgement
The authors thank Mr Jia-jie Huang from Quality and Safety Division of New Territories West Cluster, Hospital Authority, Hong Kong for his statistical analysis support.
 
Funding/support
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
 
Ethics approval
Ethics approval document was issued by New Territories West Cluster Research Ethics Committee of Hospital Authority, Hong Kong (Ref No.: NTWC/REC/20035). Informed consent was obtained from patients for stereotactic body radiotherapy.
 
References
1. Aitken KL, Hawkins MA. Stereotactic body radiotherapy for liver metastases. Clin Oncol (R Coll Radiol) 2015;27:307-15. Crossref
2. Schefter TE, Kavanagh BD, Timmerman RD, Cardenes HR, Baron A, Gaspar LE. A phase I trial of stereotactic body radiation therapy (SBRT) for liver metastases. Int J Radiat Oncol Biol Phys 2005;62:1371-8. Crossref
3. Rusthoven KE, Kavanagh BD, Cardenes H, et al. Multi-institutional phase I/II trial of stereotactic body radiation therapy for liver metastases. J Clin Oncol 2009;27:1572-8. Crossref
4. Pan CC, Kavanagh BD, Dawson LA, et al. Radiation-associated liver injury. Int J Radiat Oncol Biol Phys 2010;76(3 Suppl):S94-100. Crossref
5. Aloia TA, Vauthey JN, Loyer EM, et al. Solitary colorectal liver metastasis: resection determines outcome. Arch Surg 2006;141:460-6. Crossref
6. Høyer M, Swaminath A, Bydder S, et al., Radiotherapy for liver metastases: a review of evidence. Int J Radiat Oncol Biol Phys 2012;82:1047-57. Crossref
7. Prentice RL, Zhao S. Regression models and multivariate life tables. J Am Stat Assoc 2021;116:1330-45. Crossref
8. Hashemi R, Commenges D. Correction of the p-value after multiple tests in a Cox proportional hazard model. Lifetime Data Anal 2002;8:335-48. Crossref
9. Rusthoven CG, Lauro CF, Kavanagh BD, Schefter TE. Stereotactic body radiation therapy (SBRT) for liver metastases: a clinical review. Semin Colon Rectal Surg 2014;25:48-52. Crossref
10. Kobiela J, Spychalski P, Marvaso G, et al. Ablative stereotactic radiotherapy for oligometastatic colorectal cancer: systematic review. Crit Rev Oncol Hematol 2018;129:91-101. Crossref
11. McPartlin A, Swaminath A, Wang R, et al. Long-term outcomes of phase 1 and 2 studies of SBRT for hepatic colorectal metastases. Int J Radiat Oncol Biol Phys 2017;99:388-95. Crossref
12. Scorsetti M, Comito T, Tozzi A, et al. Final results of a phase II trial for stereotactic body radiation therapy for patients with inoperable liver metastases from colorectal cancer. J Cancer Res Clin Oncol 2015;141:543-53. Crossref
13. Hoyer M, Roed H, Traberg Hansen A, et al. Phase II study on stereotactic body radiotherapy of colorectal metastases. Acta Oncol 2006;45:823-30. Crossref
14. Joo JH, Park JH, Kim JC, et al. Local control outcomes using stereotactic body radiation therapy for liver metastases from colorectal cancer. Int J Radiat Oncol Biol Phys 2017;99:876-83. Crossref

Asthenopia prevalence and vision impairment severity among students attending online classes in low-income areas of western China during the COVID-19 pandemic

© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE (HEALTHCARE IN MAINLAND CHINA)
Asthenopia prevalence and vision impairment severity among students attending online classes in low-income areas of western China during the COVID-19 pandemic
Y Ding, PhD1; H Guan, PhD1; K Du, PhD2; Y Zhang, PhD1; Z Wang, MD1; Y Shi, PhD1
1 Center for Experimental Economics for Education, Shaanxi Normal University, Xi’an, China
2 College of Economics, Xi’an University of Finance and Economics, Xi’an, China
 
Corresponding author: Dr H Guan (hongyuguan0621@gmail.com)
 
 Full paper in PDF
 
Abstract
Introduction: This study explored the impact of online learning during the coronavirus disease 2019 (COVID-19) pandemic on asthenopia and vision impairment in students, with the aim of establishing a theoretical basis for preventive approaches to vision health.
 
Methods: This balanced panel study enrolled students from western rural China. Participant information was collected before and during the COVID-19 pandemic via questionnaires administered at local vision care centres, along with clinical assessments of visual acuity. Paired t tests and fixed-effects models were used to analyse pandemic-related differences in visual status.
 
Results: In total, 128 students were included (mean age before pandemic, 11.82 ± 1.46 years). The mean total screen time was 3.22 ± 2.90 hours per day during the pandemic, whereas it was 1.97 ± 1.90 hours per day in the pre-pandemic period (P<0.001). Asthenopia prevalence was 55% (71/128) during the pandemic, and the mean visual acuity was 0.81 ± 0.30 logarithm of the minimum angle of resolution; these findings indicated increasing vision impairment, compared with the pre-pandemic period (both P<0.001). Notably, asthenopia prevalence increased by two- to three-fold, compared with the pre-pandemic period. An increase in screen time while learning was associated with an increase in asthenopia prevalence (P=0.034).
 
Conclusion: During the COVID-19 pandemic, students spent more time on online classes, leading to worse visual acuity and vision health. Students in this study reported a significant increase in screen time, which was associated with increasing asthenopia prevalence and worse vision impairment. Further research is needed regarding the link between online classes and vision problems.
 
 
New knowledge added by this study
  • Online learning has become increasingly popular during the coronavirus disease 2019 pandemic. Students reported a nearly twofold increase in screen time during the pandemic, compared with the pre-pandemic period.
  • Students reported greater asthenopia prevalence and demonstrated worse vision impairment during the pandemic, compared with the pre-pandemic period.
  • Screen time was associated with asthenopia prevalence but not with the progression of vision impairment.
Implications for clinical practice or policy
  • Policymakers should carefully consider the prevalence of asthenopia and progression of vision impairment among students who are increasingly using digital devices and enrolling in online classes.
  • Policies regarding vision care should be implemented in response to the increasing use of online learning approaches.
 
 
Introduction
The World Health Organization announced that the coronavirus disease 2019 (COVID-19) outbreak had become an international public health emergency on 30 January 2020; on 11 March 2020, it declared that the outbreak had become a pandemic.1 Governments and public health authorities worldwide implemented public health policies to reduce the risk of viral transmission, including strict physical distancing, severe travel restrictions, and the closure of many businesses and schools. On 25 January 2020, China’s Central Government announced a nationwide travel ban and quarantine policy2; it initiated nationwide school closures as an emergency measure to prevent the spread of COVID-19.3 Thus, >220 million school-aged children and adolescents were confined to their homes; online classes were offered and delivered via the internet.4
 
Vision problems are public health challenges; among school-aged children, these problems often involve asthenopia and vision impairment. Asthenopia is defined as a subjective sensation of visual fatigue, eye weakness, or eyestrain; it can manifest through various symptoms, including epiphora, ocular pruritis, diplopia, eye pain, and dry eye.5 Vision impairment is defined as visual acuity (VA) of 6/12 or worse in either eye6; it is often caused by uncorrected refractive errors, and its estimated prevalence is 43%.7 Although both asthenopia and vision impairment have negative effects on students, the effects of vision impairment are greater. A previous global analysis revealed that vision impairment was present in 12.8 million children aged 5 to 15 years, half of whom lived in China.8 Moreover, students with vision impairment have lower scores on various motor and cognitive tests.9 10
 
Excessive use of digital devices contributes to increases in asthenopia prevalence and vision impairment among school-aged children.4 11 12 13 14 15 The COVID-19 pandemic has led to increased use of digital device–supported online classes,16 17 18 which require extended exposure to those devices.19 20 Importantly, long durations of exposure to digital devices can contribute to many vision problems in children.14
 
Asthenopia and vision impairment related to the excessive use of digital devices during the COVID-19 pandemic have been investigated in developed countries and urban China.4 11 12 To our knowledge, no similar studies have been conducted in western rural China. Additionally, online classes are increasingly implemented in rural areas, and the use of digital devices is becoming more prevalent11; thus, there is a need for research that focus on vision health in students.
 
The primary purpose of this study was to assess screen time, asthenopia prevalence, and vision impairment progression during the COVID-19 pandemic among students in western rural China. To achieve this goal, we first conducted a general descriptive analysis of student characteristics and screen time trends before and during the pandemic. We then investigated the prevalence of asthenopia and progression of vision impairment. Finally, we explored factors influencing the prevalence of asthenopia and progression of vision impairment before and during the pandemic.
 
Methods
Setting
This study focused on areas that were broadly representative of rural western China because of limited resources. Thus, the study was conducted in Shaanxi and Ningxia regions in western China. In 2019, the per capita gross domestic product in Shaanxi Province was US$10 167; this is similar to that in Ningxia Autonomous Region (US$8236).21
 
Sample selection
Vision data were acquired from local vision care centres (VCs), which had been established by the Center for Experimental Economics in Education at Shaanxi Normal University, in cooperation with county-level organisations such as the local education ministries and hospitals.
 
Before the pandemic, VC screenings were performed in each county, except during summer and winter vacations. Staff conducted one to two screenings per week (covering 2 to 4 schools); they completed one round of screening in one town each month. In practice, approximately 1 year is needed to complete one round of vision screening for all eligible children in a particular county. The second round and subsequent rounds of vision screening were performed using a similar workflow. After the completion of vision screening, students who required further assessment were referred to the VC for full eye and refractive examinations. This study included students who had visited the VC 3 months before the beginning of the COVID-19 pandemic.
 
During the pandemic, VC staff could not attend schools to perform vision screenings. To maintain vision screening services for students, we telephoned all students who had visited the VC before the pandemic. Participants in this panel study were students who participated in data collection before and during the COVID-19 pandemic.
 
Data collection
We conducted two cycles of surveys in the VC. The first survey cycle was conducted from October to December 2019 (before the pandemic); the second survey cycle was conducted among a group of students who visited the VC for follow-up from July to December 2020 (during the pandemic), based on their enrolment in the study before the pandemic. The same information was collected during the two survey cycles. During the vision screening process, VC staff administered questionnaires to students for collection of the following information: sex (male=1), age, ethnicity (Han=1), residence (non-rural=1), only-child status (yes=1), parental education (parents with ≥12 years of education=1), and parental migration status (one or both out-migrated=1; defined as one or both parents worked away from home during the semester). Household assets were calculated by summing the values of 13 items owned by the family, in accordance with the China Rural Household Survey Yearbook.22
 
The survey also included the collection of information regarding screen time and asthenopia. Students completed a previously described, self-administered questionnaire concerning mean time spent throughout the day on near activities (including computer and smartphone use, television viewing, and studying/homework after school). Reports of time spent on near activities during different parts of the day were categorised as screen time while learning and screen time while playing. Information regarding asthenopia was collected via three questions focused on ocular discomfort: whether the student had experienced dry eyes (yes=1), eye pain and swelling (yes=1), and eye fatigue and watery eyes (yes=1). Asthenopia was defined as the presence of at least one of these three types of vision health problems (yes=1).23 Furthermore, information regarding VA was collected when students visited the VC. The optometrist in the VC conducted a VA test to measure the clarity of each student’s vision. All students completed VA tests without refractive correction; students with spectacles completed VA tests with their routine method of vision correction.
 
The questionnaire regarding asthenopia was developed and reviewed by a group of health experts from Shaanxi Normal University and Zhongshan Ophthalmic Center, a well-known ophthalmology institution in China. The included questions were constructed to ensure that they could be clearly understood by students aged 9 to 17 years with the aid of trained VC staff. These three questions can serve as good indicators of symptoms representing different degrees of asthenopia in students, and they have been used in previous research.23
 
Visual acuity assessment
Visual acuity was assessed using Early Treatment Diabetic Retinopathy Study tumbling-E charts (Precision Vision, La Salle [IL], United States). In an indoor area with sufficient light, VA was separately assessed for each eye without refraction at a distance of 4 m. Students were first examined using a 6/60 line; if they correctly identified the orientation of at least four of five optotypes, they were examined using a 6/30 line, followed by a 6/15 line and a 6/3 line. In this manner, the VA for an eye was defined as the lowest line on which four of five optotypes were correctly identified. If the participant could not read the top line at a distance of 4 m, they were tested at a distance of 1 m, and the VA result was divided by 4.
 
In this study, VA levels were calculated and compared using the logarithm of the minimum angle of resolution (logMAR) scale, which is a linear scale with regular increments that offers a reasonably intuitive interpretation of VA measurement.24 In this study, vision impairment was defined as logMAR ≥0.3 (ie, VA of 6/12 or worse) in either eye.
 
Statistical methods
This balanced panel study compared student data between two periods (before and during the COVID-19 pandemic). Mean screen time, asthenopia prevalence, and vision impairment progression were compared among students using t tests, after stratification according to various demographic and behavioural factors. Fixed-effects logistic and regression models were used to explore factors influencing the prevalence of asthenopia and progression of vision impairment before and during the pandemic. Fixed-effects models were adjusted for sex, age, ethnicity, rural or non-rural residence, only-child status, parental migration status, parental education level, household assets, screen time while learning, and screen time while playing. All analyses were performed using Stata Statistical Software, version 14.1 (StataCorp, College Station [TX], United States). All tests were two-sided, and P values <0.05 were considered statistically significant.
 
Results
This study included 128 students from western rural China (mean age before pandemic, 11.82 ± 1.46 years; mean age during pandemic, 12.32 ± 1.54 years; 80 girls [62.5%] and 48 boys [37.5%]). All participants had vision impairment and were attending online classes (Table 1).
 

Table 1. Screen time before and during the coronavirus disease 2019 pandemic, stratified according to student characteristics (n=128)
 
During the pandemic, screen time significantly increased because of enrolment in online classes. The mean total screen time during the pandemic was 3.22 hours per day, compared with 1.97 hours during the pre-pandemic period (P<0.001). The mean screen time while learning during the pandemic was 1.70 hours per day, compared with 0.90 hours during the pre-pandemic period (P<0.001); the mean screen time while playing during the pandemic was 1.52 hours per day, compared with 1.33 hours during the pre-pandemic period (P=0.019). Additionally, rural students had significantly greater screen time while learning during the pandemic, compared with the pre-pandemic period (P<0.001); there was no such difference among non-rural students (Table 1).
 
The prevalence of asthenopia and progression of vision impairment significantly differed between the pandemic and pre-pandemic periods. The prevalence of asthenopia during the pandemic was 55% (71/128), whereas it was 27% (35/128) during the pre-pandemic period (P<0.001). The mean logMAR VA was worse during the pandemic compared with the pre-pandemic period (0.81 vs 0.65; P<0.001). The prevalence of asthenopia was higher during the pandemic than during the pre-pandemic period, regardless of the characteristics used to stratify participants. The mean logMAR VA was worse during the pandemic than during the pre-pandemic period, although the difference being insignificant among participants with non-Han ethnicity and participants in the top quartile of household assets (Table 2).
 

Table 2. Asthenopia prevalence and visual acuity (in logarithm of the minimum angle of resolution [logMAR]) before and during the coronavirus disease 2019 pandemic, stratified according to student characteristics (n=128)
 
Fixed-effects logistic models for asthenopia revealed that screen time while learning was associated with asthenopia prevalence, and the probability of asthenopia increased by 24.6% for each 1-hour increase in screen time while learning (95% confidence interval [CI]=1.02-1.53; P=0.034). Additionally, older age (odds ratio [OR]=2.073, 95% CI=1.13-3.81, P=0.019), Han ethnicity (OR=2.405, 95% CI=1.22-4.74; P=0.011), and only-child status (OR=0.488, 95% CI=0.21-1.13; P=0.095) were factors associated with asthenopia; screen time while playing was not (Table 3).
 

Table 3. Fixed-effects logistic analysis of factors associated with asthenopia before and during the coronavirus disease 2019 pandemic (n=128)
 
Fixed-effects regression models showed that residence in a non-rural area (OR=-0.200, 95% CI=-0.355 to -0.046; P=0.011) and only-child status (OR=-0.099, 95% CI=-0.197 to 0.000; P=0.049) were factors associated with logMAR VA. The probability of worse logMAR VA increased by 0.200 in non-rural areas, compared with rural areas. However, screen time while learning and screen time while playing were not associated with vision impairment (Table 4).
 

Table 4. Fixed-effects regression analysis of factors associated with visual acuity (in logarithm of the minimum angle of resolution [logMAR]) before and during the coronavirus disease 2019 pandemic (n=128)
 
Discussion
The global spread of the COVID-19 pandemic has affected the education of >1.5 billion children and adolescents worldwide.25 The participants in our study were representative of this important population. They demonstrated declines in VA and vision health during the pandemic, in relation to the excessive use of digital devices; these findings were consistent with the results of previous studies.19 26
 
All students in our study were attending online classes during the pandemic. We observed an increase in the mean daily time spent on digital devices between the pre-pandemic and pandemic periods; these results are consistent with international findings that screen time was greater during the pandemic than before the pandemic.19 Notably, we found that total screen time and screen time while learning significantly changed among rural students but not among non-rural students; these results are also consistent with previous findings.19 This difference presumably occurred because, compared with rural students, non-rural students were more likely to use digital devices and online classes before the pandemic.
 
We observed a significant difference in asthenopia prevalence among students in low-income areas of western China before and during the pandemic; this finding supports the results of previous studies.26 27 Although the risk of asthenopia reportedly increases with screen time,28 there is no published literature concerning changes in asthenopia among students in relation to the COVID-19 pandemic. Similar to previous studies,14 we found that the prevalence of asthenopia was approximately twofold greater among students aged 13 to 17 years than among those aged 9 to 12 years. Furthermore, Moon et al26 reported that symptoms of dry eye diseases were more common among older children than among younger children. Older children spend more time using digital devices, leading to a higher prevalence of asthenopia.29
 
This study showed significant progression of vision impairment in relation to the pandemic; similarly, a study in eastern China revealed that students had worse vision during the pandemic, compared with their vision at pre-pandemic examinations.4 However, screen time has not been associated with vision impairment among students. Furthermore, evidence regarding the impact of digital devices use on vision impairment has been inconsistent,30 31 with computer screen time made students’ vision worse while television viewing had no effect. We speculate that the association will become clearer as school-aged children spend increasing amounts of time using these devices.
 
This study had three important limitations. First, the screen time data were retrospectively collected through a self-reporting mechanism, which may have led to recall bias. However, considering the resource and measurement limitations that researchers encountered during the pandemic, self-reported recall was regarded as the optimal method for collection of screen time data in the present study. Second, the selection of students with poor vision may lead to underestimation of screen time effects on the general population, and the results should be generalised with caution. Third, the study was not designed to accurately distinguish between vision impairment caused by intrinsic factors and vision impairment caused by pandemic-related eye strain.
 
Our findings provide new evidence regarding the effects of increased screen time on asthenopia and vision impairment among students in western rural China during the pandemic; they can also serve as a basis for future research. Although pandemic-related school closures are temporary, the increasing popularity of online classes may accelerate the overall acceptance of digital devices. The use of online learning approaches is associated with multiple vision problems, which merit attention in future studies.
 
Conclusion
The present study demonstrated that asthenopia and vision impairment among students in western rural China were also affected by the pandemic; these findings provide critical insights regarding the effects of the pandemic on vision health in rural students. Moreover, the findings highlight important issues related to childhood vision health during the pandemic; parents, teachers, and eye care providers should consider evidence-based measures to avoid asthenopia and vision impairment in children. The current pace of economic and technological development is leading to increased use of digital devices and online learning approaches, but vision problems in rural China have not received sufficient consideration. Thus, there is a critical need for greater efforts to monitor VA and vision health among students in this region.
 
Author contributions
Concept or design: All authors.
Acquisition of data: Y Ding, H Guan, K Du.
Analysis or interpretation of data: Y Ding, H Guan, K Du, Y Shi.
Drafting of the manuscript: Y Ding, Y Zhang, Z Wang.
Critical revision of the manuscript for important intellectual content: H Guan, Y Shi.
 
All authors contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
As an International Editorial Advisory Board member of the journal, Y Shi was not involved in the peer review process. Other authors have disclosed no conflicts of interest.
 
Acknowledgement
We thank Dr Wenting Liu, Dr Jiaqi Zhu, and staff from the Center for Experimental Economics in Education of Shaanxi Normal University, China for their valuable contributions.
 
Funding/support
H Guan received funding for this study from the National Natural Science Foundation of China (Grant No.: 7180310) and Soft Science Project of Shaanxi Province (Grant No.: 2023-CX-RKX-127). Y Ding received funding for this study from the Fundamental Research Funds for the Central Universities (Grant No.: 2020CSWY018). This study was supported by the 111 Project (Grant No.: B16031). The funders had no role in designing the study, collecting, analysing or interpreting the data, or in drafting this manuscript.
 
Ethics approval
This study protocol was approved by Sun Yat-sen University, China (Registration No.: 2013MEKY018) and all procedures followed the principles of the Declaration of Helsinki. Permission was obtained from the local boards of education in the study area, as well as the principals of all participating schools. All participating children provided oral assent before baseline data collection, and legal guardians provided written informed consent for their children to be enrolled in the study.
 
References
1. World Health Organization. WHO timeline—COVID-19. 2020. Available from: https://www.who.int/news-room/detail/27-04-2020-who-timeline---covid-19. Accessed 13 Sep 2021.
2. Li D, Liu Z, Liu Q, et al. Estimating the efficacy of quarantine and traffic blockage for the epidemic caused by 2019-nCoV (COVID-19): a simulation analysis. medRxiv [Preprint]. 25 Feb 2020. Available from: https://doi.org/10.1101/2020.02.14.20022913. Accessed 13 Sep 2021. Crossref
3. Wang G, Zhang Y, Zhao J, Zhang J, Jiang F. Mitigate the effects of home confinement on children during the COVID-19 outbreak. Lancet 2020;395:945-7. Crossref
4. Wang J, Li Y, Musch DC, et al. Progression of myopia in school-aged children after COVID-19 home confinement. JAMA Ophthalmol 2021;139:293-300. Crossref
5. Kowalska M, Zejda JE, Bugajska J, Braczkowska B, Brozek G, Malińska M. Eye symptoms in office employees working at computer stations [in Polish]. Med Pr 2011;62:1-8.
6. Cumberland PM, Peckham CS, Rahi JS. Inferring myopia over the lifecourse from uncorrected distance visual acuity in childhood. Br J Ophthalmol 2007;91:151-3. Crossref
7. Pascolini D, Mariotti SP. Global estimates of visual impairment: 2010. Br J Ophthalmol 2012;96:614-8. Crossref
8. Resnikoff S, Pascolini D, Mariotti SP, Pokharel GP. Global magnitude of visual impairment caused by uncorrected refractive errors in 2004. Bull World Health Organ 2008;86:63-70. Crossref
9. Jan C, Li SM, Kang MT, et al. Association of visual acuity with educational outcomes: a prospective cohort study. Br J Ophthalmol 2019;103:1666-71. Crossref
10. Roch-Levecq AC, Brody BL, Thomas RG, Brown SI. Ametropia, preschoolers’ cognitive abilities, and effects of spectacle correction. Arch Ophthalmol 2008;126:252-8. Crossref
11. Zhang Z, Xu G, Gao J, et al. Effects of e-learning environment use on visual function of elementary and middle school students: a two-year assessment—experience from China. Int J Environ Res Public Health 2020;17:1560. Crossref
12. Wong CW, Tsai A, Jonas JB, et al. Digital screen time during COVID-19 pandemic: risk for a further myopia boom? Am J Ophthalmol 2021;223:333-7. Crossref
13. Kim J, Hwang Y, Kang S, et al. Association between sexposure to smartphones and ocular health in adolescents. Ophthalmic Epidemiol 2016;23:269-76. Crossref
14. Mohan A, Sen P, Shah C, Jain E, Jain S. Prevalence and risk factor assessment of digital eye strain among children using online e-learning during the COVID-19 pandemic: digital eye strain among kids (DESK study-1). Indian J Ophthalmol 2021;69:140-4. Crossref
15. Guan H, Yu NN, Wang H, et al. Impact of various types of near work and time spent outdoors at different times of day on visual acuity and refractive error among Chinese school-going children. PLoS One 2019;14:e0215827.Crossref
16. Sultana A, Tasnim S, Hossain MM, Bhattacharya S, Purohit N. Digital screen time during the COVID-19 pandemic: a public health concern. Available from: https://f1000research.com/articles/10-81. Accessed 13 Sep 2021. Crossref
17. Nigg CR, Wunsch K, Nigg C, et al. Are physical activity, screen time, and mental health related during childhood, preadolescence, and adolescence? 11-year results from the German Montorik–Modul Longitudinal Study. Am J Epidemiol 2021;190:220-9. Crossref
18. Schmidt SC, Anedda B, Burchartz A, et al. Physical activity and screen time of children and adolescents before and during the COVID-19 lockdown in Germany: a natural experiment. Sci Rep 2020;10:21780. Crossref
19. Aguilar-Farias N, Toledo-Vargas M, Miranda-Marquez S, et al. Sociodemographic predictors of changes in physical activity, screen time, and sleep among toddlers and preschoolers in Chile during the COVID-19 pandemic. Int J Environ Res Public Health 2020;18:176. Crossref
20. Bates LC, Zieff G, Stanford K, et al. COVID-19 impact on behaviors across the 24-hour day in children and adolescents: physical activity, sedentary behavior, and sleep. Children (Basel) 2020;7:138. Crossref
21. National Bureau of Statistics of China, PRC Government. China Statistical Yearbook 2020. Available from: http://www.stats.gov.cn/tjsj/ndsj/2020/indexch.htm. Accessed 14 Sep 2021.
22. National Bureau of Statistics of China, PRC Government. China Statistical Yearbook 2013. Beijing, China: China State Statistical Press; 2013.
23. Seguí Mdel M, Cabrero García J, Crespo A, Verdú J, Ronda E. A reliable and valid questionnaire was developed to measure computer vision syndrome at the workplace. J Clin Epidemiol 2015;68:662-73. Crossref
24. Yi H, Zhang L, Ma X, et al. Poor vision among China’s rural primary school students: prevalence, correlates and consequences. China Econ Rev 2015;33:247-62. Crossref
25. United Nations International Children’s Emergency Fund. Don’t let children be the hidden victims of COVID-19 pandemic. Available from: https://www.unicef.org/press-releases/dont-let-children-be-hidden-victims-covid-19-pandemic. Accessed 6 Oct 2020.
26. Moon JH, Kim KW, Moon NJ. Smartphone use is a risk factor for pediatric dry eye disease according to region and age: a case control study. BMC Ophthalmol 2016;16:188. Crossref
27. Moon JH, Lee MY, Moon NJ. Association between video display terminal use and dry eye disease in school children. J Pediatr Ophthalmol Strabismus 2014;51:87-92. Crossref
28. Rechichi C, De Mojà G, Aragona P. Video game vision syndrome: a new clinical picture in children? J Pediatr Ophthalmol Strabismus 2017;54:346-55.Crossref
29. Mowatt L, Gordon C, Santosh AB, Jones T. Computer vision syndrome and ergonomic practices among undergraduate university students. Int J Clin Pract 2018;72:e13035. Crossref
30. Terasaki H, Yamashita T, Yoshihara N, Kii Y, Sakamoto T. Association of lifestyle and body structure to ocular axial length in Japanese elementary school children. BMC Ophthalmol 2017;17:123. Crossref
31. Fernández-Montero A, Olmo-Jimenez JM, Olmo N, et al. The impact of computer use in myopia progression: a cohort study in Spain. Prev Med 2015;71:67-71. Crossref

Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: diagnostic accuracy in Hong Kong

© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE  CME
Artificial intelligence for detection of intracranial haemorrhage on head computed tomography scans: diagnostic accuracy in Hong Kong
Jill M Abrigo, MD, FRCR; Ka-long Ko, MPhil; Qianyun Chen, MSc; Billy MH Lai, MB, BS, FHKAM (Radiology); Tom CY Cheung, MB ChB, FHKAM (Radiology); Winnie CW Chu, MB ChB, FHKAM (Radiology); Simon CH Yu, MB, BS, FHKAM (Radiology)
Department of Imaging and Interventional Radiology, Prince of Wales Hospital, The Chinese University of Hong Kong, Hong Kong SAR, China
 
Corresponding author: Dr Jill M Abrigo (jillabrigo@cuhk.edu.hk)
 
 Full paper in PDF
 
Abstract
Introduction: The use of artificial intelligence (AI) to identify acute intracranial haemorrhage (ICH) on computed tomography (CT) scans may facilitate initial imaging interpretation in the accident and emergency department. However, AI model construction requires a large amount of annotated data for training, and validation with real-world data has been limited. We developed an algorithm using an open-access dataset of CT slices, then assessed its utility in clinical practice by validating its performance on CT scans from our institution.
 
Methods: Using a publicly available international dataset of >750 000 expert-labelled CT slices, we developed an AI model which determines ICH probability for each CT scan and nominates five potential ICH-positive CT slices for review. We validated the model using retrospective data from 1372 non-contrast head CT scans (84 [6.1%] with ICH) collected at our institution.
 
Results: The model achieved an area under the curve of 0.842 (95% confidence interval=0.791-0.894; P<0.001) for scan-based detection of ICH. A pre-specified probability threshold of ≥50% for the presence of ICH yielded 78.6% accuracy, 73% sensitivity, 79% specificity, 18.6% positive predictive value, and 97.8% negative predictive value. There were 62 true-positive scans and 22 false-negative scans, which could be reduced to six false-negative scans by manual review of model-nominated CT slices.
 
Conclusions: Our model exhibited good accuracy in the CT scan–based detection of ICH, considering the low prevalence of ICH in Hong Kong. Model refinement to allow direct localisation of ICH will facilitate the use of AI solutions in clinical practice.
 
 
New knowledge added by this study
  • A deep learning–based artificial intelligence model trained on an international dataset of computed tomography (CT) slices exhibited good accuracy in the detection of intracranial haemorrhage (ICH) on CT scans in Hong Kong.
  • Considering the 6% prevalence of ICH in our institution, and using a pre-specified probability threshold of ≥50%, the model detected 74% of ICH-positive scans; this outcome improved to 93% via manual review of model-nominated images.
Implications for clinical practice or policy
  • Considering the expected clinical applications, model refinement is needed to improve diagnostic performance prior to additional tests in a clinical setting.
  • Our model may facilitate assessment of CT scans by physicians with different degrees of experience in ICH detection, an important aspect of real-world clinical practice.
 
 
Introduction
Head computed tomography (CT) scans constitute the main imaging investigation during the evaluation of trauma and stroke; they are also important in the initial work-up of headache and other non-specific neurological complaints. In Prince of Wales Hospital of Hong Kong alone, >25 000 head CT scans were performed in 2019 during the clinical management of patients who presented to the Accident and Emergency Department. Computed tomography scans are composed of multiple cross-sectional images (ie, slices), which may be challenging to interpret. Typically, these scans are initially reviewed by frontline physicians prior to assessment by radiologists, and delays during the review process can be substantial. Thus, the timely recognition of an acute finding, such as intracranial haemorrhage (ICH), is limited by the competence and availability of frontline physicians.
 
The presence and location or type of ICH impacts the next clinical step, which can be further imaging investigations, medical management, or surgical intervention.1 Furthermore, a confirmation of ICH absence can also be useful in clinical management. For example, it can facilitate safe discharge from the hospital when appropriate; in patients with acute stroke, the absence of ICH is an important exclusion criterion that influences treatment selection.2
 
The use of artificial intelligence (AI) for ICH detection is a topic with global relevance considering its diagnostic impact and ability to optimise workflow, both of which have high practical value.3 4 In the accident and emergency department, AI can facilitate ICH detection in head CT scans during times when a radiologist is unavailable. Although there have been multiple reports of deep learning methods with high accuracy in the detection of ICH, the models in those reports were developed using in-house labelled training datasets and validated using a limited number of cases.3 5 6 7 8 Recently, the Radiological Society of North America (RSNA) publicly released >25 000 multi-centre head CT scans with slices that have been labelled with or without ICH by experts.9 Here, we developed a model using this RSNA dataset, then validated its performance on CT scans from our institution to determine its potential for clinical application in Hong Kong.
 
Methods
Ethical considerations
This study was approved by the Joint Chinese University of Hong Kong—New Territories East Cluster Clinical Research Ethics Committee (Ref No.: 2020.061). The model was developed from a publicly available dataset and validated on retrospectively acquired data from our institution. The requirement for patient consent was waived by the Committee given the retrospective design of the study and anonymisation of all CT scans prior to use.
 
The results of this diagnostic accuracy study are reported in accordance with the Standards for Reporting of Diagnostic Accuracy Studies guidelines.10
 
Public dataset: model development and internal validation
We acquired 25 312 head CT scans from four institutions in North and South America available in the RNSA open dataset,11 and were split into slices (each slice ≥5 mm thick), which were then randomly shuffled and annotated by 60 volunteer experts from the American Society of Neuroradiology. Each CT slice was labelled to indicate the presence and type of ICH. When present, ICH was classified according to its location, namely, intraparenchymal haemorrhage (IPH), subarachnoid haemorrhage (SAH), subdural haemorrhage (SDH), epidural haemorrhage (EDH), and intraventricular haemorrhage (IVH). The RSNA dataset comprised 752 807 CT slices, which were divided into a training subset (85%) and test subset (15%) for internal validation. Each subset consisted of approximately 86% negative ICH slices and 14% positive ICH slices, along with the following proportions of ICH subtypes: 4.8% IPH, 4.7%-4.8% SAH, 6.3% SDH, 0.4% EDH, and 3.4%-3.5% IVH.
 
The convolutional neural network (CNN) VGG (named after the Visual Geometry Group from the University of Oxford, United Kingdom) is an effective end-to-end algorithm for image detection.12 In this study, we adopted the VGG architecture with a customised output layer and loss function optimised for multi-label classification. To adjust for the low prevalence of ICH in the training set, each subtype’s logit outputs zi were concatenated as independent channels after a sigmoid output layer:
 
 
The performance of the CNN model was evaluated by binary cross-entropy loss and Sørensen–Dice loss13:
 
 
The loss functions were linearly combined with weighted values to produce the multi-label classification loss:
 
 
Where wi denotes the class prevalence weight, and α and β denote respective loss mix ratios. For simplicity, wi=1∕(n-1) for all subtype classes and wi=1 for ‘ANY’ was treated as an independent ICH class.
 
The model was trained with software written in our laboratory using the end-to-end open-source machine learning platform TensorFlow on an Nvidia Titan Xp graphics processing unit.
 
During internal validation (ie, slice-level performance for the detection of any type of ICH), the model achieved an area under the curve (AUC) of 0.912 (95% confidence interval [CI]=0.909-0.915) with sensitivity and specificity of 85% and 80%, respectively. Additionally, for the detection of specific types of ICH, the following AUC (95% CI) and sensitivity/specificity values were obtained: 0.860 (0.853-0.867) and 77%/88% for IPH, 0.835 (0.829-0.842) and 75%/82% for SAH, 0.850 (0.845-0.855) and 74%/83% for SDH, 0.813 (0.790-0.836) and 72%/80% for EDH, and 0.870 (0.861-0.879) and 79%/89% for IVH.
 
Prince of Wales Hospital dataset: external validation
The consecutive head CT scans of patients aged ≥18 years who underwent initial brain CT scans in the Accident and Emergency Department of Prince of Wales Hospital from 1 to 31 July 2019 were included, thereby simulating the point prevalence of ICH.
 
Head CT scans were acquired on a 64-slice CT scanner. Data analyses were conducted using reformatted 5-mm-thick slices, which can be accessed and viewed by physicians at all hospital workstations. DICOM (Digital Imaging and Communications in Medicine) images were de-identified prior to data analyses. The large volume of data was explored through the identification of relevant CT data using an automated program which selected scans with specific DICOM tags. Computed tomography scans performed for follow-up purposes or after recent intracranial surgery, as well as scans without radiologist reports, were excluded from the analysis.
 
We reviewed the corresponding radiology reports to determine the presence and type of ICH (IPH, SAH, SDH, EDH, or IVH). The CT scans were assessed by radiologists or senior radiology trainees; the corresponding reports were regarded as scan-level ground truth labels for analysis, consistent with their use as clinical reference standards in Hong Kong. Considering its rarity, EDH was grouped with and labelled as SDH, which has a similar appearance on CT. For scans with false-negative results, we performed post-hoc labelling of model-nominated CT slices. All scans were assessed prior to model construction; thus, the scan reports were established without knowledge of the AI results. Furthermore, all scans comprised the external validation dataset and constituted ‘unseen data’ for the model.
 
Statistical analysis
The diagnostic accuracies of the model for the detection of any type of ICH and each type of ICH were determined by calculation of the AUC with 95% CI, using DeLong et al’s method.14 To construct the confusion matrix during external validation, CT scans were classified as ICH-positive using a pre-specified probability threshold of ≥50%8; the corresponding sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy were calculated. Additional probability thresholds were established to achieve 90% sensitivity and 90% specificity for the presence of any type of ICH. Statistical analysis was performed using R software (version 4.0.2; R Foundation for Statistical Computing, Vienna, Austria), and the threshold for statistical significance was set at P<0.05.
 
Results
Model output
Figure 1 shows an example of the model output. The model report includes an overall probability for the presence of ICH (labelled ‘A’ in Fig 1). Additionally, the model selects five representative CT image slices which are likely to contain ICH (one such slice is labelled ‘B’ in Fig 1), along with the probability of each ICH type in each representative slice (labelled ‘C’ in Fig 1). All scans were successfully analysed by the model.
 

Figure 1. Sample model output, highlighting three types of information provided by the model. A: intracranial haemorrhage (ICH) probability; B: model-nominated image with possible ICH and corresponding slice number in the computed tomography (CT) scan; C: probability of each ICH type for the corresponding CT slice
 
Prince of Wales Hospital data and model validation
The Standards for Reporting of Diagnostic Accuracy Studies diagram and corresponding confusion matrix are shown in Figure 2. In total, 1372 head CT scans (84 [6.1%] with ICH) were included in the analysis. The distribution of ICH types is summarised in the Table.
 

Figure 2. Standards for the Reporting of Diagnostic Accuracy flowchart for external validation of the model using computed tomography (CT) scans from Prince of Wales Hospital. The confusion matrix is shown below the flowchart
 

Table. Distribution of computed tomography (CT) scans without and with intracranial haemorrhage in the Prince of Wales Hospital dataset (n=1372)
 
Diagnostic performance of scan-based detection for any type of intracranial haemorrhage
The model achieved an AUC of 0.842 (95% CI=0.791-0.894; P<0.001) for the identification of any type of ICH. Using a probability threshold of ≥50% for the presence of ICH, the accuracy, sensitivity, specificity, PPV, and NPV were 78.6%, 73%, 79%, 18.6%, and 97.8%, respectively. In total, 62 scans were true positive, 22 were false negative, 1017 were true negative, and 271 were false positive (Fig 2).
 
Among the 62 true-positive scans, the model output in two cases did not contain ICH-positive CT slices: 6-mm IPH in the pons (n=1) and trace SAH in a patient with multiple metastatic tumours (n=1). Figure 3 shows selected cases of model-nominated CT slices with subtle ICH.
 

Figure 3. Representative computed tomography slices from model outputs for selected true-positive scans showing small or subtle intracranial haemorrhage. Arrowheads have been added to indicate intracranial haemorrhage. (a) Haemorrhage within a cystic tumour; (b, d, and e) subarachnoid haemorrhage; (c) intraparenchymal haemorrhage; (f) subdural haemorrhage
 
Among the 22 false-negative scans, 19 had one type of ICH (6 IPH, 7 SAH, 5 SDH, and 1 IVH), two had two types of ICH (1 IPH+SAH and 1 SAH+SDH), and one had three types of ICH (IPH+SAH+IVH). In 16 scans, the model selected at least one ICH-positive CT slice which allowed correct reclassification (Fig 4). The remaining six scans with undetected ICH (Fig 5) comprised small midbrain IPH (n=1), trace SAH (n=3), and thin SDH/EDH (n=2). One of the three cases of undetected trace SAH was visualised on thin CT slices but not on thick CT slices.
 

Figure 4. Representative computed tomography slices from model outputs for false-negative scans showing intracranial haemorrhage. Arrowheads have been added to indicate intracranial haemorrhage. (a) Intraventricular haemorrhage; (b, c, f, g, h, and i) subarachnoid haemorrhage; (d, e, l, m, n, o, and p) intraparenchymal haemorrhage; (j and k) subdural haemorrhage
 

Figure 5. False-negative computed tomography scans with undetected intracranial haemorrhage. Arrowheads have been added to indicate intracranial haemorrhage. (a-e) Representative images of intracranial haemorrhage in thick computed tomography slices [(a) intraparenchymal haemorrhage; (b and d) subdural haemorrhage; (c and e) subarachnoid haemorrhage]. (f) Trace subarachnoid haemorrhage that was visible in reformatted coronal thin computed tomography slices but not thick computed tomography slices
 
A probability threshold of 20.4% yielded a sensitivity of 90% (40% specificity, 9% PPV, and 98.3% NPV), whereas a threshold of 65.7% yielded a specificity of 90% (64% sensitivity, 30% PPV, and 97.4% NPV), for the detection of ICH.
 
Diagnostic performance of scan-based detection for each type of intracranial haemorrhage
At a probability threshold of ≥50%, the following AUC (95% CI) and corresponding sensitivity/specificity were obtained for each type of ICH: 0.930 (0.892-0.968) and 4%/100% for IPH, 0.766 (0.684-0.849) and 12%/96% for SAH, 0.865 (0.783-0.947) and 75%/90% for SDH/EDH, and 0.935 (0.852-1.000) and 85%/93% for IVH.
 
Discussion
In this study, we used a large international training dataset to construct a model for ICH detection, then conducted external validation using data from Hong Kong. To overcome the discrepancy between the training dataset (composed of CT slices) and the validation dataset (composed of CT scans), and considering our goal of clinical application, we designed a model that iteratively conducts assessments at the slice level to generate an overall probability at the scan level, then nominates the slices with the highest ICH probability for clinician evaluation. Furthermore, we performed validation using a point-prevalence approach to determine the diagnostic performance of the model in a real-world setting. Considering the 6% prevalence of ICH in our institution, and using a pre-specified probability threshold of ≥50%, the model detected 74% of ICH-positive scans; this outcome improved to 93% via manual review of model-nominated images.
 
Artificial intelligence for intracranial haemorrhage detection: research and reality
Multiple studies have successfully used AI for ICH detection via deep learning methods, typically involving variants of CNNs. For example, Arbabshirani et al5 (deep CNN, >37 000 training CT scans) reported an AUC of 0.846 on 342 CT scans; Chang et al4 (two-dimensional/three-dimensional CNN, 10 159 training CT scans) reported an AUC of 0.983 on 862 prospectively collected CT scans. Furthermore, Chilamkurthy et al3 (CNN, >290 000 training CT scans) reported an AUC of 0.94 on 491 CT scans; Lee et al7 (four deep CNNs, 904 training CT scans) reported an AUC of 0.96 on 214 CT scans. Finally, Ye et al8 (three-dimensional joint CNN-recurrent neural network, 2537 training CT scans) reported an AUC of 1.0 on 299 CT scans; Kuo et al6 (patch-based fully CNN, 4396 training CT scans) reported an AUC of 0.991 on 200 CT scans. Although these results demonstrate the high diagnostic performance that can be achieved using deep learning methods for ICH detection, the studies were conducted using in-house training datasets, which are laborious to produce and limit subsequent clinical applications. Moreover, the results may not be directly applicable to clinical practice, considering the limited number (generally <500) of CT scans during validation, as well as the effect of prevalence on sensitivity and specificity. Yune et al15 demonstrated this problem with a deep learning model that had an AUC of 0.993 on selected cases, which decreased to 0.834 when validated on CT scans collected over a 3-month period; notably, this is comparable with the AUC of our model. Thus, model performance in a real-world setting can reduce the risk of bias and serve as a better indicator of clinical relevance.16
 
Artificial intelligence for intracranial haemorrhage detection: our approach
The development of an AI model is the first step in a long process of clinical translation. In this study, we aimed to construct an algorithm that was reasonably comparable with radiologist performance, prior to further tests in a clinical setting. We recognise that our model is not an end-product; it constitutes an initial exploration of the potential for an international dataset–derived algorithm to be implemented in our institution. To avoid problems associated with the lack of an annotated dataset from Hong Kong, we utilised a dataset labelled by international experts, which is the most extensive open-access dataset currently available. However, the model achieved limited diagnostic accuracy, mainly because of type 1 error (ie, identification of false positives). The training dataset was composed of CT slices, whereas the model functioned at the CT scan level, iteratively assessing all slices to identify slices with highest ICH probability. If any slice identified in a single scan is considered positive, the model reports the CT scan as ‘ICH-positive’. Thus, any detection of false positives at the slice level will lead to amplification of the false-positive rate at the scan level. This strategy resulted in a low PPV (~19%) and a high NPV (~98%). To reduce the detection of false positives, we included a CT slice nomination feature in the model, which highlights CT slices with the highest probability of ICH. This facilitates manual review and reduces the black-box nature of the model.
 
Potential implications of artificial intelligence–detected intracranial haemorrhage in clinical practice
During validation, the model was tested using an ICH point–prevalence approach to elucidate the potential clinical implications of the classification outcomes. With respect to true positives, most ICH-positive scans were detected; most of these scans had large areas of ICH, which presumably could be easily identified by non-radiologists. However, in six cases, the model correctly nominated CT slices with small areas of ICH. In two cases, the nominated images did not have ICH, which could potentially have led to incorrect reclassification of the scan as a false positive.
 
Furthermore, there were many false positives. Such results may reduce physician confidence despite the correct interpretation of an ICH-negative scan; they may lead to overdiagnosis (with prolonged hospitalisation) or further investigations, such as a follow-up CT scan that involves additional radiation exposure.
 
With respect to false negatives, the model output includes a secondary mechanism of image review that allowed correct reclassification of 16 scans, increasing the rate of ICH detection from 74% to 93%. In five cases, ICH was conspicuous on the nominated images; in 11 cases, the nominated images displayed subtle ICH. In cases of subtle ICH, it is possible to overlook the trace amount of ICH on the nominated CT slice. The same problem may affect true-positive scans, which may be misclassified as false positives unless subtle ICH is recognised in the nominated image. Unfortunately, the model-generated probability of each type of ICH in each selected image did not facilitate the localisation of ICH.
 
Based on our primary clinical motivation to develop this model, we focused on CT scans with reformatted thick CT slices that can be viewed in all hospital workstations by non-radiologists. In practice, radiologists use dedicated imaging workstations to view sub-millimetre thin CT slices with greater sensitivity, which can display smaller or subtler pathologies. Thus, there is limited capacity for ICH detection in thick CT slices; this was highlighted in a case of trauma-related trace SAH, which was visible on thin CT slices but not thick CT slices. Subarachnoid haemorrhage is reportedly the most difficult type of ICH to interpret.17 In practice, a patient with a very small amount of isolated traumatic SAH would likely receive conservative treatment, and the pathology could reasonably await detection via radiologist assessment.
 
Limitations
This study had some limitations. First, diagnostic accuracy would have been more comprehensively assessed using a larger number of CT scans or a longer point prevalence; however, we limited the assessment to CT scans collected over a 1-month period, considering the preliminary stage of model development. Second, the CT scans were assessed by radiologists and senior radiology trainees who may have different degrees of experience in ICH detection17; importantly, this limitation reflects the real-world setting where model deployment is intended. Finally, the model was specifically trained for the detection of ICH; it was not trained for the detection of other clinically significant non-ICH findings (eg, non-haemorrhagic tumours, hydrocephalus, or mass effect). The detection of these other pathologies will require dedicated models with customised training datasets.
 
Conclusion
In this study, we used a CT slice–based dataset to develop an algorithm for CT scan–based ICH detection; we validated the model using our institutional data with a point-prevalence approach, yielding insights regarding its utility in real-world clinical practice. Although the model demonstrated good accuracy, its diagnostic performance is currently limited to the intended clinical application. However, our results support further development of the model to improve its accuracy and incorporate a mechanism that can facilitate visual confirmation of ICH location. These modifications would enhance the interpretability of the deep learning model and would be useful for further evaluation of clinical applications.
 
Author contributions
Concept or design: JM Abrigo, KL Ko, WCW Chu, SCH Yu.
Acquisition of data: JM Abrigo, Q Chen, WCW Chu, BMH Lai, TCY Cheung.
Analysis or interpretation of data: All authors.
Drafting of the manuscript: JM Abrigo, KL Ko, Q Chen.
Critical revision of the manuscript for important intellectual content: All authors.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
All authors have disclosed no conflicts of interest.
 
Acknowledgement
We thank our department colleagues Mr Kevin Lo for anonymising and downloading Digital Imaging and Communications in Medicine data, and we thank Mr Kevin Leung for preparing figures for this manuscript.
 
Funding/support
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
 
Ethics approval
This research was approved by the Joint Chinese University of Hong Kong—New Territories East Cluster Clinical Research Ethics Committee (Ref No.: 2020.061). The requirement for patient consent was waived by the Committee given the retrospective design of the study and anonymisation of all computed tomography scans prior to use.
 
References
1. Caceres JA, Goldstein JN. Intracranial hemorrhage. Emerg Med Clin North Am 2012;30:771-94. Crossref
2. Powers WJ, Rabinstein AA, Ackerson T, et al. Guidelines for the early management of patients with acute ischemic stroke: 2019 update to the 2018 guidelines for the early management of acute ischemic stroke: a guideline for healthcare professionals from the American Heart Association/American Stroke Association. Stroke 2019;50:e344-418. Crossref
3. Chilamkurthy S, Ghosh R, Tanamala S, et al. Deep learning algorithms for detection of critical findings in head CT scans: a retrospective study. Lancet 2018;392:2388-96. Crossref
4. Chang PD, Kuoy E, Grinband J, et al. Hybrid 3D/2D convolutional neural network for hemorrhage evaluation on head CT. AJNR Am J Neuroradiol 2018;39:1609-16. Crossref
5. Arbabshirani MR, Fornwalt BK, Mongelluzzo GJ, et al. Advanced machine learning in action: identification of intracranial hemorrhage on computed tomography scans of the head with clinical workflow integration. NPJ Digit Med 2018;1:9. Crossref
6. Kuo W, Häne C, Mukherjee P, Malik J, Yuh EL. Expert-level detection of acute intracranial hemorrhage on head computed tomography using deep learning. Proc Natl Acad Sci U S A 2019;116:22737-45. Crossref
7. Lee H, Yune S, Mansouri M, et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat Biomed Eng 2019;3:173-82. Crossref
8. Ye H, Gao F, Yin Y, et al. Precise diagnosis of intracranial hemorrhage and subtypes using a three-dimensional joint convolutional and recurrent neural network. Eur Radiol 2019;29:6191-201. Crossref
9. Flanders AE, Prevedello LM, Shih G, et al. Construction of a machine learning dataset through collaboration: The RSNA 2019 Brain CT Hemorrhage Challenge. Radiol Artif Intell 2020;2:e190211. Crossref
10. Bossuyt PM, Reitsma JB, Bruns DE, et al. STARD 2015: an updated list of essential items for reporting diagnostic accuracy studies. Radiology 2015;277:826-32. Crossref
11. Radiological Society of North America. RSNA intracranial hemorrhage detection: identify acute intracranial hemorrhage and its subtypes. Available from: https://www.kaggle.com/c/rsna-intracranial-hemorrhage-detection/data. Accessed 28 Mar 2023. Crossref
12. Zhang X, Zou J, He K, Sun J. Accelerating very deep convolutional networks for classification and detection. IEEE Trans Pattern Anal Mach Intell 2016;38:1943-55. Crossref
13. Milletari F, Navab N, Ahmadi SA. V-Net: fully convolutional neural networks for volumetric medical image segmentation. 2016 Fourth International Conference on 3D Vision (3DV). USA (CA): Stanford; 2016: 565-71. Crossref
14. DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837-45. Crossref
15. Yune S, Lee H, Pomerantz S, et al. Real-world performance of deep-learning–based automated detection system for intracranial hemorrhage. Radiological Society of North America (RSNA) 104th Scientific Assembly and Annual Meeting. McCormick Place, Chicago (IL); 2018.
16. Nagendran M, Chen Y, Lovejoy CA, et al. Artificial intelligence versus clinicians: systematic review of design, reporting standards, and claims of deep learning studies. BMJ 2020;368:m689. Crossref
17. Strub WM, Leach JL, Tomsick T, Vagal A. Overnight preliminary head CT interpretations provided by residents: locations of misidentified intracranial hemorrhage. AJNR Am J Neuroradiol 2007;28:1679-82. Crossref

Evaluation of contemporary olanzapine- and netupitant/palonosetron-containing antiemetic regimens for chemotherapy-induced nausea and vomiting

© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE
Evaluation of contemporary olanzapine- and netupitant/palonosetron-containing antiemetic regimens for chemotherapy-induced nausea and vomiting
Christopher CH Yip1; L Li, MB, BS, FHKAM (Medicine)1; Thomas KH Lau, MB, BS, FHKAM (Medicine)1; Vicky TC Chan, MB, BS, FHKAM (Medicine)1; Carol CH Kwok, MB, BS, FHKAM (Radiology)2; Joyce JS Suen, MB, BS, FHKAM (Radiology)2; Frankie KF Mo, PhD1; Winnie Yeo, MB, BS, FHKAM (Medicine)1
1 Department of Clinical Oncology, Prince of Wales Hospital, Hong Kong
2 Department of Clinical Oncology, Princess Margaret Hospital, Hong Kong
 
Corresponding author: Prof Winnie Yeo (winnieyeo@cuhk.edu.hk)
 
 Full paper in PDF
 
Abstract
Introduction: This post-hoc analysis retrospectively assessed data from two recent studies of antiemetic regimens for chemotherapy-induced nausea and vomiting (CINV). The primary objective was to compare olanzapine-based versus netupitant/palonosetron (NEPA)-based regimens in terms of controlling CINV during cycle 1 of doxorubicin/cyclophosphamide (AC) chemotherapy; secondary objectives were to assess quality of life (QOL) and emesis outcomes over four cycles of AC.
 
Methods: This study included 120 Chinese patients with early-stage breast cancer who were receiving AC; 60 patients received the olanzapine-based antiemetic regimen, whereas 60 patients received the NEPA-based antiemetic regimen. The olanzapine-based regimen comprised aprepitant, ondansetron, dexamethasone, and olanzapine; the NEPA-based regimen comprised NEPA and dexamethasone. Patient outcomes were compared in terms of emesis control and QOL.
 
Results: During cycle 1 of AC, the olanzapine group exhibited a higher rate of ‘no use of rescue therapy’ in the acute phase (olanzapine vs NEPA: 96.7% vs 85.0%, P=0.0225). No parameters differed between groups in the delayed phase. The olanzapine group had significantly higher rates of ‘no use of rescue therapy’ (91.7% vs 76.7%, P=0.0244) and ‘no significant nausea’ (91.7% vs 78.3%, P=0.0408) in the overall phase. There were no differences in QOL between groups. Multiple cycle assessment revealed that the NEPA group had higher rates of total control in the acute phase (cycles 2 and 4) and the overall phase (cycles 3 and 4).
 
Conclusion: These results do not conclusively support the superiority of either regimen for patients with breast cancer who are receiving AC.
 
 
New knowledge added by this study
  • The olanzapine-based regimen (aprepitant, ondansetron, dexamethasone, and olanzapine) and the NEPA-based regimen (netupitant, palonosetron, and dexamethasone) demonstrated similar efficacies in terms of controlling chemotherapy-induced nausea and vomiting among patients with early-stage breast cancer.
  • Quality of life did not significantly differ between patients receiving the olanzapine-based regimen and patients receiving the NEPA-based regimen.
Implications for clinical practice or policy
  • The available data suggest that olanzapine-containing antiemetic regimens can be used without aprepitant, particularly when seeking to reduce medical expenses.
  • Antiemetic efficacy may potentially be enhanced if NEPA is administered in combination with dexamethasone and olanzapine as a four-drug antiemetic regimen.
 
 
Introduction
Patients with breast cancer receiving (neo)adjuvant treatment exhibit improved prognoses.1 However, chemotherapy regimens for breast cancer are associated with various degrees of chemotherapy-induced nausea and vomiting (CINV). The doxorubicin/cyclophosphamide (AC) regimen is one of the most frequently prescribed regimens for patients with breast cancer who are receiving (neo) adjuvant chemotherapy; AC is among the highly emetogenic chemotherapies with ≥90% risk of nausea and vomiting.
 
In situations where a neurokinin-1 receptor antagonist (NK1RA) is accessible, most current guidelines for AC(-like) chemotherapy recommend the use of a prophylactic triplet antiemetic regimen that consists of an NK1RA, a 5-hydroxytryptamine type-3 receptor antagonist (5HT3RA), and a corticosteroid, with or without olanzapine.2 3 4 In addition to earlier NK1RAs (eg, aprepitant, fosaprepitant, and rolapitant), netupitant/palonosetron (NEPA) [Akynzeo], which is a combination of an NK1RA (netupitant 100 mg) and a second-generation 5HT3RA (palonosetron 0.5 mg), has been available in the past decade. Although palonosetron constitutes a more potent 5HT3RA,5 it also has synergistic interactions with netupitant that include interference with 5HT3 receptor cross-talk and enhancement of the netupitant-mediated effect on NK1 receptor internalisation.6 7
 
In a recent systematic review and meta-analysis, Yokoe et al8 compared different antiemetic regimens to assess their control of CINV in patients receiving highly emetogenic chemotherapy regimens. The authors arbitrarily defined the ‘conventional’ regimen as a three-drug regimen that contained dexamethasone, a first- or second-generation secondgeneration 5HT3RA, and an earlier NK1RA compound (aprepitant, fosaprepitant, or rolapitant); they defined ‘new’ regimens as regimens that contained NEPA or olanzapine. The results indicated that, compared with conventional regimens, new regimens containing NEPA were more effective in terms of producing a complete response (ie, absence of vomiting and no use of rescue therapy). Additionally, Yokoe et al8 showed that olanzapine-containing regimens were most effective in terms of producing a complete response, particularly when olanzapine was added to a triplet regimen of an NK1RA, a 5HT3RA, and dexamethasone. These findings were supported by the results of a prospective randomised study published in 2020, which directly compared an olanzapine-containing four-drug regimen with a standard triplet antiemetic regimen (consisting of aprepitant, ondansetron, and dexamethasone) for the prevention of CINV in patients receiving AC chemotherapy.9
 
Here, we conducted a post-hoc analysis through retrospective assessment of individual patient data from two previously reported prospective antiemetic studies that involved Chinese patients with breast cancer.9 10 We hypothesised that a four-drug antiemetic regimen (consisting of an NK1RA, a 5HT3RA, dexamethasone, and olanzapine) would remain superior to a three-drug regimen (consisting of an NK1RA, a 5HT3RA, and dexamethasone) that included NEPA as a combination NK1RA and 5HT3RA agent. The primary objective was to compare the efficacies of olanzapine- and NEPA-containing antiemetic regimens in terms of controlling CINV during the first cycle of AC. The secondary objectives were: (1) to assess quality of life (QOL) outcomes in patients receiving these treatments during the first cycle of AC, and (2) to assess emesis control outcomes in patients receiving these treatments over multiple cycles of AC.
 
Methods
Patients
This study constituted a post-hoc analysis of data from two recently reported prospective studies. The first prospective study investigated emesis outcomes in patients with breast cancer who received a standard triplet antiemetic regimen (ie, aprepitant, ondansetron, and dexamethasone) with or without olanzapine9; after the first study, a second prospective study was conducted to assess the antiemetic efficacy of NEPA and dexamethasone.10 These studies were conducted with institutional ethics approval and were registered at ClinicalTrials.gov (NCT03386617 and NCT03079219, respectively). For the post-hoc analysis, data were extracted from the first study regarding patients who received an olanzapine plus aprepitant-containing four-drug antiemetic regimen; data were extracted from the second study regarding patients who received NEPA and dexamethasone. These patients were categorised into the ‘olanzapine’ and ‘NEPA’ groups, respectively.
 
Inclusion criteria were similar for the two studies. Specifically, patients were eligible if they were women of Chinese ethnicity, were aged >18 years, had early-stage breast cancer, and planned to receive a regimen of (neo)adjuvant AC. All study participants were required to read, understand, and complete study questionnaires and diaries in Chinese. Exclusion criteria included abnormal bone marrow, renal, or hepatic functions; receipt or planned receipt of radiation therapy to the abdomen or pelvis within 7 days prior to initial administration of study treatment; presence of grade 2 to 3 nausea, as defined by the National Cancer Institute Common Terminology Criteria for Adverse Events version 4.0,11 or vomiting within 24 hours prior to initial administration of the study treatment; presence of an active infection or any uncontrolled disease; a history of illicit drugs, including marijuana or alcohol abuse; mental incapacitation; and/or presence of a clinically significant emotional or psychiatric disorder. Written consent was provided by eligible patients prior to enrolment in the studies.
 
Study treatment
Patients in the olanzapine group received olanzapine 10 mg, aprepitant 125 mg, dexamethasone 12 mg, and ondansetron 8 mg before chemotherapy on day 1; they also received ondansetron 8 mg 8 hours after chemotherapy. Subsequently, they received aprepitant 80 mg daily on days 2-3 and olanzapine 10 mg daily on days 2-5.
 
Patients in the NEPA group received one capsule of NEPA (netupitant 300 mg/palonosetron 0.50 mg) with dexamethasone 12 mg before chemotherapy on day 1. Subsequently, they received dexamethasone 4 mg twice per day on days 2-3.
 
Study assessments
At the initiation of chemotherapy on day 1, individual patients were provided a diary to record the date and time of their symptoms of vomiting and nausea for 120 hours after the AC infusion; the use of any rescue medication was also recorded. On days 2-6, patients rated their symptoms of nausea for the previous 24 hours using a visual analogue scale (in which 0 mm implied no nausea, whereas 100 mm implied nausea that was ‘as bad as it could be’). Additionally, on day 1 (before infusion of AC) and day 6 (after completion of the diary), patients completed the Functional Living Index-Emesis (FLIE) questionnaire. A research nurse/assistant called individual patients on days 2-6 to remind them to take the study medications, complete the patient diary, and complete the FLIE questionnaire.
 
Assessment of efficacy and safety
Antiemetic efficacy was measured across three overlapping time periods. The ‘acute’ phase comprised 0 to 24 hours from the infusion of AC; the ‘delayed’ phase comprised 24 to 120 hours from the infusion of AC; the ‘overall’ phase comprised 0 to 120 hours from the infusion of AC.
 
Variables used to assess antiemetic efficacy were ‘complete response’, ‘no vomiting’, ‘no significant nausea’, ‘no nausea’, ‘no use of rescue therapy’, ‘complete protection’, and ‘total control’; definitions of these variables are provided in Table 1. The proportions of patients who exhibited these variables were recorded separately. Additionally, the ‘time to first vomiting’ in cycle 1 was determined using information recorded in patients’ diaries.
 

Table 1. Definitions of variables used to assess antiemetic efficacy
 
Quality of life was evaluated using the Chinese version of self-reported FLIE questionnaires from individual patients.12 The FLIE questionnaire consists of a nausea domain (9 items) and a vomiting domain (9 items). All scores were transformed to ensure that higher scores indicated worse impact on QOL.
 
Statistical analyses
A modified intention-to-treat approach was used for all efficacy analyses; specifically, analyses included patients who had received chemotherapy, had completed the study procedures from 0 to 120 hours in cycle 1 of AC, and had no major protocol violations.
 
To achieve the primary objective of this study, the efficacies of the two antiemetic regimens were based on the proportions (including 95% confidence intervals) of patients who achieved complete response during the acute, delayed, and overall phases after AC infusion in cycle 1. Other parameters compared in cycle 1 of AC were ‘time to first vomiting’, ‘no vomiting’, ‘no significant nausea’, no nausea’, ‘no use of rescue therapy’, ‘complete protection’, and ‘total control’.
 
To achieve the secondary objectives, QOL was compared between the two antiemetic regimens based on assessments of the nausea domain, vomiting domain, and total score (sum of nausea and vomiting domains) of the FLIE questionnaire during cycle 1 of AC. Emesis control over multiple cycles was compared between the two antiemetic regimens by assessing the proportions (including 95% confidence intervals) of patients who achieved ‘complete response’, ‘complete protection’, and ‘total control’ in the acute, delayed, and overall phases.
 
Comparisons between the two antiemetic regimens were made using the Wilcoxon rank-sum test for continuous data and Pearson’s Chi squared test for dichotomous data. Two-sided P values <0.05 were considered statistically significant. The SAS Software version 9.4 (SAS Institute, Cary [NC], United States) was used for analyses.
 
Results
Patient characteristics
Data from 120 patients were included in this study; 60 patients each were enrolled in the NEPA and olanzapine groups. Fifty-six patients (93.3%) in the olanzapine group completed all four cycles of AC, whereas 60 patients (100%) in the NEPA group completed all four cycles of AC.
 
Patient characteristics, including characteristics that could potentially affect CINV, are shown in Table 2. The olanzapine and NEPA groups had very similar patient characteristics, with median ages of 54.5 and 56 years, respectively. Nearly two-thirds of patients in each group had Stage II breast cancer (63.3% and 66.7%, respectively). The percentage of patients with a history of motion sickness was higher in the NEPA group (35%) than in the olanzapine group (16.7%). Furthermore, 30% of patients in the NEPA group and 20% of patients in the olanzapine group received AC as neoadjuvant treatment.
 

Table 2. Baseline characteristics of Chinese patients with early-stage breast cancer included in this analysis
 
Efficacy assessment
Antiemetic efficacies during cycle 1 of AC in the olanzapine and NEPA groups are shown in Table 3. Complete response rates in acute, delayed, and overall phases in cycle 1 did not differ between groups. In the acute phase, the olanzapine group exhibited a higher rate of ‘no use of rescue therapy’ (olanzapine vs NEPA: 96.7% vs 85.0%, P=0.0225). No parameters differed between groups in the delayed phase. In the overall phase, the olanzapine group exhibited significantly higher rates of ‘no use of rescue therapy’ (91.7% vs 76.7%, P=0.0244) and ‘no significant nausea’ (91.7% vs 78.3%, P=0.0408).
 

Table 3. Comparison of antiemetic efficacy during cycle 1 of doxorubicin/cyclophosphamide between olanzapine and netupitant/palonosetron groups
 
The median time to first vomiting was not reached in either group (P=0.3902). Quality of life results during cycle 1 of AC in the olanzapine and NEPA groups, determined using the FLIE questionnaire, are shown in the Figure. There were no significant differences in the nausea domain, vomiting domain, or total score of the FLIE questionnaire between the two groups.
 

Figure. Comparison of quality of life (assessed using Functional Living Index-Emesis questionnaire) throughout cycle 1 of doxorubicin/cyclophosphamide between olanzapine and netupitant/palonosetron groups
 
Antiemetic efficacies over multiple cycles of AC in the olanzapine and NEPA groups are shown in Table 4. In the acute phase, the NEPA group exhibited significantly higher rates of total control in cycle 2 (olanzapine vs NEPA: 59.6% vs 81.7%, P=0.0087) and cycle 4 (63.2% vs 86.7%, P=0.0032). No parameters differed between groups in the delayed phase. In the overall phase, the NEPA group exhibited significantly higher rates of total control in cycle 3 (55.4% vs 73.3%, P=0.0430) and cycle 4 (54.4% vs 75.0%, P=0.0195).
 

Table 4. Comparison of complete response, complete protection, and total control over multiple cycles between olanzapine and netupitant/palonosetron groups
 
Discussion
Chemotherapy-induced nausea and vomiting is a frustrating adverse effect for patients receiving anticancer treatment.13 The administration of optimal antiemetic prophylaxis can help to maintain QOL, while potentially improving patient compliance in terms of completing planned therapies. In current antiemetic prophylaxis guidelines, the European Society of Medical Oncology/Multinational Association of Supportive Care in Cancer, the American Society of Clinical Oncology, and the United States National Comprehensive Cancer Network offer several options regarding antiemetic regimens for patients receiving AC(-like) chemotherapy. These options mainly involve the combination of a 5HT3RA and corticosteroids, with or without an NK1RA and olanzapine.2 3 4 In particular, the incorporation of olanzapine, an antipsychotic drug with antagonistic effects on various receptors (eg, dopamine and serotonin receptors),14 is increasingly regarded as a component of antiemetic prophylaxis for patients receiving anticancer treatment.
 
In an attempt to identify the best antiemetic regimen, Yokoe et al8 conducted a meta-analysis of randomised trials that tested various antiemetic regimens. The results indicated that olanzapine-based regimens demonstrated the best efficacy. Specifically, olanzapine in combination with an NK1RA, a 5HT3RA, and dexamethasone exhibited the greatest efficacy; other olanzapine-containing regimens (consisting of a 5HT3RA and dexamethasone) were also superior to regimens that lacked olanzapine. Moreover, even in the presence of earlier NK1RAs (eg, aprepitant, fosaprepitant, or rolapitant), regimens lacking olanzapine remained inferior.
 
Similar to the findings with olanzapine, Yokoe et al8 reported that triplet antiemetics involving NEPA were superior to conventional NK1RAs (eg, aprepitant, fosaprepitant, or rolapitant). Furthermore, Zhang et al15 directly compared NEPA-based antiemetic regimens with aprepitant-based triplet regimens in a randomised study that involved 800 patients who underwent administration of a cisplatin-containing regimen. Their results revealed that patients receiving NEPA and dexamethasone exhibited similar control of CINV, compared with patients receiving aprepitant, granisetron, and dexamethasone; however, NEPA-treated patients had a significantly lower requirement for rescue therapy. Additionally, in a recent study focused on patients with breast cancer who were undergoing AC chemotherapy, patients who received NEPA and dexamethasone demonstrated significantly higher rates of complete response, complete protection, and total control with enhanced QOL, compared to historical controls who received aprepitant, ondansetron, and dexamethasone; these benefits persisted over multiple cycles of chemotherapy.10
 
To our knowledge, no study has directly compared olanzapine- and NEPA-containing regimens. Using an indirect comparison approach, the present study showed that the olanzapine-based regimen had higher rates of ‘no use of rescue therapy’ and ‘no significant nausea’ in cycle 1 of AC, compared to the NEPA-based regimen. In contrast, assessments in subsequent cycles revealed that the NEPA-based regimen led to higher rates of total control in the acute phase (cycles 2 and 4) and the overall phase (cycles 3 and 4). The lack of difference in QOL between the two groups of patients may be related to the difference in adverse-effect profiles of the antiemetics used. For instance, the continued use of dexamethasone on days 2-3 in the NEPA group may have affected QOL among those patients because of its effects on mood, insomnia, gastrointestinal symptoms, and metabolic profiles.16 Indeed, a recent meta-analysis showed that, among patients receiving AC or moderately emetogenic chemotherapy, 3 days of dexamethasone did not provide additional benefit compared to 1 day of the agent.17 However, olanzapine has been associated with sedation and somnolence.18 Thus, after the completion of a phase 2 trial in Japan that suggested olanzapine was more effective at 5 mg than at 10 mg,19 the same group of investigators conducted a phase 3 study in which they tested the addition of daily olanzapine 5 mg to an aprepitant-based three-drug regimen; the results showed that, even at a lower dose of olanzapine, the olanzapine-containing regimen remained more efficacious than the olanzapine-free regimen for patients receiving cisplatin.20 Other adverse effects have been reported. Our analysis of olanzapine in combination with aprepitant, ondansetron and dexamethasone revealed a significantly higher incidence of grade ≥2 neutropenia in the olanzapine arm than in the standard arm, although this altered incidence was not associated with a significant difference in neutropenic fever.9 A few cases of olanzapine-induced neutropenia have been reported21; additionally, a recent randomised antiemetic study showed that patients who received an olanzapine-containing regimen had a higher frequency of severe neutropenia (without an increased incidence of neutropenic fever).22 Although the underlying mechanism remains unknown, the results of the aforementioned Japanese study20 suggest that olanzapine 5 mg could reduce the incidence of neutropenia. In contrast, in our previous trial regarding a NEPA-based regimen, we found that patients in the NEPA arm had significantly lower incidences of grade ≥2 neutropenia and neutropenic fever, compared to historical controls who received an aprepitant-based regimen.9 10
 
This study had some potential limitations. First, dexamethasone was only used for 1 day in the olanzapine-based regimen, whereas it was administered for 3 days in the NEPA-based regimen; this difference may have influenced the findings. Second, the use of data from two separate studies may have affected the generalisability of the findings because of slight variations in patient characteristics; the lack of blinding in both studies also increased the potential for patient-related reporting biases. Nonetheless, the original studies were consecutively conducted during the period from 2017 to 2019; both the data from Chinese patients enrolled in a homogenous group with early-stage breast cancer who were receiving (neo)adjuvant AC chemotherapy and the present analysis were analysed based on individual patient data. These factors support the validity of our comparison approach.
 
Conclusion
In conclusion, the present findings do not conclusively support the superiority of either the olanzapine-based regimen or the NEPA-based regimen in terms of antiemetic efficacy or QOL among patients with breast cancer who are receiving AC. Our previous study demonstrated that aprepitant has a limited effect when used with a 5HT3RA and dexamethasone23; we also found that NEPA was superior to aprepitant.10 Overall, the available data suggest that olanzapine-containing antiemetic regimens can be used without aprepitant, particularly when seeking to reduce medical expenses. Moreover, the available data support the previous conclusion that, in parts of the world where socio-economic limitations restrict the availability of NK1RAs, the use of olanzapine combined with a 5HT3RA and dexamethasone may be an effective low-cost alternative antiemetic regimen.8 24 Antiemetic efficacy may be enhanced if NEPA is administered in combination with dexamethasone and olanzapine as a four-drug antiemetic regimen; however, the efficacy of an olanzapine plus NEPA regimen in terms of controlling CINV should be confirmed in a trial setting.
 
Author contributions
Concept or design: W Yeo.
Acquisition of data: FKF Mo, W Yeo.
Analysis or interpretation of data: W Yeo, CCH Yip, FKF Mo.
Drafting of the manuscript: W Yeo, CCH Yip.
Critical revision of the manuscript for important intellectual content: L Li, TKH Lau, VTC Chan, CCH Kwok, JJS Suen, FKF Mo.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
W Yeo has been involved in the Chemotherapy-Induced Nausea and Vomiting (CINV) Network in Asia and has provided lectures on CINV at events organised by Mundipharma International Limited, which supported the design of the NEPA study10 analysed in this post-hoc analysis but had no role in the present comparative analysis, data collection and analysis, decision to publish, or preparation of the manuscript.
 
Acknowledgement
We thank Ms Dong KT Lai, Ms Elizabeth Pang, Ms Vivian Chan, and Ms Maggie Cheung of the Department of Clinical Oncology, The Chinese University of Hong Kong for their support in contributing to patient enrolment and study monitoring.
 
Declaration
Data from this study were presented at the European Society of Medical Oncology Asia Virtual Congress 2020 on 27 November 2020.
 
Funding/support
This study was supported by an education grant from Madam Diana Hon Fun Kong Donation for Cancer Research (Grant No.: CUHK Project Code 7104870). The Donation had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
 
Ethics approval
The studies examined in this post-hoc analysis were approved by The Joint Chinese University of Hong Kong–New Territories East Cluster Institution Review Board of The Chinese University of Hong Kong and the Hong Kong Hospital Authority, and the Kowloon West Cluster Research Ethics Committee of the Hong Kong Hospital Authority (Ref No.: CREC 2016.013, CREC 2017.1609 and KW/FR-18-019[119-19]). All patient data in this study were anonymous and were based on the abovementioned reported studies. There was no additional work on retrieving patient records in this study.
 
References
1. Early Breast Cancer Trialists’ Collaborative Group (EBCTCG); Peto R, Davies C, et al. Comparisons between different polychemotherapy regimens for early breast cancer: meta-analyses of long-term outcome among 100,000 women in 123 randomised trials. Lancet 2012;379:432-44. Crossref
2. National Comprehensive Cancer Network. NCCN Clinical Practice Guidelines in Oncology, Antiemesis Version 1; 2019.
3. Herrstedt J, Roila F, Warr D, et al. 2016 updated MASCC/ESMO consensus recommendations: prevention of nausea and vomiting following high emetic risk chemotherapy. Support Care Cancer 2017;25:277-88. Crossref
4. Hesketh PJ, Kris MG, Basch E, et al. Antiemetics: American Society of Clinical Oncology Clinical Practice Guideline update. J Clin Oncol 2017;35:3240-61. Crossref
5. Popovic M, Warr DG, Deangelis C, et al. Efficacy and safety of palonosetron for the prophylaxis of chemotherapy-induced nausea and vomiting (CINV): a systematic review and meta-analysis of randomized controlled trials. Support Care Cancer 2014;22:1685-97. Crossref
6. Thomas AG, Stathis M, Rojas C, Slusher BS. Netupitant and palonosetron trigger NK1 receptor internalization in NG108-15 cells. Exp Brain Res 2014;232:2637-44. Crossref
7. Stathis M, Pietra C, Rojas C, Slusher BS. Inhibition of substance P-mediated responses in NG108-15 cells by netupitant and palonosetron exhibit synergistic effects. Eur J Pharmacol 2012;689:25-30. Crossref
8. Yokoe T, Hayashida T, Nagayama A, et al. Effectiveness of antiemetic regimens for highly emetogenic chemotherapy-induced nausea and vomiting: a systematic review and network meta-analysis. Oncologist 2019;24:e347-57. Crossref
9. Yeo W, Lau TK, Li L, et al. A randomized study of olanzapine-containing versus standard antiemetic regimens for the prevention of chemotherapy-induced nausea and vomiting in Chinese breast cancer patients. Breast 2020;50:30-8. Crossref
10. Yeo W, Lau TK, Kwok CC, et al. NEPA efficacy and tolerability during (neo)adjuvant breast cancer chemotherapy with cyclophosphamide and doxorubicin. BMJ Support Palliat Care 2022;12:e264-70.
11. National Cancer Institute. Cancer Therapy Evaluation Program. 2021. Available from: https://ctep.cancer.gov/protocoldevelopment/electronic_applications/ctc.htm#ctc_40. Accessed 3 Feb 2023.
12. Martin AR, Pearson JD, Cai B, Elmer M, Horgan K, Lindley C. Assessing the impact of chemotherapy-induced nausea and vomiting on patients’ daily lives: a modified version of the Functional Living Index-Emesis (FLIE) with 5-day recall. Support Care Cancer 2003;11:522-7. Crossref
13. Griffin AM, Butow PN, Coates AS, et al. On the receiving end. V: patient perceptions of the side effects of cancer chemotherapy in 1993. Ann Oncol 1996;7:189-95. Crossref
14. Bymaster FP, Nelson DL, DeLapp NW, et al. Antagonism by olanzapine of dopamine D1, serotonin2, muscarinic, histamine H1 and alpha 1-adrenergic receptors in vitro. Schizophr Res 1999;37:107-22. Crossref
15. Zhang L, Lu S, Feng J, et al. A randomized phase III study evaluating the efficacy of single-dose NEPA, a fixed antiemetic combination of netupitant and palonosetron, versus an aprepitant regimen for prevention of chemotherapy-induced nausea and vomiting (CINV) in patients receiving highly emetogenic chemotherapy (HEC). Ann Oncol 2018;29:452-8. Crossref
16. Roila F, Ruggeri B, Ballatori E, et al. Aprepitant versus metoclopramide, both combined with dexamethasone, for the prevention of cisplatin-induced delayed emesis: a randomized, double-blind study. Ann Oncol 2015;26:1248-53. Crossref
17. Okada Y, Oba K, Furukawa N, et al. One-day versus three-day dexamethasone in combination with palonosetron for the prevention of chemotherapy-induced nausea and vomiting: a systematic review and individual patient data-based meta-analysis. Oncologist 2019;24:1593-600. Crossref
18. Navari RM, Qin R, Ruddy KJ, et al. Olanzapine for the prevention of chemotherapy-induced nausea and vomiting. N Engl J Med 2016;375:134-42. Crossref
19. Abe M, Hirashima Y, Kasamatsu Y, et al. Efficacy and safety of olanzapine combined with aprepitant, palonosetron, and dexamethasone for preventing nausea and vomiting induced by cisplatin-based chemotherapy in gynecological cancer: KCOG-G1301 phase II trial. Support Care Cancer 2016;24:675-82. Crossref
20. Hashimoto H, Abe M, Tokuyama O, et al. Olanzapine 5 mg plus standard antiemetic therapy for the prevention of chemotherapy-induced nausea and vomiting (J-FORCE): a multicentre, randomised, double-blind, placebo-controlled, phase 3 trial. Lancet Oncol 2020;21:242-9. Crossref
21. Malhotra K, Vu P, Wang DH, Lai H, Faziola LR. Olanzapine-induced neutropenia. Ment Illn 2015;7:5871. Crossref
22. Gjafa E, Ng K, Grunewald T, et al. Neutropenic sepsis rates in patients receiving bleomycin, etoposide and cisplatin chemotherapy using olanzapine and reduced doses of dexamethasone compared to a standard antiemetic regimen. BJU Int 2021;127:205-11. Crossref
23. Yeo W, Mo FK, Suen JJ, et al. A randomized study of aprepitant, ondansetron and dexamethasone for chemotherapy-induced nausea and vomiting in Chinese breast cancer patients receiving moderately emetogenic chemotherapy. Breast Cancer Res Treat 2009;113:529-35. Crossref
24. Babu G, Saldanha SC, Kuntegowdanahalli Chinnagiriyappa L, et al. The efficacy, safety, and cost benefit of olanzapine versus aprepitant in highly emetogenic chemotherapy: a pilot study from South India. Chemother Res Pract 2016;2016:3439707. Crossref

Chest computed tomography analysis of lung sparing morphology: differentiation of COVID-19 pneumonia from influenza pneumonia and bacterial pneumonia using the arched bridge and vacuole signs

© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE
Chest computed tomography analysis of lung sparing morphology: differentiation of COVID-19 pneumonia from influenza pneumonia and bacterial pneumonia using the arched bridge and vacuole signs
Tiffany Y So, FRANZCR1; Simon CH Yu, FRCR1; WT Wong, FRCR2; Jeffrey KT Wong, FRCR1; Heather Lee, FRCR3; YX Wang, MMed, PhD1
1 Department of Imaging and Interventional Radiology, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong
2 Department of Anaesthesia and Intensive Care, Faculty of Medicine, The Chinese University of Hong Kong, Hong Kong
3 Department of Diagnostic Radiology, Princess Margaret Hospital, Hong Kong
 
Corresponding author: Prof YX Wang (yixiang_wang@cuhk.edu.hk)
 
 Full paper in PDF
 
Abstract
Introduction: This study evaluated the arched bridge and vacuole signs, which constitute morphological patterns of lung sparing in coronavirus disease 2019 (COVID-19), then examined whether these signs could be used to differentiate COVID-19 pneumonia from influenza pneumonia or bacterial pneumonia.
 
Methods: In total, 187 patients were included: 66 patients with COVID-19 pneumonia, 50 patients with influenza pneumonia and positive computed tomography findings, and 71 patients with bacterial pneumonia and positive computed tomography findings. Images were independently reviewed by two radiologists. The incidences of the arched bridge sign and/or vacuole sign were compared among the COVID-19 pneumonia, influenza pneumonia, and bacterial pneumonia groups.
 
Results: The arched bridge sign was much more common among patients with COVID-19 pneumonia (42/66, 63.6%) than among patients with influenza pneumonia (4/50, 8.0%; P<0.001) or bacterial pneumonia (4/71, 5.6%; P<0.001). The vacuole sign was also much more common among patients with COVID-19 pneumonia (14/66, 21.2%) than among patients with influenza pneumonia (1/50, 2.0%; P=0.005) or bacterial pneumonia (1/71, 1.4%; P<0.001). The signs occurred together in 11 (16.7%) patients with COVID-19 pneumonia, but they did not occur together in patients with influenza pneumonia or bacterial pneumonia. The arched bridge and vacuole signs predicted COVID-19 pneumonia with respective specificities of 93.4% and 98.4%.
 
Conclusion: The arched bridge and vacuole signs are much more common in patients with COVID-19 pneumonia and can help differentiate COVID-19 pneumonia from influenza and bacterial pneumonia.
 
 
New knowledge added by this study
  • On computed tomography, the arched bridge sign is characterised by ground-glass opacities or consolidation with an arched margin outlining unaffected lung parenchyma. The vacuole sign refers to a focal oval or round lucent area (typically <5 mm) that is present within ground-glass opacities or sites of consolidation.
  • These signs were commonly observed in patients with coronavirus disease 2019 (COVID-19) in Hong Kong, consistent with data from other populations.
  • Patients with COVID-19 pneumonia are much more likely to exhibit the arched bridge sign and/or the vacuole sign, compared with patients who have influenza pneumonia or bacterial pneumonia.
Implications for clinical practice or policy
  • The presence of the arched bridge sign and/or the vacuole sign on computed tomography may support a diagnosis of COVID-19 pneumonia and assist in differentiation from other types of pneumonia.
  • The duration of total hospitalisation did not differ between patients with COVID-19 pneumonia who had and did not have these two signs, suggesting that they do not indicate a better or worse prognosis if appropriate treatments are administered.
 
 
Introduction
A diagnosis of coronavirus disease 2019 (COVID-19) is made on the basis of epidemiological and clinical history, as well as the results of severe acute respiratory syndrome coronavirus 2 real-time reverse transcriptase polymerase chain reaction (RT-PCR) testing. Chest computed tomography (CT) has been proposed as a useful alternative investigation method for COVID-19 diagnosis or triage, particularly in healthcare settings with restricted access to RT-PCR testing and in the context of lower RT-PCR sensitivity during early stages of the disease; it may also be useful for imaging-mediated evaluation of disease severity and progression.1 2 The most common CT findings in early-stage COVID-19 pneumonia (illness days 0-5) are pure ground-glass opacities (GGOs); the second most common finding is consolidation.3 4 In the later stages (illness days 6-17), findings usually evolve to a combination of GGOs, consolidation, and reticular opacities with architectural distortion.4 These imaging features are not specific to COVID-19 pneumonia; they can overlap with other types of viral or bacterial pneumonia, particularly influenza pneumonia, as well as other non-infectious inflammatory lung diseases.5 6 Influenza, one of the most common causes of viral pneumonia,7 and bacterial pneumonia, historically the most common type of community-acquired pneumonia worldwide,8 maintained high incidences during the early COVID-19 pandemic when this study was conducted; thus, they had the potential to substantially contribute to hospitalisations in this period. However, COVID-19 pneumonia and other types of viral or bacterial pneumonia distinctly differ in terms of their disease course, temporal progression, and available therapeutics9 10 11; thus, there is a need for early and accurate differentiation among these entities.
 
Studies in 2020 revealed several CT imaging features that can aid in differential diagnosis. Compared with influenza pneumonia, patients with COVID-19 pneumonia are more likely to exhibit a peripheral distribution,12 13 14 patchy combination of GGOs and consolidation,15 fine reticular opacities,16 and vascular thickening or enlargement16 17; patients with influenza pneumonia are more likely to exhibit nodules,18 tree-in-bud sign,18 bronchial wall thickening,15 lymphadenopathy,16 and pleural effusions.12 In the past, diffuse airspace consolidation, centrilobular nodules, bronchial wall thickening, and mucous impaction19 have been identified as typical signs of bacterial pneumonia. Nevertheless, CT assessment of COVID-19 generally remains challenging, with reported accuracies for radiologists ranging from 60 to 83%16 in terms of differentiating patients with COVID-19 pneumonia from patients with influenza pneumonia; considering these rates, further studies of relevant imaging findings are needed.
 
A report by Wu et al20 highlighted the arched bridge sign, which may be a distinct CT feature of COVID-19 pneumonia. In their analysis of 11 patients with COVID-19 pneumonia, the sign was present in 72.7%.20 The arched bridge sign refers to a specific pattern of GGOs or consolidation, commonly in a subpleural location, which forms an arched contour with a smooth concave margin towards the pleural side. The arched margin outlines the spared parenchyma between the GGOs or consolidation and the pleural surface. Another reported sign, regarded as the vacuole sign,21 22 23 24 is presumably based on the morphological pattern of parenchymal sparing in areas of affected lung. The vacuole sign refers to a focal oval or round lucent area (typically <5 mm) that is observed within GGOs or sites of consolidation. In clinical practice, we often observed these two novel signs on CT scans of patients with COVID-19 pneumonia. We hypothesised that these two signs are common in patients with COVID-19 pneumonia and thus could be used to differentiate such pneumonia from other types of infection-related pneumonia. However, considering the limited prior evidence (solely from small retrospective studies20 21 22 23 24) regarding the prevalence of the vacuole sign in COVID-19 pneumonia, and because the arched bridge sign has—to our knowledge—only been reported in a single previous publication,20 additional assessments of these signs are needed. The utilities of the arched bridge and vacuole signs in COVID-19 pneumonia have not been directly assessed in prior reports, nor have they been compared between COVID-19 pneumonia and other types of infection-related pneumonia. In this study, we evaluated the arched bridge and vacuole signs in patients with COVID-19 pneumonia, then examined whether these signs could be used to differentiate such pneumonia from influenza pneumonia or bacterial pneumonia.
 
Methods
Patients
This retrospective study included consecutive patients who were admitted to two hospitals in Hong Kong (Prince of Wales Hospital and Princess Margaret Hospital) with RT-PCR–confirmed COVID-19, along with positive CT findings, from 24 January 2020 to 16 April 2020. These patients represent most patients with COVID-19 in Hong Kong during the study period, when all patients with confirmed COVID-19 were hospitalised regardless of clinical status; moreover, Princess Margaret Hospital also served as a centralised treatment centre for patients with COVID-19. The study recruitment period reflects the early days of the COVID-19 pandemic in Hong Kong, during which CT examinations were commonly performed during the diagnosis and treatment of patients with COVID-19. All patients with COVID-19 underwent complete PCR-based assessment of multiple respiratory pathogens on admission; patients with COVID-19 were excluded from the present study if they exhibited evidence of other concomitant viral or bacterial respiratory infections.
 
The influenza pneumonia and bacterial pneumonia comparison groups comprised consecutive patients who were admitted to Prince of Wales Hospital in Hong Kong, with pure influenza pneumonia or pure bacterial pneumonia and positive CT findings from 20 February 2018 to 13 January 2020. The diagnosis of pure influenza pneumonia was determined by RT-PCR–mediated detection of influenza A or B viral RNA, in the absence of evidence (eg, respiratory or blood cultures, PCR tests, or serological tests) suggesting concomitant infection with other viral or bacterial pathogens. The diagnosis of pure bacterial pneumonia was determined by positive bacterial culture on sputum or bronchoalveolar lavage, in the absence of evidence suggesting concomitant infection with other viral or bacterial pathogens. Patients with pre-existing lung parenchymal disease (eg, interstitial lung disease) or known lung malignancy were excluded from the study.
 
Image acquisition
Computed tomography scans were performed using 64-section multidetector scanners (LightSpeed VCT or LightSpeed Pro 32, GE Medical Systems, Milwaukee [WI], United States). The following scan parameters were used: voltage, 120 kV; tube current, 50-502 mA; and slice thickness, 0.625 mm or 1.25 mm. Scans were performed with the patient in the supine position during end-inspiration.
 
Image evaluation
All CT images were reviewed in random order by two trained radiologists (TY So and YX Wang) with 7 and 5 years of experience in diagnostic chest imaging, respectively, using a dedicated picture archiving and communication system workstation. Each radiologist was blinded to demographic and clinical information for all patients. The images were independently reviewed by each radiologist, and the consensus findings for any discrepancies from discussion are reported.
 
Each CT image was initially subjected to broad assessment of abnormalities. Subsequently, the arched bridge and vacuole signs were specifically assessed; the presence or absence of each sign was recorded. The arched bridge sign was defined as the presence of GGOs or consolidation with an arched concave margin outlining a region of spared lung; the vacuole sign was defined as the presence of a vacuole-like region of normal lung (<5 mm) within GGOs or sites of consolidation.21
 
For patients with COVID-19 pneumonia and patients with influenza pneumonia, CT findings of GGOs (hazy areas of parenchymal opacities that did not conceal underlying vessels), consolidation (parenchymal opacities that concealed underlying vessels), reticular opacities (coarse linear or curvilinear opacities, interlobular septal thickening, or subpleural reticulation), and crazy paving pattern (GGOs with interlobular and intralobular septal thickening) were recorded. Other signs such as air bronchograms (air-filled bronchi on a background of opaque lung), nodules (small rounded focal opacities <3 cm), cavitation (gas-filled spaces within sites of pulmonary consolidation), bronchiectasis, pleural retraction or thickening, pleural effusion, pericardial effusion, pneumothorax, and mediastinal lymphadenopathy (lymph nodes >1 cm in short-axis diameter) were also recorded. The distributions of pulmonary abnormalities were classified as unilateral or bilateral, and peripheral (involving mainly the peripheral one-third of the lung), central (involving mainly the central two-thirds of the lung), or diffuse (involving both peripheral and central regions). Lobar involvement was also recorded (right upper lobe, right middle lobe, right lower lobe, left upper lobe, and/or left lower lobe). For patients with bacterial pneumonia, only the arched bridge and vacuole signs were recorded. Other CT changes and their distributions were not individually recorded. This component of the analysis was determined based on reports that it is easier to differentiate COVID-19 pneumonia from bacterial pneumonia, whereas it is more difficult to differentiate COVID-19 pneumonia from other types of viral pneumonia.6 25 26 This manuscript was written in accordance with the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines for reporting observational studies.
 
Statistical analysis
Imaging findings were compared using the Chi squared test or Fisher’s exact test, as appropriate, followed by Bonferroni correction. Comparisons of disease stage, severity, and clinical course among patients with COVID-19 who had the arched bridge and/or vacuole signs were performed using the non-parametric Mann-Whitney U test. P values <0.05 were considered indicative of statistical significance. For the arched bridge and vacuole signs, the sensitivity, specificity, positive predictive value, and negative predictive value were calculated, along with the respective 95% confidence intervals. All analyses were conducted using SPSS software (Windows version 25.0; IBM Corp, Armonk [NY], United States).
 
Results
Patients
Among 76 patients with bacterial pneumonia who were admitted for treatment during the study period, five patients with pre-existing lung parenchymal disease were excluded from the analysis: organising pneumonia (n=2), non-specific interstitial pneumonia (n=1), and idiopathic interstitial pneumonia of uncertain subtype (n=2). No patients with COVID-19 required exclusion because of concomitant viral or bacterial infections. The final study population comprised 187 patients: 66 patients with COVID-19 pneumonia, 50 patients with influenza pneumonia, and 71 patients with bacterial pneumonia. The following organisms were detected in patients with bacterial pneumonia: Streptococcus pneumoniae, Staphylococcus aureus, Haemophilus influenzae, Enterococcus spp., Klebsiella pneumoniae, Pseudomonas aeruginosa, Escherichia coli, Stenotrophomonas spp., Serratia spp., Acinetobacter spp., and Moraxella catarrhalis. Demographic and clinical characteristics of the study population are shown in Table 1. Compared with patients in the influenza pneumonia and bacterial pneumonia groups, patients with COVID-19 pneumonia tended to be younger and healthier.
 

Table 1. Comparison of demographic characteristics among patients with pneumonia in Hong Kong
 
Arched bridge and vacuole signs
The arched bridge and vacuole signs were present in 42 (63.6%) and 14 (21.2%) of 66 patients with COVID-19 pneumonia, respectively (Table 2). The arched bridge sign was commonly in a subpleural location, and there was a smooth arched margin outlining the underside of the GGO or consolidation in all cases (Fig a and b). The vacuole sign was present with GGOs or sites of consolidation in various locations (Fig c and d). The arched bridge sign was much more common in patients with COVID-19 pneumonia than in patients with influenza pneumonia (63.6% vs 8.0%) or bacterial pneumonia (63.6% vs 5.6%, P<0.001). Similarly, the vacuole sign was much more common in patients with COVID-19 pneumonia than in patients with influenza pneumonia (21.2% vs 2.0%, P=0.005) or bacterial pneumonia (21.2% vs 1.4%, P<0.001).
 

Table 2. Comparison of patterns of lung sparing morphology among patients with pneumonia in Hong Kong
 

Figure. (a) The arched bridge sign. Axial computed tomography (CT) image (i) and magnified view of boxed area (ii) in a 56-year-old woman with coronavirus disease 2019 (COVID-19) pneumonia showing an arched ground-glass opacity (GGO) with a sharp underside outlining a semicircular region of spared lung. The typical subpleural location for the sign is evident. (b) Axial CT image (i) and magnified view of boxed area (ii) in a 35-year-old man with COVID-19 pneumonia showing a subpleural GGO with a sharp arched margin outlining two adjacent regions of spared lung, demonstrating a double arched bridge appearance. Other GGOs are also evident involving both central and peripheral lung parenchyma. (c) Vacuole sign. Axial CT image (i) and magnified view of boxed area (ii) in a 55-year-old woman with COVID-19 pneumonia showing a subpleural GGO with a few very small vacuole-like regions of sparing in the affected region. (d) Axial CT image (i) and magnified view of boxed area (ii) in a 56-year-old man with COVID-19 pneumonia showing a subpleural GGO with multiple very small vacuoles
 
The arched bridge and vacuole signs occurred together in 11 (16.7%) of 66 patients with COVID-19 pneumonia, but they did not occur together in any patients with influenza pneumonia or bacterial pneumonia. Additionally, a review of the five excluded patients with bacterial pneumonia and concurrent pre-existing lung parenchymal disease revealed that none of those patients exhibited the arched bridge sign or the vacuole sign.
 
In this study, the arched bridge and vacuole signs exhibited high specificities (93.4% and 98.4%, respectively) in terms of identifying COVID-19 pneumonia (Table 3), with moderate or low sensitivities (63.6% and 21.2%, respectively). They also exhibited high positive predictive values (84.0% and 87.5%, respectively) and high or moderate negative predictive values (82.5% and 69.6%, respectively).
 

Table 3. Diagnostic performances of the arched bridge and vacuole signs for coronavirus disease 2019 pneumonia
 
The relationships of the arched bridge and vacuole signs with disease course are shown in Table 4. Computed tomography was mainly performed during admission, at a mean of 5.3 days after admission, suggesting these two signs generally appeared at an early stage. Comparisons of patients with COVID-19 pneumonia who had and did not have these two signs revealed that the arched bridge sign was associated with more extensive lung involvement (diseased lobes: 4.0 [present] vs 2.4 [absent], P<0.001). This trend was not evident for the vacuole sign (diseased lobes: 3.8 [present] vs 3.3 [absent]). There was no significant difference in the duration of total hospitalisation between patients with COVID-19 who had and did not have these two signs, suggesting they were not associated with a better or worse prognosis if appropriate treatment was administered.
 

Table 4. Comparison of disease stage, severity, and clinical course among patients with coronavirus disease 2019 pneumonia according to arched bridge sign and vacuole sign statuses
 
Other computed tomography findings
Table 5 shows the comparison of other CT findings between COVID-19 pneumonia and influenza pneumonia. No significant differences were observed in the incidences of GGOs, consolidation, reticular opacities, or crazy paving between patients with COVID-19 and patients with influenza pneumonia (all P>0.05). Air bronchograms (P=0.003), nodules (P=0.009), cavitation (P=0.004), bronchiectasis (P<0.001), pleural effusion (P<0.001), pericardial effusion (P=0.032), and mediastinal lymphadenopathy (P<0.001) were significantly more common in patients with influenza pneumonia.
 

Table 5. Comparison of other computed tomography findings between patients with coronavirus disease 2019 pneumonia and patients with influenza pneumonia
 
Abnormalities were more commonly bilateral in patients with COVID-19 pneumonia (77.3%) and patients with influenza pneumonia (96%). The distribution was more likely to be peripheral in patients with COVID-19 pneumonia (51.5% vs 2.0%, P<0.001), and was more likely to be diffuse in patients with influenza pneumonia (98% vs 48.5%, P<0.001). The right upper lobe (P<0.001), right middle lobe (P<0.001), and left upper lobe (P<0.001) were less commonly involved in patients with COVID-19 pneumonia than in patients with influenza pneumonia.
 
Discussion
Arched bridge and vacuole signs
This study evaluated the incidences and diagnostic values of the arched bridge and vacuole signs among patients with COVID-19 pneumonia in a Hong Kong Chinese population. Since the initial description of Wu et al20 in a series of 11 patients with COVID-19, our study is the first to validate the arched bridge sign in patients with COVID-19. To our knowledge, this is also the first study to evaluate the vacuole sign in non–COVID-19–related pneumonia. The arched bridge sign was significantly more common in COVID-19 pneumonia than in influenza pneumonia or bacterial pneumonia. Additionally, the incidences of the vacuole sign and both signs observed in combination were higher (or tended to be higher) in patients with COVID-19 pneumonia than in patients with influenza pneumonia or bacterial pneumonia. Our results imply that these two signs generally appeared at an early stage; the arched bridge sign is more likely to be observed in patients with more severe lung pathology. These results suggest that the arched bridge and vacuole signs can be used in CT-based identification of COVID-19 pneumonia, as well as efforts to differentiate COVID-19 pneumonia from other types of infection-related pneumonia. Currently, chest CT is not recommended for the screening or diagnosis of COVID-19 pneumonia when RT-PCR tests are available. In selected cases, CT can be used to monitor clinical progress and identify complications of the disease. In some scenarios, CT can be a useful alternative investigation method for COVID-19 diagnosis or triage, such as healthcare settings with restricted access to RT-PCR tests.27 28 When these two signs are observed on CTs performed for COVID-19 pneumonia or other indications during the COVID-19 pandemic, physicians should carefully consider a diagnosis of COVID-19 pneumonia. However, our findings indicated there was no significant difference in the duration of total hospitalisation between patients with COVID-19 pneumonia who had and did not have these two signs, suggesting that they are not indicative of a better or worse prognosis if appropriate treatments are administered.
 
The underlying pathophysiological mechanisms behind these signs remain unclear. However, the morphological appearances of the arched bridge and vacuole signs may indicate different pathophysiological mechanisms of lung sparing that occur during infection-related pneumonia. Histopathological examinations of lung biopsy tissues from patients with COVID-19 pneumonia have provided evidence of variations in diffuse alveolar damage.29 30 The curved concave margin in the arched bridge sign may be the result of secondary pulmonary lobule sparing, with the interlobular septum of the secondary pulmonary nodule forming some resistance to the spread of infection among lobules.20 In contrast, the vacuole sign (ie, a very small focal lucent area) may reflect a spared alveolar cluster or dilated alveolar sac within an area of otherwise involved lung.21 23 Zhang et al23 reported that the vacuole sign was often present in patients with advanced COVID-19 pneumonia, where alveolar sac dilation could result from damage to the alveolar wall.
 
The incidence of the arched bridge sign in patients with COVID-19 (63.6%) was similar to the incidence reported by Wu et al20 (72.7%). The incidence of the vacuole sign (21.2%) in patients with COVID-19 pneumonia is also within the range reported in prior studies describing this sign (17-66%).21 22 23 24 Notably, three additional case series have revealed spared airspaces in patients with COVID-19 pneumonia, comprising ‘round cystic changes’31, ‘cystic air spaces’32 and ‘cavity signs’,33 with prevalences of 10 to 30%; these phenomena may include the vacuole sign. However, these case series did not include formal definitions of the findings. The differences in definitions of the vacuole sign (and phenomena that include the sign) may also explain the disparate prevalences (17%-66%, as noted above) reported in the literature.
 
The arched bridge and vacuole signs differentiated COVID-19 pneumonia from influenza pneumonia and bacterial pneumonia with high specificities and high positive predictive values, suggesting that these signs can help to provide a specific imaging diagnosis of COVID-19 pneumonia. When encountering inconclusive CT features of COVID-19 pneumonia, these signs can be identified with minimal additional effort; their presence may be sufficient to increase suspicion or add to the evidence confirming a diagnosis of COVID-19 pneumonia. The respective sensitivities of the arched bridge and vacuole signs were moderate (63.6%) and low (21.2%); the arched bridge sign may be more useful in this context. Our findings suggest that the combined presence of the arched bridge and vacuole signs strongly supports a diagnosis of COVID-19 pneumonia.
 
Consistent with previous studies, the presence of nodules, cavitations, bronchiectasis, pleural effusion, pericardial effusion, and/or mediastinal lymphadenopathy was uncommon in patients with COVID-19 pneumonia; these features were more common in patients with influenza pneumonia.12 16 17 18 34 35 Our results indicated that COVID-19-related abnormalities on CT were generally bilateral and peripheral, compatible with the findings in prior studies.12 13 14
 
Limitations
This study had several limitations. First, it used a retrospective design, and patients were imaged in a cross-sectional manner at various time intervals after symptom onset. Computed tomography was not regularly performed, which hindered the monitoring or analysis of imaging signs over time. Second, CT was not routinely performed for patients with influenza pneumonia or bacterial pneumonia; it was performed based on clinical judgement, generally because of patient deterioration or poor response to treatment. We did not assess differences in the clinical features of patients with influenza pneumonia and patients with bacterial pneumonia between patients who did and did not undergo CT. Third, we attempted to implement diversity in our analysis of COVID-19 pneumonia by comparisons with influenza pneumonia and bacterial pneumonia, whereas prior studies have generally been limited to comparisons of COVID-19 pneumonia with influenza pneumonia. However, we did not examine other types of viral pneumonia; we also did not conduct subgroup analysis according to influenza subtype. Additionally, we did not systematically compare the prognoses of patients with non–COVID-19 pneumonia who had and did not have the arched bridge or vacuole signs. This comparison was hindered by the sample size, because these two signs were very uncommon in patients with non–COVID-19 pneumonia. However, additional analysis did not reveal a clear pattern whereby these two signs would be predictive for clinical prognosis in patients with non–COVID-19 pneumonia. Finally, the sample size in this study was moderate. Although the prevalences of the arched bridge and vacuole signs in our patients with COVID-19 pneumonia were consistent with findings in the literature, their diagnostic specificities should be validated in other types of pneumonia. Despite these limitations, the high diagnostic specificities of these CT signs provide insights that will be useful in future studies. Additional work is needed regarding the relationships of these CT signs with clinical status, and our findings require validation in larger and more diverse patient populations.
 
Conclusion
In conclusion, two morphological patterns of lung sparing, namely the arched bridge and vacuole signs, are much more common in patients with COVID-19 pneumonia; they have the potential to differentiate COVID-19 pneumonia from influenza pneumonia and bacterial pneumonia. In this study, these signs had high specificities and positive predictive values for COVID-19 pneumonia. The identification of these signs in clinical practice may be useful for increasing suspicion or providing confirmatory evidence to support a diagnosis of COVID-19 pneumonia.
 
Author contributions
Concept and design: TY So, SCH Yu, JYX Wang.
Acquisition of data: All authors.
Analysis and interpretation of data: All authors.
Drafting of the manuscript: TY So, S Yu, JYX Wang.
Critical revision of the manuscript for important intellectual content: All authors.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
The authors have no conflicts of interest to disclose.
 
Acknowledgement
The authors thank Ms Apurva Sawhney, Department of Imaging and Interventional Radiology, The Chinese University of Hong Kong, for assistance with data collection.
 
Funding/support
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
 
Ethics approval
This study was approved by the Joint Chinese University of Hong Kong–New Territories East Cluster Clinical Research Ethics Committee (REC Ref. No.: 2020.232), which waived the requirement for informed consent due to the retrospective nature of the study. The study was conducted in compliance with the established ethical standards and principles of the Declaration of Helsinki.
 
References
1. Kim H. Outbreak of novel coronavirus (COVID-19): what is the role of radiologists? Eur Radiol 2020;30:3266-7. Crossref
2. Yang W, Sirajuddin A, Zhang X, et al. The role of imaging in 2019 novel coronavirus pneumonia (COVID-19). Eur Radiol 2020;30:4874-82. Crossref
3. Pan F, Ye T, Sun P, et al. Time course of lung changes at chest CT during recovery from coronavirus disease 2019 (COVID-19). Radiology 2020;295:715-21. Crossref
4. Wang Y, Dong C, Hu Y, et al. Temporal changes of CT findings in 90 patients with COVID-19 pneumonia: a longitudinal study. Radiology. 2020;296:E55-64. Crossref
5. Koo HJ, Lim S, Choe J, Choi SH, Sung H, Do KH. Radiographic and CT features of viral pneumonia. Radiographics 2018;38:719-39. Crossref
6. Sun Z, Zhang N, Li Y, Xu X. A systematic review of chest imaging findings in COVID-19. Quant Imaging Med Surg 2020;10:1058-79. Crossref
7. Marcos MA, Esperatti M, Torres A. Viral pneumonia. Curr Opin Infect Dis 2009;2:143-7. Crossref
8. Apisarnthanarak A, Mundy LM. Etiology of community-acquired pneumonia. Clin Chest Med 2005;26:47-55. Crossref
9. Sanders JM, Monogue ML, Jodlowski TZ, Cutrell JB. Pharmacologic treatments for coronavirus disease 2019 (COVID-19): a review. JAMA 2020;323:1824-36. Crossref
10. Wiersinga WJ, Rhodes A, Cheng AC, Peacock SJ, Prescott HC. Pathophysiology, transmission, diagnosis, and treatment of coronavirus disease 2019 (COVID-19): a review. JAMA 2020;324:782-93. Crossref
11. Zayet S, Kadiane-Oussou NJ, Lepiller Q, et al. Clinical features of COVID-19 and influenza: a comparative study on Nord Franche-Comte cluster. Microbes Infect 2020;22:481-8. Crossref
12. Lin L, Fu G, Chen S, et al. CT manifestations of coronavirus disease (COVID-19) pneumonia and influenza virus pneumonia: a comparative study. AJR Am J Roentgenol 2021;216:71-9. Crossref
13. Chung M, Bernheim A, Mei X, et al. CT imaging features of 2019 novel coronavirus (2019-nCoV). Radiology 2020;295:202-7. Crossref
14. Song F, Shi N, Shan F, et al. Emerging 2019 novel coronavirus (2019-nCoV) pneumonia. Radiology 2020;295:210-7. Crossref
15. Wang H, Wei R, Rao G, Zhu J, Song B. Characteristic CT findings distinguishing 2019 novel coronavirus disease (COVID-19) from influenza pneumonia. Eur Radiol 2020;30:4910-7. Crossref
16. Bai HX, Hsieh B, Xiong Z, et al. Performance of radiologists in differentiating COVID-19 from non–COVID-19 viral pneumonia on chest CT. Radiology 2020;296:E45-54. Crossref
17. Yin Z, Kang Z, Yang D, Ding S, Luo H, Xiao E. A comparison of clinical and chest CT findings in patients with influenza A (H1N1) virus infection and coronavirus disease (COVID-19). AJR Am J Roentgenol 2020;215:1065-71. Crossref
18. Liu M, Zeng W, Wen Y, Zheng Y, Lv F, Xiao K. COVID-19 pneumonia: CT findings of 122 patients and differentiation from influenza pneumonia. Eur Radiol 2020;30:5463-9. Crossref
19. Tanaka N, Matsumoto T, Kuramitsu T, et al. High resolution CT findings in community-acquired pneumonia. J Comput Assist Tomogr 1996;20:600-8. Crossref
20. Wu R, Guan W, Gao Z, et al. The arch bridge sign: a newly described CT feature of the coronavirus disease- 19 (COVID-19) pneumonia. Quant Imaging Med Surg 2020;10:1551-8. Crossref
21. Zhou S, Wang Y, Zhu T, Xia L. CT features of coronavirus disease 2019 (COVID-19) pneumonia in 62 patients in Wuhan, China. AJR Am J Roentgenol 2020;214:1287-94. Crossref
22. Sabri YY, Nassef AA, Ibrahim IM, Abd El Mageed MR, Khairy MA. CT chest for COVID-19, a multicenter study— experience with 220 Egyptian patients. Egypt J Radiol Nucl Med 2020;51:144. Crossref
23. Zhang L, Kong X, Li X, et al. CT imaging features of 34 patients infected with COVID-19. Clin Imaging 2020;68:226-31. Crossref
24. Zhou S, Zhu T, Wang Y, Xia L. Imaging features and evolution on CT in 100 COVID-19 pneumonia patients in Wuhan, China. Eur Radiol 2020;30:5446-54. Crossref
25. Elmokadem AH, Bayoumi D, Abo-Hedibah SA, El-Morsy A. Diagnostic performance of chest CT in differentiating COVID-19 from other causes of ground-glass opacities. Egypt J Radiol Nucl Med 2021;52:12. Crossref
26. Zheng F, Li L, Zhang X, et al. Accurately discriminating COVID-19 from viral and bacterial pneumonia according to CT images via deep learning. Interdiscip Sci 2021;13:273-85. Crossref
27. Simpson S, Kay FU, Abbara S, et al. Radiological Society of North America Expert consensus statement on reporting chest CT findings related to COVID-19. Endorsed by the Society of Thoracic Radiology, the American College of Radiology, and RSNA - Secondary Publication. J Thorac Imaging 2020;35:219-27. Crossref
28. Dennie C, Hague C, Lim RS, et al. Canadian Society of Thoracic Radiology/Canadian Association of Radiologists consensus statement regarding chest imaging in suspected and confirmed COVID-19. Can Assoc Radiol J 2020;71:470-81. Crossref
29. Bradley BT, Maioli H, Johnston R, et al. Histopathology and ultrastructural findings of fatal COVID-19 infections in Washington State: a case series. Lancet 2020;396:320-32. Crossref
30. Zhang H, Zhou P, Wei Y, et al. Histopathologic changes and SARS-CoV-2 immunostaining in the lung of a patient with COVID-19. Ann Intern Med 2020;172:629-32. Crossref
31. Shi H, Han X, Jiang N, et al. Radiological findings from 81 patients with COVID-19 pneumonia in Wuhan, China: a descriptive study. Lancet Infect Dis 2020;20:425-34. Crossref
32. Rodrigues RS, Barreto MM, Werberich GM, Marchiori E. Cystic airspaces associated with COVID-19 pneumonia. Lung India 2020;37:551-3. Crossref
33. Kong W, Agarwal PP. Chest imaging appearance of COVID-19 infection. Radiol Cardiothorac Imaging 2020;2:e200028. Crossref
34. Salehi S, Abedi A, Balakrishnan S, Gholamrezanezhad A. Coronavirus disease 2019 (COVID-19): a systematic review of imaging findings in 919 patients. AJR Am J Roentgenol 2020;215:87-93. Crossref
35. Xu Z, Pan A, Zhou H. Rare CT feature in a COVID-19 patient: cavitation. Diagn Interv Radiol 2020;26:380-1. Crossref

Fracture incidence and fracture-related mortality decreased with decreases in population mobility during the early days of the COVID-19 pandemic: an epidemiological study

© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE
Fracture incidence and fracture-related mortality decreased with decreases in population mobility during the early days of the COVID-19 pandemic: an epidemiological study
Janus SH Wong, MB, BS, MRCSEd1; Christian X Fang, FRCSEd, FHKCOS1; Alfred LH Lee, MB, BS, MRCP2; Dennis KH Yee, FRCSEd, FHKCOS3; Kenneth MC Cheung, FRCS (Eng), FHKCOS1; Frankie KL Leung, FRCSEd, FHKCOS1
1 Department of Orthopaedics and Traumatology, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong
2 Department of Microbiology, Prince of Wales Hospital, Hong Kong
3 Department of Orthopaedics and Traumatology, Alice Ho Miu Ling Nethersole Hospital, Hong Kong
 
Corresponding author: Dr Christian X Fang (cfang@hku.hk)
 
 Full paper in PDF
 
Abstract
Introduction: We investigated the impact of coronavirus disease 2019 (COVID-19) social distancing measures on fracture incidence and fracture-related mortality, as well as associations with population mobility.
 
Methods: In total, 47 186 fractures were analysed across 43 public hospitals from 22 November 2016 to 26 March 2020. Considering the smartphone penetration of 91.5% in the study population, population mobility was quantified using Apple Inc’s Mobility Trends Report, an index of internet location services usage volume. Fracture incidences were compared between the first 62 days of social distancing measures and corresponding preceding epochs. Primary outcomes were associations between fracture incidence and population mobility, quantified by incidence rate ratios (IRRs). Secondary outcomes included fracture-related mortality rate (death within 30 days of fracture) and associations between emergency orthopaedic healthcare demand and population mobility.
 
Results: Overall, 1748 fewer fractures than projected were observed during the first 62 days of COVID-19 social distancing (fracture incidence: 321.9 vs 459.1 per 100 000 person-years, P<0.001); the relative risk was 0.690, compared with mean incidences during the same period in the previous 3 years. Population mobility exhibited significant associations with fracture incidence (IRR=1.0055, P<0.001), fracture-related emergency department attendances (IRR=1.0076, P<0.001), hospital admissions (IRR=1.0054, P<0.001), and subsequent surgery (IRR=1.0041, P<0.001). Fracture-related mortality decreased from 4.70 (in prior years) to 3.22 deaths per 100 000 person-years during the COVID-19 social distancing period (P<0.001).
 
Conclusion: Fracture incidence and fracture-related mortality decreased during the early days of the COVID-19 pandemic; they demonstrated significant temporal associations with daily population mobility, presumably as a collateral effect of social distancing measures.
 
 
New knowledge added by this study
  • A significant reduction in fracture incidence was observed during the early days of the coronavirus disease 2019 pandemic.
  • Daily fracture incidence was temporally associated with population mobility.
Implications for clinical practice or policy
  • Data regarding population mobility could facilitate estimation of fracture incidence and be used (along with many other factors) to estimate clinical service demand for timely management of public health responses involving changes in population mobility.
  • As digital literacy increases, population digital usage patterns could support epidemiological investigations and address gaps in conventional data sources.
 
 
Introduction
The coronavirus disease 2019 (COVID-19) pandemic, which began in early 2020, has resulted in unprecedented large-scale public health responses. Stringent regional social distancing measures (eg, quarantine, school closures, and restrictions at work and recreation destinations) were rapidly implemented during the early days of the pandemic as forms of non-pharmacological intervention.1 Although there is evidence that such measures can temporarily contain the spread of severe acute respiratory syndrome coronavirus 2,2 collateral effects among non–COVID-19–related conditions have also been reported.3 Trauma is the leading cause of death and disability among young adults worldwide,4 but the effects of the COVID-19 pandemic on injuries and fracture incidence within Hong Kong have not been fully elucidated. This uncertainty has hindered healthcare resource deployment and clinical service demand estimation in times of stringency. We sought to address this problem using ‘big data’ sources and regional clinical data repositories, which allow researchers to rapidly delineate epidemiological associations with potential applications in forecasting models, while avoiding resource-intensive collection of conventional epidemiological information and protecting patient anonymity.
 
We presumed that restrictions on citizen mobility, in concert with social distancing, were associated with reductions in musculoskeletal injuries during the early days of the COVID-19 pandemic. Specifically, we hypothesised that reduced population mobility was associated with reductions in fracture incidence and fracture-related healthcare needs during the early days of the pandemic. We investigated these relationships by analysing daily multicentre hospital data registries in Hong Kong, along with digital population mobility datasets published by a technology company. Our main outcome measurement was skeletal fractures, which served as a specific surrogate for musculoskeletal trauma.
 
Methods
Data collection
This study was conducted in Hong Kong, a highincome region (with gross domestic product per capita of HK$357 667 in 20205) that was among the first areas affected by COVID-19; social distancing measures were implemented during the early days of the pandemic.
 
Using the Clinical Data Analysis and Reporting System of the Hospital Authority, anonymised patient records were retrieved from all 43 public hospitals in Hong Kong for the period from 22 November 2016 to 20 May 2020. In Hong Kong, up to 90% of hospital bed-days occur in public hospitals, which manage nearly all critical emergencies in the region.6 Anonymised clinical data were retrieved, including time of initial injury presentation, emergency department triage, trauma category, hospital admission, diagnosis, and surgical procedures. Diagnoses and procedures were encoded in accordance with the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) by treating physicians based on clinical and radiological investigations, intraoperative findings, and date of hospital discharge. The ICD-9-CM codes that met the inclusion criteria (which included fractures under the purview or commonly admitted under the care of orthopaedic and traumatology service) were all codes from 805 to 829 (inclusive). Duplicate records from fracture reassessment related to follow-up attendances, hospitalisation after emergency department attendance, and elective hospital re-admissions (ie, episodes assigned to the same patient unique identifier with identical diagnostic codes, which occurred within 30 days of the index episode) were regarded as a single event to avoid double counting. Pathological fractures and records with missing diagnosis codes or admission times were excluded from the analysis.
 
Time intervals
The ‘COVID-19 epoch’ was defined as 25 January 2020 (activation of the government’s ‘emergency’ response and commencement of social distancing policies7) to 26 March 2020; this arbitrarily chosen 62-day period included all patients with fractures who presented during that period. This epoch was compared with the 9 weeks preceding the onset of the COVID-19 pandemic (ie, 22 November 2019 to 24 January 2020), as well as the same period over the past 3 years to adjust for seasonality-related variations8 (ie, 25 January to 26 March in 2017, 2018, and 2019). Differences between actual and projected daily fracture incidences were calculated based on mean values at the same time of year over the past 3 years. Fracture-related mortality rates, defined as the numbers of deaths within 30 days after initial fracture presentation per 100 000 person-years, were compared. The Chi squared test was used to detect differences in fracture incidence and fracture-related mortality during the COVID-19 pandemic and pre-pandemic epochs.
 
Quantifying population mobility
Surrogate data concerning population mobility were retrieved from Mobility Trends Reports9—an aggregate daily measure of geographical direction requests on Apple Maps, a service established by Apple Inc, which holds the largest market share of electronic mobile devices (including smartphones and tablets) in Hong Kong.10 Walking index was regarded as an index of population mobility, considering the smartphone penetration of 91.5%11 among the 7.50 million residents of Hong Kong.12
 
Data analysis
Associations between daily fracture incidence and population mobility were determined by incidence rate ratios (IRRs) using quasi-Poisson regression. Secondary analysis involved associations between mobility index and fracture repair surgeries, all types of orthopaedic emergency department attendances, orthopaedic hospital admissions, and emergency orthopaedic surgeries.
 
Because medical records were timestamped in Hong Kong time (8 hours ahead of Greenwich Mean Time), they were converted to Pacific Time to match the time intervals listed in Mobility Trends Reports; this conversion ensured that data were temporally matched for analysis.
 
To determine whether mobility associations simply reflected health-seeking behaviour, we included analyses of diseases which lacked a physiological basis and were not associated with population mobility—these ‘controls’ included appendicitis, cellulitis, and abscess (ICD-9-CM diagnosis codes 540 and 682). Statistical analysis was performed using R software, version 3.6.2 (R Foundation for Statistical Computing, Vienna, Austria). Quasi-Poisson regression was used to model the relationship between the population mobility index and the daily incidences of fractures and fracture-related events; the population mobility index was the explanatory variable, whereas the
 
daily incidences of various events were response variables. A quasi-Poisson distribution was preferred over a Poisson distribution, considering the presence of significant overdispersion among some response variables (in the form of count data) when a dispersion parameter was included. In accordance with standard statistical methods, the natural logarithm was utilised as the link function. Incidence rate ratios were reported and represented by the following formula:
 
Estimated incidence = IRRPMI × BIR
 
where IRR represents the incidence rate ratio, PMI represents the population mobility index, and BIR represents the baseline incidence rate. The IRR, which quantifies the relationship between the mobility index and fracture incidence, is multiplicative in nature—for every unit increase in the mobility index, there is a corresponding multiplicative increase in the IRR. If the IRR is <1, it is expected to decrease in a multiplicative manner for every unit decrease in the mobility index. Multiple comparisons were adjusted by Bonferroni correction, and the threshold for statistical significance was regarded as P<0.00227 (0.05/22).
 
Results
In total, 59 931 fracture-related medical records from orthopaedic emergency department attendances, hospital admissions, and surgeries were reviewed. After exclusion of 11 498 linked episodes, 284 pathological fractures, 786 follow-up attendances, 175 hospital re-admissions, and two episodes with missing admission times, 47 186 fractures were included in the analysis. Descriptive statistics regarding daily fracture incidences, controls, and fracture-related surgeries during COVID-19 social distancing are shown in Table 1. Intra-year and inter-year comparison cohorts are presented in Table 2.
 

Table 1. Incidences of fractures and surgeries during the early days of the coronavirus disease 2019 social distancing (25 January to 26 March 2020)
 

Table 2. Incidences of fractures before and during the early days of the coronavirus disease 2019 pandemic
 
Fracture incidence during COVID-19 social distancing
A reduction of 1748 fractures in the actual versus projected incidence (321.9 vs 459.1 per 100 000 person-years, P<0.001) was observed during the COVID-19 epoch; the relative risk was 0.690 (95% confidence interval [CI]=0.678-0.702), compared with mean incidences in the previous 3 years (ie, inter-year cohort) [Table 2]. Differences in fracture incidence between the pandemic and pre-pandemic epochs are shown in Figure 1.
 

Figure 1. Daily fracture incidences (triangles) before (22 November 2019 to 24 January 2020) and during (25 January to 26 March 2020) the early days of coronavirus disease 2019 social distancing, with comparison to the same period in the previous 3 years (dots in different shades of grey). There were 1748 fewer fractures than projected
 
Fracture incidences, population mobilities, and controls are depicted in Figure 2. The first two COVID-19 cases in Hong Kong were reported on 23 January 202013; three additional cases were reported on 24 January 2020. Social distancing measures were implemented on 25 January 2020; these included suspension of schools, initiation of ‘work from home’ measures among civil servants, and suspension of hospital visitations. Mandatory border quarantine was enforced on 8 February 2020. The sharpest decrease in mobility was observed on 24 January 2020; population mobility subsequently remained at low levels, in conjunction with cancellations of large-scale social and sporting events, as well as the imposition of travel restrictions with quarantine measures for returning travellers.7
 

Figure 2. Fracture incidences, population mobilities (based on mobility index data from Apple Inc’s Mobility Trend Reports), and controls over time in the study population. The largest decrease in population mobility coincided with the first confirmed case of coronavirus disease 2019 in Hong Kong
 
Associations of fracture incidence with population mobility
Fracture incidence was positively associated with the population mobility index (IRR=1.0055, 95% CI=1.0044-1.0066, P<0.001). Analyses of fracture incidence according to anatomical location revealed associations of the population mobility index with upper limb fractures (IRR=1.0073, 95% CI=1.0057-1.0088, P<0.001) and lower limb fractures (IRR=1.0045, 95% CI=1.0030-1.0060, P<0.001) [Fig 3].
 

Figure 3. Associations of fracture incidence with population mobility. Fracture incidence was associated with mobility index according to quasi-Poisson regression, with incidence rate ratios of 1.0055 (95% confidence interval [CI]=1.0044-1.0066) for all fractures, 1.0073 (95% CI=1.0057-1.0088) for upper limb fractures, and 1.0045 (95% CI=1.0030-1.0060) for lower limb fractures (all P<0.001)
 
The population mobility index was associated with the incidences of fractures involving the radius and ulna (IRR=1.0079, 95% CI=1.0057-1.0101, P<0.001), hand and fingers (IRR=1.0069, 95% CI=1.0039-1.0098, P<0.001), femoral neck (IRR=1.0065, 95% CI=1.0035-1.0095, P<0.001), and tibia and fibula (IRR=1.0097, 95% CI=1.0044-1.0151, P<0.001) [Fig 4]. However, after Bonferroni correction, the population mobility index did not exhibit statistically significant associations with trochanteric hip fractures (IRR=1.0008, P=0.683), spine fractures (IRR=0.996, P=0.183), or pelvic fractures (IRR=1.0064, P=0.00799).
 

Figure 4. Incidence rate ratios indicating relationships between fracture incidence and population mobility index. Incidence rate ratios of fractures are grouped according to anatomical locations with 95% confidence intervals indicated on each bar. Bars in dark grey and asterisks in y-axis labels indicate statistically significant associations (P<0.00227). Note ‘control groups’ of diseases in grey, which were included to investigate possibility of confounding between mobility index and disease incidence by alterations in health-seeking behaviour; no statistically significant associations were present in these groups
 
Stronger associations were observed among fractures, such that some patients presented at a younger age (eg, patients with tibia, fibula, hand, and finger fractures), whereas other patients presented at an older age (eg, patients with femoral neck fractures). Digital literacy, manual dexterity and visual acuity, and higher internet and smartphone usage among younger residents11 are among the factors that cause the population mobility index to have increased sensitivity for analysis in such age-groups.
 
The incidences of cellulitis, abscesses, and appendicitis were not associated with the population mobility index (P>0.00227). These findings support the hypothesis that changes in associations between fracture incidence and population mobility were not solely caused by changes in health-seeking behaviour; if they had been caused by changes in such behaviour, corresponding reductions in those conditions would have been observed.
 
Secondary exploratory analysis of surgeries, emergency department attendances, and hospital admissions
The daily population mobility index was associated with the number of patients admitted on a particular day who subsequently underwent fracture repair surgeries (IRR=1.0041, 95% CI=1.0020-1.0062, P<0.001). The population mobility index was also associated with all types of emergency orthopaedic surgeries (IRR=1.0040, 95% CI=1.0021-1.0058, P<0.001), attendances at orthopaedic emergency departments (IRR=1.0076, 95% CI 1.0064-1.0087, P<0.001), and emergency orthopaedic hospital admissions (IRR=1.0054, 95% CI=1.0043-1.0064, P<0.001). Additionally, the numbers of orthopaedic patients triaged as critical, emergent, and urgent (ie, patients who require physician attention within 30 minutes of attendance) were also associated with the population mobility index (IRR=1.0063, 95% CI=1.0054-1.0073, P<0.001). Whereas the numbers of traffic-related and sports-related trauma cases were associated with the population mobility index (IRR=1.008, 95% CI=1.0063-1.0097 and IRR=1.013, 95% CI=1.0092-1.0158, respectively, both P<0.001), the number of assault-related trauma cases was not (P=0.238).
 
Fracture-related mortality rate
Forty-nine patients with fractures died within 30 days of presentation during the COVID-19 epoch. This constituted a mortality rate of 3.22 deaths per 100 000 person-years, which was lower than the rate of 4.70 deaths per 100 000 person-years during the period before the pandemic (P<0.001); thus, there were around 19 fewer fracture-related deaths in the Hong Kong population during the 62-day study period. Four patients with fractures had COVID-19 (ie, they had positive results in nasopharyngeal swab reverse transcriptase-polymerase chain reaction tests for severe acute respiratory syndrome coronavirus 2) and survived beyond 30 days after initial fracture presentation. The change in mortality was presumably explained by reduced fracture incidence: 30-day mortality among patients with fractures did not significantly differ between the COVID-19 epoch (1.2%, 49 deaths in 4101 patients) and the preceding period (1.0%, 175 deaths in 17 198 patients) [P=0.305].
 
Discussion
This study analysed 47 186 fractures in Hong Kong, prior to and during the early days of the COVID-19 pandemic. Population mobility was assessed through aggregate digital footprints using the volume of location service requests as a surrogate marker, considering the high smartphone and internet penetration in Hong Kong11; importantly, datasets of aggregate digital footprints have been published to facilitate efforts to control COVID-19.9 The findings support our hypothesis in terms of the relationship between fracture incidence and population mobility.
 
Fractures incur substantial healthcare costs; for example, fragility fracture-related costs incurred costs of 37.5 billion euros, along with the loss of 1.0 million quality-adjusted life years, among the six largest European countries in 2017.14 Some fractures (eg, hip fractures) warrant early surgical management to mitigate the morbidity and mortality associated with surgical delays.15 Guidance regarding early surgical management remained in effect, even during the early days of the COVID-19 pandemic.16 Despite the best available tools, fracture prediction remains difficult; there are additional challenges associated with epidemiological projections of specific time points when such fractures occur. Accordingly, hospitals and public health entities experience difficulties in terms of estimating emergency trauma service load and allocating limited healthcare resources. Our findings suggest that population mobility indices, which are freely and publicly accessible, can provide insights regarding fracture incidence. Population mobility may be useful in quantitative modelling of fracture-related inpatient and surgical theatre service demand, using the IRRs described in this study.
 
Although there is evidence to support the efficacy of social distancing measures with respect to COVID-19 transmission,2 our findings emphasise the collateral impacts of pandemic-related interventions on non-communicable diseases. We found that fracture incidence decreased when population mobility was hindered by social distancing measures; the relative reduction in overall fractures appeared to be similar to the effect of established pharmacological interventions on fragility fractures.17 Although this relationship appears to contradict the common notion that physical activity confers a protective effect against fractures in both young and old age-groups, 18 19 associations of increased fracture risk with specific types of exercises (eg, bicycling), or regular participation in other exercise and sports activities, have been described.20 Thus, long-term benefits (eg, increased bone mineral density) may be accrued at the expense of increased exposure to fracture risk when engaging in physical activity. Although the long-term impact of reduced population mobility on fracture incidence remains unclear, vitamin D deficiency caused by prolonged time indoors (ie, without sunlight exposure) is an established risk factor for future fractures.21
 
The strengths of our study include its inclusion of data from all public hospitals in Hong Kong, which allowed extensive analysis of rare events such as fractures. Our database has a high (>96%) positive predictive value for fractures,22 presumably because data entry is conducted by impartial registered medical practitioners. Furthermore, high internet and smartphone penetration increased the sensitivity of the population mobility analysis, such that the mobility index was geographically specific to the study population. Pedestrian and road traffic densities, which are indirectly represented by the population mobility index, could also precipitate accidents, falls, and subsequent fracture risk. Additionally, potential confounding based on health-seeking behaviour was partially mitigated by the inclusion of ‘control’ groups. Fortunately, all hospitals involved in the study maintained full emergency service during the early days of the COVID-19 pandemic23; this maintenance of emergency service minimised potential confounding by hospitals that were unable to provide service to patients with fractures.
 
Limitations of the study involved deficiencies in the population mobility index. For example, travel between familiar places and travel where navigation guidance is unnecessary, as well as the usage of alternative electronic service providers, were not considered. Therefore, the population mobility index served as a more specific (rather than sensitive) tool for assessment of population mobility. Global positioning system (GPS)–based mobility tracking would theoretically allow more extensive data collection, thus providing greater detection sensitivity; however, such mobility tracking would cause substantial privacy issues, resulting in legal and ethical challenges.
 
Notably, older adults are less adept in smartphone usage (62.2% of residents aged ≥65 years reported internet usage in 202011), and the digital population mobility index does not adequately illustrate this division in the population. Furthermore, fractures in older adults are largely caused by osteoporosis, whereas high-energy injury mechanisms are observed in younger individuals.24 Therefore, social distancing may have a negligible effect on the incidences of osteoporotic fractures sustained indoors. We caution against using population mobility data as the sole source of estimates for health service planning because that approach could underestimate fragility fracture service demand.
 
Additionally, the use of fracture incidence data from a public healthcare database only included approximately 90% of the population health demand. During the early days of the COVID-19 pandemic, instances of diversion to the private sector, attendances in private clinics, and visits to alternative practitioners were not coded; the lack of these data may have led to underestimation of total fracture incidence. Finally, we caution against generalising these findings to regions with less internet and smartphone penetration.
 
Conclusion
During the early days of the COVID-19 pandemic, fracture incidence and fracture-related mortality considerably decreased with the implementation of government social distancing measures that targeted population mobility. This unique opportunity enabled the identification of collateral associations and revealed that population mobility could be used (along with many other factors) to estimate clinical service demand.
 
Author contributions
Concept or design: JSH Wong, DKH Yee.
Acquisition of data: JSH Wong.
Analysis or interpretation of data: JSH Wong, ALH Lee, DKH Yee, CX Fang.
Drafting of the manuscript: JSH Wong, ALH Lee, CX Fang.
Critical revision of the manuscript for important intellectual content: CX Fang, DKH Yee, FKL Leung, KMC Cheung.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
All authors have disclosed no conflicts of interest.
 
Funding/support
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
 
Ethics approval
Ethics approval was granted by the Institutional Review Board of The University of Hong Kong/ Hospital Authority Hong Kong West Cluster (HKU/HA HKW IRB Ref No.: UW 20-275), and investigations were carried out in accordance with the Declaration of Helsinki. The requirement for patient informed consent was waived by the Board because the study used anonymised data and the risk of identification was low.
 
References
1. Leung K, Wu JT, Liu D, Leung GM. First-wave COVID-19 transmissibility and severity in China outside Hubei after control measures, and second-wave scenario planning: a modelling impact assessment. Lancet 2020;395:1382-93. Crossref
2. Cousins S. New Zealand eliminates COVID-19. Lancet 2020;395:1474. Crossref
3. De Filippo O, D’Ascenzo F, Angelini F, et al. Reduced rate of hospital admissions for ACS during Covid-19 outbreak in Northern Italy. N Engl J Med 2020;383:88-9. Crossref
4. Krug EG, Sharma GK, Lozano R. The global burden of injuries. Am J Public Health 2000;90:523-6. Crossref
5. Census and Statistics Department, Hong Kong SAR Government. Table 31: Gross Domestic Product (GDP), implicit price deflator of GDP and per capita GDP. 2020. Available from: https://www.censtatd.gov.hk/en/web_table.html?id=31. Accessed 26 Apr 2020.
6. Food and Health Bureau, Hong Kong SAR Government. Report of the Strategic Review on Healthcare Manpower Planning and Professional Development. 2017. Available from: https://www.fhb.gov.hk/en/press_and_publications/otherinfo/180500_sr/srreport.html. Accessed 20 May 2020.
7. Leung GM, Cowling BJ, Wu JT. From a sprint to a marathon in Hong Kong. N Engl J Med 2020;382:e45. Crossref
8. Yee DK, Fang C, Lau TW, Pun T, Wong TM, Leung F. Seasonal variation in hip fracture mortality. Geriatr Orthop Surg Rehabil 2017;8:49-53. Crossref
9. Apple Inc. COVID-19–mobility trends reports. Available from: https://www.apple.com/covid19/mobility. Accessed 26 Apr 2020.
10. Statcounter GlobalStats. Mobile & tablet vendor market share Hong Kong. Jan – Mar 2020. Available from: https://gs.statcounter.com/vendor-market-share/mobile-tablet/hong-kong/#monthly-202001-202003. Accessed 26 Apr 2020.
11. Census and Statistics Department, Hong Kong SAR Government. Thematic Household Survey Report No. 69: Personal Computer and Internet Penetration. 2020. Available from: https://www.ogcio.gov.hk/en/about_us/facts/doc/householdreport2020_69.pdf. Accessed 5 May 2020.
12. Census and Statistics Department, Hong Kong SAR Government. Population– Overview. 2020. Available from: https://www.censtatd.gov.hk/hkstat/sub/so20.jsp. Accessed 12 Apr 2020.
13. Centre for Health Protection, Department of Health, Hong Kong SAR Government. Latest situation of novel coronavirus infection in Hong Kong. Available from: https://chp-dashboard.geodata.gov.hk/covid-19/en.html. Accessed 12 Apr 2022.
14. Borgström F, Karlsson L, Ortsäter G, et al. Fragility fractures in Europe: burden, management and opportunities. Arch Osteoporos 2020;15:59. Crossref
15. Leung F, Lau TW, Kwan K, Chow SP, Kung AW. Does timing of surgery matter in fragility hip fractures? Osteoporos Int 2010;21 Suppl 4:S529-34. Crossref
16. British Orthopaedic Association. COVID BOAST-Management of patients with urgent orthopaedic conditions and trauma during the coronavirus pandemic. Available from: https://www.boa.ac.uk/resources/covid-19-boasts-combined1.html. Accessed 13 Feb 2023.
17. Tsuda T, Hashimoto Y, Okamoto Y, Ando W, Ebina K. Meta-analysis for the efficacy of bisphosphonates on hip fracture prevention. J Bone Miner Metab 2020;38:678-86. Crossref
18. Fritz J, Cöster ME, Nilsson JA, Rosengren BE, Dencker M, Karlsson MK. The associations of physical activity with fracture risk—a 7-year prospective controlled intervention study in 3534 children. Osteoporos Int 2016;27:915-22. Crossref
19. Morseth B, Ahmed LA, Bjørnerem Å, et al. Leisure time physical activity and risk of non-vertebral fracture in men and women aged 55 years and older: the Tromsø study. Eur J Epidemiol 2012;27:463-71. Crossref
20. Appleby PN, Allen NE, Roddam AW, Key TJ. Physical activity and fracture risk: a prospective study of 1898 incident fractures among 34,696 British men and women. J Bone Miner Metab 2008;26:191-8. Crossref
21. Nilson F, Moniruzzaman S, Andersson R. A comparison of hip fracture incidence rates among elderly in Sweden by latitude and sunlight exposure. Scand J Public Health 2014;42:201-6. Crossref
22. Sing CW, Woo YC, Lee AC, et al. Validity of major osteoporotic fracture diagnosis codes in the Clinical Data Analysis and Reporting System in Hong Kong. Pharmacoepidemiol Drug Saf 2017;26:973-6. Crossref
23. Hospital Authority, Hong Kong SAR Government. HA adjusts service provision to focus on combatting epidemic. 2020. Press Release. Available from: https://www.info.gov.hk/gia/general/202002/10/P2020021000711.htm. Accessed 20 May 2020.
24. Bergh C, Wennergren D, Möller M, Brisby H. Fracture incidence in adults in relation to age and gender: a study of 27,169 fractures in the Swedish Fracture Register in a well-defined catchment area. PLoS One 2020;15:e0244291. Crossref

Ten-year refractive and visual outcomes of intraocular lens implantation in infants with congenital cataract

© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE  CME
Ten-year refractive and visual outcomes of intraocular lens implantation in infants with congenital cataract
Joyce JT Chan, FRCOphth; Emily S Wong, FCOphthHK; Carol PS Lam, FCOphthHK; Jason C Yam, FRCSEd
Department of Ophthalmology and Visual Sciences, The Chinese University of Hong Kong, Hong Kong Eye Hospital, Hong Kong
 
Corresponding author: Dr JC Yam (yamcheuksing@cuhk.edu.hk)
 
 Full paper in PDF
 
Abstract
Introduction: There is no consensus regarding optimal target refraction after intraocular lens implantation in infants. This study aimed to clarify relationships of initial postoperative refraction with long-term refractive and visual outcomes.
 
Methods: This retrospective review included 14 infants (22 eyes) who underwent unilateral or bilateral cataract extraction and primary intraocular lens implantation before the age of 1 year. All infants had ≥10 years of follow-up.
 
Results: All eyes exhibited myopic shift over a mean follow-up period of 15.9 ± 2.8 years. The greatest myopic shift occurred in the first postoperative year (mean=-5.39 ± +3.50 dioptres [D]), but smaller amounts continued beyond the tenth year (mean=-2.64 ± +2.02 D between 10 years postoperatively and last follow-up). Total myopic shift at 10 years ranged from -21.88 to -3.75 D (mean=-11.62 ± +5.14 D). Younger age at operation was correlated with larger myopic shifts at 1 year (P=0.025) and 10 years (P=0.006) postoperatively. Immediate postoperative refraction was a predictor of spherical equivalent refraction at 1 year (P=0.015) but not at 10 years (P=0.116). Immediate postoperative refraction was negatively correlated with final best-corrected visual acuity (BCVA) (P=0.018). Immediate postoperative refraction of ≥+7.00 D was correlated with worse final BCVA (P=0.029).
 
Conclusion: Considerable variation in myopic shift hinders the prediction of long-term refractive outcomes in individual patients. When selecting target refraction in infants, low to moderate hyperopia (<+7.00 D) should be considered to balance the avoidance of high myopia in adulthood with the risk of worse long-term visual acuity related to high postoperative hyperopia.
 
 
New knowledge added by this study
  • The greatest myopic shift occurred in the first year after cataract surgery, but smaller shifts continued beyond the tenth year. Overall, 50% of eyes exhibited myopic shift >-2.00 dioptres between the tenth postoperative year and last follow-up.
  • Considerable variation in refractive change after intraocular lens implantation in infants aged <1 year hinders the prediction of long-term refractive outcomes in individual patients. Immediate postoperative refraction was not correlated with spherical equivalent refraction at 10 years postoperatively.
  • Immediate postoperative refraction of ≥+7.00 dioptres was correlated with worse final visual acuity.
Implications for clinical practice or policy
  • When selecting target refraction in infants, low to moderate hyperopia (<+7.00 dioptres) should be considered to balance the avoidance of high myopia in adulthood with the risk of worse long-term visual acuity related to high postoperative hyperopia.
 
 
Introduction
Appropriate optical correction after cataract extraction in infants is important for efforts to avoid amblyopia. Primary intraocular lens (IOL) implantation allows constant in situ optical correction during the critical years of visual development, while avoiding the expenses and compliance issues associated with contact lenses.1 Disadvantages include increased rates of surgical complications and re-operations,2 as well as the inability to modify IOL power during ocular growth. A recent report by the American Academy of Ophthalmology suggested that IOL implantation can be safely conducted in children aged >6 months.3 However, because of the unpredictable nature of ocular growth, it remains challenging to select a target refraction in infants that allows achievement of optimal long-term visual and refractive outcomes.
 
Surgeons target various initial hyperopia values, ranging from +5.00 dioptres (D) to +10.50 D,4 5 6 7 8 9 to compensate for the rapid myopic shift that occurs during infancy. However, prediction of the myopic shift remains difficult; significant hyperopia in infants requires stringent optical correction to prevent amblyopia, and some studies have linked high initial hyperopia to worse visual acuity.10 11 This retrospective study aimed to clarify the relationships of initial postoperative refraction with 10-year spherical equivalent refraction (SER) and long-term best-corrected visual acuity (BCVA) after IOL implantation in infants.
 
Methods
Inclusion and exclusion criteria
This retrospective study included patients who underwent unilateral or bilateral congenital cataract extraction and primary IOL implantation before the age of 1 year between 1997 and 2009 at a single secondary and tertiary referral eye centre. Only patients with ≥10 years of follow-up were included. Eyes with associated ocular co-morbidities (eg, persistent foetal vasculature and glaucoma) were excluded.
 
Surgical technique and follow-up
The patients’ baseline characteristics (eg, age, axial length [determined by applanation A-scan biometry], and keratometry) were recorded. Intraocular lens powers were calculated using the Sanders–Retzlaff–Kraff II formula. The operating surgeon selected the target refraction and IOL power, considering the patient’s age (all cases) and refractive error in the fellow eye (unilateral cases). All operations were performed using similar techniques, including the creation of a 3.0-mm scleral tunnel, anterior continuous curvilinear capsulorhexis, and lens removal by automated irrigation and aspiration. Heparin-surface-modified polymethyl methacrylate IOLs or acrylic foldable IOLs were implanted. The IOL was placed in the capsular bag or in the sulcus. All wounds were sutured. In some cases, primary posterior curvilinear capsulorhexis and anterior vitrectomy were performed. Because of reports that a significant number of eyes in young infants required secondary posterior capsule opening despite primary posterior capsulotomy,12 13 this procedure was omitted in some eyes to increase the likelihood of achieving capsular IOL implantation. Postoperatively, all eyes were treated with intensive topical steroid and antibiotic medication. Patients were assessed on postoperative day 1, week 1, week 2, and week 4; they were then assessed every 3 to 6 months. When clinically significant posterior capsular opacification developed, secondary posterior capsulotomy was performed promptly. Glasses were used for postoperative optical correction; in some cases, contact lenses were also used. Amblyopia treatment by patching was performed as necessary.
 
Outcome measures and statistical analysis
Spherical equivalent refraction at 2 weeks postoperatively was regarded as immediate postoperative refraction. Serial refractions at each year of postoperative follow-up were recorded, and SERs were calculated as the algebraic sum of the sphere and half the cylindrical power. Postoperative axial length was measured using non-contact optical biometry, which was less invasive than applanation biometry.
 
Statistical analysis was performed using Microsoft Excel and SPSS (Windows version 21.0; IBM Corp, Armonk [NY], United States). Best-corrected visual acuities were converted to logarithm of the minimum angle of resolution (logMAR) values for statistical analysis. Correlations between continuous variables were assessed by Spearman correlation. Differences between groups were analysed by the Mann–Whitney U test. Preoperative axial length and keratometry were compared with values at the last follow-up using the paired-samples Wilcoxon signed-rank test. The independent-sample Kruskal–Wallis test was used to compare 10-year SER and BCVA values among groups with immediate postoperative refraction ≤+3.50 D, +3.50 to +7.00 D, and ≥+7.00 D. Partial correlation analysis was performed to detect correlations of immediate postoperative refraction with spherical refraction at 1 year and 10 years after adjustment for age at operation. Multiple linear regression was performed for multivariate analysis of statistically significant factors identified during univariate analysis. P values <0.05 were considered statistically significant.
 
Results
Twenty-two eyes of 14 patients were included in this study. One eye in one patient with bilateral cataract was excluded because it was surgically treated after the patient reached 1 year of age. One eye in another patient with bilateral cataract was excluded because it exhibited secondary glaucoma. During surgery, heparin-surface-modified polymethyl methacrylate IOLs were implanted in three eyes, whereas acrylic foldable IOLs were implanted in 19 eyes. The IOL was placed in the capsular bag in 18 eyes and in the sulcus in four eyes. Additionally, primary posterior curvilinear capsulorhexis and anterior vitrectomy were performed in 13 eyes. For postoperative optical correction, all 14 patients wore glasses; four patients (including two with unilateral cataract) also wore contact lenses. Thirteen patients underwent amblyopia treatment by patching.
 
The Table summarises the baseline characteristics, refractive outcomes, and visual outcomes of eyes included in this study. All 22 eyes exhibited myopic shift, ranging from -21.88 to -3.75 D at 10 years. Figures 1 and 2 show the amounts of myopic shift and SER, respectively, at 1 to 10 years postoperatively and at last follow-up. The greatest myopic shift occurred in the first postoperative year, but smaller shifts continued beyond the tenth year. Ninety percent of eyes exhibited myopic shift >-2.00 D between the third postoperative year and last follow-up (mean myopic shift: -6.40 ± +3.29 D; range, -12.00 to -1.63 D). These proportions were 82% between the sixth postoperative year and last follow-up (mean myopic shift: -4.14 ± +2.35 D; range, -9.38 to -1.13 D), and 50% between the tenth postoperative year and last follow-up (mean myopic shift: -2.64 ± +2.02 D; range, -0.125 to -6.75 D).
 

Table. Baseline characteristics and follow-up results of patients who underwent unilateral or bilateral cataract extraction and primary intraocular lens implantation before the age of 1 year
 

Figure 1. Magnitude of myopic shift per year from 1 to 10 years postoperatively, and between 10 years postoperatively and last follow-up, after primary implantation of intraocular lens in infants aged <1 year. Boxes: quartile 1 to quartile 3 (interquartile range). Lines: medians. Whiskers: maximum and minimum values excluding potential outliers and extreme values. Circles: potential outliers, more than 1.5 interquartile ranges but at most 3 interquartile ranges below quartile 1 or above quartile 3. Asterisks: extreme values, more than 3 interquartile ranges below quartile 1 or above quartile 3
 

Figure 2. Spherical equivalent refraction immediately after operation, at 1 year to 10 years, and at last follow-up after primary intraocular lens implantation in infants aged <1 year. Boxes: quartile 1 to quartile 3 (interquartile range). Lines: medians. Whiskers: maximum and minimum values excluding potential outliers and extreme values. Circles: potential outliers, more than 1.5 interquartile ranges but at most 3 interquartile ranges below quartile 1 or above quartile 3
 
Factors affecting myopic shift at 1 year and at 10 years
In univariate analysis, a larger myopic shift at 1 year postoperatively was correlated with younger age at operation (R2=0.585, P=0.004), more hyperopic immediate postoperative refraction (R2=-0.533, P=0.011), and a need for secondary posterior capsulotomy (U=20, z=-2.066, P=0.04). One-year myopic shift was not correlated with initial axial length (R2=0.038, P=0.878), and it did not differ between unilateral (median=-5.81 D) and bilateral cases (median=-4.38 D) [U=41.5, z=0.469, P=0.652]. Multiple linear regression was performed for statistically significant factors identified during univariate analysis. Only age at operation remained statistically significant (P=0.025); immediate postoperative refraction (P=0.191) and a need for secondary posterior capsulotomy (P=0.781) were not significant in multivariate analysis.
 
The total amount of myopic shift at 10 years postoperatively was correlated with age at operation (R2=0.579, P=0.006), but it was not correlated with immediate postoperative refraction (R2=-0.339, P=0.133) or initial axial length (R2=0.291, P=0.241). There was no difference in the amount of myopic shift at 10 years between unilateral (median=-14.62 D) and bilateral cases (median=-10.50 D) [U=40.5, z=1.357, P=0.185] or between eyes that required secondary posterior capsulotomy (median=-11.25 D) and eyes that did not (median = -6.19 D) [U=24, z=-1.645, P=0.112].
 
Factors affecting spherical equivalent refraction at 1 year and at 10 years
Spherical equivalent refraction at 1 year did not significantly differ between unilateral (median=-2.69 D) and bilateral cases (median=+1.13 D) [U=59, z=1.959, P=0.053] or between eyes that required secondary capsulotomy (median=+0.31 D) and eyes that did not (median=+1.42 D) [U=32.5, z=-1.143, P=0.261]. Partial correlation analysis showed that after adjustment for age at operation, immediate postoperative refraction (R2=0.522, P=0.015) was a statistically significant predictor of SER at 1 year.
 
In contrast, SER at 10 years postoperatively was significantly more myopic in unilateral cases (median=-10.63 D) than in bilateral cases (median=-4.81 D) [U=49.5, z=2.264, P=0.017]. This finding may be related to surgeon preference for less hyperopic target refractions in unilateral cases, which can match the refraction of the fellow eye and potentially prevent significant postoperative anisometropia. Indeed, after adjustment for age, immediate postoperative SER was significantly less hyperopic in unilateral cases than in bilateral cases (P=0.025). A need for secondary posterior capsulotomy (U=28, z=-1.325, P=0.205) was not correlated with SER at 10 years. After adjustment for laterality, both age at operation (P=0.066) and immediate postoperative refraction (P=0.116) were not statistically significant predictors of SER at 10 years. There was no significant difference in 10-year SER among eyes with immediate postoperative refraction ≤+3.50 D, +3.50 to +7.00 D, and ≥+7.00 D (P=0.439), as shown in Figure 3.
 

Figure 3. Ten-year spherical equivalent refraction in eyes with immediate postoperative refraction ≤+3.50 dioptres (D), +3.50 to +7.00 D, and ≥+7.00 D. Boxes: quartile 1 to quartile 3 (interquartile range). Lines: medians. Whiskers: maximum and minimum values excluding potential outliers and extreme values. Circles: potential outliers, more than 1.5 interquartile ranges but at most 3 interquartile ranges below quartile 1 or above quartile 3
 
Subgroup analysis was performed for bilateral cases only. Multiple regression analysis showed that at 1 year, both age at operation (P=0.014) and immediate postoperative refraction (P=0.024) remained significant predictors of SER after unilateral cases had been excluded. At 10 years postoperatively, age at operation was a significant predictor of SER (P=0.015), whereas immediate postoperative refraction was not (P=0.135).
 
Axial length and keratometry
Mean preoperative axial length was 19.12 mm, whereas mean axial length at 10 years was 24.82 mm. There were no differences in initial axial length (U=32, z=0.894, P=0.421) or total axial length change (U=22, z=-0.224, P=0.875) between unilateral and bilateral cases. Final axial length was significantly greater than preoperative axial length (z=3.823, P<0.0005). Total axial length change was strongly correlated with total myopic shift (R2=-0.791, P<0.0005). There was no difference between preoperative and final keratometry values (z=0.081, P=0.936). The total change in the mean keratometry value was not correlated with total myopic shift (R2=-0.168, P=0.490).
 
Final best-corrected visual acuity
At the last follow-up, 11 eyes (50%) had a final BCVA of 0.18 logMAR or better, six eyes (27%) had moderate amblyopia with BCVA of 0.3 to 0.6 logMAR, and five eyes (23%) had severe amblyopia with BCVA of 0.7 to 1.0 logMAR. There was a statistically significant correlation between immediate postoperative refraction and final BCVA (R2=0.440, P=0.041). Best-corrected visual acuity was worse in eyes that required secondary capsulotomy (U=74.5, z=1.995, P=0.049). Multiple regression revealed that a need for secondary capsulotomy was no longer a significant predictor for BCVA (P=0.299), whereas immediate postoperative refraction remained a significant predictor for BCVA (P=0.018). Best-corrected visual acuity was significantly worse in eyes with immediate postoperative refraction of +7.00 D or higher than in eyes with lower levels of immediate postoperative hyperopia (P=0.029) [Fig 4]. There were no significant correlations of final BCVA with age at operation (R2=-0.041, P=0.856), SER at 10 years (R2=0.011, P=0.963), SER at last follow-up (R2=-0.122, P=0.589), or laterality (U=48.5, z=-1.087, P=0.300).
 

Figure 4. Logarithm of the minimum angle of resolution (logMAR) best-corrected visual acuity in eyes with immediate postoperative refraction ≤+3.50 dioptres (D), +3.50 to +7.00 D, and ≥+7.00 D. Boxes: quartile 1 to quartile 3 (interquartile range). Lines: median. Whiskers: maximum and minimum values excluding potential outliers and extreme values. Circles: potential outliers, more than 1.5 interquartile ranges but at most 3 interquartile ranges below quartile 1 or above quartile 3. Asterisks: extreme values, more than 3 interquartile ranges below quartile 1 or above quartile 3
 
Complications and re-operations
Seventeen eyes underwent 21 re-operations in total, 17 of which were secondary posterior capsulotomies. All nine eyes that did not undergo primary posterior capsulorhexis and anterior vitrectomy required secondary capsulotomy; one of the nine eyes required secondary capsulotomy twice. Seven of 13 eyes with primary posterior capsulorhexis and anterior vitrectomy required secondary capsulotomy. Three eyes underwent injection of intracameral tissue plasminogen activator, one eye underwent dissection of fibrinous membrane, and one eye required removal of anterior capsular phimosis. Notably, anterior capsular phimosis did not develop in any other eyes. One eye developed secondary glaucoma and was excluded from this study.
 
Discussion
Two important goals in the management of congenital cataract include achievement of good long-term visual acuity and minimisation of refractive error in adulthood. This study focused on long-term outcomes after primary IOL implantation in infants, all of whom had ≥10 years of follow-up. Myopic shift was present in all eyes, and its magnitude considerably varied. Immediate postoperative refraction was not a statistically significant predictor of SER at 10 years. Moreover, there was a statistically significant negative correlation between immediate postoperative refraction and final BCVA. Finally, immediate postoperative refraction of +7.00 D or higher was correlated with worse final BCVA.
 
Refractive change in a growing eye
Refractive change in a normal growing eye involves a complex interaction among axial length elongation, corneal curvature flattening, and the reduction of crystalline lens power.14 Additional effects on ocular growth (eg, related to the presence or laterality of congenital cataract, age at corrective surgery, initial axial length, postoperative visual input, and compliance with postoperative amblyopia therapy) remain uncertain.15 The presence of an intraocular lens magnifies myopic shift in a growing eye—the intraocular lens exhibits constant power and moves anteriorly away from the retina during ocular growth, hindering the extrapolation of data from phakic eyes.5 Figure 5 shows the mean SER during the first decade of life in pseudophakic eyes from patients in the present study, compared with normal eyes from the ongoing population-based Hong Kong Children Eye Study.16 At 10 years after corrective surgery, the mean SER was -6.48 D in pseudophakic eyes, whereas it was -0.72 D in normal eyes of age-matched children. The mean axial length at 10 years was 24.82 mm in pseudophakic eyes, whereas it was 23.79 mm in normal eyes of age-matched children.16 These data imply the presence of a greater myopic shift and greater increase in axial length among pseudophakic eyes which continues beyond the first 2 years of life; notably, these increases are relative to the mean growth of normal eyes in Hong Kong children, who exhibit a higher prevalence of myopia compared with other populations.16
 

Figure 5. Mean spherical equivalent refraction during the first decade of life in pseudophakic eyes from patients in the present study compared with normal eyes from the Hong Kong Children Eye Study
 
Refractive change after primary intraocular lens implantation in infants
Several other studies of myopic shift have revealed considerable refractive change after primary IOL implantation in infants aged <1 year. At 5 years postoperatively, the Infant Aphakia Treatment Study revealed a mean myopic shift of -8.97 D for infants surgically treated at the age of 1 month and -7.22 D for infants surgically treated at the age of 6 months,9 whereas Negalur et al17 found a median myopic shift of -8.43 D after the same duration of follow-up in infants operated before the age of 6 months. Fan et al18 reported a mean myopic shift of -7.11 D at 3 years postoperatively in infants operated before the age of 1 year; Lu et al19 reported a mean myopic shift of -6.46 D at 2 years in 22 eyes, as well as a mean myopic shift of -8.67 D at 6 years in three eyes, among infants operated between the age of 6 and 12 months. In our study, which had a longer follow-up period, the mean 10-year myopic shift was -11.62 ± 5.14 D and myopic progression continued beyond 10 years postoperatively. These findings highlight the importance of using long-term data to guide management decisions, including the selection of target refraction and the determination of appropriate timing for enhancement procedures (eg, IOL exchange).
 
Our results showed that myopic shift was greatest in the first postoperative year and was correlated with age at operation, which is consistent with findings in the literature.9 10 17 18 19 20 21 Because age at operation is most frequently associated with the magnitude of refractive change, many surgeons prefer to adjust initial hyperopia according to age. McClatchey et al22 recommended targets of +5.00 to +8.00 D in infants aged <1 year, whereas Valera Cornejo et al4 selected targets of +7.00 to +9.00 D in infants of the same age-group. The results of the Infant Aphakia Treatment Study suggested that, to achieve emmetropia at 5 years, immediate postoperative hyperopia should be +10.50 D from 4 to 6 weeks of age and +8.50 D from 7 weeks to 6 months of age.9 However, our results showed considerable variation in myopic shift at 10 years (range, -21.88 to -3.75 D); after adjustment for age, immediate postoperative refraction was not a statistically significant predictor of SER at 10 years. Other studies have shown that initial refraction and IOL undercorrection were not significantly associated with the magnitude or rate of myopic shift9 18 22 23; they also revealed large and unpredictable variations in refractive outcomes after IOL implantation in young infants.10 20 21 22 24 25 At the 3-year follow-up, refractive change ranged from +2.00 to -15.50 D in a study by Gouws et al26 and from -0.47 to -10.69 D in a study by Fan et al.18 Although we observed a trend towards more myopic 10-year refractions in groups with lower initial postoperative hyperopia, there were no significant differences because of substantial variability in the data (Fig 3). The Infant Aphakia Treatment Study showed that the actual and expected amounts of myopic shift differed in a large percentage of patients; 50% of patients exhibited differences of +3.00 to +14.00 D from expected values.9 Therefore, age-adjusted suggested targets only compensate for the mean expected myopic shift; large interpatient variability will often result in unanticipated long-term outcomes for individual patients. Correlation analysis in our study revealed that age at operation only explained 58% of the variance in myopic shift at 10 years. This correlation is presumably influenced by other factors that contribute to myopic progression, such as genetics, ethnicity, outdoor exposure, education level, and extent of near work.27
 
Long-term best-corrected visual acuity
The achievement of optimal long-term BCVA is another important goal of surgical treatment for congenital cataract. In our study, immediate postoperative refraction of ≥+7.00 D was correlated with worse BCVA. Similarly, in a study of infants who underwent surgery between the ages of 2 and 21 months, with ≥4 years of follow-up, Magli et al10 found that BCVA was higher in infants with initial spherical refraction between +1.00 and +3.00 D than in infants with initial spherical refraction >+3.00 D. In a study that included older children who underwent surgery at or before the age of 8.5 years, Lowery et al11 found that low early postoperative hyperopia (+1.75 to +5.00 D) yielded better longterm BCVA, compared with refractions <+1.75 or >+5.00 D in unilateral cases; no difference was observed in bilateral cases. Another study of older children (surgically treated between the ages of 2 and 6 years) revealed no difference in BCVA between initial postoperative refractive errors of near emmetropia versus undercorrection of +2.00 to +5.50 D23; no patients had initial refraction values >+5.50 D. High initial postoperative hyperopia requires good compliance with refractive correction; in infants, such hyperopia also requires amblyopia treatment because younger children are at higher risk of developing amblyopia. Hyperopia is more amblyogenic than myopia because young children have higher demands for near vision28; moreover, hyperopia causes defocusing in both distance and near vision, particularly among patients who exhibit pseudophakia related to accommodation loss. Studies have shown variable compliance with optical correction and amblyopia treatment after congenital cataract surgery19 29; the use of high-plus spectacles is associated with various optical aberrations. Additionally, contact lenses are suboptimal because one of the original aims of intraocular lens implantation is to avoid the need for contact lens. Myopia is comparatively less amblyogenic because it allows retention of near vision, particularly if the amount of myopia remains low until later in childhood when visual development is more mature.30 Therefore, parental motivation and the likelihood of compliance should be included in decisions regarding postoperative refraction. Ideally, high myopia in adulthood should be minimised. However, this goal should be balanced with the risks of amblyopia and long-term poor vision. Therefore, the selection of high hyperopia (>+7.00 D) as an initial postoperative target refraction should be avoided when possible.
 
Strengths and limitations
A major strength of our study was the long follow-up period. Additionally, it only included infants who underwent IOL implantation before the age of 1 year because the refractive change in this group exhibits the greatest variability and is most challenging to predict.5
 
There were some limitations in this study. First, it used a retrospective design and included a small number of patients. Second, there was no objective monitoring of compliance with optical correction or amblyopia treatment. Third, few unilateral cases were included, which may have hindered the detection of larger myopic shifts in post–IOL implantation in unilateral cases. Notably, some previous studies revealed larger myopic shifts after IOL implantation in such cases.4 17 21 22
 
In conclusion, the large and variable refractive change after IOL implantation in infants aged <1 year hinders the prediction of long-term refractive outcomes in individual patients. When selecting target refraction in infants, low to moderate hyperopia (<+7.00 D) should be considered to balance the avoidance of high myopia in adulthood with the risk of worse long-term visual acuity related to high postoperative hyperopia
 
Author contributions
Concept or design: All authors.
Acquisition of data: JJT Chan.
Analysis or interpretation of data: JJT Chan.
Drafting of the manuscript: JJT Chan.
Critical revision of the manuscript for important intellectual content: All authors.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
As an editor of the journal, JC Yam was not involved in the peer review process. Other authors have disclosed no conflicts of interest.
 
Funding/support
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
 
Ethics approval
Ethics approval was granted by the Research Ethics Committee (Kowloon Central/Kowloon East), Hospital Authority (Ref No.: KC/KE-19-0059/ER-4). The requirement for patient consent was waived by the ethics board due to the retrospective nature of the study. The study is conducted in accordance with the ethical principles of the Declaration of Helsinki.
 
References
1. Kumar P, Lambert SR. Evaluating the evidence for and against the use of IOLs in infants and young children. Expert Rev Med Devices 2016;13:381-9. Crossref
2. Infant Aphakia Treatment Study Group; Lambert SR, Lynn MJ, et al. Comparison of contact lens and intraocular lens correction of monocular aphakia during infancy: a randomized clinical trial of HOTV optotype acuity at age 4.5 years and clinical findings at age 5 years. JAMA Ophthalmol 2014;132:676-82. Crossref
3. Lambert SR, Aakalu VK, Hutchinson AK, et al. Intraocular lens implantation during early childhood: a report by the American Academy of Ophthalmology. Ophthalmology 2019;126:1454-61. Crossref
4. Valera Cornejo DA, Flores Boza A. Relationship between preoperative axial length and myopic shift over 3 years after congenital cataract surgery with primary intraocular lens implantation at the National Institute of Ophthalmology of Peru, 2007-2011. Clin Ophthalmol 2018;12:395-9. Crossref
5. McClatchey SK, Parks MM. Theoretic refractive changes after lens implantation in childhood. Ophthalmology 1997;104:1744-51. Crossref
6. Enyedi LB, Peterseim MW, Freedman SF, Buckley EG. Refractive changes after pediatric intraocular lens implantation. Am J Ophthalmol 1998;126:772-81. Crossref
7. Astle WF, Ingram AD, Isaza GM, Echeverri P. Paediatric pseudophakia: analysis of intraocular lens power and myopic shift. Clin Experiment Ophthalmol 2007;35:244-51. Crossref
8. Yam JC, Wu PK, Ko ST, Wong US, Chan CW. Refractive changes after pediatric intraocular lens implantation in Hong Kong children. J Pediatr Ophthalmol Strabismus 2012;49:308-13.
9. Weakley DR Jr, Lynn MJ, Dubois L, et al. Myopic shift 5 years after intraocular lens implantation in the Infant Aphakia Treatment Study. Ophthalmology 2017;124:822-7. Crossref
10. Magli A, Forte R, Carelli R, Rombetto L, Magli G. Long-term outcomes of primary intraocular lens implantation for unilateral congenital cataract. Semin Ophthalmol 2015;31:1-6. Crossref
11. Lowery RS, Nick TG, Shelton JB, Warner D, Green T. Long-term visual acuity and initial postoperative refractive error in pediatric pseudophakia. Can J Ophthalmol 2011;46:143-7. Crossref
12. Vasavada A, Chauhan H. Intraocular lens implantation in infants with congenital cataracts. J Cataract Refract Surg 1994;20:592-8. Crossref
13. Plager DA, Yang S, Neely D, Sprunger D, Sondhi N. Complications in the first year following cataract surgery with and without IOL in infants and older children. J AAPOS 2002;6:9-14. Crossref
14. Gordon RA, Donzis PB. Refractive development of the human eye. Arch Ophthalmol 1985;103:785-9. Crossref
15. Indaram M, VanderVeen DK. Postoperative refractive errors following pediatric cataract extraction with intraocular lens implantation. Semin Ophthalmol 2018;33:51-8. Crossref
16. Yam JC, Tang SM, Kam KW, et al. High prevalence of myopia in children and their parents in Hong Kong Chinese population: the Hong Kong Children Eye Study. Acta Ophthalmol 2020;98:e639-48. Crossref
17. Negalur M, Sachdeva V, Neriyanuri S, Ali M, Kekunnaya R. Long-term outcomes following primary intraocular lens implantation in infants younger than 6 months. Indian J Ophthalmol 2018;66:1088-93. Crossref
18. Fan DS, Rao SK, Yu CB, Wong CY, Lam DS. Changes in refraction and ocular dimensions after cataract surgery and primary intraocular lens implantation in infants. J Cataract Refract Surg 2006;32:1104-8. Crossref
19. Lu Y, Ji YH, Luo Y, Jiang YX, Wang M, Chen X. Visual results and complications of primary intraocular lens implantation in infants aged 6 to 12 months. Graefes Arch Clin Exp Ophthalmol 2010;248:681-6. Crossref
20. O’Keefe M, Fenton S, Lanigan B. Visual outcomes and complications of posterior chamber intraocular lens implantation in the first year of life. J Cataract Refract Surg 2001;27:2006-11. Crossref
21. Hoevenaars NE, Polling JR, Wolfs RC. Prediction error and myopic shift after intraocular lens implantation in paediatric cataract patients. Br J Ophthalmol 2011;95:1082-5. Crossref
22. McClatchey SK, Dahan E, Maselli E, et al. A comparison of the rate of refractive growth in pediatric aphakic and pseudophakic eyes. Ophthalmology 2000;107:118-22. Crossref
23. Lambert SR, Archer SM, Wilson ME, Trivedi RH, del Monte MA, Lynn M. Long-term outcomes of undercorrection versus full correction after unilateral intraocular lens implantation in children. Am J Ophthalmol 2012;153:602-8.e1. Crossref
24. Barry JS, Ewings P, Gibbon C, Quinn AG. Refractive outcomes after cataract surgery with primary lens implantation in infants. Br J Ophthalmol 2006;90:1386-9. Crossref
25. Plager DA, Kipfer H, Sprunger DT, Sondhi N, Neely DE. Refractive change in pediatric pseudophakia: 6-year follow-up. J Cataract Refract Surg 2002;28:810-5. Crossref
26. Gouws P, Hussin HM, Markham RH. Long term results of primary posterior chamber intraocular lens implantation for congenital cataract in the first year of life. Br J Ophthalmol 2006;90:975-8. Crossref
27. Morgan IG, French AN, Ashby RS, et al. The epidemics of myopia: aetiology and prevention. Prog Retin Eye Res 2018;62:134-49. Crossref
28. Pascual M, Huang J, Maguire MG, et al. Risk factors for amblyopia in the vision in preschoolers study. Ophthalmology 2014;121:622-9.e1. Crossref
29. Drews-Botsch CD, Hartmann EE, Celano M, Infant Aphakia Treatment Study Group. Predictors of adherence to occlusion therapy 3 months after cataract extraction in the Infant Aphakia Treatment Study. J AAPOS 2012;16:150-5. Crossref
30. O’Hara MA. Pediatric intraocular lens power calculations. Curr Opin Ophthalmol 2012;23:388-93. Crossref

Assessment of healthcare quality among village clinicians in rural China: the role of internal work motivation

Hong Kong Med J 2023 Feb;29(1):57-65 | Epub 9 Feb 2023
© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE (HEALTHCARE IN MAINLAND CHINA)
Assessment of healthcare quality among village clinicians in rural China: the role of internal work motivation
Q Gao, PhD; L Peng, MSc; S Song, MSc; Y Zhang, MSc; Y Shi, PhD
Center for Experimental Economics in Education, Shaanxi Normal University, Xi'an, China
 
Corresponding author: Prof Y Shi (shiyaojiang7@gmail.com)
 
 Full paper in PDF
 
Abstract
Introduction: The quality of primary care is important for health outcomes among residents in China. There is evidence that internal work motivation improves the quality of healthcare provided by clinicians. However, few empirical studies have examined the relationship between internal work motivation and clinical performance among village clinicians in rural China. This study was performed to evaluate healthcare quality among village clinicians, then explore its relationships with internal work motivation among those clinicians.
 
Methods: We collected survey data using a standardised patient method and a structured questionnaire. We observed 225 interactions between standardised patients and village clinicians from 21 counties in three provinces. We used logistic regression models to analyse the relationships between work motivation and healthcare quality, then conducted heterogeneity analysis.
 
Results: Healthcare quality among village clinicians was generally low. There was a significantly positive correlation between internal work motivation and healthcare quality among village clinicians (P<0.1). Additionally, the positive effect of internal work motivation on healthcare quality was strongest among clinicians who received financial incentives and had a lighter workload (fewer patients per month) [P<0.1].
 
Conclusion: Healthcare quality among village clinicians requires urgent improvement. We recommend implementing financial incentives to stimulate internal work motivation among village clinicians, thus improving their clinical performance.
 
 
New knowledge added by this study
  • Internal work motivation was positively correlated with healthcare quality among village clinicians in rural China.
  • The positive correlation was strongest among clinicians who received financial incentives and had a lighter workload (fewer patients per month).
Implications for clinical practice or policy
  • Healthcare quality among village clinicians in rural China should be enhanced by improving their internal work motivation.
  • Interventions that include financial incentives should be implemented to strengthen the positive effect of internal work motivation on healthcare quality among clinicians.
 
 
Introduction
Village clinics, the first tier of rural health systems in China, are responsible for preventing and treating common diseases among rural residents.1 2 However, the quality of healthcare provided by village clinicians may be unsatisfactory in rural China.3 4 Village clinicians generally have a low level of education and limited medical qualifications.4 There is some evidence that, among village clinicians, the first records of formal schooling are primarily vocational school degrees; most (84.3%) of these clinicians only have the basic medical certification necessary to practise medicine in rural areas.5 Moreover, despite limited empirical evaluation, available data indicate that rural primary clinicians have low diagnostic quality and provide poor management of chronic diseases.6 A 2012 study in Shaanxi Province revealed that 41% of diagnoses were incorrect; treatments were considered correct or partially correct in 53% of clinician-patient interactions.5 A systematic review of 24 studies between 2000 and 2012 showed the rate of antibiotic use in rural clinics was much higher than the rate recommended by the World Health Organization.7 8
 
The Chinese Government has recognised the need to strengthen primary healthcare in rural areas. To improve health among rural residents, the government has recently issued multiple policies that are intended to improve service capacity within primary medical systems.9 10 For example, to improve clinical knowledge among village clinicians, several government departments jointly implemented a plan in 2013, which focused on the provision of continuing education for clinicians.11 In 2019, the Basic Medical and Health Promotion Law of the People’s Republic of China emphasised the need to support the development of primary medical institutions and implement various policies that would improve primary medical service capabilities.12
 
Although improvements in internal work motivation among village clinicians may help to enhance their medical performance, few empirical studies have examined the relationship between these two characteristics among village clinicians in rural China. Theory-focused researches indicate that internal work motivation is important for improvements to clinician performance.13 14 Other theory-based researches in China have suggested that clinicians with higher internal motivation are more likely to deliver higher-quality work.15 16 Quantitative analyses of clinician behaviour, primarily conducted in other countries, have also revealed positive effects of internal work motivation on healthcare quality and work performance of clinicians.17 18 19 To our knowledge, empirical studies of work motivation in China have primarily focused on individuals in business careers and similar occupations; few have considered groups of clinicians.20 21 22 Thus, there have been few empirical studies involving village clinicians in rural China.
 
This study explored the relationship between internal work motivation and healthcare quality among village clinicians in rural China. First, using a standardised patient method and questionnaire interviews, we evaluated healthcare quality and internal work motivation among village clinicians. Second, we examined the relationships between internal work motivation and healthcare quality among village clinicians. Third, we conducted heterogeneity analysis with a focus on clinician workload and financial incentives.
 
Methods
Sampling and data collection
Our study sampling was conducted in the rural areas of three prefectures, each located in one of the following three provinces: Sichuan, Shaanxi, and Anhui. Representative samples were selected using a multi-level random method. First, 21 sample counties were randomly selected from the sample prefectures. Next, 10 townships from each sampled county were randomly chosen as sample townships; 209 sample townships were selected because one sample county contained only nine townships. Then, one village was randomly selected from each township. Finally, all village clinics in the sample village were included; one standardised patient interaction was completed in the sample village.
 
We conducted two sets of surveys to collect data regarding basic characteristics, internal work motivation, and healthcare quality among village clinicians in 2015. In the first set of surveys, we primarily gathered information regarding the characteristics of village clinics and village clinicians. Specifically, we used a facility structured questionnaire to enquire about the value of each sample clinic’s medical instruments and institutional net income in 2014 (both in Renminbi [RMB]), and length of daily lunch break (hours). We recorded the following characteristics of sample village clinicians: age, gender, level of education and clinical qualifications, duration of service, monthly salary (in RMB), number of training days in 2014, clinician workload (mean number of patients per month), mean duration of consultation per patient (minutes), and any financial incentives. Additionally, we asked village clinicians to respond to questions regarding internal work motivation.
 
In the second set of surveys, we used a standardised patient method to evaluate the quality of healthcare provided by sample village clinicians. This method avoids problems such as the Hawthorne effect and recall bias, accurately assesses healthcare quality among clinicians, and is widely used in other countries.23 24 We recruited 63 individuals (ie, standardised patients; 21 in each province) to present three predetermined disease cases of diarrhoea, tuberculosis, and unstable angina in a standardised manner. Generally, we randomly allocated one standardised patient to each sample clinic to report a case that had been randomly selected prior to allocation.
 
Measurement of healthcare quality
We evaluated the quality of healthcare provided by village clinicians using three indicators: process quality, diagnostic accuracy, and treatment accuracy. We assigned a process quality value of 1 to clinicians who completed more than the mean percentage of suggested items, indicating a high-quality enquiry process. Otherwise, the process quality value was 0. Regarding diagnostic and treatment accuracies, we assigned a value of 0 to an ‘incorrect’ result, based on predetermined criteria. Otherwise, ‘correct’ or ‘partly correct’ results were assigned an accuracy value of 1. The treatment was also considered correct if the clinician referred the patient to a higher-level hospital.
 
Measurement of internal work motivation
According to Amabile and Mueller,25 an individual’s work motivation is defined as internal work motivation if it originates from love and interest. The internal motivation instrument in our study included four items, such as ‘because I like what I do for a living'. The responses of four items were rated on a 7-point Likert-type scale, ranging from 1 = strongly disagree to 7 = strongly agree. In this study, we assigned a value of 0 to responses indicating disagreement or neutrality (with original score of 1-4) and a value of 1 to responses indicating agreement (with original score of 5-7). The total score of the four items on our instrument represented a clinician's level of internal work motivation. The total score ranges from 0 to 4; a higher score indicated a higher level of motivation.
 
The Cronbach’s α value of the internal work motivation questionnaire was 0.826, which indicated that the scale had good internal consistency. The Kaiser–Meyer–Olkin value of the questionnaire was 0.705, indicating that the scale had good structural validity. These results confirmed that the questionnaire was an acceptable measurement tool.
 
Statistical analysis
STATA15.0 software (Stata Corporation; College Station [TX], United States) was used to perform descriptive and regression analyses of the collected data. Logistic regression models with a significance threshold of P<0.1 were used to analyse relationships between internal work motivation and healthcare quality.26 27 28 Two items, clinician workload × internal motivation interaction and financial incentive × internal motivation interaction, were added to the model for analyses of heterogeneity. All regression analyses were adjusted for fixed effects of disease cases, standardised patients, and the coder.
 
Results
Characteristics of sample village clinicians and clinics
In total, 225 village clinicians from 225 village clinics were included in this study. Table 1 describes the basic characteristics of sample village clinicians. The mean age of the clinicians was 49.20 years, and 196 clinicians (87.11%) were men. Among the 225 clinicians, 25 (11.11%) had attended college or above, whereas seven (3.11%) had a practising clinician qualification. Each clinician examined a mean of 171 patients per month. Mean salaries for village clinicians were particularly low (slightly >1900 RMB per month), and 103 clinicians (45.78%) had received financial incentives.
 

Table 1. Characteristics of sample village clinics and clinicians (n=225)
 
Table 1 also describes the characteristics of sample village clinics. The mean value of medical equipment was 920 RMB, and the mean institutional net income in 2014 was 25 500 RMB. However, only 86 clinics (38.22%) had a medical equipment value above the mean. This result indicates that the value of medical equipment considerably varied among sample clinics, and the value of medical equipment in most clinics was inadequate. Notably, clinics had a mean lunch break length of <1 hour.
 
Healthcare quality among village clinicians
The unannounced standardised patients completed 225 disease cases (57, 87, and 81 cases of diarrhoea, angina, and tuberculosis, respectively). Table 2 shows the healthcare quality among sample village clinicians determined via three disease cases. On average, the clinicians completed 17% of the recommended consultation and examination items. Furthermore, 129 clinicians (57.33%) completed fewer than the mean number of recommended consultation and examination items. Among all types of cases, 73 clinicians (32.44%) provided a completely or partially correct diagnosis. Furthermore, 94 clinicians (41.78%) provided correct or partly correct treatments across all types of cases. Although the results of these three indicators varied among diseases, the percentages of clinicians with number of recommended consultation and examination items above the mean, number of correct diagnoses, and number of treatments for each disease were generally low.
 

Table 2. Healthcare quality among sample village clinicians determined via three disease cases
 
Internal work motivation of village clinicians
Table 3 shows the levels of internal work motivation among sample village clinicians. Overall, 213 clinicians (94.67%) believed that ‘I like what I do for a living’ or ‘I enjoy my job’ motivated their work in clinics. Furthermore, 187 (83.11%) and 206 (91.56%) clinicians indicated that their respective main work motivations were ‘because my job is interesting’ and ‘because my job is fun’. Integration of the scores for the four items revealed that the mean overall score for internal work motivation was 3.64 ± 0.85 (range, 0-4).
 

Table 3. Internal work motivation of sample village clinics (n=225)
 
Relationships between internal work motivation and healthcare quality among village clinicians
Table 4 presents the results of logistic regression analysis of the relationship between internal work motivation and healthcare quality among village clinicians. Internal work motivation had a positive effect on clinical performance among sample clinicians. Specifically, for each one-unit increase in internal work motivation, village clinicians were 42.17% (P<0.1) and 45.61% (P<0.1) more likely to provide a correct or partially correct diagnosis and treatment, respectively.
 

Table 4. Logistic regression analysis of relationships between internal work motivation and healthcare quality among sample village clinicians
 
Table 5 shows the results of heterogeneity analysis from the perspective of clinician workload and financial incentives. The clinician workload × internal motivation interaction was significantly negatively correlated with diagnostic accuracy, whereas the financial incentive × internal motivation interaction was significantly positively correlated with treatment accuracy (P<0.1). These results indicate that a heavier workload could hinder the positive effect of internal motivation on diagnostic accuracy among village clinicians. Furthermore, among village clinicians who received financial incentives, the positive effect of their internal work motivation on their treatments was stronger than the corresponding effect among village clinicians who did not receive financial incentives.
 

Table 5. Heterogeneity analysis of the relationship between internal work motivation and healthcare quality among sample village clinicians
 
Discussion
This study evaluated the healthcare quality among village clinicians in rural China and its relationship with internal work motivation among these clinicians, through an analysis of 225 rural village clinicians from three provinces in 2015. There were three main findings. First, healthcare quality among village clinicians needed to be improved. Second, village clinicians with stronger internal work motivation were more likely to offer appropriate treatment. Third, village clinicians with a lighter workload (fewer patients per month) or financial incentives exhibited a stronger positive correlation between internal motivation and healthcare quality.
 
Generally, interactions between unannounced standardised patients and sample village clinicians showed that poor healthcare quality was provided by village clinics in rural China. On average, village clinicians completed only 17% of the recommended consultation and examination items. The rates of diagnostic accuracy and treatment accuracy (including correct or partly correct treatment) were 32.44% and 41.78%, respectively. Our findings of poor healthcare quality are comparable with the results of other studies performed at primary health centres in rural China. For example, a study based on the patient’s perspective, conducted in Guangdong Province, highlighted the difficulty in maintaining adequate coordination among primary medical services.29 A survey using a standardised patient method revealed that healthcare quality was worse in rural China than in primary care settings in Nairobi, Kenya.30 A systematic analysis of rural township health centres in Shandong Province also indicated a need for improved healthcare quality among primary care clinicians.16
 
We found that internal work motivation was generally high among village clinicians. The mean internal work motivation score was 3.64 ± 0.85, indicating that most village clinicians liked their jobs and were interested in their careers. Consistent with our findings, previous studies in other countries showed that most medical workers had high levels of internal work motivation.31 32 33 Although few empirical studies have evaluated internal work motivation among village clinicians, there is some evidence that rural primary care clinicians in China experience meaning and pleasure from engaging in medical work.13 Additionally, similar to results in other countries, we found that among the intrinsic factors, most village clinicians believed that a love for their career motivated them to work.33 34 35
 
Consistent with data from studies in other countries,17 19 our empirical analysis demonstrated significant positive correlations between internal work motivation and healthcare quality among village clinicians in rural China. According to affect heuristic theory, this relationship presumably arises because individuals rely on emotions to make behavioural decisions, and a positive attitude will lead to higher-quality behaviours.36 Empirical results from other countries support this assumption. A study in the United States demonstrated the importance of internal work motivation in medical behaviour decisions; clinicians with higher internal motivation were more willing to maintain higher quality in their work.17 The findings of studies in developing countries, such as Ghana and Indonesia, also indicated that work motivation can significantly improve the quality of medical services provided by clinicians.19 37 Thus, efforts to stimulate internal work motivation among village clinicians may help to improve their healthcare quality.
 
The results of heterogeneity analysis showed that the positive effect of internal work motivation on healthcare quality varied according to clinician workload and financial incentives. Specifically, internal work motivation had a stronger positive effect on clinical performance among village clinicians who had a lighter workload (fewer patients per month). This is presumably because clinicians with a heavier workload (more patients) are more likely to experience burnout38 and a decreased sense of autonomy,39 40 which could reduce internal motivation and ultimately lead to a decline in work performance.14 39 41 Additionally, compared with village clinicians who did not receive financial incentives, clinicians who received financial incentives experienced a stronger positive effect on healthcare quality because of their internal work motivation. Studies of clinicians, combined with the results of theoretical analyses in other fields (ie, motivational synergy theory and self-decision theory), suggest that the provision of financial incentives encourages a belief of greater competence among clinicians; this belief, in conjunction with internal work motivation, enables clinicians to maintain high quality in their work.13 14 41 42 Previous empirical studies have also demonstrated that performance-related financial incentives can improve internal work motivation among employees, leading to improvements in performance.43
 
The results of these heterogeneity analyses support efforts to enhance the positive effect of internal work motivation on healthcare quality by providing appropriate incentives for clinicians. Consistent with this perspective, the Chinese Government has been implementing incentive programmes during the past decade to improve healthcare quality among primary care clinicians in rural China.10 11 For example, the government is actively restructuring the salary and performance system, while asserting that healthcare systems at all levels should engage in combined efforts to provide additional financial incentives.44
 
To further promote internal work motivation among clinicians and improve their work performance, we recommend the revision of governmental incentives policies, based on existing policies. Specifically, medical institutions at all levels should establish performance accountability45; and emphasis should be placed on including physician performance in assessments to incentivise high-quality healthcare. Furthermore, medical institutions at all levels should provide additional financial incentives to clinicians based on assessments of patient experiences. These programmes could strengthen the positive effect of internal motivation on work performance among clinicians and improve their healthcare quality. Additionally, the workload of primary clinicians should be carefully managed to preserve the positive effect of their intrinsic motivation on job performance.
 
This study had a few limitations. First, it was a cross-sectional study, and the results represent correlations rather than causal relationships. Second, because we randomly selected samples from Sichuan, Shaanxi, and Anhui provinces, our results may not be fully representative of village clinicians and village clinics throughout rural China. Third, the reported level of internal work motivation may have been overestimated because this variable was self-reported by village clinicians.
 
Conclusion
Overall, healthcare quality was poor among village clinicians in rural China. Furthermore, there were positive correlations between internal work motivation and healthcare quality among rural village clinicians; these positive correlations were stronger among clinicians with financial incentives and lighter workload. Our findings suggest that the Chinese Government should implement policies to provide financial incentives for clinicians, with the goal of enhancing internal work motivation among village clinicians and improving their healthcare quality.
 
Author contributions
Concept or design: Q Gao, Y Shi, L Peng.
Acquisition of data: L Peng, S Song, Y Zhang.
Analysis or interpretation of data: L Peng, Q Gao, S Song.
Drafting of the manuscript: L Peng, Q Gao, Y Shi.
Critical revision of the manuscript for important intellectual content: All authors.
 
Conflicts of interest
As an International Editorial Advisory Board member of the journal, Y Shi was not involved in the peer review process. Other authors have disclosed no conflicts of interest.
 
Acknowledgement
All authors thank the standardised patients and investigators for their contribution and hard work.
 
Funding/support
This work received funding from 111 Project (Grant No.: B16031), National Natural Science Foundation of China (Grant No.: 72203134) and Innovation Capability Support Program of Shaanxi, China (Grant No.: 2022KRM007).
 
Ethics approval
Ethical approval was obtained from the Institutional Review Board of Sichuan University, China (Protocol No.: K2015025). The board approved the verbal consent procedure. Participants in this study were informed of the survey procedure and consented to publication.
 
References
1. Babiarz KS, Miller G, Yi H, Zhang L, Rozelle S. China’s new cooperative medical scheme improved finances of township health centers but not the number of patients served. Health Aff (Millwood) 2012;31:1065-74. Crossref
2. Shi Y, Xue H, Wang H, Sylvia S, Medina A, Rozelle S. Measuring the quality of doctors’ health care in rural China: an empirical research using standardized patients [in Chinese]. Stud Labour Econ 2016;4:48-71.
3. Guo W, Sylvia S, Umble K, Chen Y, Zhang X, Yi H. The competence of village clinicians in the diagnosis and treatment of heart disease in rural China: a nationally representative assessment. Lancet Reg Health West Pac 2020;2:100026. Crossref
4. Tan J, Yao Y, Wang Q. Comparative analysis of the development and service utilisation of health resources in Guangxi township hospitals and the whole country [in Chinese]. Soft Sci Health 2019;33:46-50.
5. Sylvia S, Shi Y, Xue H, et al. Survey using incognito standardized patients shows poor quality care in China’s rural clinics. Health Policy Plan 2015;30:322-33. Crossref
6. Li X, Lu J, Hu S, et al. The primary health-care system in China. Lancet 2017;390:2584-94. Crossref
7. Yin X, Song F, Gong Y, et al. A systematic review of antibiotic utilization in China. J Antimicrob Chemother 2013;68:2445-52. Crossref
8. Virtual Health Library. Using indicators to measure country pharmaceutical situations: fact book on WHO level I and level II monitoring indicators. 2006. Available from: https://pesquisa.bvsalud.org/portal/resource/pt/mis-19254. Accessed 30 Sep 2021.
9. Communist Party of China Central Committee, State Council, People’s Republic of China. “Healthy China 2030” Programme [in Chinese]. 2016. Available from: http://www.gov.cn/zhengce/2016-10/25/content_5124174.htm. Accessed 30 Sep 2021.
10. State Council, People’s Republic of China. “Thirteenth Five-Year” Sanitation and Health Plan [in Chinese]. 2017. Available from: http://www.gov.cn/zhengce/content/2017-01/10/content_5158488.htm. Accessed 30 Sep 2021.
11. National Health and Family Planning Commission, National Development and Reform Commission, Ministry of Education of the People’s Republic of China, Ministry of Finance of the People’s Republic of China, National Administration of Traditional Chinese Medicine. National Rural Doctor Education Plan (2011-2020) [in Chinese]. 2013. Available from: http://www.gov.cn/gzdt/2013-10/30/content_2518099.htm. Accessed 20 Jan 2023.
12. State Council, People’s Republic of China. The 15th Meeting of the Standing Committee of the 13th National People’s Congress. Basic Medical Hygiene and Health Promotion Law of the People’s Republic of China [in Chinese]. 2019. Available from: http://www.gov.cn/xinwen/2019-12/29/content_5464861.htm. Accessed 5 Oct 2021.
13. Yuan B, Meng Q. Behaviour determinants of rural health workers: based on work motivation theory analysis [in Chinese]. Chin Health Econ 2012;31:50-2.
14. Kao AC. Driven to care: aligning external motivators with intrinsic motivation. Health Serv Res 2015;50 Suppl 2:2216-22. Crossref
15. Yuan B, Meng Q, Hou Z, Sun X, Song K. Analysis on incentive mechanism and motivation of rural health providers [in Chinese]. Chin J Health Policy 2010;3:3-9.
16. Yuan S, Meng Q, Sun X. Studying on the quality of medical services in township health centres in view of structural quality [in Chinese]. Chin Health Serv Manage 2012;29:841-4.
17. Green EP. Payment systems in the healthcare industry: an experimental study of physician incentives. J Econ Behav Organ 2014;106:367-78. Crossref
18. Mangkunegara AP, Agustine R. Effect of training, motivation and work environment on physicians’ performance. Acad J Interdiscip Stud 2016;5:173-88. Crossref
19. Al Aluf W, Sudarsih S, Musemedi DP, Supriyadi S. Assessing the impact of motivation, job satisfaction, and work environment on the employee performance in healthcare services. Int J Sci Technol Res 2017;6:337-41.
20. Hou X, Lu F. Effects of work values of millennial employees, intrinsic motivation on job performance: the moderating effect of organizational culture. Manage Rev 2018;30:157-68.
21. Li W, Mei J. Intrinsic motivation and employee performance: based on the effects of work engagement as intermediary [in Chinese]. Manage Rev 2013;25:160-7.
22. Liao J, Jing Z, Liu W, Wang X. Promotion opportunity stagnation and job performance: the mediating—role of intrinsic motivation and perceived insider status. Ind Eng Manage 2015;20:15-21.
23. Das J, Holla A, Das V, Mohanan M, Tabak D, Chan B. In urban and rural India, a standardized patient study showed low levels of provider training and huge quality gaps. Health Aff (Millwood) 2012;31:2774-84. Crossref
24. Rethans JJ, Gorter S, Bokken L, Morrison L. Unannounced standardised patients in real practice: a systematic literature review. Med Educ 2007;41:537-49. Crossref
25. Amabile TM, Mueller J. Studying creativity, its processes, and its antecedents: an exploration of the componential theory of creativity. In: Zhou J, Shalley CE, editors. Handbook of Organizational Creativity. New York: Lawrence Erlbaum Associates; 2008: 33-64.
26. Lee KI, Koval JJ. Determination of the best significance level in forward step-wise logistic regression. Commun Stat Simul Comput 1997;2:559-75. Crossref
27. Maneejuk P, Yamaka W. Significance test for linear regression: how to test without P-values? J Appl Stat 2020;31:827-45. Crossref
28. Feise R. Do multiple outcome measures require p-value adjustment? BMC Med Res Methodol 2002;17:1471-2288-2-8. Crossref
29. Feng S. Analysis of the quality of primary care and the influencing factors in rural Guangdong Province—based on demander perspective [in Chinese]. Chin Health Serv Manage 2016;33:824-7.
30. Daniels B, Dolinger A, Bedoya G, et al. Use of standardised patients to assess quality of healthcare in Nairobi, Kenya: a pilot, cross-sectional study with international comparisons. BMJ Glob Health 2017;2:e000333.
31. Settineri S, Merlo EM, Frisone F, et al. The experience of health and suffering in the medical profession. Mediterr J Clin Psychol 2018;6:1-14.
32. Toode K, Routasalo P, Helminen M, Suominen T. Hospital nurses’ individual priorities, internal psychological states and work motivation. Int Nurs Rev 2014;61:361-70. Crossref
33. Tsounis A, Sarafis P, Bamidis PD. Motivation among physicians in Greek public health-care sector. Br J Med Med Res 2014;4:1094-105. Crossref
34. Ferraro T, dos Santos NR, Moreira JM, Pais L. Decent work, work motivation, work engagement and burnout in physicians. Int J Appl Posit Psychol 2020;5:13-35. Crossref
35. Lochner L, Wieser H, Mischo-Kelling M. A qualitative study of the intrinsic motivation of physicians and other health professionals to teach. Int J Med Educ 2012;3:209-15. Crossref
36. Slovic P, Finucane ML, Peters E, MacGregor DG. The affect heuristic. Eur J Oper Res 2007;177:1333-52. Crossref
37. Alhassan RK, Spieker N, van Ostenberg P, Ogink A, Nketiah-Amponsah E, de Wit TF. Association between health worker motivation and healthcare quality efforts in Ghana. Hum Resour Health 2013;11:37. Crossref
38. Ward ZD, Morgan ZJ, Peterson LE. Family physician burnout does not differ with rurality. J Rural Health 2021;37:755-61. Crossref
39. Wang Y, Hu XJ, Wang HH, et al. Follow-up care delivery in community-based hypertension and type 2 diabetes management: a multi-centre, survey study among rural primary care physicians in China. BMC Fam Pract 2021;22:224. Crossref
40. Shirom A, Nirel N, Vinokur AD. Work hours and caseload as predictors of physician burnout: the mediating effects by perceived workload and by autonomy. Appl Psychol 2010;59:539-65. Crossref
41. Deci EL, Ryan RM. Intrinsic Motivation and Self-Determination in Human Behavior. New York: Springer; 1985. Crossref
42. Amabile TM. Motivational synergy: toward new conceptualizations of intrinsic and extrinsic motivation in the workplace. Hum Resour Manage Rev 1993;3:185-201. Crossref
43. Eisenberger R, Aselage J. Incremental effects of reward on experienced performance pressure: positive outcomes for intrinsic interest and creativity. J Organiz Behav 2009;30:95-117. Crossref
44. Qin J, Li S, Lin C. Reform progress and development strategy of incentive mechanism for training and use of general practitioners in China [in Chinese]. Chin Gen Pract 2020;23:2351-8.
45. Li X, Krumholz HM, Yip W, et al. Quality of primary health care in China: challenges and recommendations. Lancet 2020;395:1802-12. Crossref

Cost-minimisation analysis of intravenous versus subcutaneous trastuzumab regimen for breast cancer management in Hong Kong

Hong Kong Med J 2023 Feb;29(1):16-21 | Epub 3 Feb 2023
© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE  CME
Cost-minimisation analysis of intravenous versus subcutaneous trastuzumab regimen for breast cancer management in Hong Kong
Vivian WY Lee, PharmD1; Franco WT Cheng, MClinPharm2
1 Centre for Learning Enhancement And Research, The Chinese University of Hong Kong, Hong Kong
2 Department of Pharmacology and Pharmacy, Li Ka Shing Faculty of Medicine, The University of Hong Kong, Hong Kong
 
Corresponding author: Prof Vivian WY Lee (vivianlee@cuhk.edu.hk)
 
 Full paper in PDF
 
Abstract
Introduction: In 2017, breast cancer was the most common cancer and third leading cause of cancer death among women in Hong Kong. Approximately 20% of patients were human epidermal growth factor receptor-2 (HER2)-positive. This study was conducted to investigate cost differences between intravenous and subcutaneous trastuzumab regimens in Hong Kong using medical resources utilisation data from other countries.
 
Methods: A cost-minimisation model was developed to compare the cost of total care, including direct medical cost and full-time equivalent (FTE) hours. The drug acquisition cost was obtained from the manufacturer, whereas the costs for hospitalisation and clinic visits were acquired from the Hong Kong Gazette. Time (in FTE hours) was determined by literature review. All costs were expressed in US dollars (US$1 = HK$7.8). Costs were not discounted because of the short time horizon. One-way deterministic sensitivity analysis was performed to identify the effects of changes in drug acquisition cost, changes in FTE hours (based on confidence intervals reported), and changes in body weight (±20%).
 
Results: Literature review indicated that 0.18 FTE hour of nursing time (7.9 hours) and 0.14 FTE hour of pharmacist time (6.2 hours) could be saved each week if the subcutaneous formulation was used. Using data in 2017, after 18 cycles of treatment with subcutaneous trastuzumab, the drug acquisition and healthcare professional time costs were reduced by US$9451.28 and US$566.16, respectively, yielding an annual savings of over US$8 million.
 
Conclusion: The subcutaneous formulation of trastuzumab is a potential cost-saving therapy for HER2-positive breast cancer patients in Hong Kong. The drug acquisition cost was the parameter with the greatest effect on the total cost of treatment.
 
 
New knowledge added by this study
  • The results of this study suggest that the subcutaneous formulation of trastuzumab would be a cost-saving therapy for HER2-positive breast cancer patients in Hong Kong.
  • The drug acquisition cost was the parameter with the greatest effect on the total cost of treatment.
Implications for clinical practice or policy
  • The high drug acquisition cost of trastuzumab may prevent patients from receiving effective treatment.
  • The subcutaneous formulation of trastuzumab is expected to remain more cost-effective, despite the potential emergence of biosimilar trastuzumab.
 
 
Introduction
In 2017, breast cancer was the most common cancer and third leading cause of cancer death among women in Hong Kong.1 Additionally, an estimated 20% of breast cancers in Hong Kong were human epidermal growth factor receptor-2 (HER2)-positive.2 3
 
Intravenous (IV) trastuzumab, in combination with chemotherapy, is licensed for the treatment of HER2-positive early-stage breast cancer and metastatic breast cancer. It must be reconstituted into solution for loading dose infusion over a duration of 90 minutes, followed by maintenance dose infusion over a duration of 30 minutes.4 Additionally, IV trastuzumab is dosed according to each patient’s body weight, with a loading dose of 8 mg/kg followed by a maintenance dose of 6 mg/kg every 3 weeks.4 This regimen consumes considerable healthcare resources, including drug preparation and administration time, clinic and chair time, and physician time dedicated to patient interaction.5
 
A fixed-dose subcutaneous (SC) formulation of trastuzumab was developed to allow drug administration over approximately 5 minutes, which is much shorter than the duration of IV infusion. The 600-mg dose of SC trastuzumab every 3 weeks is non-dinferior to the IV formulation with respect to efficacy and tolerability.6 7 Furthermore, approximately 90% of patients preferred SC over IV administration of trastuzumab in the PrefHer (Preference for subcutaneous or intravenous administration of trastuzumab in patients with HER2-positive early breast cancer) randomised crossover trials,8 9 which were designed to assess patient preference and healthcare professional satisfaction with both treatment options.
 
Data from other countries have demonstrated that for SC formulation of trastuzumab, less time is required for drug preparation and administration; moreover, fewer consumables are used.10 11 12 13 A cost-minimisation analysis (CMA) study in Greece demonstrated that the total cost of therapy per patient was 21 870 euros (€) when using the SC formulation of trastuzumab, whereas it was €23 118 when using the IV formation of trastuzumab. The investigators concluded that use of the SC formulation of trastuzumab would provide cost savings for the Greek healthcare system.10 A study in Spain revealed similar findings: the use of the SC formulation of trastuzumab led to a 19.4 to 28.8% cost savings in the hospital.11 Additionally, a time-and-motion study in New Zealand compared medical resource utilisation between the IV and SC formulations of trastuzumab in patients with HER2-positive breast cancer. The potential cost saving was NZ$96.94 per patient per cycle.12 Furthermore, a time-and-motion sub-study13 from the PrefHer trials involving eight countries (Canada, France, Switzerland, Denmark, Italy, Russia, Spain, and Turkey) demonstrated time savings for patient chair, administration by healthcare professionals, and drug preparation.
 
The SC formulation of trastuzumab is expected to provide cost savings in other countries. However, healthcare systems and modes of clinical services differ between Hong Kong and other countries. Therefore, this study was conducted to investigate cost differences between IV and SC trastuzumab regimens in Hong Kong medical settings, using medical resources utilisation data from other countries.
 
Methods
Cost methods and data sources
A CMA model was developed to compare the cost of total care. The CMA approach was used because the clinical efficacy and safety profiles of IV and SC trastuzumab regimens are similar, as demonstrated in the previous studies7 14 15; this fulfils the CMA requirement for two treatments to demonstrate similar efficacy. The following steps were followed in the CMA. We compared direct medical costs related to the IV and SC trastuzumab regimens that produced equivalent health outcomes. The CMA solely focuses on selection of the least costly option. In this study, the CMA was conducted from a hospital perspective. All direct medical costs and full-time equivalent (FTE) hours were included in this study. Drugs, clinic visits for drug administration, specialist out-patient clinic visits, and consumables were regarded as direct medical costs. The time horizon was 18 cycles of treatment, which mimics the duration of treatment for early-stage HER2-positive breast cancer. Drug acquisition cost data were obtained from the manufacturer, whereas costs for hospitalisation and clinic visits were acquired from the 2017 Hong Kong Gazette.16 The drug acquisition cost was based on the dose used in previous clinical trials: IV loading dose of 8 mg/kg and maintenance dose of 6 mg/kg every 3 weeks versus SC fixed dose of 600 mg every 3 weeks. A mean body weight of 57.3 kg was used, based on data from the 2016 Hong Kong Cancer Registry.3
 
Estimated FTE hour values were obtained from previous literature. These values were regarded as the time (in hours) required for drug preparation and administration, divided by 44 hours, the weekly average working hours for such tasks. The FTE hour values were then converted to monetary values, calculated as the median hourly rate received by individuals in each position. In Hong Kong, nurses and pharmacists are mainly involved in drug preparation and administration; thus, the salaries of these positions were used for estimation of FTE hour values.
 
All costs were expressed in US dollars (US$1 = HK$7.8), using 2016 as the fiscal year. Because of the short time horizon in the study, no costs were discounted.
 
Literature review
Medical resources and FTE hour values were determined by literature review in Embase and MEDLINE, using the key words ‘subcutaneous’, ‘trastuzumab’, ‘time’, ‘cost’, and ‘medical resources’.
 
Statistical analyses
The CMA was conducted from the healthcare payer perspective. All continuous variables were described as means ± standard deviations and medians with ranges.
 
A drug budget impact forecast analysis was performed to determine how changes in the total cost of treatment regimens, including direct medical costs and FTE hours, would impact healthcare expenditures in Hong Kong. Each individual parameter, namely drug acquisition cost for each formulation (±20%), patient body weight (±20%), and time and consumables reported in the literature (based on confidence intervals reported) were analysed independently within specified ranges, whereas other factors were fixed at base-case values. The analysis parameters were chosen based on the findings in previous cost-effectiveness studies.17 A simulation model was used to run 10 000 iterations of the forecast model; for each iteration, model parameters were input as shown in Table 1. We assumed that cost changes were consistent with the beta distribution around the mean. One-way deterministic sensitivity analysis was also performed to evaluate the extent to which the total cost would be affected by changes in the drug acquisition cost for each formulation (±20%), changes in times and consumables obtained from literature (based on confidence intervals reported), and changes in body weight (±20%); this approach is consistent with the methodology used in another cost-effectiveness analysis focused on trastuzumab.17 Figure 1 summarises the analysis process of this study.
 

Table 1. Parameters and costs of the drug budget impact forecast model
 

Figure 1. Flowchart of the analysis process
 
Results
In total, 11 studies were identified, eight of which were eligible for analysis.12 18 19 20 21 22 23 24 Three studies were excluded because they did not report the time required for administration or preparation. There are a total of six studies with information on pharmacist time on preparation and nursing time on administration for IV and SC trastuzumab; the remaining two only reported time differences between the two formulations. Among the six studies that reported the time for preparation, four reported the total drug preparation time required for IV and SC trastuzumab, whereas the remaining two only reported time differences. If the SC formulation was used, 0.18 FTE hour of nursing time (7.9 hours) and 0.14 FTE hour of pharmacist time (6.2 hours) could be saved each week. Table 2 summarises the findings from these studies.
 

Table 2. Summary of findings on healthcare professional time
 
After 18 cycles of treatment with SC trastuzumab, the drug acquisition and healthcare professional time costs were reduced by US$9451.28 and US$566.16, respectively, compared with IV trastuzumab. Therefore, US$10 017.44 could be saved for each patient who completed 18 cycles of treatment. The cost of consumables was excluded because only two studies reported this information, and the contributions to overall costs were minimal (NZ$15.2712 and GBP0.6421, respectively). Table 3 summarises the direct medical costs of IV and SC formulations.
 

Table 3. Total cost of care for 18 cycles of treatment with intravenous trastuzumab versus subcutaneous trastuzumab
 
Sensitivity analysis
The drug budget impact forecast model was most affected by body weight and drug acquisition cost. Cost differences between the IV and SC formulations were reduced by decreases in body weight and IV trastuzumab cost, as well as an increase in SC trastuzumab cost. The effects of changes in nursing time and pharmacist time were smaller. Table 1 summarises the model parameters, and Figure 2 illustrates the effects of each variable on cost differences.
 

Figure 2. Tornado diagram of factors affecting total cost of treatment
 
Drug budget impact forecast
In 2017, 4373 women were diagnosed with invasive breast cancer,1 and approximately 20% of them were HER2-positive.2 3 Furthermore, trastuzumab was the most commonly used targeted therapy (95.3%).3 Assuming that the SC formulation was used (instead of the IV formulation) for all HER2-positive patients receiving trastuzumab and using the 2017 data stated here, an annual saving of over US$8.3 million could be achieved in Hong Kong.
 
Discussion
The results of this study suggest that SC trastuzumab would be more cost-effective than its IV counterpart in Hong Kong. Even if lower-cost biosimilar trastuzumab becomes available, the SC formulation will remain less expensive unless there is a substantial reduction in the acquisition cost of IV trastuzumab.
 
As body weight decreases, the necessary dosage and corresponding expenditures are expected to decrease. Paradoxically, ≤20% increases in body weight had a neutral effect in the analysis. This result could be related to a substantial amount of drug wastage when using weight-based IV trastuzumab, which is consistent with previous findings.19 25 Therefore, further studies are needed to determine the optimal route of administration for patients who are underweight or do not require full doses of trastuzumab because of their clinical conditions.
 
Although the SC formulation is expected to save time for healthcare professionals,26 27 28 the present analysis suggests that its contribution to the total cost of care is minimal. The cost of drug acquisition has the greatest effect on financial burden.
 
The use of data from previous time-and-motion studies in other countries may not be appropriate for medical settings in Hong Kong. Further studies should be conducted in Hong Kong to estimate the actual cost savings with respect to healthcare professional time, although theoretical time savings may not accurately represent actual time savings because of clinical activities conducted during administration of trastuzumab.29 Furthermore, data from other countries exhibited wide distributions in terms of standard deviation and range. Nevertheless, the influence of the SC formulation on the total cost-saving effect may be limited, as demonstrated in the sensitivity analysis.
 
Although the costs of clinic visits and chemotherapy were assumed to be identical throughout 18 cycles of treatment between the two formulations, some patients can receive SC trastuzumab in ambulatory care settings. Thus, the mean savings may have been underestimated in our model.
 
There were several limitations in this study. First, because of the small number of studies identified in the literature review, consumables could not be included in the CMA. Second, societal cost and patient preferences were not considered because such information is unavailable in Hong Kong. A more patient-centred approach would provide greater insights. Third, time-and-motion analysis and waste handling in Hong Kong were not considered; these factors may have specific impact on drug preparation time and administration time and costs. Fourth, costs for adverse drug reactions were not included because these costs were assumed to be equal for IV and SC trastuzumab regimens. However, this assumption may be incorrect, particularly with regard to infusion-related reactions.
 
Conclusion
The results of this study suggest that the SC formulation of trastuzumab would be a cost-saving therapy for HER2-positive breast cancer patients in Hong Kong. The drug acquisition cost was the parameter with the greatest effect on the total cost of treatment.
 
Author contributions
Concept or design: VWY Lee.
Acquisition of data: FWT Cheng.
Analysis or interpretation of data: Both authors.
Drafting of the manuscript: FWT Cheng.
Critical revision of the manuscript for important intellectual content: VWY Lee.
 
Both authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
Both authors disclosed no conflicts of interest.
 
Declaration
The datasets generated and/or analysed in this study are available from the corresponding author on reasonable request.
 
Funding/support
This research received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
 
Ethics approval
Not applicable because this study did not involve human participants.
 
References
1. Hong Kong Cancer Registry, Hospital Authority, Hong Kong SAR Government. Female breast cancer in 2017. Available from: https://www3.ha.org.hk/cancereg/pdf/factsheet/2017/breast_2017.pdf. Accessed 11 May 2020.
2. Yau TK, Sze H, Soong IS, Hioe F, Khoo US, Lee AW. HER2 overexpression of breast cancers in Hong Kong: prevalence and concordance between immunohistochemistry and in-situ hybridisation assays. Hong Kong Med J 2008;14:130-5.
3. Hong Kong Breast Cancer Foundation. Hong Kong Breast Cancer Registry Report No. 8. 2016. Available from: http://www.hkbcf.org/download/bcr_report8/hkbcf_report_2016_full_report.pdf. Accessed 21 May 2017.
4. Genentech. Highlights of prescribing information. 2016. Available from: http://www.gene.com/download/pdf/herceptin_prescribing.pdf. Accessed 8 Aug 2016.
5. Kruse GB, Amonkar MM, Smith G, Skonieczny DC, Stavrakas S. Analysis of costs associated with administration of intravenous single-drug therapies in metastatic breast cancer in a U.S. population. J Manag Care Pharm 2008;14:844-57. Crossref
6. Ismael G, Hegg R, Muehlbauer S, et al. Subcutaneous versus intravenous administration of (neo)adjuvant trastuzumab in patients with HER2-positive, clinical stage I-III breast cancer (Hannah study): a phase 3, open-label, multicentre, randomised trial. Lancet Oncol 2012;13:869-78. Crossref
7. Jackisch C, Kim SB, Semiglazov V, et al. Subcutaneous versus intravenous formulation of trastuzumab for HER2-positive early breast cancer: updated results from the phase III HannaH study. Ann Oncol 2015;26:320-5. Crossref
8. Pivot X, Gligorov J, Müller V, et al. Preference for subcutaneous or intravenous administration of trastuzumab in patients with HER2-positive early breast cancer (PrefHer): an open-label randomised study. Lancet Oncol 2013;14:962-70. Crossref
9. Pivot X, Gligorov J, Müller V, et al. Patients’ preferences for subcutaneous trastuzumab versus conventional intravenous infusion for the adjuvant treatment of HER2- positive early breast cancer: final analysis of 488 patients in the international, randomized, two-cohort PrefHer study. Ann Oncol 2014;25:1979-87. Crossref
10. Mylonas C, Kourlaba G, Fountzilas G, Skroumpelos A, Maniadakis N. Cost-minimization analysis of trastuzumab intravenous versus trastuzumab subcutaneous for the treatment of patients with HER2+ early breast cancer and metastatic breast cancer in Greece. Value Health 2014;17:A640-1. Crossref
11. Gutierrez F, Nazco G, Viña M, Bullejos M, Gonzalez I, Valcarcel C. Economic impact of using subcutaneous trastuzumab. Value Health 2014;17:A641. Crossref
12. North RT, Harvey VJ, Cox LC, Ryan SN. Medical resource utilization for administration of trastuzumab in a New Zealand oncology outpatient setting: a time and motion study. Clinicoecon Outcomes Res 2015;7:423-30. Crossref
13. De Cock E, Pivot X, Hauser N, et al. A time and motion study of subcutaneous versus intravenous trastuzumab in patients with HER2-positive early breast cancer. Cancer Med 2016;5:389-97. Crossref
14. Van den Nest M, Glechner A, Gold M, Gartlehner G. The comparative efficacy and risk of harms of the intravenous and subcutaneous formulations of trastuzumab in patients with HER2-positive breast cancer: a rapid review. Syst Rev 2019;8:321. Crossref
15. Jackisch C, Stroyakovskiy D, Pivot X, et al. Subcutaneous vs intravenous trastuzumab for patients with ERBB2-positive early breast cancer: final analysis of the HannaH phase 3 randomized clinical trial. JAMA Oncol 2019;5:e190339. Crossref
16. Government Logistics Department, Hong Kong SAR Government. Hospital Authority Ordinance (Chapter 113). Revisions to list of charges. Available from: https://www.gld.gov.hk/egazette/pdf/20172124/egn201721243884.pdf. Accessed 30 May 2017.
17. Kurian AW, Thompson RN, Gaw AF, Arai S, Ortiz R, Garber AM. A cost-effectiveness analysis of adjuvant trastuzumab regimens in early HER2/neu-positive breast cancer. J Clin Oncol 2007;25:634-41. Crossref
18. Olofsson S, Norrlid H, Karlsson E, Wilking U, Ragnarson Tennvall G. Societal cost of subcutaneous and intravenous trastuzumab for HER2-positive breast cancer—an observational study prospectively recording resource utilization in a Swedish healthcare setting. Breast 2016;29:140-6. Crossref
19. Ponzetti C, Canciani M, Farina M, Era S, Walzer S. Potential resource and cost saving analysis of subcutaneous versus intravenous administration for rituximab in non-Hodgkin’s lymphoma and for trastuzumab in breast cancer in 17 Italian hospitals based on a systematic survey. Clinicoecon Outcomes Res 2016;8:227-33. Crossref
20. De Cock E, Pan YI, Tao S, Baidin P. Time savings with trastuzumab subcutaneous (SC) injection verse trastuzumab intravenous (IV) infusion: a time and motion study in 3 Russian centers. Value Health 2014;17:A653. Crossref
21. Nawaz S, Samanta K, Lord S, Diment V, Mcnamara S. Cost savings with Herceptin® (trastuzumab) SC vs IV administration: a time & motion study. Breast 2013;22(S1):S112.
22. De Cock E, Tao S, Alexa U, Pivot X, Knoop A. Abstract P5- 15-07: Time savings with trastuzumab subcutaneous (SC) injection vs. trastuzumab intravenous (IV) infusion: first results from a Time-and-Motion study (T&M). Cancer Res 2014;72(24 Suppl):P5-15-07. Crossref
23. De Cock E, Tao S DM-P, Millar D CN. Time savings with trastuzumab subcutaneous (SC) injection vs. trastuzumab intravenous (IV) infusion: a time and motion study in 5 Canadian centres. Proceedings of the Canadian Association for Population Therapeutics (CAPT) Annual Conference; 2013 Nov 17-19; Toronto, Canada.
24. Samanta K, Moore L, Jones G, Evason J, Owen G. PCN39 potential time and cost savings with herceptin (trastuzumab) subcutaneous (SC) injection versus herceptin intravenous (IV) infusion: results from three different English patient settings. Value Health 2012;15:A415. Crossref
25. Nestorovska A, Naumoska Z, Grozdanova A, et al. Subcutaneous vs intravenous administration of trastuzumab in HER2+ breast cancer patients: a Macedonian cost-minimization analysis. Value Health 2015;18:A463. Crossref
26. Rojas L, Muñiz S, Medina L, et al. Cost-minimization analysis of subcutaneous versus intravenous trastuzumab administration in Chilean patients with HER2-positive early breast cancer. PLoS One 2020;15:e0227961. Crossref
27. O’Brien GL, O’Mahony C, Cooke K, et al. Cost minimization analysis of intravenous or subcutaneous trastuzumab treatment in patients with HER2-positive breast cancer in Ireland. Clin Breast Cancer 2019;19:e440-51. Crossref
28. Lopez-Vivanco G, Salvador J, Diez R, et al. Cost minimization analysis of treatment with intravenous or subcutaneous trastuzumab in patients with HER2-positive breast cancer in Spain. Clin Transl Oncol 2017;19:1454-61. Crossref
29. Papadmitriou K, Trinh XB, Altintas S, Van Dam PA, Huizing MT, Tjalma WA. The socio-economical impact of intravenous (IV) versus subcutaneous (SC) administration of trastuzumab: future prospectives. Facts Views Vis Obgyn 2015;7:176-80.

Correlation between primary family caregiver identity and maternal depression risk in poor rural China

Hong Kong Med J 2022;28(6):457–65 | Epub 7 Dec 2022
© Hong Kong Academy of Medicine. CC BY-NC-ND 4.0
 
ORIGINAL ARTICLE (HEALTHCARE IN MAINLAND CHINA)
Correlation between primary family caregiver identity and maternal depression risk in poor rural China
N Wang, PhD; M Mu, MSc; Z Liu, MSc; Z Reheman, MSc; J Yang, PhD; W Nie, MSc; Y Shi, PhD; J Nie, PhD
Center for Experimental Economics in Education, Shaanxi Normal University, Xi’an, China
 
Corresponding author: Dr J Yang (jyang0716@163.com)
 
 Full paper in PDF
 
Abstract
Introduction: Prenatal and postpartum depression are important public health challenges because of their long-term adverse impacts on maternal and neonatal health. This study investigated the risk of maternal depression among pregnant and postpartum women in poor rural China, along with the correlation between primary family caregiver identity and maternal depression risk.
 
Methods: Pregnant women and new mothers were randomly selected from poor rural villages in the Qinba Mountains area in Shaanxi. Basic demographic information was collected regarding the women and their primary family caregivers. The Edinburgh Postnatal Depression Scale was used to identify women at risk of depression, and the Perceived Social Support Scale was used to evaluate perceived family support.
 
Results: This study included 220 pregnant women and 473 new mothers. The mean proportions of women at risk of prenatal and postpartum depression were 19.5% and 18.6%, respectively. Regression analysis showed that identification of the baby’s grandmother as the primary family caregiver was negatively correlated with maternal depression risk (β=-0.979, 95% confidence interval [CI]=-1.946 to -0.012, P=0.047). However, the husband’s involvement in that role was not significantly correlated with maternal depression risk (β=-0.499, 95% CI=-1.579 to 0.581, P=0.363). Identification of the baby’s grandmother as the primary family caregiver was positively correlated with family support score (β=0.967, 95% CI=-0.062 to 1.996, P=0.065).
 
Conclusion: Prenatal and postpartum depression are prevalent in poor rural China. The involvement of the baby’s grandmother as the primary family caregiver may reduce maternal depression risk, but the husband’s involvement in that role has no effect.
 
 
New knowledge added by this study
  • Prenatal and postpartum depression are prevalent in poor rural areas of China. Despite evidence regarding the importance of family support during prenatal and postpartum periods, husbands in poor rural China did not provide effective support.
  • There was a persistent risk of maternal depression during both prenatal and postpartum periods.
  • Maternal depression persists in the absence of external interventions.
Implications for clinical practice or policy
  • High-quality family support is necessary to ensure that pregnant women maintain good mental health. Compared with husbands, grandmothers may be better primary caregivers because they are experienced in terms of parenting and housework.
  • Husbands in poor rural China should receive training that enables them to provide effective maternal care.
 
 
Introduction
Maternal depression is a common mental health problem during the prenatal and postpartum periods. The World Health Organization estimates that approximately 10% of pregnant women and 13% of postpartum women worldwide have mental health problems, mainly depression.1 In China, the prevalence of maternal depression ranges from 8.2% to 28.5%.2 3 4 5 6 7 Women in urban areas have access to specialised maternity care services and mental health services that can help manage these mental health problems and difficulties.8 However, these commercialised services are usually expensive and distant from poor rural areas of China. Therefore, it is particularly important for pregnant and postpartum women in poor rural areas to rely on family and social relationships for reasonable care and support.
 
There is evidence that the level of perceived social support, particularly family support, is associated with a woman’s mental health status during pregnancy.9 10 China’s rapid societal and economic development have resulted in substantial changes to family structure in both urban and rural areas. For example, modern couples are more likely to live with only their children, rather than with family members from multiple generations.11 When grandparents are absent from a family’s daily life, the role of the husband becomes more important because he must be more engaged in housework12 and provide greater support.
 
The changes in primary family caregiver identity during prenatal and postpartum periods reflect this transformation of family structure.13 The results of multiple studies in developed countries and the urban areas of China have suggested that husbands are able to care for their wives and children during pregnancy and after delivery; moreover, a husband’s companionship has a positive impact on the mental health status of his pregnant wife.2 10 14 However, in poor rural areas, no consensus has been reached concerning whether a husband can provide effective family support for his pregnant wife.15 For example, husbands usually have lower awareness of maternity care because of limited education and limited housework experience. However, in a traditional Chinese family with patrilocal features, the husband is the main worker and is responsible for the economic well-being of the family,16 whereas the wife stays at home and cares for the family. This stereotype of traditional household arrangement prevents some men from providing maternal care, regardless of their presence at home. Accordingly, grandmother, the mother of the baby’s mother, becomes a possible caregiver for the mother and baby,17 although this may lead to mother-in-law conflict.18
 
Here, using data from a large-scale survey of pregnant and postpartum women in poor rural areas, we analysed the status of maternal mental health in poor rural areas, with family support as an intermediate variable, to understand the correlation between primary family caregiver identity and maternal depression risk.
 
Methods
Sampling
The data analysed in this study were collected during a survey of maternal and neonatal health and nutrition statuses among residents of poor rural villages in the Qinba Mountains area; the survey was conducted by Shaanxi Normal University from March 2019 to April 2019. The Qinba Mountains area spans six provinces including Gansu, Sichuan, Shaanxi, Chongqing, Henan, and Hubei. Its primary portion is situated in Shaanxi’s southern region. In 2019, the per capita annual disposable income in the Qinba Mountains area was RMB 11 443, similar to that of rural residents in poverty-stricken counties (RMB 11 567).19 In 2018, the mean poverty rate in this area was 3.6%; for comparison, the national mean was 1.7% and the rate in poverty-stricken counties was 4.5%.20 This study included women aged ≥18 years who were either pregnant (≥4 weeks of gestation) or in the postpartum period (0-6 months after delivery).
 
The following multilevel cluster-based random sampling method was used in this study. First, 13 national-level poor counties in two prefectures in the Qinba Mountains area were selected. Then, a list of villages was obtained for each county, and the total numbers of pregnant women and households with babies aged ≤6 months in each village were counted with assistance from local government officials. Considering the financial limitations and overall feasibility of the study, villages with a small sample size (<3) or large sample size (>15) were excluded. Finally, we used Stata 15.0 (Stata Corp, College Station [TX], United States) to analyse the data. The sample size was estimated to achieve, for an average incidence of independent variables of 0.15 in consideration of our pilot study, a sampling standard error (SE) of 0.03 with a 95% confidence interval (CI). The final 131 villages were randomly selected as sample villages, and all households in the sample villages that met the above criteria were considered eligible for the study.
 
Data collection
The data used in this study were collected through face-to-face interviews. To ensure accuracy and consistency during data collection, enumerators were selected from a group of interested university students in Xi’an. The enumerators underwent extensive training, then completed a pilot study with 20 participants prior to formal data collection. Each eligible participant received a consent form with information regarding programme objectives, procedures, potential risks, and benefits, as well as an explanation of privacy protection. Participants provided oral consent for inclusion in the study before engaging in a face-to-face interview with a single enumerator. Each interview only involved the participant, and interruptions from other family members were avoided.
 
Assessments
Basic participant information
A questionnaire was used to collect basic participant information, including their age, education level, and self-rated health status, along with whether the baby had been born and whether it was the firstborn child. The women were also asked whether they had access to any support groups where mothers could seek help and exchange information concerning parenting experiences. Furthermore, they were asked nine yes/no questions regarding family assets (eg, possession of a computer, an air conditioner, and a car). The above questions were also included in our questionnaire to better understand maternal social interactions and household assets in order to control for them in the regression analysis and thus produce more accurate regression results. Each participant’s decision-making power was measured using a scale of eight items compiled by Peterman et al.21 A higher score on the decision-making power scale was presumed to indicate greater autonomy concerning childcare and the management of other family issues.
 
Primary family caregivers
A questionnaire was used to collect information about all family members living in the participant’s home for >3 months, who were more likely to be the primary caregivers and to have an impact on maternity. Each participant was asked to identify the family member who served as the primary family caregiver, providing the most care for the participant and her baby during the prenatal and postpartum periods. Considering the sample size and sample distribution, three primary family caregiver categories were used in this study: the husband, the baby’s grandmother (the mother of the baby's mother or the baby's father), and other family members or no caregivers.
 
Edinburgh Postnatal Depression Scale
The Edinburgh Postnatal Depression Scale (EPDS) is a 10-item scale used to identify women at risk of maternal depression.22 23 The total EPDS score ranges from 0 to 30, where a higher score indicates a greater risk of depression. Although the original cut-off value was an EPDS score of ≥13 points, we used the standard cut-off value in China (≥9.5 points24 25) as an indicator of sufficient depression risk to merit psychiatric examination and possible treatment. Previous research has demonstrated that the EPDS has satisfactory reliability and validity. Specifically, Wang et al26 reported that the EPDS had a content validity ratio of 0.93 and good internal consistency (Cronbach’s α coefficient of 0.76). The correlation coefficients between the 10 individual item scores and the total score ranged from 0.37 to 0.67, with P values <0.01.
 
Perceived Social Support Scale
The Perceived Social Support Scale, developed by Zimet et al27 and translated into Chinese by Jiang,28 is a 12-item self-assessment questionnaire that measures three sources of social support (ie, three subscales): family support, friends’ support, and other people’s support. Responses to questionnaire items are recorded using a seven-point Likert scale that ranges from ‘completely negative’ to ‘completely positive’ (1-7 points), indicating the respondent’s level of agreement with each item. The total score is 84 points (28 points per subscale), and a higher score indicates the receipt of greater social support. The Cronbach’s α coefficient of the scale is 0.88; the Cronbach’s α coefficients for family support, friends’ support, and other people’s support subscales are 0.81, 0.85, and 0.91, respectively.27 Because this study focused on family support, only the family support subscale was used as an intermediate variable to analyse the correlation between primary family caregiver identity and maternal depression risk.
 
Statistical methods
STATA 15.1 software was used to clean the data and perform statistical analysis. Descriptive statistical analysis was performed and presented as mean ± standard deviation. F-test and t test were used to detect differences in depression scores among subgroups of women with different characteristics. Multiple linear regression was used to explore correlations between primary family caregiver identity and maternal depression risk or family support score. P values <0.05 were considered statistically significant. Additionally, we adjusted the SE at the village level and calculated coefficients with greater precision because individual values within the same village are correlated, which might result in biased SE in multiple linear regression.
 
Results
In total, 715 women were interviewed, including 220 pregnant women and 495 new mothers. Twenty-two samples with missing values were excluded to ensure sample uniformity throughout the analysis procedure. Finally, analyses in this study were based on the data of 693 participants (220 pregnant women and 473 new mothers) and the questionnaire return efficiency was 96.9%, which is the percentage of survey responses that were valid.
 
Maternal depression risk in poor rural areas
Among the 220 pregnant women, 37 (16.8%), 66 (30.0%), and 117 (53.2%) were in the early, middle, and late stages of pregnancy, respectively (Table 1). In total, 226 of the 473 new mothers (47.8%) had babies aged 1 to 3 months, whereas 247 new mothers (52.2%) had babies aged 4 to 6 months.
 

Table 1. Maternal depression risk in poor rural areas
 
The mean maternal EPDS score was 5.85 and the proportion of women at risk of depression was 18.9% (131/693). The proportion of women at risk of depression was generally stable regardless of pregnancy stage. Specifically, the proportion of women at risk of depression during early pregnancy was 16.2% (6/37); during middle and late pregnancy, the proportions of women at risk were slightly increased. The proportions of women at risk of depression were 16.8% (38/226) and 20.2% (50/247) at 1-3 months and 4-6 months after delivery, respectively. However, the maternal EPDS scores and proportions of women at risk of depression did not significantly differ according to pregnancy stage or time since delivery.
 
Univariate analysis of maternal depression risk
Overall, the mean participant age was 28.13 ± 4.70 years. In total, 239 women (34.5%; mean age, 25.52 ± 3.95 years) reported that the current pregnancy or ≤6-month-old baby was their firstborn child. The remaining 454 women (65.5%; mean age, 29.50 ± 4.49 years) were experienced mothers who have already had children and are familiar with caring for them. Overall, 116 women (16.7%) had an education level above junior high school. The self-rated health status was good in 89 women (12.8%), and 102 women (14.7%) were involved in a parenting support group. Table 2 summarises the participant characteristics.
 

Table 2. Univariate analysis of maternal depression risk (n=693)
 
As shown in Table 2, the participants were clustered into three groups according to primary caregiver identity: the husband for 151 women (21.8%), the baby’s grandmother for 452 women (65.2%), and other family members or no caregiver for 90 women (13.0%). The mean EPDS scores of women in the three groups were 6.23 ± 4.34, 5.56 ± 4.01, and 6.63 ± 4.84, respectively (P=0.039). Additionally, univariate analysis revealed statistically significant differences in depression scores according to education level, self-rated health status, and parenting support group involvement. There were no statistically significant differences in other variables.
 
Correlation between primary family caregiver identity and maternal depression risk
As shown in Table 3, identification of the baby’s grandmother as the primary family caregiver was significantly negatively correlated with EPDS score (β=-0.979, 95% CI=-1.946 to -0.012, P=0.047). However, identification of the husband as the family caregiver was not significantly correlated with EPDS score (β=-0.499, 95% CI=-1.579 to 0.581, P=0.363).
 

Table 3. Multiple linear regression analysis of correlation between primary family caregiver identity and maternal depression risk
 
Correlation between primary family caregiver identity and family support score
As shown in Table 4, after adjustment for other variables, there was no significant correlation between identification of the husband as the primary family caregiver and the family support score (β=0.375, 95% CI=-0.704 to 1.455, P=0.493). However, identification of the baby’s grandmother as the primary family caregiver was significantly positively correlated with family support score (β=0.967, 95% CI=-0.062 to 1.996, P=0.065). Furthermore, identification of the baby’s grandmother as the primary family caregiver had the largest standardised regression coefficient among the three caregiver categories, indicating that pregnant and postpartum women felt the greatest family support when the baby’s grandmother was the primary family caregiver.
 

Table 4. Multiple linear regression analysis of correlation between primary family caregiver identity and family support score
 
Discussion
Maternal depression risk in poor rural areas
In this study, the overall proportion of women at risk of maternal depression was 18.9%, including a mean proportion of 19.5% among pregnant women and a mean proportion of 18.6% among women ≤6 months postpartum. This overall proportion of women at risk of maternal depression is much higher than the proportion in a western urban area of China (12.4%)29 and comparable with the proportions in low- and middle-income countries such as Ethiopia (19.9%)30—both previous studies also used the EPDS to identify women at risk of maternal depression. The high proportion in the present study may be related to the location (poor rural areas): compared with women in urban areas, women in poor rural areas are more likely to have a lower socio-economic status.31 The lack of knowledge regarding mental health and its services in rural areas also makes women in such areas more likely to become depressed if they do not receive timely treatment for mental health problems.32 Therefore, the mental health of rural mothers should receive greater attention from their family members and the relevant health departments.
 
This study also revealed a persistent risk of depression during the prenatal and postpartum periods (Table 1). Notably, the proportion did not substantially decrease by 6 months after delivery. Yue et al33 investigated the mental health of caregivers for babies aged 6 to 36 months in a rural area in western China. Their results showed that the proportion of caregivers at risk of depression was similar to the proportion in the present study. These findings suggest that maternal depression persists in the absence of external intervention. Thus, there is an urgent need for timely external mental health interventions among pregnant women and mothers of young children. The present study also showed that the maternal depression risk in poor rural areas is influenced by factors such as a woman’s education level, self-rated health status, and parenting support group involvement. These results are consistent with the findings by Zhou et al,7 Lancaster et al,10 and Lee et al.18
 
Correlation between primary family caregiver identity and maternal depression risk
Our results showed that identification of the husband as the primary family caregiver was not significantly correlated with maternal depression risk in poor rural areas (Table 3). This finding was considerably different from the results of previous studies in urban areas. Xie et al34 found that insufficient or poor-quality emotional support from the husband was significantly associated with an increased risk of postpartum depression among mothers in Changsha, Hunan Province, China. In contrast, Wan et al2 found that the proportions of women at risk of maternal depression were 1.9- to 2.6-fold higher among women without support from the husband before and after delivery than among women with support from the husband, based on a study of mothers in Beijing, China. The results of these studies suggest that the husband’s involvement as the primary family caregiver can reduce the risk of maternal depression in urban areas, but this effect was not apparent in poor rural areas.
 
We also found that maternal depression risk was significantly lower when the baby’s grandmother was identified as the primary family caregiver (Table 3). Our results are consistent with the findings by Wan et al2 in a study of 342 pregnant women in Beijing, China: during the ‘confinement’ period, care and support from the baby’s grandmother(s) were important for relieving depression. However, Lee et al18 showed that mother-in-law conflict remains prominent in China, which may have negative emotional outcomes for grandmothers and new mothers. Although pregnant and postpartum women in poor rural areas may experience similar conflict, our findings suggest that support from the baby’s grandmother(s) remains predominantly positive.
 
Correlation between family support and maternal depression risk
We attempted to determine why support from the husband did not reduce maternal depression risk in poor rural areas through the analysis of an intermediary variable. Initially, we hypothesised that the positive effect of the husband acting as the primary family caregiver would be offset by the loss of income caused by the husband’s inability to seek work opportunities in other locations. However, data analysis revealed that the husband’s role as the primary caregiver had no impact on the family income and family asset index (online supplementary Table 1). Thus, we explored the effect of family support. Multiple previous studies demonstrated that family support influenced maternal depression risk14; consistent with those findings, our analysis showed that family support was significantly negatively correlated with maternal depression risk (online supplementary Table 2).
 
There may be two main reasons for this negative correlation. First, husbands in poor rural areas have insufficient knowledge and skills related to maternal care.16 Husbands do not have first-hand experience in childbirth and can only acquire it through education. However, compared with men in urban areas, men in poor rural areas have lower levels of education and may be less inclined to learn on their own, making it more difficult to acquire such knowledge and skills.35 In contrast, grandmothers are more experienced overall, which may enable them to provide more effective family support. For example, based on their own experience, grandmothers can help new mothers to prepare for and manage pain that sometimes occurs during breastfeeding, which can alleviate anxiety and provide a feeling of greater support.17 Second, in poor rural areas, husbands may lack sufficient time and energy to provide effective family care. Compared with families in urban areas, families in poor rural areas are more economically disadvantaged33; therefore, husbands in such families may prioritise financial stability and be unable to expend time or energy in support of maternal care, despite their physical presence in the home. In contrast, the baby’s grandmother(s) may have sufficient time and energy to provide effective maternal care (eg, by feeding the baby and changing its diapers), thus relieving the mother’s psychological stress.
 
The findings in this analysis of women in poor rural areas differ from the results of studies in urban areas, indicating important differences in family structure between urban and rural areas. There is evidence that a gradual transformation of the family is underway in urban areas, whereby husbands have begun to actively engage in caregiving. However, the transformation of family structure is much slower in poor rural areas,13 and husbands in those areas are not yet prepared for this new role. Because of constraints regarding their education level and skills, as well as family finances, husbands in poor rural areas continue to prioritise financial stability36; their support does not have a positive impact on the risk of maternal depression. Thus, women in poor rural areas must continue to rely on family members outside of the nuclear family, such as the baby’s grandmother(s), to assume some caregiving responsibilities.
 
Commercialised and specialised mental health counselling services in urban areas play important roles in improving maternal mental health.8 Xiao37 found that postnatal care through a menstrual club provided continuous physical, psychological, and emotional support that was sufficient to reduce the incidence of postpartum depression. However, such clubs are not available in poor rural areas. Therefore, it is important to promote better caregiving from family members, including husbands. For example, husbands could receive training that enables them to provide practical support, as well as guidance concerning the early identification of depressive tendencies and the development of communication skills for psychological adjustment.
 
Limitations
This study had some limitations. First, its cross-sectional design prevented the assessment of maternal depression trends during pregnancy and after delivery, although such an assessment could have been conducted in a cohort study. Second, this study focused on primary family caregiver identity and did not explore the type or form of caregiving provided. Third, all participants were residents of rural northwest China, and thus the results may not be generalisable to other populations. These limitations should be addressed in future studies.
 
Conclusions and policy implications
The prevalence of maternal depression is high in poor rural areas of Shaanxi Province. Identification of the husband as the family caregiver was not significantly correlated with maternal depression risk, whereas the involvement of the baby’s grandmother in that role was significantly negatively correlated with maternal depression risk. Based on our findings, we make the following suggestions. In rural areas, high-quality family support is necessary to ensure that pregnant women maintain good mental health. Compared with husbands, grandmothers may be better primary caregivers because they are more experienced in terms of parenting and housework. Husbands in poor rural China should receive training that enables them to provide effective maternal care.
 
Author contributions
Concept or design: N Wang, M Mu, J Yang, Y Shi, J Nie.
Acquisition of data: N Wang, M Mu, Z Liu, R Zulihumaer, W Nie.
Analysis or interpretation of data: N Wang, M Mu, J Yang, J Nie.
Drafting of the manuscript: N Wang, M Mu, J Yang, Y Shi, J Nie.
Critical revision of the manuscript for important intellectual content: All authors.
 
All authors had full access to the data, contributed to the study, approved the final version for publication, and take responsibility for its accuracy and integrity.
 
Conflicts of interest
As an International Editorial Advisory Board member of the journal, Y Shi was not involved in the peer review process. Other authors have disclosed no conflicts of interest.
 
Acknowledgement
The authors thank the study participants and the enumerators who conducted data collection.
 
Funding/support
The authors are supported by the 111 Project (Grant No. B16031), Soft Science Research Project of Xi’an Science and Technology Plan (Grant No. 2021-0059), the Fundamental Research Funds for the Central Universities (Grant No. 2021CSWY024) and the Fundamental Research Funds for the Central Universities (Grant No. 2021CSWY025) of China.
 
Ethics approval
The study was approved by the Medical Ethics Committee of Shaanxi Normal University and Xi’an Jiaotong University of China (No: 2020-1240). Each eligible participant received a consent form with information regarding programme objectives, procedures, potential risks, and benefits, as well as an explanation of privacy protection. Participants provided oral consent for inclusion in the study before engaging in a face-to-face interview with a single enumerator.
 
References
1. Ceulemans M, Foulon V, Ngo E, et al. Mental health status of pregnant and breastfeeding women during the COVID- 19 pandemic—a multinational cross-sectional study. Acta Obstet Gynecol Scand 2021;100:1219-29. Crossref
2. Wan EY, Moyer CA, Harlow SD, Fan Z, Jie Y, Yang H. Postpartum depression and traditional postpartum care in China: role of Zuoyuezi. Int J Gynecol Obstet 2009;104:209-13. Crossref
3. Zeng Y, Cui Y, Li J. Prevalence and predictors of antenatal depressive symptoms among Chinese women in their third trimester: a cross-sectional survey. BMC Psychiatry 2015;15:66. Crossref
4. Zhang Y, Zou S, Cao Y, Zhang Y. Relationship between domestic violence and postnatal depression among pregnant Chinese women. Int J Gynecol Obstet 2012;116:26-30. Crossref
5. Song Y, Li W. The impact of social support and antepartum emotion on postpartum depression [in Chinese]. China J Health Psychol 2014;22:909-11.
6. Xu M, Li C, Zhang K, Sun S, Zhang H, Jin J. Comparison of prenatal depression and social support differences between Korean and Han pregnant women in Yanbian region [in Chinese]. Matern Child Health Care China 2015;30:524-7.
7. Zhou X, Liu H, Li X, Zhang S, Li F, Zhao Z. Depression and risk factors of women in their second-third trimester of pregnancy in Shaanxi province [in Chinese]. Chin Nurs Manage 2019;19:1005-11.
8. Liu H, Zhang C. Comparative study on service efficiency of China’s urban and rural health systems [in Chinese]. China Soft Sci 2011;10:102-13.
9. Leahy-Warren P, McCarthy G, Corcoran P. First-time mothers: social support, maternal parental self-efficacy and postnatal depression. J Clin Nurs 2012;21:388-97. Crossref
10. Lancaster CA, Gold KJ, Flynn HA, Yoo H, Marcus SM, Davis MM. Risk factors for depressive symptoms during pregnancy: a systematic review. Am J Obstet Gynecol 2010;202:5-14. Crossref
11. Tang C. A review of modernisation theory and its development on family [in Chinese]. Sociol Stud 2010;25:199-222,246.
12. Högberg U. The World Health Report 2005: “make every mother and child count”—including Africans. Scand J Public Health 2005;33:409-11. Crossref
13. Zhao F, Ji Y, Chen F. Conjugal or parent-child relationship? Factors influencing the main axis of Chinese family relations in transition period [in Chinese]. J Chin Women Stud 2021;4:97-112.
14. Li D, Xu X, Liu J, Wu P. The relationship between life event and pregnancy stress: the mediating effect of mental health and the moderating effect of husband support [in Chinese]. J Psychol Sci 2013;36:876-83.
15. Dennis CL, Ross L. Women’s perceptions of partner support and conflict in the development of postpartum depressive symptoms. J Adv Nurs 2006;56:588-99. Crossref
16. Yang H, Lu Y, Ren L. A study on the impact of bearing two children on urban youth’s work-family balance: the empirical analysis based on the third survey of Chinese women’s social status [in Chinese]. Popul Econ 2016;9:1-9.
17. Hu J, Wang Y. Study on related risk factors of patients with pre-and postnatal depression [in Chinese]. Chin Nurs Res 2010;24:765-7.
18. Lee DT, Yip AS, Chan SS, Lee FF, Leung TY, Chung TK. Determinants of postpartum depressive symptomatology—a prospective multivariate study among Hong Kong Chinese Women [in Chinese]. Chin Ment Health J 2005;9:54-9.
19. National Bureau of Statistics of China. Income of villagers in poor rural areas in 2019 [in Chinese]. Available from: http://www.stats.gov.cn/xxgk/sjfb/zxfb2020/202001/t20200123_1767753.html. Accessed 30 Nov 2022.
20. Sun J, Zhang J, Li C, Lu Y. Strategic judgements and development suggestions on the development of poverty-stricken areas in China [in Chinese]. Manage World 2019;35:150-9,185.
21. Peterman A, Schwab B, Roy S, Hidrobo M, Gilligan DO. Measuring women’s decision making: indicator choice and survey design experiments from cash and food transfer evaluations in Ecuador, Uganda, and Yemen. World Dev 2021;141:105387. Crossref
22. Adouard F, Glangeaud-Freudenthal NM, Golse B. Validation of the Edinburgh postnatal depression scale (EPDS) in a sample of women with high-risk pregnancies in France. Arch Womens Ment Health 2005;8:89-95. Crossref
23. Zhang H, Li L. Comparative analysis of 3 postpartum depression scales abroad [in Chinese]. Chin J Nurs 2007;2:186-8.
24. Lee DT, Yip SK, Chiu HF, et al. Detecting postnatal depression in Chinese women. Validation of the Chinese version of the Edinburgh Postnatal Depression Scale. Br J Psychiatry 1998;172:433-7. Crossref
25. Liu Y, Zhang L, Guo N, Li J, Jiang H. Research progress of Edinburgh Postnatal Depression Scale in screening of perinatal depression [in Chinese]. Chin J Mod Nurs 2021;27:5026-31.
26. Wang Y, Guo X, Lau Y, Chan KS, Yin L, Chen J. Psychometric evaluation of the Mainland Chinese version of the Edinburgh Postnatal Depression Scale. Int J Nurs Stud 2009;46:813-23. Crossref
27. Zimet GD, Dahlem NW, Zimet SG, Farley GK. The multidimensional scale of perceived social support. J Pers Assess 1988;52:30-41. Crossref
28. Yan R, Liu L, Zhang L. Analysis of the relationship between social support self-efficacy and health behaviours among college students [in Chinese]. Chin J Sch Health 2010;31:362-3.
29. Zhang X, Li X, Li Y, Quan X, Li X. Study on status and influencing factors of antenatal depression among urban women in western China [in Chinese]. Matern Child Health Care China 2019;34:5275-7.
30. Dibaba Y, Fantahun M, Hindin MJ. The association of unwanted pregnancy and social support with depressive symptoms in pregnancy: evidence from rural southwestern Ethiopia. BMC Pregnancy Childbirth 2013;13:135. Crossref
31. Wu H, Rao J. Research review on left-behind women [in Chinese]. J Chin Agric Univ Soc Sci 2009;26:18-23.
32. Hou F, Cerulli C, Wittink MN, Caine ED, Qiu P. Depression, social support and associated factors among women living in rural China: a cross-sectional study. BMC Womens Health. 2015;15:28. Crossref
33. Yue A, Gao J, Yang M, Swinnen L, Medina A, Rozelle S. Caregiver depression and early child development: a mixed-methods study from rural China. Front Psychol 2018;9:2500. Crossref
34. Xie RH, Yang J, Liao S, Xie H, Walker M, Wen SW. Prenatal family support, postnatal family support and postpartum depression. Aust N Z J Obstet Gynaecol 2010;50:340-5. Crossref
35. Wei Q, Zhang C, Hao B, Wang X. Study on parenting concepts and behaviours of caregivers with children under 3 years old in rural areas of China [in Chinese]. Matern Child Health Care China 2017;32:1759-61.
36. Hao J, Wang F, Huang J. Male labour migration, women empowerment and household protein intake: evidence from less developed rural areas in China [in Chinese]. China Rural Econ 2021;8:125-44.
37. Xiao G. A practical study of puerperal health care in a menstrual centre [in Chinese]. Chin J Mode Drug Appl 2016;10:271-2.

Pages