Predicting Mechanical Complications After Adult Spinal Deformity Operation Using a Machine Learning Based on Modified Global Alignment and Proportion Scoring With Body Mass Index and Bone Mineral Density
Article information
Abstract
Objective
This study aimed to create an ideal machine learning model to predict mechanical complications in adult spinal deformity (ASD) surgery based on GAPB (modified global alignment and proportion scoring with body mass index and bone mineral density) factors.
Methods
Between January 2009 and December 2018, 238 consecutive patients with ASD, who received at least 4-level fusions and were followed-up for ≥2 years, were included in the study. The data were stratified into training (n=167, 70%) and test (n=71, 30%) sets and input to machine learning algorithms, including logistic regression, random forest gradient boosting system, and deep neural network.
Results
Body mass index, bone mineral density, the relative pelvic version score, the relative lumbar lordosis score, and the relative sagittal alignment score of the global alignment and proportion score were significantly different in the training and test sets (p<0.05) between the complication and no complication groups. In the training set, the area under receiver operating characteristics (AUROCs) for logistic regression, gradient boosting, random forest, and deep neural network were 0.871 (0.817–0.925), 0.942 (0.911–0.974), 1.000 (1.000–1.000), and 0.947 (0.915–0.980), respectively, and the accuracies were 0.784 (0.722–0.847), 0.868 (0.817–0.920), 1.000 (1.000–1.000), and 0.856 (0.803–0.909), respectively. In the test set, the AUROCs were 0.785 (0.678–0.893), 0.808 (0.702–0.914), 0.810 (0.710–0.910), and 0.730 (0.610–0.850), respectively, and the accuracies were 0.732 (0.629–0.835), 0.718 (0.614–0.823), 0.732 (0.629–0.835), and 0.620 (0.507–0.733), respectively. The random forest achieved the best predictive performance on the training and test dataset.
Conclusion
This study created a comprehensive model to predict mechanical complications after ASD surgery. The best prediction accuracy was 73.2% for predicting mechanical complications after ASD surgery. This information can be used to prevent mechanical complications during ASD surgery.
INTRODUCTION
Adult spinal deformity (ASD) is a disorder that is globally prevalent [1]. It is characterized by significant low back/leg pain, stooping, and poor health-related quality of life (HRQoL) in patients with ASD compared with the general population. Although spinal surgery for correcting ASD is invasive, it is effective in symptomatic cases where conservative treatment is often unsuccessful [2]. However, the surgical correction of ASD is a difficult procedure that is known to have a high risk of complications during the surgery and postoperative period [3]. The estimated incidence of morbidity and mortality due to surgical correction is 31.3% and 0.5%, respectively [3]. Since there are many complications of ASD surgery, there are some ideal surgical target parameters such as Scoliosis Research Society-Schwab classification and age-adjusted alignment goals [4,5]. There are also formulas, such as the global alignment and proportion (GAP) score, which predict mechanical complications after ASD surgery, and the modified global alignment and proportion scoring with body mass index and bone mineral density (GAPB) system, which combines body mass index (BMI) and bone mineral density (BMD) with the GAP score [6,7].
Most studies have been performed using simple statistical techniques such as linear regression and logistic regression, and in practice, they provide information on mean values that do not properly reflect the characteristics of the population. However, in the past few years, the medical field has increasingly adopted computational techniques that allow the processing of large amounts of data and the creation of complex mathematical models that describe the relationships between different variables. The idea behind artificial intelligence is to create a system that mimics the natural ability of humans to continuously learn as they access new data and apply it to new situations in the future. Our research team reported that GAPB predicts mechanical complications better than other systems related to ASD [6]. This study aimed to create an ideal machine learning model to predict mechanical complications in ASD surgery based on the GAPB system.
MATERIALS AND METHODS
1. Patient Population
This was a retrospective analysis of surgically treated patients with ASD enrolled from 2009 to 2017. This study was approved by the Institutional Review Board (IRB) of the Ajou University Hospital (IRB No. 2022-0546-008). Written informed consent was obtained from all participants. The inclusion criteria were as follows: patients who underwent ASD surgery to correct sagittal imbalance; the presence of one of the following radiological criteria, including coronal Cobb angle > 20°, sagittal vertical axis >5 cm, pelvic tilt (PT) >25°, and/or thoracic kyphosis >60°, and/or pelvic incidence minus lumbar lordosis (PI–LL) > 10°; use of posterior spinal fixation and instruments with ASD surgery at ≥ level 4; and patients with a follow-up period of ≥ 2 years. The exclusion criteria were patients with ASD due to syndrome, autoimmune disease, infection, tumor, or other pathological conditions. Between January 2009 and December 2017, 491 patients with ASD underwent ASD surgery at our hospital. Among them, 253 patients with a follow-up period of < 2 years, patients without corrective surgery for ASD, and those with a surgical level of ≤ 3 were excluded. Between January 2009 and December 2017, 238 consecutive patients with sagittal imbalance who underwent ASD surgery were ultimately included in the study.
2. Data Collection
Demographic data, radiologic parameters, surgical characteristics, HRQoL data were collected for all 238 patients included in the electronic medical records. Demographic data included age, sex, BMI, BMD, and GAP score variables. Yilgor et al.7 created the GAP score. The overall goal of the GAP score is to achieve patient-specific spine-pelvic alignment guidance, and the GAP score predicts mechanical complications. After that, Noh et al. [8,9] made GAPB including BMI and BMD in GAP. Factors frequently used to predict mechanical complications after ASD were used to create an artificial intelligence model.
The following sagittal alignment parameters were measured: PI, PT, lumbar vertebral lordosis (LL [L1–S1]), PI–LL, and global tilt. Radiographic measures included preoperative, postoperative, and final follow-up alignment parameters. We defined mechanical complications after ASD surgery as the following (proximal junctional kyphosis, proximal junctional failure, distal junctional failure, distal junctional kyphosis, rod fracture, implant-related complications) and investigated their prevalence. Proximal junctional kyphosis was defined as a ≥ 10° increase in kyphosis between upper Instrumented vertebra (UIV) and UIV+2 between the early postoperative and 2-year followup radiographs. Proximal junctional failure was defined as a fracture of UIV or UIV+1, withdrawal of the instrument in UIV, and/or sagittal subluxation. Distal junctional kyphosis/ failure referred to a ≥ 10° increase in kyphosis angle between lowest instrumented vertebra (LIV) and LIV-1, and/or withdrawal of the apparatus from the LIV. Rod breakage referred to single or double rod breakage. Implant-related complications included other radiographic implant-related complications such as screw loosening, breakage, pullout, or interbody graft, hook, or screw leave. HRQoL was measured using the Oswestry Disability Index, the Scoliosis Research Society-22 Spinal Malformation Questionnaire, and Short Form-36.
3. Prediction Models and Evaluation
The patients were randomly divided into training (n = 167, 70%) and test (n= 71, 30%) datasets (Fig. 1). The training set was used to develop the model, and the test set was used to evaluate the model. Among the models that can be implemented with R, we compared logistic regression, which is widely used conventionally, gradient boosting, which is a representative boosting method, random forest, which is a representative bagging method, and deep neural network, which has recently become an issue. We performed 4 analyses to classify the occurrence of complications. First, univariable and multivariable logistic regressions were used. Variables with p< 0.05 in the univariable analysis were entered in the multivariable analysis. The final multivariable model was determined using a stepwise variable selection method. Second, the gradient boosting model was created with the R package “xgboost,” and variable importance was visualized. For this analysis, a maximum tree depth of 2, learning rate of 0.3, and number of boosting of 20 were considered. Third, random forest classification was performed using the R package “random forest.” For this analysis, the number of trees was set to 500, and the number of variables used in each 3 was set to 5, which had the largest Kappa value. Fourth, a deep neural network was used via the R package “nnet.” For this analysis, a hidden layer of 10 was employed.
Diagnostic performance was evaluated using the area under receiver operating characteristic (AUROC), area under precise recall curve (AUPRC), accuracy, sensitivity, and specificity for each dataset. To calculate the accuracy, sensitivity, and specificity, the optimal cutoff points were computed using Youden index. Comparisons of AUROC, AUPRC, accuracy, sensitivity, and specificity were performed using generalized estimating equations.
4. Statistical Analysis
Descriptive statistics are presented as frequencies and percentages for categorical variables and as means and standard deviations for continuous variables. To compare the characteristics of patients in the complication and no complication groups, the chi-square test (or Fisher exact test) was used for categorical variables and an independent 2-sample t-test was used for continuous variables. All statistical analyses were performed using SAS 9.4 (SAS Institute Inc., Cary, NC, USA). Statistical significance was set at p< 0.05.
RESULTS
1. Patient Demographics
Two hundred thirty-eight patients underwent ASD surgery (204 females [86%], 34 males [14%]); their demographic data are shown in Table 1. Of those patients, 167 (70.2%) were assigned to the training set and 71 (29.8%) to the test set. The patients’ average age and follow-up period were 67.1± 6.17 years and 28.54± 4.25 months, respectively. The mean ages of patients in the training and test sets were 67.80± 7.49 years and 66.94± 6.98 years, respectively. When comparing the groups with and without complications in the training set, BMI, BMD, the relative pelvic version score, the relative lumbar lordosis score, and the relative sagittal alignment score were statistically significant. When comparing the groups with and without complications in the test set, BMI, BMD, the relative pelvic version score, the relative lumbar lordosis score, and the relative sagittal alignment score were statistically significant. When comparing the group with and without complications in the test set, BMI, BMD, the relative pelvic version score, the relative lumbar lordosis score, the lordosis distribution index score, and the relative sagittal alignment score were statistically significant.
2. Logistic Regression Model
The results of the univariate and multivariate logistic regression analyses are presented in Table 2. The following variables were significantly related to mechanical complications of ASD surgery in univariate logistic regression: BMI, BMD, relative pelvic version score, relative lumbar lordosis score, and relative sagittal alignment score. In the multivariate logistic regression, BMD and relative lumbar lordosis score were significantly related to mechanical complications of ASD surgery.
3. Gradient Boosting Model
The results of the gradient boosting analysis are shown in Fig. 2. BMI, BMD, and relative lumbar lordosis score were the most important variables in the gradient boosting model.
4. Random Forest Model
The results of the random forest analyses are shown in Fig. 3. BMI, BMD, and relative lumbar lordosis score were the most important variables in the random forest model. Since random forest has the possibility of overfitting in the training set, it must be interpreted carefully considering the validation result.
5. Deep Neural Network Model
The results of the deep neural network analyses are shown in Fig. 4. The most important variables in this model were the lordosis distribution index score and relative sagittal alignment score.
6. Diagnostic Performance of the Machine Learning Models
The AUROCs and AUPRCs for the 4 machine learning models are presented in Table 3. In the training set, the AUROCs for logistic regression, gradient boosting, random forest, and deep neural network model were 0.871 (0.817–0.925), 0.942 (0.911–0.974), 1.000 (1.000–1.000), and 0.947 (0.915–0.980), respectively, the AUPRCs for logistic regression, gradient boosting, random forest, and deep neural network model were 0.793 (0.677–0.895), 0.93 (0.878–0.965), 1.000 (1.000–1.000), and 0.942 (0.898-0.972), respectively, and the accuracies were 0.784 (0.722–0.847), 0.868 (0.817–0.920), 1.000 (1.000–1.000), and 0.856 (0.803–0.909), respectively. In the test set, the AUROCs for the same models were 0.785 (0.678–0.893), 0.808 (0.702–0.914), 0.810 (0.710–0.910), and 0.730 (0.610–0.850), respectively, the AUPRCs for logistic regression, gradient boosting, random forest, and deep neural network model were 0.711 (0.523–0.87), 0.717 (0.529–0.89), 0.748 (0.554–0.882), and 0.667 (0.475–0.818), respectively, and the accuracies were 0.732 (0.629–0.835), 0.718 (0.614– 0.823), 0.732 (0.629–0.835), and 0.620 (0.507–0.733), respectively. The random forest achieved the best predictive performance on the training and test dataset. Fig. 5 shows the AUPRCs of each model in the training and test sets.
DISCUSSION
The prevalence of mechanical complications, with radiologic and clinical manifestations, after surgery for adult spinal deformities is reported to be 30%, and more than 50% of these patients undergo revision surgery for treatment [10]. Soroceanu et al. [11] reported that radiographic and implant-related complications accounted for 31.7%, and in 52.6% of these complications, reoperation for mechanical correction was required. There are many aspects of ASD surgery with notable variability, including the occurrence of complications and outcomes [12]. GAPB is a system that is used to predict mechanical complications that occur after ASD surgery, including both patient-specific and radiological factors [6]. In this study, we constructed a model to predict mechanical complications after ASD surgery using GAPB factors. The GAPB system, including BMI and BMD, showed improved predictability of predicting mechanical complications compared to the GAP scoring system [8]. In particular, Noh et al. [9] reported that GAPB better predicted mechanical complications in the moderately disproportioned and severely disproportioned groups in GAP. Park et al. [13] reported that osteoporosis and obesity are important risk factors for proximal junctional kyphosis, proximal junctional failure and other mechanical complications. Since most elderly patients in ASD surgery have low muscle mass and severe osteoporosis, BMI and osteoporosis are essential when discussing mechanical complications. Recently, several studies using deep learning algorithms, such as random forest, gradient boosting, and neural networks, have been conducted for the spine [14]. Yagi et al. [15] created a postsurgical complication prediction model for ASD surgery in adults using spinal alignment, demographic data, and surgical invasiveness; 170 participants were enrolled in this study. A decision tree for 2-year postoperative complications was constructed and confirmed by splitting data in a 7:3 ratio for training and testing, with the external validation of 25 ASD patients who underwent surgery at different hospitals [15]. For the test sample, the predictive model was 92% accurate, the AUC was 0.963, and the external validation was 84% accurate. Lafage et al. [16] created a machine learning model to determine the upper vertebra in ASD surgery. The samples were stratified into 3 groups: 70% for training, 15% for validation, and 15% for performance testing. A neural network model was used, and the results showed an accuracy of 81.0%, precision of 87.5%, and recall of 87.5%. Pellisé et al. [17] created a model to predict the incidence of adverse events after ASD surgery using a random forest model. The model was trained using 80% of the data for the training set and 20% for the test set and showed adequate predictive accuracy, with AUCs ranging from 0.67 to 0.92 [17]. Durand et al. [18] created a model for predicting blood transfusion following surgery for adult spinal deformities. A total of 1,029 patients were analyzed and divided into datasets for training (n = 824) and validation (n= 205). The random forest model showed an AUC of 0.85 (95% confidence interval, 0.80–0.90) and was reported to show better predictive ability than single-decision tree models.(18) Ames et al. created a model to predict the cost of surgery for ASD. The regression tree and random forest models were used to predict the occurrence of treatment costs exceeding $100,000 [19]. The results of the regression tree analysis using CTREE resulted in an adjusted R2 value of 56% at 90 days and 35.6% at 2 years of direct cost forecasting. Random C-forest regression analysis showed an adjusted R2 value of 57.4% at 90 days and 28.8% at 2 years of direct cost forecasts. Peng et al. [20] created a model to predict proximal junctional kyphosis after surgery in adolescent patients with idiopathic scoliosis. The random forest has great value for predicting the individual risk of developing proximal junctional kyphosis after long instrumentation and fusion surgery in patients with Lenke 5 adolescent idiopathic scoliosis. Jain created a model to predict discharge delay, medical complications, and readmission within 90 days after long-segment posterior lumbar spine fusion surgery [21] using logistic regression, random forest, and elastic net. In our study, we created a model to predict the mechanical complications that occur after ASD surgery. We used logistic regression, gradient boosting, random forest, and deep neural networks. Important factors were BMD, BMI, relative lumbar lordosis score, lordosis distribution index score, and relative sagittal alignment score. The patients were randomly divided into training (70%) and test (30%) datasets. In the training set, the AUROC for random forest was 1.000 and the accuracy was 1.000. In the test set, the AUROC for random forest was 0.81 and the accuracy was 0.732. Random forest achieved the best predictive performance on the training and test dataset.
This study has several limitations. Because our models were built using retrospective data, future efforts to update these models are required. Additionally, the reasons for mechanical complications after ASD correction are multifactorial. Many factors affect the outcome of surgery, including the surgical method, upper level instrumentation, muscle mass, and various underlying conditions. These factors were excluded when the model was created.
However, the GAPB system is helpful in predicting mechanical complications after ASD surgery [6]. Noh et al. [9] reported that the GAPB system was more meaningful in the moderately disproportioned and severely disproportioned GAP groups. We believe that it will be helpful to develop models that predict mechanical complications through machine learning. And the overfitting problem caused by using small data samples is a limitation of this study. We will study with more data samples later.
CONCLUSION
This study created a comprehensive model to predict mechanical complications after ASD surgery. The best prediction accuracy was 73.2% for predicting mechanical complications after ASD surgery. This information can be used to prevent mechanical complications during ASD surgery.
Notes
Conflict of Interest
The authors have nothing to disclose.
Funding/Support
This study received no specific grant from any funding agency in the public, commercial, or not-for-profit sectors.
Author Contribution
Conceptualization: SHN, SHK, KHK; Data curation: SHN, HSL, GEP, SHK, KHK; Formal analysis: SHN, SHK, KHK; Methodology: SHN, HSL, GEP, SHK, KHK; Project administration: SHN, HSL, GEP, SHK, KHK; Visualization: SHN, YH, JYP, SUK, DKC, KSK, YEC, SHK, KHK; Writing - original draft: SHN; Writing - review & editing: SHN.