Warning: mkdir(): Permission denied in /home/virtual/lib/view_data.php on line 87 Warning: chmod() expects exactly 2 parameters, 3 given in /home/virtual/lib/view_data.php on line 88 Warning: fopen(/home/virtual/e-kjs/journal/upload/ip_log/ip_log_2024-07.txt): failed to open stream: No such file or directory in /home/virtual/lib/view_data.php on line 95 Warning: fwrite() expects parameter 1 to be resource, boolean given in /home/virtual/lib/view_data.php on line 96 The Quantitative Evaluation of Automatic Segmentation in Lumbar Magnetic Resonance Images
Neurospine Search


Neurospine > Volume 21(2); 2024 > Article
Liang, Fang, Lin, Yang, Chang, Chang, Ko, Tu, Fay, Wu, Huang, Hu, Chen, and Kuo: The Quantitative Evaluation of Automatic Segmentation in Lumbar Magnetic Resonance Images



This study aims to overcome challenges in lumbar spine imaging, particularly lumbar spinal stenosis, by developing an automated segmentation model using advanced techniques. Traditional manual measurement and lesion detection methods are limited by subjectivity and inefficiency. The objective is to create an accurate and automated segmentation model that identifies anatomical structures in lumbar spine magnetic resonance imaging scans.


Leveraging a dataset of 539 lumbar spinal stenosis patients, the study utilizes the residual U-Net for semantic segmentation in sagittal and axial lumbar spine magnetic resonance images. The model, trained to recognize specific tissue categories, employs a geometry algorithm for anatomical structure quantification. Validation metrics, like Intersection over Union (IOU) and Dice coefficients, validate the residual U-Net’s segmentation accuracy. A novel rotation matrix approach is introduced for detecting bulging discs, assessing dural sac compression, and measuring yellow ligament thickness.


The residual U-Net achieves high precision in segmenting lumbar spine structures, with mean IOU values ranging from 0.82 to 0.93 across various tissue categories and views. The automated quantification system provides measurements for intervertebral disc dimensions, dural sac diameter, yellow ligament thickness, and disc hydration. Consistency between training and testing datasets assures the robustness of automated measurements.


Automated lumbar spine segmentation with residual U-Net and deep learning exhibits high precision in identifying anatomical structures, facilitating efficient quantification in lumbar spinal stenosis cases. The introduction of a rotation matrix enhances lesion detection, promising improved diagnostic accuracy, and supporting treatment decisions for lumbar spinal stenosis patients.


Degenerative lumbar spondylolisthesis or degenerative disc disease are potential causes of lumbar spinal stenosis. This condition may manifest with clinical symptoms like low back pain, sciatica, and neurological claudication, restricting movement and disrupting daily posture [1,2]. While medication or physical therapy proves effective for most patients, surgery becomes necessary for those unresponsive to conservative treatments. Surgical intervention involves decompression and stabilization to address instability [3,4]. Preoperative assessment for surgical indications and outcomes relies on radiological examinations, including magnetic resonance imaging (MRI) scans.
Degenerative changes in lumbar regions, as revealed by MRI scans, encompass disc height reduction, osteophyte formation, and degenerative disc herniation, constituting over 90% of lumbar spinal central and lateral stenosis cases [5,6]. The integrity of intervertebral discs, comprised of the nucleus pulposus, ringshaped cartilaginous endplates, and collagenous annulus-fibrosis layers, plays a pivotal mechanical role. The extracellular matrix in these discs manages tensile strength and osmotic pressure, facilitating load transmission across the spine column in response to body weight and daily activities [7], but this diminishes with age [8]. Intervertebral disc degeneration, exacerbated by accumulated compressive loads, is implicated in lumbar spinal stenosis observed in MRI scans [9]. T2-weighted MRI emerges as a potent tool for detecting morphologic changes in intervertebral disc degeneration, including height loss and water-intensity loss [10]. Degeneration at this level may correlate with adjacent endplate degeneration. Signal intensity changes in MRI scans appear to signify a spectrum of vertebral body marrow changes associated with lumbar degenerative disease [11]. In the previous studies, intervertebral disc could be measured in quantitative parameters that showed a greater ability to reflect the aging effect of degeneration [12]. While prior studies employed quantitative parameters to measure intervertebral discs, limitations like manual segmentation, feature point marking, subjectivity, and human eye capacity hindered consistency and efficiency. Consequently, treatment decisions for lumbar spinal stenosis patients still rely largely on self-reports and physical examinations [13]. Advances in clinical imaging techniques promise more robust measurements, overcoming the drawbacks of time-consuming processes, labor intensity, and the need for specialized domain knowledge.
To address the challenges posed by clinically evaluated lumbar spinal stenosis, advanced spine indices measurement methods employing segmentation-based approaches and deep learning algorithms were employed. Traditionally, semisegmentation software was conventionally used to separate different spinal tissues and measure anatomical structures [14]. However, these manual processes proved time-consuming, labor-intensive, and required specialized expertise. The integration of deep learning algorithms, particularly convolutional neural networks (CNNs), introduced automated segmentation for identifying lumbar spine anatomical structures. Utilizing a 3D CNN on 23 patients MRI, an automatic supervised segmentation approach for vertebral body formation was developed [15]. The accuracy and superiority of MRI-based 3-dimensional (3D)-CNN images were compromised without the detection of soft tissue such as intervertebral discs, dural sac, and yellow ligaments. In a separate study, a fully automated cervical segmentation architecture utilized a deep, fully connected CNN to reduce misdiagnosis [16]. The centroid of the vertebrae was located using probabilistic spatial regression, achieving a Dice similarity coefficient of 0.84 and a shape error of 1.69 mm. To extend segmentation to all vertebrae, including cervical, thoracic, and lumbar spines, a 3D U-Net and deconvolution network were employed in an iterative segmentation model [17]. Cross-entropy was applied as the loss function for multi-label classification. Midsagittal diameter and cross-sectional area of the dural sac and intervertebral discs were automatically determined using a CNN with cross-space distance-preserving regularization, yielding reliable performance (mean absolute error range: 1.04 ± 0.09 mm to 3.54 ± 0.28 mm) [18].
Furthermore, a pretrained U-Net-based CNN network, incorporating a region proposal network and postprocessing with a thresholding method for distinguishing fat and muscular tissue, demonstrated the applicability of pretrained networks for limited image datasets. It produced favorable results for segmenting paraspinal muscles and exhibited excellent Intersection over Union (IOU) and differential scanning calorimetry for segmenting intervertebral discs [19]. While CNNs have become widely utilized in medical image recognition and detection, the complexity of optimization increases with depth. To mitigate these issues, residual convolution was introduced, leading to the development of a residual U-Net [20].
Despite the time saved by automated segmentation, manual landmarks annotation remained a time-consuming aspect. Many methods focused on landmarks detection in spine indices measurement, using forecasted heatmaps to construct landmarks from which spine indices were derived. These landmarks, identified as locations with the highest peak values in the forecasted heatmaps, facilitated the definition and calculation of the contour and anatomy of intervertebral discs, vertebrae, dura sac, and lamina. Previous studies demonstrated various landmarks-based analyses [21,22]. Although CNN models performed well in spine segmentation, there is a notable gap in evidence supporting the application of this technique in MRI with lesion detection, particularly for conditions such as lumbar spinal stenosis, to enable robust detection and support imaging diagnosis.
This study utilizes MRI images from patients with lumbar spinal stenosis to implement an automatic segmentation technique for model training. The primary objective of the model is to identify various anatomical structures such as the intervertebral disc, vertebral body, and yellow ligament. Furthermore, the model aims to provide global quantitative measurements for both normal and abnormal levels at each lumbar spine segment.


1. Data Preparation

This study conducted a retrospective analysis of axial and sagittal T2-weighted/T1-weighted (T2W/T1W) MRI studies at a single medical center. The inclusion criteria encompassed patients undergoing lumbar spine MRI due to lumbar spinal stenosis, while individuals under the age of 20, those with severe artifacts in lumbar MRI, vertebral fractures, postsurgery conditions, and those with trauma or malignancy were excluded from the dataset. MRI images were acquired in 3-Tesla scanner (MR750, GE Medical Systems, Milwaukee, WI, USA).
To formulate the lumbar spine semantic segmentation model, the dataset underwent division into training, validation, and testing sets. On average, labeling each slice takes between 10 to 15 minutes, posing a significant challenge to this process. To streamline this, 3 professional clinicians were enlisted to annotate the ground truth images using the open-source software ITK-SNAP [23]. We utilized the ‘polygon’ and ‘paintbrush’ features within ITK-SNAP for this purpose. The polygon tool allowed for shape editing by adjusting vertices on the image, while the paintbrush tool facilitated quick drawing and refinement using mouse input, accommodating masks of various shapes and sizes. To reduce variability in MRI scans, we adjusted the contrast and intensity of images to ensure accurate delineation of each anatomical structure. The annotation process involved identifying key anatomical structures such as the intervertebral disc, vertebral body, and dural sac in sagittal views (Fig. 1A), as well as the intervertebral disc, dural sac, lamina, and yellow ligament in axial views (Fig. 1B).
To identify the appropriate slices in axial view, a dual-view setup within ITK-SNAP was utilized to ensure precise alignment of corresponding locations across axial and midsagittal views. The selected axial slice of the disc was defined as the one cutting closest to the half-height of the disc in sagittal view. Five appropriate axial slices were extracted, each from the lumbar level discs: L1/L2, L2/L3, L3/L4, L4/L5, and L5/S1 in the axial MRI scan.
The study was performed according to the Helsinki Declaration (http://www.wma.net/en/30publications/10policies/b3/ ) and approved by the Institutional Review Board (IRB) of Taipei Veterans General Hospital (IRB No. 2023-01-019AC) where the experiment was performed.

2. The Architecture of the Residual U-Net

The lumbar spine tissue semantic segmentation task employed the residual U-Net, whose architecture is depicted in Fig. 2A. The encoding phase was initiated by inputting 320× 320 pixel lumbar spine T1W MR images into the residual U-Net featuring a sequential series of a convolution layer, followed by a batch normalization layer with the Rectified Linear Unit (ReLU) activating function, and a second convolution layer with an identity map addition. Two residual units followed this, each comprising 2 stacks of batch normalization layers with ReLU activating function, followed by a convolution layer and an identity map addition.
Subsequently, the encoding features traversed the bridge compartment before entering the decoding phase. The decoding phase comprised 3 stacks of up-sampling layers, concatenate layers, and residual units, with additional corresponding skip connections to mitigate gradient vanishing during deep network training. The output segmentation mask results were classified into 4 classes (intervertebral disc, dural sac, ligamentum flavum, and lamina) in axial view lumbar spine MRIs, and 3 classes (intervertebral disc, vertebral body, and dural sac) in sagittal view lumbar magnetic resonance (MR) images.
The residual U-Net model underwent training with a batch size of 10 for 2,000 epochs, utilizing the Adadelta optimizer with a learning rate of 0.001. Training data were augmented with adjustments to brightness, contrast, rotation, and translation to enhance model generalization. TensorFlow (https://www.tensorflow.org) was used to implement all networks, and the codes executed on a server equipped with an RTX 2080 Ti GPU.
After automatic segmentation, we identified anatomical structures in both sagittal and axial views. Fig. 2B summarizes the detected lesions within each of these anatomical structures.

3. The Measurement of Performance Evaluations

Within the domain of supervised learning prediction, particularly in the context of deep learning methodologies, the proficiency of the model under scrutiny was meticulously evaluated across both training and testing datasets through performance assessments. This evaluation entailed a detailed comparative analysis between the predicted output image and the ground truth image, aiming to precisely identify and measure the discrepancies between the predicted and true labeled regions. To facilitate a nuanced and intuitive gauge of the segmentation quality, the Dice coefficient score was employed as a pivotal metric. Dice coefficient score, serving as a quantifiable measure, computes the percentage of overlap between 2 images on a scale from 0 to 1, where a Dice coefficient score of 1 denotes a perfect and complete overlap between the segmented output and the ground truth.
Traditionally, the Dice coefficient score has found extensive utility in the evaluation of models within the realm of medical imaging, establishing itself as a standard metric for assessing the efficacy of medical image segmentation endeavors. The Dice coefficient effectively quantifies the extent of spatial overlap between the predicted segmentation (A) and the ground truth segmentation (B), providing a critical insight into the performance of segmentation. Equation 1 defines the formula to get the Dice coefficient score,
Dice coefficient score (A,B) = 2×ABA+B
where |.| represents the number of voxels. The overlap values were between 0 and 1, where 1 denotes an identical pair of masks and 0 indicates no match between the two. Such quantification not only underscores the precision of the segmentation but also encapsulates the ability of model to accurately demarcate and identify relevant anatomical features within MR images.
The IOU also plays a pivotal role in quantifying the accuracy of segmentation models performance. IOU, also referred to as the Jaccard index, is a measure used to evaluate the extent of overlap between the predicted segmentation and the ground truth, offering a clear and concise quantification of predicted performance in delineating targeted anatomical structures within MR images. This metric calculates the ratio of the intersection of the predicted and true segments to their union, providing values that range from 0, indicating no overlap, to 1, signifying perfect congruence between the predicted and actual images. Equation 2 demonstrated the formula to get the IOU index.

4. The Measurement of Intervertebral Disc Geometry Index

The computer vision method identified landmarks of the intervertebral disc to measure the geometry of each lumbar spine intervertebral disc (Fig. 3A). In the sagittal view of lumbar MR images, the leftmost and rightmost points of the intervertebral disc were identified to calculate the width (Wdc, Fig. 3B). The height (Hdc, Fig. 3B) was defined as the points intersected with the border of the intervertebral disc, using the normal vector of the line connecting the leftmost point, rightmost point, and intervertebral disc centroid. The intervertebral disc area was estimated based on the segmented pixel results, while the volume was integrated into the area in each slice.
Lumbar alignment was measured by detecting each vertebral structure. Spinal listhesis was detected by measuring the distance (L, Fig. 3C) between 2 reference lines derived from the edges of the upper and lower vertebrae in the sagittal view. Reference lines were identified using the 4 endpoints of the vertebrae. Under normal conditions, the measured L should not exceed 0; patients would be reported with spinal listhesis if L is greater than 0.
Intervertebral disc herniation was quantified by measuring the diameter of the dural sac (Dds) in sagittal and axial views. In the sagittal view (Fig. 3D), the diameters of the corresponding dural sacs of each lumbar intervertebral disc were calculated. Dds was determined based on the distance between the intersection of the normal vector passing through the intervertebral disc centroid and the dural sac. The compression of the dural sac caused by the bulging disc was defined by the distance between the line of posterior endplates and the edge of the dural sac (Dist, Fig. 3C). In the axial view (Fig. 3E), Dds with the corresponding lumbar intervertebral disc was evaluated with the intersected length with the dural sac and the line connecting the intervertebral disc centroid and the bottom of the lamina. To assess noncentral line herniated discs, a sweep of ± 30° measuring the dural sac diameter was also applied (Fig. 3F). Whether the herniated disc occurred other than the central line herniated could be verified by the decreased dural sac diameter and the corresponding degree of the site. Furthermore, the thickness of yellow ligament was also measured (Fig. 3G). The yellow ligament was split into middle (YLm) and lateral (YLl) part. To locate the yellow ligament precisely, the YLm was defined as the length of intersection between dural sac centroid and the lowest point of yellow ligament. As the YLm was specify, the line connecting dural sac centroid and the lowest point was considered as the reference and swept ± 30° to measure the thickness of yellow ligament. The YLl was determined as the longest distance between dural sac centroid and the intersection points of yellow ligament.

5. The Measurement of Disc Hydration

The degree of intervertebral disc degeneration is closely tied to the loss of water content in the nucleus pulposus. To assess this degeneration, we calculated the signal intensity difference (∆SI) within intervertebral disc areas. Both T1W/T2W MR images of the lumbar spine in sagittal view were utilized, obtaining intervertebral disc signal intensity through a pre-generated intervertebral disc mask created by the residual U-Net model. The T1W/T2W pixel intensities of the intervertebral disc served as features for the K-means model, facilitating clustering into hydrated and dehydrated groups. The centroids of these groups were identified, and their distance calculated. Additionally, the mean intensity of cerebrospinal fluid (CSF) was computed as a normalized factor to minimize signal intensity variation under different MR imaging conditions. Equation 3 expresses the formula for ∆SI between the centroids.
Where SICSF stands for the signal intensity corresponding to the distance of origin and cluster center for the CSF area, SIhydration and SIdehydration were the distance of origin between cluster center of hydration and dehydration, respectively.

6. Statistical Analysis

In the course of statistical analysis, we employed confidence intervals (CIs) to interpret the observed results. Setting a significance level of 0.05, these intervals offer an estimated range. Within this range, it can be asserted with 95% confidence that the true population parameter exists. CIs act as a robust metric for elucidating the statistical uncertainty linked to the estimates, offering a nuanced comprehension of the precision and reliability of experimental findings.


1. Demography of Data

A total of 539 patients were included from the Taipei Veterans General Hospital. The dataset consisted of 268 males and 271 females. The age distribution was 59.4 ± 18.9 years old (range, 43–78 years) in males, and 65.3 ± 14.3 years old (range, 46–81 years) in females. The training data set consisted of 207 males and 171 females, while 32 males and 21 females were included for validation dataset at training procedure. To evaluate the performance of the trained model, a testing dataset consisting of 56 males and 52 females was used.

2. The Performance of Segmentation by Using Residual U-Net

To address challenges related to gradient explosion, vanishing gradients, and optimization complexity, the residual U-Net was developed. Additionally, skip connections were employed to mitigate overfitting, recognized as a more effective strategy than randomly deactivating units. In this study, the residual U-Net model was chosen for the semantic segmentation task on lumbar spine MR images. The sagittal lumbar spine MR images involved the segmentation of 3 tissue categories: intervertebral disc, vertebra, and dural sac. To validate the segmentation results, Table 1 reports the mean IOU, mean Dice coefficient, and their 95% CIs. The mean IOU values were 0.91 for intervertebral disc, 0.93 for vertebra, and 0.93 for dural sac. The corresponding mean Dice coefficients were 0.91, 0.93, and 0.87, respectively.
In axial view lumbar MRIs, segmentation involved 4 tissue categories: intervertebral disc, lamina, dural sac, and yellow ligament. The overall mean IOU values were 0.82 for intervertebral disc, 0.85 for lamina, 0.82 for dural sac, and 0.83 for yellow ligaments. The mean Dice coefficients were 0.82 for intervertebral disc, 0.84 for lamina, 0.82 for dural sac, and 0.83 for the yellow ligaments.

3. The Quantification of Anatomical Structure in Lumbar Spine

The automated quantification system for lumbar spine anatomical structures was utilized for measurements following tissue segmentation by the residual U-Net model. In the sagittal view, the dimensions of the intervertebral disc height (Hdc) and width (Wdc) were individually measured at each level, with mean values of 10.8 mm and 33.1 mm, respectively. The diameter of the dural sac (Dds) was also measured, yielding an average of 10.6 mm. Sagittal view calculations were also performed to assess disc hydration, resulting in an average hydration level of 55.4%. In the axial view, measurements included the central line diameter (Dds) and middle thickness (YLm) and lateral thickness of the yellow ligament (YLl), with average values of 10.7 mm, 2.1 mm, and 2.5 mm, respectively (Table 2). A comparison between the training and testing datasets revealed no significant differences in anatomical measurements, as indicated by p-values between each pair of groups (p > 0.05).

4. Case Illustration

A 65-year-old man experienced persistent back pain radiating to both lower limbs for several months. Despite attempts at conservative treatment, a lumbar MRI scan was conducted, revealing a bulging disc at the L4–5 levels resulting in lumbar spinal stenosis. The axial and sagittal images from the scan were utilized as testing data in an automatic segmented model. In the sagittal view, the compression of the dural sac (Dist) at L4–5 was measured at 5.3 mm. The disc hydration at the index level was 18%, indicating a value below the normal range (Fig. 4A). Additionally, the diameter of the dural sac (Dds) at L4–5 was significantly smaller than at other levels (Fig. 4B). In the axial view, the rotation matrix displayed a smooth inverted U-shaped curve at the normal level (Fig. 4C), which deteriorated at the L4–5 level, suggesting lumbar spinal stenosis with dural sac compression (Fig. 4C). Measurements indicated that the yellow ligament thickness was 3.1 mm laterally and 3.9 mm medially. After quantitative analysis, it was concluded that the bulging disc at L4–5, coupled with a hypertrophic yellow ligament, contributed to lumbar spinal stenosis at the same level, aligning with clinical observations.


The lumbar degenerative disease encompasses disc degeneration, facet hypertrophy, and hypertrophic yellow ligaments, leading to lumbar spinal stenosis and associated clinical symptoms. Instabilities like retrolisthesis or spondylolisthesis can exacerbate these clinical conditions. Regarding disc degeneration, the Modic classification categorizes disc conditions into 4 types based on MRI scans: type 0 signifies a normal disc and vertebral body appearance; type I involves bone marrow edema within the vertebral body and hypervascularization; type II indicates fatty replacements of the red bone marrow within the vertebral body, and type III involves subchondral bone sclerosis [24,25]. Additionally, disc-only degeneration can be assessed using the Pfirrmann grading scale, which categorizes discs into grade I (normal disc), grade II (inhomogeneous disc with normal height and clear nucleus/annulus distinction), grade III (inhomogeneous gray disc with a blurred border between the nucleus and annulus and normal to slightly decreased height), grade IV (inhomogeneous hypointense dark gray disc with significant height loss), and grade V (inhomogeneous black disc with disc space collapse) [10,24]. While these classification systems offer varied disc degeneration grading, they lack specific quantitative scales. The introduction of deep learning models allows for the quantitative evaluation of central and lateral lumbar stenosis. However, there remains a need for a comprehensive evaluation of each anatomical structure in the lumbar spine and the identification of the specific components causing lumbar spinal stenosis [26].
This study employed a combination of the residual U-Net and a geometry algorithm to establish an automatic system for the quantitative analysis of spinal anatomical structures. The residual U-Net’s segmentation capability effectively distinguished various labeled tissues, facilitating subsequent measurements. While numerous methods, such as the K-means clustering algorithm [27], deep CNN [28], and U-Net [29], have been utilized for medical segmentation in recent years, they often faced challenges in handling extensive hidden layers and larger images. These limitations were primarily attributed to computational resource constraints and the “vanishing gradient” problem. To address the vanishing gradient issue, the residual unit was introduced [30]. Previous research has demonstrated the effectiveness of the residual U-Net in enhancing the performance of deep convolutional networks and addressing class imbalance issues when compared to traditional U-Net and other improved variants [31].
There were 2 different views as the input in this model. In the sagittal view, this study extracted and subsequently measured discs, vertebrate bodies, and the dural sac. Additionally, the study included a demonstration of disc hydration. A prior investigation introduced a semantic segmentation network (BianqueNet) to enhance the histogram method, enabling automatic calculation for measuring disc hydration [32]. However, the histogram method exhibited drawbacks, such as overenhancement leading to pronounced peaks for frequently occurring gray levels. This overenhancement posed a significant challenge and overlooked the importance of local enhancement while being susceptible to the mean-shift problem. To overcome these issues associated with the histogram method, K-means clustering was employed to quantify disc hydration levels. K-means is a straightforward yet effective unsupervised learning algorithm designed to address clustering problems. This method categorizes a given dataset into a predetermined number of clusters, providing a solution to the overenhancement issue [33]. Consequently, the signal intensity was divided into 3 clusters representing disc hydration, disc dehydration, and dural sac signal, mitigating the challenges associated with overenhancement.
In the axial view, the residual U-Net accurately identified discs, the dural sac, yellow ligament, and lamina. This study introduces a novel approach for detecting bulging discs and compression of the dural sac using a rotation matrix to measure the diameter of the dural sac and the thickness of the yellow ligament. Hypertrophy of the yellow ligament is considered a contributing factor in the development of lumbar spinal stenosis [34]. The key advantage of this algorithm lies in its ability to not only detect stenosis but also determine the direction in which the intervertebral disc is compressed. To assess the healthy condition of the intervertebral disc, we established a scheme in healthy subjects to define the normal range.
In clinical practice, lumbar degeneration is assessed using various grading systems tailored to clinical presentations. The Meyerding classification grades spondylolisthesis in patients with lumbar instability via lateral radiographs [35]. MRI examinations aid in evaluating disc degeneration by assessing morphological changes in intervertebral discs and endplates [10]. Another retrospective radiologic study [36] aimed to establish a qualitative grading system for lumbar spinal stenosis and determine its reliability and clinical relevance. Dural sac cross-sectional area differs significantly between symptomatic and asymptomatic individuals. The study introduced a 7-grade classification based on dural sac morphology observed on T2 axial MRI, involving 95 subjects. Results showed substantial intra- and moderate interobserver agreement, with surgical patients exhibiting smaller dural sac cross-sectional areas and a higher proportion of higher grades. Various factors contribute to lumbar degeneration and stenosis, including disc generation and facet joint and yellow ligament hypertrophy. While previous grading systems focused on single anatomical degeneration [10,11,35,36], this study compared a 3D-CNN model, suggesting that a global assessment across different anatomical structures could enhance clinical diagnosis. While experienced clinical experts may find neural network introduction limiting for diagnosis, it could aid residency training. Prior studies indicate neural networks assist trainees in identifying urgent findings and gaining diagnostic confidence [37,38]. Especially in the field of neurosurgery, a deep learning artificial intelligence model significantly enhanced the diagnostic abilities of novice physicians when it came to identifying pediatric skull fractures on plain radiographs [39]. Moreover, it effectively discerned the traits of hydrocephalus from computed tomography images of the brain, automating the analysis process for junior doctors [40]. Overall, the proposed model offers a comprehensive assessment of lumbar spinal stenosis, benefiting both experienced experts and residents in training.
This study involved a retrospective review of MR images focusing on patients diagnosed with lumbar degenerative disease or lumbar spinal stenosis. In the data training phase, levels that had undergone surgery were excluded to establish the normal range for each anatomical structure. However, all images were retrospectively reviewed from patients who had undergone lumbar surgery, potentially introducing bias in case selection for data training. During data testing and validation, all levels were utilized to distinguish between normal and abnormal findings. To enhance clinical applicability, a grading system will be developed, employing a global evaluation of each anatomical structure to establish correlations with the severity of lumbar spinal stenosis. This system aims to support future surgical decision-making processes.


This study introduces an advanced approach to lumbar degenerative diseases, particularly lumbar spinal stenosis, utilizing a residual U-Net and deep learning algorithm for automated segmentation of lumbar spine MRI scans. The residual U-Net demonstrates high accuracy in semantic segmentation, yielding precise measurements of intervertebral discs, vertebral bodies, dural sac, lamina, and yellow ligaments. The proposed method, applied to a dataset of 539 lumbar spinal stenosis patients, showcases its potential for clinical use. Additionally, a novel algorithm employing a rotation matrix detects bulging discs and dural sac compression, offering valuable insights into pathology. The findings highlight the efficiency and reliability of this automated segmentation technique, providing a promising avenue for improving diagnostic accuracy and guiding treatment decisions in lumbar spinal stenosis.


Conflict of Interest

The authors have nothing to disclose.


This work is financially supported by Taiwan Food and Drug Administration under contract numbers of MOHW109-FDA-M-114-000747, MOHW110-FDA-D-114-000741, MOHW111-FDA-H-114-000741, MOHW112-FDAH-114-000741, and MOHW113-FDA-D-113-000741.

Author Contribution

Conceptualization: YWL, TCL, YYC; Formal analysis: YTF, TCL, CRY, HWH; Data curation: YWL, CCC, HKC, CCK, THT, LYF, JCW, WCH; Methodology: YTF, TCL; Project administration: JCW, WCH, YYC, CHK; Writing – original draft: YWL, TCL, CHK; Writing – review & editing: JCW, CHK.

Fig. 1.
The ground truth labeling in magnetic resonance imagings. The ground truth images highlight anatomical structures: the intervertebral disc, vertebral body, and dural sac are marked in dark red, blue, and yellow respectively in sagittal views (A). In axial views (B), the intervertebral disc, dural sac, lamina, and yellow ligament are marked in dark red, yellow, purple, and green respectively.
Fig. 2.
The schema of automatic spinal measurement. (A) The flow chart of automatic segmentation according to residual U-Net was illustrated. (B) After automatic segmentation, anatomical structures in both sagittal and axial views were identified to detect lesions within each of these anatomical structures. MRI, magnetic resonance imaging.
Fig. 3.
The strategy of spinal parameters measurement in sagittal and axial views. (A) The discs and vertebral bodies were measured using T2 magnetic resonance (MR) images and the segmentation from the residual U-net. The leftmost and rightmost points of the intervertebral disc were identified to calculate the width (Wdc, B). The height (Hdc, B) was defined as the points intersected with the border of the intervertebral disc, using the normal vector of the line connecting the leftmost point, rightmost point, and intervertebral disc centroid. Spinal listhesis was detected by measuring the distance (L, C) between 2 reference lines derived from the edges of the upper and lower vertebrae in the sagittal view. The compression of the dural sac caused by the bulging disc was defined by the distance between the line of posterior endplates and the edge of the dural sac (Dist, C). (D) The diameters of the corresponding dural sacs (Dds) of each lumbar intervertebral disc were calculated. Dds was determined based on the distance between the intersection of the normal vector passing through the intervertebral disc centroid and the dural sac. The diameter of dural sac (Dds) in axial orientation was defined as the distance between the top and the bottom points of dural points, as well as diameter of intervertebral disc. (Dd, E). (F) The stenosis measurement in axial view were measured using rotation matrix. (G) Both middle thickness (YLm) and lateral thickness (YLl) of yellow ligaments were measured on axial T2-weighted MR images.
Fig. 4.
Illustration of a 65-year-old man experienced persistent back pain with a bulging disc at the L4–5 levels resulting in spinal stenosis. (A) In the sagittal view, vertebral bodies (marked in blue), intervertebral discs (marked in red), and dura sac (marked in yellow) were identified. The compression of the dural sac at L4–5 was measured at 5.3 mm. The disc hydration at the index level was 18%, indicating a value below the normal range. (B) The diameter of the dural sac (Dds) at L4–5 was significantly smaller than at other levels. (C) In the axial view, the rotation matrix displayed a smooth inverted U-shaped curve at the normal level (L2–3), which deteriorated at the L4–5 level, suggesting spinal stenosis with dural sac compression. Measurements indicated that the yellow ligament thickness was 3.1 mm laterally and 3.9 mm medially.
Table 1.
Performance of segmentation in anatomical structures
Segmented structure Mean IOU 95% CI Mean Dice coefficient
Sagittal view
 Disc 0.91 0.87–0.95 0.91
 Vertebral body 0.93 0.89–0.97 0.93
 Dural sac 0.93 0.89–0.97 0.87
Axial view
 Disc 0.82 0.80–0.84 0.82
 Lamina 0.85 0.83–0.87 0.84
 Dural sac 0.82 0.80–0.84 0.82
 Yellow ligament 0.83 0.82–0.85 0.83

IOU, Intersection over Union; CI, confidence interval.

Table 2.
Normal range of segmented structure in sagittal and axial views
Variable Level
L1–2 L2–3 L3–4 L4–5 L5–S1
Sagittal view
 Hdc (mm) 9.2–9.8 10.0–10.8 10.1–10.7 11.7–12.3 11.4–11.8 10.8
 Wdc (mm) 30.9–31.1 32.8–33.1 34.6–35.0 32.4–33.0 33.4–34.0 33.1
 Disc hydration (%) 51.7–52.8 53.1–54.2 54.9–56.5 55.2–56.5 58.6–60.3 55.4
 Dds (mm) 12.0–12.5 10.8–11.3 10.7–11.2 9.2–9.8 8.9–9.4 10.6
Axial view
 Dds (mm) 10.8–11.7 10.9–11.8 10.1–11.2 10.1–11.0 9.3–10.2 10.7
 YLm (mm) 1.9–2.2 2.0–2.2 2.2–2.5 1.8–2.1 1.7–1.8 2.1
 YLl (mm) 2.7–2.8 2.5–2.8 2.0–2.1 2.8–3.0 2.1–2.4 2.5

The normal range was defined by the 95% confidential interval.

Hdc, disc height; Wdc, disc width; Dds, diameter of dural sac; YLm, medial thickness of yellow ligament; YLl, lateral thickness of yellow ligament.


1. An HS, Anderson PA, Haughton VM, et al. Introduction: disc degeneration: summary. Spine (Phila Pa 1976) 2004;29:2677-8.
2. Kuo CH, Chang PY, Wu JC, et al. Dynamic stabilization for L4-5 spondylolisthesis: comparison with minimally invasive transforaminal lumbar interbody fusion with more than 2 years of follow-up. Neurosurg Focus 2016;40:E3.
3. Kuo CH, Chang PY, Tu TH, et al. The effect of lumbar lordosis on screw loosening in dynesys dynamic stabilization: four-year follow-up with computed tomography. BioMed Res Int 2015;2015:152435.
crossref pmid pmc pdf
4. Yeh MY, Kuo CH, Wu JC, et al. Changes of facet joints after dynamic stabilization: continuous degeneration or slow fusion? World Neurosurg 2018;113:e45-50.
crossref pmid
5. Kornblum MB, Fischgrund JS, Herkowitz HN, et al. Degenerative lumbar spondylolisthesis with spinal stenosis: a prospective long-term study comparing fusion and pseudarthrosis. Spine (Phila Pa 1976) 2004;29:726-33. discussion 733-4.
6. Steurer J, Roner S, Gnannt R, et al. Quantitative radiologic criteria for the diagnosis of lumbar spinal stenosis: a systematic literature review. BMC Musculoskelet Disord 2011;12:175.
crossref pmid pmc pdf
7. Urban JP, Roberts S. Degeneration of the intervertebral disc. Arthritis Res Ther 2003;5:120-30.
crossref pmid pmc
8. Hassan CR, Lee W, Komatsu DE, et al. Evaluation of nucleus pulposus fluid velocity and pressure alteration induced by cartilage endplate sclerosis using a poro-elastic finite element analysis. Biomech Model Mechanobiol 2021;20:281-91.
crossref pmid pmc pdf
9. Stokes IA, Iatridis JC. Mechanical conditions that accelerate intervertebral disc degeneration: overload versus immobilization. Spine (Phila Pa 1976) 2004;29:2724-32.
crossref pmid pmc
10. Pfirrmann CWA, Metzdorf A, Zanetti M, et al. Magnetic resonance classification of lumbar intervertebral disc degeneration. Spine (Phila Pa 1976) 2001;26:1873-8.
crossref pmid
11. Modic MT, Steinberg PM, Ross JS, et al. Degenerative disk disease: assessment of changes in vertebral body marrow with MR imaging. Radiology 1988;166(1 Pt 1):193-9.
crossref pmid
12. Abdollah V, Parent EC, Battié MC. Reliability and validity of lumbar disc height quantification methods using magnetic resonance images. Biomed Tech (Berl) 2019;64:111-7.
crossref pmid
13. Lyle MA, Manes S, McGuinness M, et al. Relationship of physical examination findings and self-reported symptom severity and physical function in patients with degenerative lumbar conditions. Phys Ther 2005;85:120-33.
crossref pmid pdf
14. Videman T, Battié MC, Gibbons LE, et al. Aging changes in lumbar discs and vertebrae and their interaction: a 15-year follow-up study. Spine J 2014;14:469-78.
crossref pmid
15. Korez R, Likar B, Pernuš F, et al. Model-based segmentation of vertebral bodies from MR images with 3D CNNs. In: Ourselin S, Joskowicz L, Sabuncu M, et al., editors. Medical image computing and computer-assisted intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science, vol 9901. Cham (Switzerland): Springer; 2016. p. 433-41.

16. Al Arif SMMR, Knapp K, Slabaugh G. Fully automatic cervical vertebrae segmentation framework for X-ray images. Comput Methods Programs Biomed 2018;157:95-111.
crossref pmid
17. Chuang CH, Lin CY, Tsai YY, et al. Efficient triple output network for vertebral segmentation and identification. IEEE Access 2019;7:117978-85.
18. Lin L, Tao X, Pang S, et al. Multiple axial spine indices estimation via dense enhancing network with cross-space distance-preserving regularization. IEEE J Biomed Health Inform 2020;24:3248-57.
crossref pmid
19. Shen H, Huang J, Zheng Q, et al. A deep-learning–based, fully automated program to segment and quantify major spinal components on axial lumbar spine magnetic resonance images. Phys Ther 2021;101:pzab041.
crossref pmid pdf
20. Zarvani M, Saberi S, Azmi R, et al. Residual learning: a new paradigm to improve deep learning-based segmentation of the left ventricle in magnetic resonance imaging cardiac images. J Med Signals Sens 2021;11:159-68.
crossref pmid pmc
21. Hao DJ, Duan K, Liu TJ, et al. Development and clinical application of grading and classification criteria of lumbar disc herniation. Medicine (Baltimore) 2017;96:e8676.
crossref pmid pmc
22. Hughes A, Makirov SK, Osadchiy V. Measuring spinal canal size in lumbar spinal stenosis: description of method and preliminary results. Int J Spine Surg 2015;9:3.
crossref pmid pmc
23. Yushkevich PA, Piven J, Hazlett HC, et al. User-guided 3D active contour segmentation of anatomical structures: significantly improved efficiency and reliability. Neuroimage 2006;31:1116-28.
crossref pmid
24. Kuo CH, Huang WC, Wu JC, et al. Radiological adjacentsegment degeneration in L4–5 spondylolisthesis: comparison between dynamic stabilization and minimally invasive transforaminal lumbar interbody fusion. J Neurosurg Spine 2018;29:250-8.
crossref pmid
25. Xia W, Liu C, Duan S, et al. The influence of spinal-pelvic parameters on the prevalence of endplate Modic changes in degenerative thoracolumbar/lumbar kyphosis patients. PLoS One 2018;13:e0197470.
crossref pmid pmc
26. Roller BL, Boutin RD, O’Gara TJ, et al. Accurate prediction of lumbar microdecompression level with an automated MRI grading system. Skeletal Radiol 2021;50:69-78.
crossref pmid pdf
27. Dhanachandra N, Manglem K, Chanu YJ. Image segmentation using K -means clustering algorithm and subtractive clustering algorithm. Procedia Comput Sci 2015;54:764-71.
28. Bousabarah K, Ruge M, Brand JS, et al. Deep convolutional neural networks for automated segmentation of brain metastases trained on clinical data. Radiat Oncol 2020;15:87.
crossref pmid pmc pdf
29. Wang Z, Xiao P, Tan H. Spinal magnetic resonance image segmentation based on U-net. J Radiat Res Appl Sci 2023;16:100627.
30. Borawar L, Kaur R. ResNet: solving vanishing gradient in deep networks. In: Mahapatra RP, Peddoju SK, Roy S, Parwekar Pet al., editors. Proceedings of International Conference on Recent Trends in Computing. Lecture notes in networks and systems, vol 600. Singapore: Springer; 2023. p. 235-47.

31. Shereena VB, Raju G. Medical ultrasound image segmentation using Multi-Residual U-Net architecture. Multimed Tools Appl 2024;83:27067-88.
crossref pdf
32. Zheng HD, Sun YL, Kong DW, et al. Deep learning-based high-accuracy quantitation for lumbar intervertebral disc degeneration from MRI. Nat Commu 2022;13:841.
crossref pmid pmc pdf
33. Qi J, Yu Y, Wang L, et al. K*-Means: an effective and efficient K-means clustering algorithm. In: 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom); 2016 Oct 8-10; Atlanta (GA), USA.
34. Abbas J, Hamoud K, Masharawi YM, et al. Ligamentum flavum thickness in normal and stenotic lumbar spines. Spine (Phila Pa 1976) 2010;35:1225-30.
crossref pmid
35. Meyerding HW. Low backache and sciatic pain associated with spondylolisthesis and protruded intervertebral disc: incidence, significance, and treatment. JBJS 1941;23:461-70.

36. Schizas C, Theumann N, Burn A, et al. Qualitative grading of severity of lumbar spinal stenosis based on the morphology of the dural sac on magnetic resonance images. Spine (Phila Pa 1976) 2010;35:1919-24.
crossref pmid
37. Lee JH, Ha EJ, Kim D, et al. Application of deep learning to the diagnosis of cervical lymph node metastasis from thyroid cancer with CT: external validation and clinical utility for resident training. Eur Radiol 2020;30:3066-72.
crossref pmid pdf
38. Yi PH, Kim TK, Yu AC, et al. Can AI outperform a junior resident? Comparison of deep neural network to first-year radiology residents for identification of pneumothorax. Emerg Radiol 2020;27:367-75.
crossref pmid pdf
39. Choi JW, Cho YJ, Ha JY, et al. Deep learning-assisted diagnosis of pediatric skull fractures on plain radiographs. Korean J Radiol 2022;23:343-54.
crossref pmid pmc pdf
40. Duan W, Zhang J, Zhang L, et al. Evaluation of an artificial intelligent hydrocephalus diagnosis model based on transfer learning. Medicine (Baltimore) 2020;99:e21229.
crossref pmid pmc

Editorial Office
CHA University, CHA School of Medicine Bundang Medical Center
59 Yatap-ro, Bundang-gu, Seongnam-si, Gyeonggi-do 13496, Korea
Tel: +82-31-780-1924  Fax: +82-31-780-5269  E-mail: support@e-neurospine.org
The Korean Spinal Neurosurgery Society
#407, Dong-A Villate 2nd Town, 350 Seocho-daero, Seocho-gu, Seoul 06631, Korea
Tel: +82-2-585-5455  Fax: +82-2-2-523-6812  E-mail: ksns1987@gmail.com
Business License No.: 209-82-62443

Copyright © The Korean Spinal Neurosurgery Society.

Developed in M2PI

Zoom in Close layer