Automatic and observable spinal cord compression grading system for lung cancer spinal metastasis
Introduction
Lung cancer has the highest morbidity and mortality rate among all cancers (1), and bone metastasis of lung cancer is very common, among which spinal metastasis is the most common (accounting for about 70%) (2-5). About 46% of patients with lung cancer bone metastasis will experience bone-related events such as pathological fractures, malignant pain, hypercalcemia, and spinal cord compression, which seriously affect the patients’ quality of life and survival. To improve the quality of life, prolong survival, relieve symptoms, and prevent or treat pathological fractures, it is usually necessary to evaluate whether surgical intervention is needed for lung cancer bone metastasis. The location and morphology of the tumor are important surgical indications for nerve compression and spinal instability (6-10), among which the severity of epidural spinal cord compression [ESCC (11)] is one of the important indicators. The Spine Oncology Study Group (SOSG) classifies the severity of ESCC based on the cross-sectional magnetic resonance imaging (MRI) T2-weighted images of the most severely compressed spinal part, which are graded as 0, 1, 2, and 3. Low-grade (ESCC 0) is generally treated with radiotherapy alone, while high-grade (ESCC 2 and 3) is usually treated with surgical decompression and internal fixation, followed by radiotherapy as the best treatment. Therefore, evaluating the ESCC grade can provide important information for multidisciplinary collaborative management and joint decision-making between doctors and patients to determine the treatment plan, including determining the timing of surgery, the invasiveness of the surgical approach, radiotherapy, drug therapy, and palliative care. Traditional ESCC grading is highly dependent on the empirical imaging evaluation of orthopedists or radiologists, which has problems such as strong subjectivity, long time consumption, and limited consistency between different observers (12-14). Therefore, a fully automated ESCC grading system can effectively reduce human errors and provide a more objective and efficient auxiliary tool for clinical practice.
In recent years, with the rapid development of deep learning (DL) and artificial intelligence (AI) technologies, the evaluation of spinal diseases has been rapidly developed to a certain extent, such as spinal metastasis detection, spinal disease research, and cervical spinal cord compression assessment. However, the application of MRI-based lung cancer spinal metastasis ESCC compression assessment in clinical practice is still limited (3,15-25). Some studies on the detection of metastatic bone lesions focus on primary segmentation of the lesion area, followed by manual feature selection for classification. Peter et al. and Jan et al. proposed integrated methods based on three approaches for segmenting two types of lesions, utilizing Markov random fields with intensity probability models, two-stage graph cut methods, and decision trees trained on hundreds of texture and shape features (26,27). Merali et al. used the ResNet50 deep convolutional neural network (CNN) for the classification and recognition of cervical spinal cord compression of MRI slices, and achieved an accuracy of 92.39% and an area under the curve (AUC) of 0.94 (28). Wang et al. used a deep neural network method consisting of three identical subnetworks for multi-resolution vertebral analysis and detection of spinal metastases. The dataset consisted of only 26 cases, and the proposed method showed less accurate boundary detection of the affected regions (29). Chmelik et al. applied deep CNNs for segmentation and classification of spinal metastatic lesions based on computed tomography (CT) imaging (19). Liu et al. proposed a multi-view CNN framework trained with multi-channel input, consisting of different view regions of lung nodules for classification (22). Lang et al. differentiated metastatic spinal lesions from primary lung cancer and other cancers by using dynamic contrast-enhanced (DCE) sequences from 30 lung cancer and 31 non-lung cancer cases, achieving an average accuracy of 0.71±0.043 using CNN for classification (30).
However, there is no research on the segmentation and compression severity classification based on MRI of lung cancer spinal metastases. Since lung cancer spinal metastases in MRI are complex in morphology and invasiveness, and have interference from adjacent tissues, the accurate detection and severity classification of spinal cord compression pose a major challenge. At the same time, the lack of high-quality annotated data also increases the difficulty of model training and verification. So, we proposed a multi-tissue segmentation network model for metastases based on adaptive pooling and cross-attention to achieve observable compression. And then, a classification network was used to transfer learning (TL) the segmentation features to achieve automated compression severity grading assessment. Our main contributions can be summarized as follows:
- We constructed a dataset containing 577 lung cancer spinal metastasis MRI slices with annotations of lesions.
- We have proposed a novel feature segmentation network based on the pyramid cross-attention network (PCAN) to extract multiple tissues. The network was validated on a lung cancer spinal metastasis MRI dataset, and achieved state-of-the-art multi-object segmentation accuracy. The output results were used to construct a feature dataset for classification.
- We have proposed a new dual attention network (DAN), which is used for TL to achieve grading evaluation and help orthopedists to develop more reasonable patient treatment plans.
- We have proposed a novel progressive compression grading assessment method, which is different from the traditional method of direct classification with multiple MRI slices. First, multi-tissue feature segmentation was performed through PCAN to help clinicians quickly identify the positional relationship between tumor tissue and spinal cord. Second, based on the multi-tissue features identified by PCAN, DAN was used to perform an automated grading evaluation of compression. It provides clinicians with more objective and efficient evaluation indicators.
Methods
Data acquisition and proposed framework
Participants
This is a single-center, cross-sectional study. A total of 88 spinal metastases from lung cancer patients [mean ± standard deviation (SD), age 52.6±6.7 years, height 168.1±8.3 cm, and weight 58.3±9.3 kg] were collected for this study. The inclusion criteria for the subjects were: (I) local pain radiotherapy and medical treatment were ineffective, and the expected survival period was more than 3 months; (II) there were symptoms of spinal cord compression such as increased muscle tension in the lower limbs, unstable walking, bowel and bladder disorders, paralysis; and (III) spinal instability (Spinal Instability Neoplastic Score [SINS (31) score >7], or specimens were obtained for further diagnosis or treatment. The exclusion criteria for the subjects were: (I) responsiveness to medical or radiological therapies, with an anticipated survival period less than 3 months; and (II) physical condition incompatible with surgical intervention. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and was approved by the Ethics Committee of Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences [No. GDREC2019574J(R1)]. Before the experiment, all participants had provided informed consent and completed a questionnaire on subject information.
Data collection
This study selected 577 MRI T2 cross-sectional images from 88 cases as experimental data under the guidance of professional intermediate-level physicians. The cases are classified and annotated independently by three intermediate-level clinical experts, and the final classification and annotation results were determined by majority voting. Fleiss’ kappa (κ) (32) for inter-rater reliability among the three annotators was κ=0.93. The MRI datasets are stored in Digital Imaging and Communications in Medicine (DICOM) format. The clinical experts used ITK-SNAP (33) on a Windows system to label the tumor, spine, and spinal cord, and saved the labeled and unlabeled slices as Neuroimaging Informatics Technology Initiative (NifTI) format for segmentation training. Red represents bone tumors, green represents the spine, blue represents the spinal cord, and unannotated areas are considered background, with four classification targets in total. Additionally, the images were classified according to the ESCC method. Based on the evaluation of spinal cord compression, four levels were defined: grade 0, grade 1, grade 2, and grade 3. Grade 0: tumor limited to the bone; grade 1: epidural tumor extension without displacement of the sac; grade 2: tumor compressing the spinal cord without circumferential extension or cerebrospinal fluid space obstruction; grade 3: tumor with circumferential epidural extension, causing severe spinal cord compression and cerebrospinal fluid space obstruction. In this study, grades 0 and 1 were combined as the low-risk compression category (grade 0), while grades 2 and 3 were merged as the high-risk compression category (grade 1). The original MRI slice images were extracted from the DICOM series, and a voting system was used to classify them according to the votes of three annotators, with the gold standard classification being determined by their consensus. The two-dimensional images were stored in separate folders based on their classification. The training set consisted of 462 images, with 333 low-risk (grade 0) and 129 high-risk (grade 1) images. The validation set contained 115 images, with 83 low-risk (grade 0) and 32 high-risk (grade 1) images.
The hardware environment for neural network model training and testing included one server, equipped with 32 Intel Xeon CPUs (E5-2695 v4 @ 2.10 GHz), 128 GB of RAM, and an NVIDIA TITAN Xp high-performance GPU (12 GB VRAM, 3840 CUDA cores). The operating system was configured with Nvidia’s CUDA 8.5 parallel computing library.
The software environment for neural network model training and testing used Ubuntu 16.4 as the operating system, Python 2.7 as the programming language, vi as the programming tool, and PyTorch (1.1.0) along with Scikit-learn as the neural network frameworks.
Proposed framework
This manuscript proposes a cross-attention-based pyramid multi-scale pooling segmentation network and spinal cord compression grading assessment system for few-shot MRI of spinal metastatic cancer (Figure 1). The system utilizes a progressive grade assessment approach and consists of three main components. The first part is the PCAN for multi-tissue segmentation, which includes an encoder-decoder U-shape architecture. The dual pooling is applied to down-sample the features, followed by pyramid pooling technology for multi-level deep fusion and filtering of the compressed features. The skip connections are employed between the encoding and decoding layers, and a cross-attention mechanism is used to cross-fuse the features from both layers. This approach addresses the challenges of complex textures, unclear boundaries, and high segmentation difficulty in spinal MRI cross-sectional images, thus improving the accuracy of target region segmentation. The second part is building a multi-tissue feature heatmap for the next grading assessment. The third part is the DAN for compression grading, which adopts a coding compression structure (34), adds position and channel attention modules to fuse the features, and uses TL and fine-tuning methods to improve the assessment of spinal cord compression caused by bone tumors.
Multi-tissue segmentation in MRI data
Network architecture
Unet (35) was first proposed at the Medical Image Computing and Computer-Assisted Intervention (MICCAI) conference in 2015 and achieved exciting results in biomedical image segmentation. Subsequently, the U-shaped segmentation network was widely used in the field of medical image segmentation (36,37). Based on the encoding-decoding U-shaped network framework (38), we proposed a pyramid multi-scale pooling and PCAN to address the morphological variability and blurred boundary of metastatic tumor MRI (Figure 2). First, the encoding compression part is improved by replacing the original max-pooling downsampling with dual-pooling downsampling. Then, at the bottom of the encoder, a pyramid pooling module (PPM) is employed to filter and fuse multi-level, multi-scale features of the compressed representations, thereby enhancing the network’s receptive field across multiple dimensions. Finally, in the skip connection section, a cross-attention module is introduced to enhance the cross-attention between the encoded and decoded features, improving the overall target recognition capability of the network.
Dual pooling module
The dual-pooling module effectively integrates the feature filtering and compression capabilities of average pooling and max pooling (39-41). The pooling process serves as both feature selection and information filtering. By concatenating average pooling with max pooling and combining max pooling with upsampling, the module enhances the contextual features between targets. It extracts the overall feature information of the target (such as overall grayscale and shape), removes local noise interference within the target, and highlights the target’s boundary and internal texture features.
Given the input feature , the process begins with average pooling followed by a 1×1 convolution:
Next, max pooling is applied to the result of M = Conv(Maxpool(A)), followed by a 1×1 convolution. Then, the features from max pooling are upsampled to match the output dimensions. Finally, the average pooling features and the upsampled max pooling features are concatenated along the channel dimension to obtain the dual-pooling feature:
Where x is the input feature, Avgpool is average pooling function, Maxpool is max pooling function, Conv is the convolution operation, up is upsampling function, is the final output feature obtained by concatenating the dual-pooling features, is the output of average pooling and 1×1 convolution, Conv represents a 1×1 convolution operation, up refers to a 2× upsampling operation, Avgpool and Maxpool represent average pooling and max pooling operations, respectively.
PPM
PPM was first proposed by Zhao et al. (42) and has since been widely used in various segmentation tasks. For example, Zhang et al. (43) integrated PPM to improve multi-organ feature fusion and achieved a Dice score of 91.58%. PPM improves segmentation performance by aggregating contextual features through global average pooling at four different receptive field scales. The coarsest scale is 1×1 global adaptive pooling, followed by 2×2, 3×3, and 6×6 pooling. By partitioning features across multiple spatial scales, PPM promotes deep fusion of contextual relationships and semantic features. Afterwards, features at different levels are upsampled to match the input size through bilinear interpolation and then concatenated to form a unified representation, effectively integrating multi-scale contextual information.
Cross-attention module
In encoder-decoder architecture networks, attention modules are commonly employed in skip connections to enhance the feature information of the encoder by applying attention weights to the compressed feature information from the decoder, thereby emphasizing the target features. Unlike traditional attention modules, this study proposes a cross-attention module that facilitates bidirectional feature cross-attention between the encoder and decoder, followed by feature concatenation. Through bilinear cross-attention (Figure 2), the contextual features of the encoder and decoder are cross-enhanced and deeply fused, accelerating the convergence speed of network training and improving the accuracy of target segmentation.
Spinal cord compression grading network
We proposed a DAN based on the residual network (Figure 3), which includes position and channel attention modules to enhance the ability of compression recognition. The position and channel attention module can integrate spatial position and vertical channel correlation features. Finally, the global average pooling is used to fuse the features of the fully connected layer to achieve compression grading recognition.
The position attention module is used to fuse the cross-spatial information of different local feature maps. There are two convolution layers after input features P for horizontal and vertical filtering to enhance the spatial location recognition. And then reshape and transpose to generate two new feature maps G and feature matrix Q. The spatial weight feature matrix is obtained by matrix multiplication and softmax operation for features G and Q, the formula is as follows.
Where represents the influence of the y position on the x position in the spatial coordinate system. The characteristic matrix between any two points is calculated and output by matrix multiplication.
Similarly, the channel attention module extracts horizontal and vertical features by convolution kernels, and finally integrates the horizontal and vertical features into features between high-dimensional channels by matrix multiplication. The formula is as follows.
Where denotes the attention of the x channel relative to the y channel .
Results
Validation with the multi-tissue segmentation
To demonstrate the tissue segmentation performance of the proposed method, we first applied consistency preprocessing to the input images. The image resolution was adjusted to 512×512, followed by contrast-limited adaptive histogram equalization (CLAHE) (44) to enhance contrast between pixels. Additionally, to address the issue of small sample sizes, data augmentation techniques such as vertical, horizontal, and combined flips are applied to expand and augment the training dataset.
The performance of the proposed method was validated with the private clinical dataset. Due to the limited dataset, which are only 88 cases and 577 MRI slices. We used the four-fold cross-validation method to enhance the effectiveness of the validation. Every fold has 22 cases. First, we used the four-fold cross-validation method to select the optimal network of evaluation parameters. Second, TL was used to train all datasets to improve the overall segmentation and recognition ability of the network. Figure 4 shows the mean pixel accuracy (MPA) and mean intersection over union (MIOU) curves in different training epochs of four-fold cross-validation training.
For the training of segmentation network, preprocessed grayscale images are used as input, with a single channel, and the output has four classes (background, vertebrae, bone tumor, and spinal cord). The training batch size is 10. The Adam algorithm (45) with weight decay 0.0002 and learning rate 0.0002 is trained as an optimizer to adapt the global learning rate and improve the training speed, and CrossEntropyLoss (46) is used as a loss function for training the network. We used the validation set (25% of the training data) to monitor the loss and stop training if there is no improvement for 10 consecutive epochs.
The formula for the single-sample cross-entropy is:
For the batch sample cross-entropy, the formula is:
Where is the ground truth, is the prediction probability, bs represents the batch size, and n is the number of categories.
Model evaluation is conducted using the MPA and MIOU:
Where n is the number of classes, A is the network’s segmented region, and B is the gold standard annotation region. The MIOU is used to evaluate the network’s accuracy in semantic segmentation by averaging the intersection over union between the segmented region and the ground truth.
Under the same configuration environment and dataset, a comparative training was conducted on three advanced baseline network methods [Unet (38), ResUnet (47), and PSPnet (42)], as well as three autonomously improved networks based on the foundational architecture (ResAgUnet, PSPResUnet, and PCAN). Figure 5 shows the convergence of training loss values for different networks across various epochs. Among them, the Unet exhibited the slowest loss convergence speed, requiring approximately 50 epochs to approach the loss values of other networks. The training loss convergence speed of the PSPnet showed a significant improvement compared to Unet, demonstrating that the pyramid scene parsing (PSP) module enhances the feature learning capability of the network in small-sample learning tasks. Additionally, the training loss convergence speed of ResUnet also improved considerably compared to Unet, indicating that the residual module facilitates better feature learning. Furthermore, the loss convergence speed of ResAgUnet showed further improvement over ResUnet, indicating that the attention module enhances the network’s feature learning capability even more. The training loss convergence speed of PCAN was close to that of ResAgUnet, inheriting the feature learning advantages of both the PSP module and the attention module.
In order to observe the multi-target feature recognition, we added a feature visualization output module before the final classification softmax output layer of the network. This module transforms the features of different classification targets into grayscale [0–255] and then applies a color map to generate a heatmap. Figure 6 displays the feature heatmap distributions for different segmentation targets: the first column shows the segment result, the second column shows the spine, the third column shows the tumor, and the fourth column shows the spinal cord. In these heatmaps, the areas with darker red colors represent stronger or more prominent features, while the yellow regions indicate features with moderate intensity.
The segmentation of the spine, spinal cord, bone tumors, and background in target images directly affects the extraction of target features and classification performance. To evaluate the segmentation capability of the PCAN, we compared its segmentation results with other traditional methods, as detailed in Table 1.
Table 1
| Methods | MPA | MIOU | Se | Sp |
|---|---|---|---|---|
| Unet | 0.789 | 0.677 | 0.796 | 0.919 |
| ResUnet | 0.791 | 0.683 | 0.799 | 0.916 |
| ResAgUnet | 0.805 | 0.691 | 0.808 | 0.923 |
| PSPnet | 0.801 | 0.689 | 0.802 | 0.918 |
| PSPResUnet | 0.812 | 0.703 | 0.812 | 0.920 |
| PCAN | 0.819 | 0.711 | 0.823 | 0.922 |
PCAN gets the best MPA of 0.819 and the best MIOU of 0.711. MIOU, mean intersection over union; MPA, mean pixel accuracy; PCAN, pyramid cross-attention network; Se, sensitivity; Sp, specificity.
Among these, the Unet shows the lowest performance in terms of MPA and MIOU, with values of 0.789 and 0.677, respectively. ResUnet, which integrates residual modules to enhance feature filtering and transmission capabilities, achieves slight improvements in both MPA and MIOU compared to Unet. ResAgUnet, built on top of ResUnet, incorporates an attention module to enhance target segmentation capabilities, leading to a noticeable improvement in target recognition over ResUnet. PSPnet, which utilizes a PPM, shows slightly lower MPA and MIOU compared to ResAgUnet. PSPResUnet, which combines both residual and PPMs for feature recognition, demonstrated an improvement in target segmentation performance over the previous two methods.
The PCAN includes PPM, serial dual pooling, and cross-attention modules, which achieve improvements in the segmentation of large objects with complex internal textures. Its target recognition and segmentation performance exceeds other networks, with an MPA of 0.819, MIOU of 0.711, average sensitivity of 0.823, and average specificity of 0.922. Figure 7 shows the comparison of multi-target segmentation results of different networks, including the ground truth.
Validation with the spinal cord compression grading
We used the segmentation network feature maps as input data to the classification network. The input feature has three channels (representing spine features, spinal cord features, and bone tumor features), and the output result has one channel. The batch size is 10. The dropout rate is 0.25. And the RMSProp algorithm (48) is trained as the optimizer with weight decay 0.0002 and learning rate 0.0002, which optimizes training parameters using a window-sliding weighted average to compute second-order momentum. The loss function is BCEWithLoss (49), which measures the closeness between the actual evaluation level and the expected level. Additionally, a four-fold cross-validation is applied. We used the validation dataset to monitor the loss and stop training if there is no improvement for 10 consecutive epochs.
In order to solve the issue of the few-shot dataset, we utilized a segmentation network to extract key features. And pre-trained the classification network on the large-scale dataset ImageNet1000, and then used TL on our dataset to obtain the optimal network parameters. The key features of the spinal cord, spine, and tumor were extracted to form corresponding weight feature matrices, which are then transformed and combined into three-channel color images. These images served as input data for the compression grading classification network. As shown in Figure 8, feature maps of different grades are used as input for the classification network to enhance classification accuracy. Among them, Figure 8A,8B represent grade 0, indicating low-risk compression with intact spinal canal and uncompressed spinal cord; Figure 8C,8D represent grade 1, indicating high-risk compression with bone tumor invading the spinal canal region.
The detection performance was evaluated against other traditional methods on the given data set. To evaluate the detection performance, the four evaluation metrics, including , , , and are considered.
Figure 9A illustrates the training convergence of DAN, DenseNet (50), ResNet101, and InceptionV3 (51) networks under non-TL, comparing the number of training epochs and loss. Figure 9B shows the loss curves of DAN with TL, and DenseNet, ResNet101, and InceptionV3 with fine-tuned TL. Comparing the two figures, it is evident that with TL, the training loss of the network significantly decreases from the 2nd epoch and stabilizes with minor fluctuations starting from the 5th epoch. TL greatly enhances the efficiency of network training. Among the networks, DAN exhibits faster loss convergence under non-TL, while in TL, DAN-TL shows similar loss convergence speed to the other networks.
Figure 9C shows the validation set accuracy of different networks at different training epochs. Figure 9D shows the validation set accuracy of different networks at different training epochs using the fine-tuning TL method. The results show that through TL, the target network achieves better classification performance with fewer training epochs, and also achieves a certain degree of improvement in classification accuracy. Figure 9E,9F present the ROC curves and corresponding AUC values of different networks, both without and with TL (non-TL and TL). The ROC curve and AUC provide a more robust evaluation of model performance compared to single-threshold metrics. Among non-TL models, DAN achieves the highest AUC value of 0.953, followed by ResNet101. In the TL setting, DAN-TL also attains the best performance with an AUC of 0.963.
The compression severity of spinal metastasis from lung cancer is primarily manifested by tumor invasion into the spinal canal, leading to compression of the spinal cord. In severe cases, paralysis may occur. This manuscript combines feature extraction enhancement and TL methods. First, the features of the spine, tumor, and spinal cord targets are segmented and extracted. Then, the DAN is employed, and TL is used to train the network for spinal cord compression level classification and recognition. Table 2 presents the classification performance results of different networks and training methods. Each network has two classification results: one for non-TL and one for TL. In the case of non-TL, InceptionV3 shows the lowest classification performance, possibly due to its design for large training datasets, leading to insufficient training on smaller sample datasets. DAN performs the best in terms of classification accuracy, followed by ResNet.
Table 2
| Methods | ACC | Precision | F1-score | Recall |
|---|---|---|---|---|
| DenseNet | 0.91 | 0.81 | 0.91 | 0.85 |
| DenseNet-TL | 0.91 | 0.84 | 0.91 | 0.87 |
| ResNet101 | 0.91 | 0.81 | 0.91 | 0.85 |
| ResNet101-TL | 0.93 | 0.85 | 0.91 | 0.88 |
| InceptionV3 | 0.85 | 0.76 | 0.79 | 0.76 |
| InceptionV3-TL | 0.93 | 0.87 | 0.91 | 0.84 |
| VGG16 | 0.89 | 0.81 | 0.89 | 0.84 |
| VGG16-TL | 0.92 | 0.84 | 0.90 | 0.87 |
| DAN | 0.92 | 0.83 | 0.91 | 0.86 |
| DAN-TL | 0.95 | 0.89 | 0.94 | 0.89 |
Comparison of the conventional and the proposed approach, DAN gets the ACC of 0.92. And the accuracy of almost all methods is improved through TL. DAN-TL gets the best ACC of 0.95. ACC, accuracy; DAN, dual attention network; TL, transfer learning.
With TL, InceptionV3-TL demonstrates a substantial improvement in classification accuracy over its non-TL counterpart, highlighting that TL from large datasets to small sample datasets significantly enhances classification performance. DAN also shows a modest improvement in classification accuracy with TL compared to non-TL. When comparing the results of TL and non-TL across different networks, the TL-based classification accuracy consistently surpasses the non-TL results. Overall, the proposed DAN demonstrates the highest accuracy in evaluating the spinal cord compression severity of spinal metastasis from lung cancer.
Discussion
In this study, an improved model for medical image feature region segmentation and classification based on feature enhancement and TL was established. An optimized pyramid pooling and cross-attention front-end feature segmentation network was proposed. By designing attention modules, a feature heatmap sample dataset was constructed to achieve a novel progressive classification network, establishing a spinal cord compression severity assessment-assisted diagnostic model. This model was applied to an MRI dataset of spinal metastases from lung cancer, and various comparative experiments were conducted with other state-of-the-art deep neural network methods to evaluate the model’s recognition and classification performance. In Wang et al.’s dataset (29), most local regions in the MRI sequences did not contain any metastatic lesions, and the number of cases was only 22, which was insufficient. Additionally, precise detection of lesions was not achieved. Our dataset addresses this issue through precise annotations by professional physicians.
The metastatic cancer spinal cord compression grading assessment model includes an advanced lightweight PCAN and an improved attention-based DAN. In the PCAN, serial dual pooling was introduced to address the overfitting problem caused by complex internal tissue textures during feature recognition. Cross-attention was used for encoding and decoding feature transmission and fusion to enhance target recognition. In the intermediate layer, a PPM was used to filter and combine compressed features at different levels, improving overall target recognition capability. Applied to a small-sample MRI dataset, compared to traditional segmentation networks, the accuracy of target segmentation was enhanced, and the precision of target segmentation was optimized. Through PCAN segmentation, key features such as tumors, spine, and spinal cord were extracted, and feature fusion was used to generate key target feature maps as inputs for compression severity assessment classification. In the DAN, two attention modules of position and channel, and a progressive classification mode were employed to enhance the algorithm’s operational efficiency and robustness. Compared to other traditional classification networks such as DenseNet and ResNet101, the accuracy increased by 1%, and the precision improved by 2%. To further improve the performance of the classification network in a small-sample dataset, we performed TL on the ImageNet1000 pre-trained network model. After only 20 epochs of training, the classification accuracy increased by 3% and the precision increased by 6%, reaching a classification accuracy of 95% and a precision of 89%.
Potential limitations of the metastatic cancer spinal cord compression grading assessment model include the relatively small MRI dataset used for training and validation, which may not cover all possible cases, such as across different MRI scanners, tumor types, and patient demographics. The dataset annotated by only three clinicians may contain subjective errors. Although we used data augmentation technology to expand the training samples, continuous validation and dataset refinement are necessary in subsequent clinical applications to improve the system’s coverage, reliability, and stability. At the same time, this is a progressive compression grading assessment method, which includes two networks. It is more complex than the traditional classification network and requires more computing power.
The lung cancer spinal metastasis spinal cord compression grading classification assessment model provides an intelligent solution for clinical spinal cord compression grading. It aids physicians in comprehensively evaluating the compression severity of bone metastasis in lung cancer and formulating subsequent surgical and pharmacological treatment strategies for patients. To better promote the clinical use of the system, we will continue to address new issues and challenges encountered during its application. Additionally, further research on auxiliary parameters for lung cancer bone metastasis and three-dimensional segmentation and reconstruction of lung cancer bone metastasis will be conducted to provide comprehensive visual assistance and parameter evaluation for clinical diagnosis.
In summary, our manuscript proposes an innovative solution for intelligent spinal cord compression grading assessment of bone tumors based on MRI feature extraction and TL. Currently, the method is based on two-dimensional images. In the future, we plan to establish a small-sample voxel dataset based on three-dimensional image series, study curriculum learning for uncertainty and generative adversarial networks to enhance segmentation capabilities, and utilize multi-task joint learning to improve model learning efficiency. The development of three-dimensional image region segmentation, classification, and parameter analysis will provide new insights and parameter evaluation for clinical diagnosis.
Conclusions
The grading of spinal cord compression in patients with spinal metastases from lung cancer serves as a critical indicator for surgical intervention. This progressive and observable grading system has been validated, achieving an accuracy of 0.95 and a precision of 0.89 in compression assessment. Its application can assist orthopedic surgeons in formulating appropriate treatment strategies for patients.
Acknowledgments
None.
Footnote
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-922/dss
Funding: This research was supported by the grant from
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-922/coif). Gengyuan Wang reports that this research was supported by the grant from Guangzhou Science and Technology Program (No. 2025A04J4527). G.L. is affiliated with Guangzhou Rural Commercial Bank Co., Ltd. The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and was approved by the Ethics Committee of Guangdong Provincial People’s Hospital, Guangdong Academy of Medical Sciences [No. GDREC2019574J(R1)]. Before the experiment, all participants had provided informed consent and completed a questionnaire on subject information.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Chen W, Zheng R, Baade PD, Zhang S, Zeng H, Bray F, Jemal A, Yu XQ, He J. Cancer statistics in China, 2015. CA Cancer J Clin 2016;66:115-32. [Crossref] [PubMed]
- Mundy GR. Metastasis to bone: causes, consequences and therapeutic opportunities. Nat Rev Cancer 2002;2:584-93. [Crossref] [PubMed]
- Macedo F, Ladeira K, Pinho F, Saraiva N, Bonito N, Pinto L, Goncalves F. Bone Metastases: An Overview. Oncol Rev 2017;11:321. [Crossref] [PubMed]
- Ryan C, Stoltzfus KC, Horn S, Chen H, Louie AV, Lehrer EJ, Trifiletti DM, Fox EJ, Abraham JA, Zaorsky NG. Epidemiology of bone metastases. Bone 2022;158:115783. [Crossref] [PubMed]
- Kuchuk M, Kuchuk I, Sabri E, Hutton B, Clemons M, Wheatley-Price P. The incidence and clinical impact of bone metastases in non-small cell lung cancer. Lung Cancer 2015;89:197-202. [Crossref] [PubMed]
- Katakami N, Kunikane H, Takeda K, Takayama K, Sawa T, Saito H, Harada M, Yokota S, Ando K, Saito Y, Yokota I, Ohashi Y, Eguchi K. Prospective study on the incidence of bone metastasis (BM) and skeletal-related events (SREs) in patients (pts) with stage IIIB and IV lung cancer-CSP-HOR 13. J Thorac Oncol 2014;9:231-8. [Crossref] [PubMed]
- Klimo P Jr, Schmidt MH. Surgical management of spinal metastases. Oncologist 2004;9:188-96. [Crossref] [PubMed]
- Coleman RE. Metastatic bone disease: clinical features, pathophysiology and treatment strategies. Cancer Treat Rev 2001;27:165-76. [Crossref] [PubMed]
- Tsuya A, Kurata T, Tamura K, Fukuoka M. Skeletal metastases in non-small cell lung cancer: a retrospective study. Lung Cancer 2007;57:229-32. [Crossref] [PubMed]
- Rosen LS, Gordon D, Tchekmedyian NS, Yanagihara R, Hirsh V, Krzakowski M, Pawlicki M, De Souza P, Zheng M, Urbanowitz G, Reitsma D, Seaman J. Long-term efficacy and safety of zoledronic acid in the treatment of skeletal metastases in patients with nonsmall cell lung carcinoma and other solid tumors: a randomized, Phase III, double-blind, placebo-controlled trial. Cancer 2004;100:2613-21. [Crossref] [PubMed]
- Bilsky MH, Laufer I, Fourney DR, Groff M, Schmidt MH, Varga PP, Vrionis FD, Yamada Y, Gerszten PC, Kuklo TR. Reliability analysis of the epidural spinal cord compression scale. J Neurosurg Spine 2010;13:324-8. [Crossref] [PubMed]
- Tubiana-Hulin M. Incidence, prevalence and distribution of bone metastases. Bone 1991;12:S9-10. [Crossref] [PubMed]
- Gorelik N, Chong J, Lin DJ. Pattern Recognition in Musculoskeletal Imaging Using Artificial Intelligence. Semin Musculoskelet Radiol 2020;24:38-49. [Crossref] [PubMed]
- Hernandez RK, Wade SW, Reich A, Pirolli M, Liede A, Lyman GH. Incidence of bone metastases in patients with solid tumors: analysis of oncology electronic medical records in the United States. BMC Cancer 2018;18:44. [Crossref] [PubMed]
- Azimi P, Mohammadi HR, Benzel EC, Shahzadi S, Azhari S, Montazeri A. Artificial neural networks in neurosurgery. J Neurol Neurosurg Psychiatry 2015;86:251-6. [Crossref] [PubMed]
- Azimi P, Benzel EC, Shahzadi S, Azhari S, Mohammadi HR. Use of artificial neural networks to predict surgical satisfaction in patients with lumbar spinal canal stenosis: clinical article. J Neurosurg Spine 2014;20:300-5. [Crossref] [PubMed]
- Azimi P, Benzel EC, Shahzadi S, Azhari S, Mohammadi HR. The prediction of successful surgery outcome in lumbar disc herniation based on artificial neural networks. J Neurosurg Sci 2016;60:173-7.
- Nolting J. Developing a neural network model for health care. AMIA Annu Symp Proc 2006;2006:1049.
- Chmelik J, Jakubicek R, Walek P, Jan J, Ourednicek P, Lambert L, Amadori E, Gavelli G. Deep convolutional neural network-based segmentation and classification of difficult to define metastatic spinal lesions in 3D CT data. Med Image Anal 2018;49:76-88. [Crossref] [PubMed]
- Bevilacqua V, Brunetti A, Trotta GF, Carnimeo L, Marino F, Alberotanza V, Scardapane A. A deep learning approach for hepatocellular carcinoma grading. International Journal of Computer Vision and Image Processing 2017;7:1-18. (IJCVIP).
- Menegotto AB, Becker CDL, Cazella SC. Computer-aided diagnosis of hepatocellular carcinoma fusing imaging and structured health data. Health Inf Sci Syst 2021;9:20. [Crossref] [PubMed]
- Liu K, Kang G. Multiview convolutional neural networks for lung nodule classification. Int J Imaging Syst Technol 2017;27:12-22.
- Ertosun MG, Rubin DL. Automated Grading of Gliomas using Deep Learning in Digital Pathology Images: A modular approach with ensemble of convolutional neural networks. AMIA Annu Symp Proc 2015;2015:1899-908.
- Khawaldeh S, Pervaiz U, Rafiq A, Alkhawaldeh RS. Noninvasive grading of glioma tumor using magnetic resonance imaging with convolutional neural networks. Applied Sciences 2017;8:27.
- Ishikawa Y, Washiya K, Aoki K, Nagahashi H. Brain tumor classification of microscopy images using deep residual learning. SPIE BioPhotonics Australasia 2016;10013:222-31.
- Peter R, Malinsky M, Ourednicek P, Jan J. 3D CT spine data segmentation and analysis of vertebrae bone lesions. In: 2013 35th annual international conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2013:2376-9.
- Jan J, Chmelik J, Jakubicek R, Ourednicek P, Amadori E, Gavelli G. Spine lesion analysis in 3D CT data–Reporting on research progress. AIP Conf Proc 2018;1956:020001.
- Merali Z, Wang JZ, Badhiwala JH, Witiw CD, Wilson JR, Fehlings MG. A deep learning model for detection of cervical spinal cord compression in MRI scans. Sci Rep 2021;11:10473. [Crossref] [PubMed]
- Wang J, Fang Z, Lang N, Yuan H, Su MY, Baldi P. A multi-resolution approach for spinal metastasis detection using deep Siamese neural networks. Comput Biol Med 2017;84:137-46. [Crossref] [PubMed]
- Lang N, Zhang Y, Zhang E, Zhang J, Chow D, Chang P, Yu HJ, Yuan H, Su MY. Differentiation of spinal metastases originated from lung and other cancers using radiomics and deep learning based on DCE-MRI. Magn Reson Imaging 2019;64:4-12. [Crossref] [PubMed]
- Fisher CG, DiPaola CP, Ryken TC, Bilsky MH, Shaffrey CI, Berven SH, et al. A novel classification system for spinal instability in neoplastic disease: an evidence-based approach and expert consensus from the Spine Oncology Study Group. Spine (Phila Pa 1976) 2010;35:E1221-9. [Crossref] [PubMed]
- Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378-82.
- Yushkevich PA, Gao Y, Gerig G. ITK-SNAP: An interactive tool for semi-automatic segmentation of multi-modality biomedical images. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE; 2016:3342-45.
- Tammina S. Transfer learning using vgg-16 with deep convolutional neural network for classifying images. International Journal of Scientific and Research Publications (IJSRP) 2019;9:143-50.
- Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing; 2015:234-41.
- Zunair H, Ben Hamza A. Sharp U-Net: Depthwise convolutional network for biomedical image segmentation. Comput Biol Med 2021;136:104699. [Crossref] [PubMed]
- Verma R, Kumar N, Patil A, Kurian NC, Rane S, Graham S, et al. MoNuSAC2020: A Multi-Organ Nuclei Segmentation and Classification Challenge. IEEE Trans Med Imaging 2021;40:3413-23. [Crossref] [PubMed]
- Falk T, Mai D, Bensch R, Çiçek Ö, Abdulkadir A, Marrakchi Y, et al. U-Net: deep learning for cell counting, detection, and morphometry. Nat Methods 2019;16:67-70. [Crossref] [PubMed]
- Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I. Attention is all you need. In: Advances in Neural Information Processing Systems (NIPS 2017). 2017.
- Mnih V, Heess N, Graves A, Kavukcuoglu K. Recurrent models of visual attention. Advances in neural information processing systems (NIPS 2014). 2014.
- Petit O, Thome N, Rambour C, Themyr L, Collins T, Soler L. U-net transformer: Self and cross attention for medical image segmentation. In: International Workshop on Machine Learning in Medical Imaging. Cham: Springer International Publishing; 2021:267-76.
- Zhao H, Shi J, Qi X, Wang X, Jia J. Pyramid scene parsing network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:2881-90.
- Zhang L, Yin X, Liu X, Liu Z. Medical image segmentation by combining feature enhancement Swin Transformer and UperNet. Sci Rep 2025;15:14565. [Crossref] [PubMed]
- Reza AM. Realization of the contrast limited adaptive histogram equalization (CLAHE) for real-time image enhancement. J VLSI Signal Process Syst Signal Image Video Technol 2004;38:35-44.
- Jais IKM, Ismail AR. Adam optimization algorithm for wide and deep neural network. Knowledge Engineering and Data Science 2019;2:10.
- Zhang Z, Sabuncu MR. Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels. Adv Neural Inf Process Syst 2018;32:8792-802.
- He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-8.
- Zou F, Shen L, Jie Z, Zhang W, Liu W. A sufficient condition for convergences of adam and rmsprop. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:11127-35.
- Qin Y, Miao W, Qian C. A high-precision fall detection model based on dynamic convolution in complex scenes. Electronics 2024;13:1141.
- Zhu Y, Newsam S. Densenet for dense flow. In: 2017 IEEE International Conference on Image Processing (ICIP). IEEE; 2017:790-4.
- Xia X, Xu C, Nan B. Inception-v3 for flower classification. In: 2017 2nd International Conference on Image, Vision, and Computing (ICIVC). IEEE; 2017:783-7.

