Contrastive report and multiparametric dual-region magnetic resonance imaging learning for the preoperative prediction of axillary lymph node metastasis in breast cancer

Hanyu Shen; Wenju Cui; Yunsong Peng; Yilin Leng; Xiang Zhang; Gang Yuan; Jian Zheng

doi:10.21037/qims-2025-1485

Original Article

Contrastive report and multiparametric dual-region magnetic resonance imaging learning for the preoperative prediction of axillary lymph node metastasis in breast cancer

Hanyu Shen^1,2#, Wenju Cui^1,2#, Yunsong Peng³, Yilin Leng^1,2, Xiang Zhang^4,5, Gang Yuan² , Jian Zheng²

¹Division of Life Sciences and Medicine, School of Biomedical Engineering (Suzhou), University of Science and Technology of China, Hefei, China; ²Department of Medical Imaging, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, Suzhou, China; ³Guizhou Province International Science and Technology Cooperation Base for Precision Imaging Diagnosis and Treatment, Key Laboratory of Advanced Medical Imaging and Intelligent Computing of Guizhou Province, Department of Radiology, Guizhou Provincial People’s Hospital, Guiyang, Guizhou, China; ⁴Department of Radiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China; ⁵Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Medical Research Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China

Contributions: (I) Conception and design: H Shen, W Cui; (II) Administrative support: J Zheng; (III) Provision of study materials or patients: X Zhang; (IV) Collection and assembly of data: H Shen; (V) Data analysis and interpretation: H Shen; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Xiang Zhang, MD. Department of Radiology, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, 107 Yanjiang West Road, Yuexiu District, Guangzhou 510120, China; Guangdong Provincial Key Laboratory of Malignant Tumor Epigenetics and Gene Regulation, Medical Research Center, Sun Yat-sen Memorial Hospital, Sun Yat-sen University, Guangzhou, China. Email: zhangx345@mail.sysu.edu.cn; Gang Yuan, PhD. Department of Medical Imaging, Suzhou Institute of Biomedical Engineering and Technology, Chinese Academy of Sciences, 88 Keling Road, Science & Technology City, Suzhou New District, Suzhou 215163, China. Email: yuangang@sibet.ac.cn.

Background: The accurate prediction of axillary lymph node metastasis (ALNM) is crucial for determining the surgical extent and making treatment decisions for breast cancer patients. However, the method of incorporating clinical diagnostic reports into models to evaluate ALNM is underdeveloped and under validated. This study aimed to investigate the potential of a multimodal deep learning (DL) model that integrates the magnetic resonance imaging (MRI) characteristics of breast tumors and axillary lymph nodes (ALNs) with clinical diagnostic report-derived textual features for accurate ALNM differentiation.

Methods: This study retrospectively enrolled 804 breast cancer patients, of whom, 396 were diagnosed with ALNM and 408 with non-ALNM. First, a vision-language model [Breast Axillary Lymph Nodes-Contrastive Language-Image Pre-training, (BALN-CLIP)] with a Vision Transformer (ViT) as the visual encoder and BioClinical Bidirectional Encoder Representations from Transformers (BioClinicalBERT) as the text encoder was constructed. The model was trained using a contrastive learning strategy on the tumor and ALN data. Second, the fine-tuned visual and text encoders were extracted from BALN-CLIP to develop a multimodal model [Multimodal Multiparametric Axillary Lymph Network (MM-AXLNet)] that incorporated orthogonal fusion and cross-attention modules (CAMs), and integrated dynamic contrast-enhanced (DCE) sequences, T2-weighted imaging (T2WI) sequences, and clinical diagnostic reports for ALNM prediction across dual regions. Finally, the performance of our model was compared with that of a single-region model, a model without report features, and radiologist assessments. The accuracy, sensitivity, specificity, receiver operating characteristic (ROC) curves, and area under the curve (AUC) values of the ROC curves of the models were evaluated.

Results: In the five-fold cross-validation, the dual-region MM-AXLNet that incorporated report features performed optimally, achieving a mean accuracy of 0.819±0.020, AUC of 0.885±0.015, precision of 0.821±0.016, sensitivity of 0.820±0.007, specificity of 0.846±0.019, and F1-score of 0.819±0.018 in the test set, demonstrating statistically significant superiority compared to the other models. The diagnostic performance of this optimal model was superior to that of the radiologists.

Conclusions: MM-AXLNet, which integrates multiparametric dual-region MRI with report information through dual-region contrastive learning, enables the accurate preoperative prediction of ALNM in breast cancer, facilitating clinical treatment decision-making.

Keywords: Multimodal deep learning (multimodal DL); multiparametric dual-region magnetic resonance imaging (multiparametric dual-region MRI); clinical diagnostic report; axillary lymph node metastasis (ALNM); breast cancer

Submitted Jul 05, 2025. Accepted for publication Oct 21, 2025. Published online Dec 31, 2025.

doi: 10.21037/qims-2025-1485

Introduction

Breast cancer is the most common cancer in women worldwide (1). Its high incidence poses a serious threat to women’s health and quality of life (1). Axillary lymph node metastasis (ALNM) is an important indicator for evaluating breast cancer progression, and the status of axillary lymph nodes (ALNs) typically determines the treatment regimen, surgical extent, and whether radiotherapy and chemotherapy are required after mastectomy (2,3). Therefore, the accurate assessment of lymph node status is crucial for clinical pathological staging, treatment decision-making, and prognostic evaluation in breast cancer patients (4).

Traditionally, ALN dissection (ALND) and sentinel lymph node biopsy (SLNB) have served as the standard surgical methods for assessing ALN involvement (5,6). However, both ALND and SLNB are invasive procedures that may lead to complications such as pain, lymphedema, numbness, and limited arm function (7). Magnetic resonance imaging (MRI) provides certain advantages in evaluating ALNM. However, due to the subjective experience of radiologists, imaging assessment may yield false positives and false negatives, leading to unnecessary surgery. Therefore, a non-invasive, accurate method urgently needs to be established to predict ALNM preoperatively and improve clinical decision-making.

In recent years, radiomics and machine learning have made significant advances in breast cancer diagnosis, treatment, and prognosis prediction. However, radiomic feature extraction remains highly dependent on manual annotation, introducing subjectivity and dataset sensitivity that limit its clinical translation. With the development of artificial intelligence (AI), deep learning (DL) algorithms have been widely applied in image diagnosis and prediction due to their fast, accurate, and reproducible advantages (8). They have also achieved excellent performance in ALNM prediction, offering greater feasibility for clinical translation. Chen et al. (9) used DenseNet to extract features from dynamic contrast-enhanced (DCE) and diffusion-weighted imaging (DWI) sequences, incorporating clinical features such as patient age to build an-image-clinical model for ALNM prediction that achieved an area under the curve (AUC) of 0.71 on the test set.

Current DL trends favor the use of multiparametric MRI for ALNM prediction. For instance, Wang et al. (10) established a ResNet50 model based on DWI, T1-weighted imaging (T1WI), and T2-weighted imaging (T2WI) images of primary breast tumors, and further enhanced its performance through stacked ensemble learning. Ren et al. (11) predicted breast cancer ALNM through convolutional neural networks (CNNs) based on the multiparametric MRI of ALNs, which achieved an optimal AUC of 0.882. These studies demonstrate that combining DL models with multiparametric MRI can improve prediction results.

Although imaging-based DL models have shown promise, they predominantly rely on image features alone, overlooking valuable clinical diagnostic report information. This single-modality approach limits the comprehensive characterization of lesions. Clinical diagnostic reports encapsulate expert interpretations and highlight subtle diagnostic clues beyond imaging data. Therefore, integrating textual information through a multimodal framework may better exploit their complementary strengths and enhance ALNM prediction performance.

Currently, multimodal research is advancing rapidly, with the emergence of Transformer architectures driving deeper cross-modal interactions. Models such as Contrastive Language-Image Pre-training (CLIP) have significantly enhanced multimodal representation capabilities. In medical image analysis, multimodal data fusion has become a key strategy for improving diagnostic and predictive performance. Many studies have focused on training vision-text models in the medical domain, such as PubMed Central CLIP (PMC-CLIP) (12), A Foundation Language-Image Model of the Retina (FLAIR) (13), and Retinal Contrastive Language-Image Pretraining (RET-CLIP) (14).

Clinical diagnostic reports, carefully crafted by radiologists, encapsulate their reasoning processes and provide detailed, standardized descriptions of bilateral breast and ALN findings. As such, they can serve as a source of high-precision weak annotations for MRI analysis. Previous studies (15-18) successfully implemented multimodal image-text pre-training by applying contrastive learning to X-ray images and their associated radiological reports. These studies employed visual and textual encoders. Unlike X-ray images, multiparametric MRI provides superior contrast, facilitating the identification of more complex lesions. Integrating clinical diagnostic reports into neural network models can enhance model interpretability.

Orthogonal fusion facilitates mutual feature enhancement and redundancy elimination, which can address the insufficient fusion between different sequences. Meanwhile, the CAM allows attention interactions between different embeddings, capable of coordinating and enhancing the weights of effective features. However, the performance of DL networks based on orthogonal fusion and cross-attention fusion for evaluating ALNM in breast MRI remains unclear.

An effective diagnostic assistive model could provide accurate and stable diagnostic results as a reference for radiologists. To achieve this goal, we proposed an innovative multimodal DL framework that integrates DCE- and T2WI-based tumor and ALN region analysis. Leveraging dual-region contrastive learning, the model incorporates clinical report information to enhance attention to critical lesion features. This study aimed to explore the potential of the proposed model in predicting ALNM in breast cancer. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1485/rc).

Methods

Study design and patients

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of the Sun Yat-sen Memorial Hospital, Sun Yat-sen University (No. SYSKY-2024-896-01), and the requirement of individual consent for this retrospective analysis was waived.

In this retrospective study, we collected the breast MRI images and clinical diagnostic reports of 1,650 breast cancer patients who underwent MRI examinations at the Sun Yat-sen Memorial Hospital, Sun Yat-sen University between May 2015 and January 2023.

Patients were included in the study if they met the following inclusion criteria: (I) were female and aged over 18 years, and had histologically confirmed stage I–III invasive breast cancer; (II) had undergone MRI examination before surgery or treatments such as chemotherapy; and (III) had undergone surgery and SLNB/ALND treatment and had pathologically confirmed ALN status. Patients were excluded from the study if they met any of the following exclusion criteria: (I) had not undergone any surgery or had incomplete postoperative pathological results; (II) had poor-quality MRI images; and/or (III) had distant metastasis or other malignant tumors.

All the included patients underwent DCE-MRI and T2WI. This study used the 14th phase of the DCE sequence, T2WI sequence images, and corresponding clinical diagnostic reports. The clinical diagnostic reports were written by radiologists after completing MRI image interpretation, with the report content focusing solely on the imaging features observed in the MRI images; the radiologists had no access to other clinical information that might affect the objectivity of the text content. Further issues encountered during image inclusion are detailed in the supplementary material (Appendix 1). Additionally, all patients underwent ALND on the ipsilateral side of the breast cancer, with pathological results confirming lymph node metastasis status.

Figure 1 provides the patient selection process flowchart. A total of 804 patients were ultimately included in study, of whom, 396 had ALNM and 408 did not have ALNM. The patients had a mean age of 49.78±10.45 years. The dataset was randomly divided into five groups, each accounting for approximately 20%, and five-fold cross-validation with a 4:1 training-test split was performed to evaluate model performance.

Figure 1 Patient enrollment flow chart. MRI, magnetic resonance imaging; ROI, region of interest.

MRI technique

The breast MRI scans were performed using 1.5 T (Magnetom Avanto; Siemens Healthcare Solutions, Erlangen, Germany) and 3.0 T (Achieva; Philips Medical Systems, Amsterdam, Netherlands) MRI scanners, equipped with 8-channel phased-array breast coils. Axial imaging sequences, including T2WI, non-contrast-enhanced T1WI, DWI, and DCE imaging, were acquired. Apparent diffusion coefficient (ADC) maps derived from DWI images were obtained automatically using the built-in software on the MRI scanner.

For the DCE imaging, 0.1 mmol/kg of gadopentetate dimeglumine was injected intravenously using an MRI-compatible power syringe at a flow rate of 3.5 milliliters per second, after which two initial dynamic measurements were obtained. Subsequently, a 20-mL normal saline solution flush was administered. The DCE data acquisition comprised 40 measurements, with a temporal resolution of 8 seconds. After DCE imaging, delayed contrast-enhanced T1WI (CE-T1WI) was performed in the axial and coronal orientations. Comprehensive information on the MRI sequences is provided in the supplementary material (Table S1).

MRI image data segmentation and preprocessing

Each patient’s dataset included multi-sequence breast MRI image data. For each patient, T2WI and DCE-MRI images were selected as the input image data. Two radiologists (Ya Qiu, with 6 years of experience in breast MRI tumor segmentation, and Jiayi Huang, with 4 years of experience in breast MRI tumor segmentation) used ITK-SNAP 3.6.0 software to manually segment the primary tumor, as well as all visible lymph nodes and surrounding fat, vessels, and other areas in the ipsilateral axillary region on the multiparametric MRI sequences. The radiologists conducted the segmentation process blinded to the histopathological information about the breast tumors and ALNs, and delineated both the primary tumor and ALN regions layer by layer. The image data preprocessing for the DCE and T2WI sequences included the window width and level adjustment, normalization, and selection of model input images [two-dimensional (2D) slices]. Using the primary tumor and ALN masks, the tumor and ALN regions of interest (ROIs) were centrally cropped from the 2D slices, with the background pixels set to zero. An example of the ROI cropping for both a primary tumor and ALN based on a DCE sequence is shown in Figure 2. Detailed preprocessing procedures are described in the supplementary material (Appendix 2).

Figure 2 User-cropped ROIs. The upper panel shows the cropped tumor region; the lower panel shows the cropped axillary lymph node region. ROI, region of interest.

Clinical diagnostic report data preprocessing

To translate the Chinese clinical diagnostic reports, the pre-trained MarianMT model provided by Helsinki-natural language processing (NLP), which supports the automatic translation of Chinese and English, was used. To extract the tumor and ALN descriptions from the clinical diagnostic reports, the pre-trained large generative model Qwen-Chat was employed, using designed prompts as inputs to generate structured reports. An example of the translation and structuring process of the clinical diagnostic reports is provided in Figure 3. Detailed preprocessing procedures for clinical diagnostic reports are described in the supplementary material (Appendix 3).

Figure 3 Translation and structuring of clinical diagnostic reports. T1WI, T1-weighted imaging; T2WI, T2-weighted imaging.

Fine-tuning model development

This study adopted a transfer learning approach, utilizing open-source pre-trained visual encoders and text encoders, and transferring them to train and fine-tune the study’s dataset. As shown in Figure 4, we fine-tuned a vision-language model called Breast Axillary Lymph Nodes-CLIP (BALN-CLIP) under the CLIP paradigm using constructed dual-region image-text quadruplets that integrated text features from clinical diagnostic reports. Each patient’s quadruplet contained tumor and ALN images, along with structured clinical diagnostic descriptions of the corresponding regions. DCE-MRI served as the imaging input during fine-tuning. BALN-CLIP uses two visual encoders and two text encoders to extract image features and text features from images and clinical diagnostic reports, respectively. The visual encoders employed the base version Vision Transformer-base (ViT-base) pre-trained on the ImageNet dataset, while the text encoders used BioClinical Bidirectional Encoder Representations from Transformers (BioClinicalBERT) pre-trained on the Medical Information Mart for Intensive Care III (MIMIC-III) clinical database and PubMed database.

Figure 4 Framework diagram of the BALN-CLIP model. DCE, dynamic contrast-enhanced; MLP, multilayer perceptron.

During the fine-tuning process, a contrastive learning strategy was applied to enhance the representation of the MRI images by leveraging structured text features extracted from clinical diagnostic reports, thereby achieving the integration of text and image features. The fine-tuned model enhances the encoder’s comprehension of image content through relevant clinical diagnostic reports, thereby improving the accuracy of image analysis, and utilizes the refined visual and textual encoders for the subsequent classification tasks. Detailed explanations of the model architecture and mathematical principles are provided in the supplementary material (Appendix 4).

In the experiment, the dataset was divided into five folds, with only four folds selected each time as the training set to fine-tune the model weights, saving the encoder weights corresponding to the minimum training loss. The training parameters for the BALN-CLIP model are detailed in the supplementary material (Appendix 5). All the code was implemented in PyTorch, and mixed precision training was performed on an NVIDIA GeForce RTX 4090 Graphics Processing Unit (GPU).

ALNM prediction model construction

After fine-tuning the BALN-CLIP model, two visual encoders $ϕ_{v t} (\cdot)$ , $ϕ_{v a} (\cdot)$ and two text encoders $ϕ_{t t} (\cdot)$ , $ϕ_{t a} (\cdot)$ were extracted, targeting the tumor and ALN regions respectively, and the corresponding parameters were frozen to develop the Multimodal Multiparametric Axillary Lymph Network (MM-AXLNet) model for predicting ALNM. The entire model framework is illustrated in Figure 5A, representing a multimodal prediction model. For the DCE and T2WI sequences, the shared visual encoder $ϕ_{v t} (\cdot)$ was used to extract image features $V_{D C E}^{t}$ and $V_{T 2 W I}^{t}$ from the tumor region, and the shared visual encoder $ϕ_{v a} (\cdot)$ was used to extract image features $V_{D C E}^{a}$ and $V_{T 2 W I}^{a}$ from the ALN region. For the structured clinical diagnostic reports of the tumor and ALN regions, $ϕ_{t t} (\cdot)$ and $ϕ_{t a} (\cdot)$ were used to extract text features $T^{t}$ and $T^{a}$ , respectively.

Figure 5 Architecture of the MM-AXLNet and fusion module mechanism. (A) Framework diagram of the MM-AXLNet model; (B) framework diagram of the OFM; (C) representation of T2WI features projection onto the DCE features, and a schematic diagram of the component orthogonal to the DCE features. C, concat; DCE, dynamic contrast-enhanced; FC, fully connected layer; MLP, multilayer perceptron; OFM, orthogonal fusion module; P, pooling; T2WI, T2-weighted imaging.

Through the orthogonal fusion module (OFM), $V_{D C E}^{t}$ and $V_{T 2 W I}^{t}$ , and $V_{D C E}^{a}$ and $V_{T 2 W I}^{a}$ were fused separately to obtain the image-side fusion features $V_{D C E + T 2 W I}^{t}$ and $V_{D C E + T 2 W I}^{a}$ , respectively. $V_{D C E + T 2 W I}^{t}$ and $V_{D C E + T 2 W I}^{a}$ were then concatenated with their respective regional text features $T^{t}$ and $T^{a}$ to obtain the fusion features $E_{t}$ and $E_{a}$ for each region. Using the CAM, features $E_{t}$ and $E_{a}$ were merged to obtain $E_{c}$ . $E_{t}$ , $E_{a}$ , and $E_{c}$ were then concatenated before being input into a multilayer perceptron (MLP) to generate the prediction scores. The workflow of the OFM is shown in Figure 5B, and its mathematical projection analysis is presented in Figure 5C. A detailed analysis of the model construction process and the mathematical principles of the architecture are provided in the supplementary material (Appendix 6).

Additionally, this study constructed the following two additional models based on the tumor region and the ALN region in the images: MM-AXLNet_tumor and MM-AXLNet_ALN. These models omitted the CAM from the original MM-AXLNet framework, directly inputting either $E_{t}$ or $E_{a}$ into the MLP for metastasis prediction.

In the experiment, five-fold cross-validation was used for all three models to obtain the metastasis prediction results, with four subsets used for training and one subset used for testing. The encoder weights used for each fold were derived from the corresponding fold of the BALN-CLIP model fine-tuned on the same training set, with the final results representing the average across all five folds. The training parameters for the MM-AXLNet model are detailed in the supplementary material (Appendix 5). Code implementation was also based on PyTorch, with algorithm training conducted on an NVIDIA GeForce RTX 4090.

Comparisons with other models

To investigate the effect of the model architecture on evaluation performance, ablation experiments were conducted to compare the following classification models: Model A, a dual-region classification model based on the DCE sequence; Model B, a dual-region classification model based on the DCE sequence with an added CAM; Model C, a dual-region classification model based on both the DCE and T2WI sequences with an added CAM; and Model D, a dual-region classification model based on the DCE and T2WI sequences incorporating both an OFM and a CAM. All the visual encoders in these models were initialized with the fine-tuned encoders from the BALN-CLIP model, and the final prediction scores were generated using an MLP. Notably, Model D did not incorporate the text features derived from the clinical diagnostic reports, which enabled the contribution of the clinical report-derived information to the model’s performance to be assessed.

Additionally, to further assess the importance of fine-tuning the BALN-CLIP model, the fine-tuning process of the BALN-CLIP model was omitted (Figure 4), directly freezing all the parameters of the pre-trained visual encoder (ViT-base) and text encoder (BioClinicalBERT). The remaining OFM, CAM, and top-level classifier (MLP) in the MM-AXLNet model were then trained, and an evaluation was conducted (designated as Model E). Additionally, the BALN-CLIP visual encoder was replaced with ResNet-18, and the resulting MM-AXLNet model (designated as Model F) was evaluated. Additionally, to evaluate the independent predictive capability of the clinical diagnostic report text features themselves, the text encoder and weights from the fine-tuned BALN-CLIP model were used to construct a dual-region baseline model based solely on text features, which performed classification prediction through a MLP (designated as Model G).

Given that metastasis prediction in DL is traditionally based on CNNs, and considering the widespread adoption of the ResNet-18 architecture in related studies, we selected ResNet-18, ViT-Base, MedCLIP-ResNet18, and MedCLIP-ViT for comparison. All these models used DCE sequences and corresponding tumor masks as image inputs. For the multimodal models (which were MedCLIP variants), structured clinical diagnostic reports of the tumor region were used as the text input. The visual encoders for all models were initialized with ImageNet pre-trained weights.

Five-fold cross-validation was performed to evaluate model performance.

Performance and comparison with the MM-AXLNet model

The following criteria were used to evaluate ALN metastasis on MRI: (I) a short-axis diameter of 10 mm or greater; (II) a long- to short-axis ratio of less than 2; and (III) fatty hilum loss (19-21).

Two radiologists (Yuqing Peng and Xiang Zhang with 5 and 9 years of experience in breast MRI evaluation, respectively) independently assessed the morphological characteristics of the ALNs on MRI without knowledge of the final pathological diagnosis of lymph node metastasis. If any disagreements arose between the two radiologists, a third radiologist (Jun Shen, with 23 years of experience in breast MRI evaluation) made the determination. Subsequently, the predictive performance of the radiologists on each fold was evaluated, the results were averaged across the five folds, and their sensitivity, specificity, and accuracy were compared with those of the MM-AXLNet model to assess the clinical feasibility of our approach.

Statistical analysis

The statistical analysis was performed using Python (version 3.10.14) and R software (version 4.0.3). For the continuous variables, the Student’s t-test or Mann-Whitney U test was used for comparisons. For the categorical variables, the Chi-squared test or Fisher’s exact test was used for comparisons. Model performance was evaluated by calculating the sensitivity, specificity, accuracy, precision, and F1-score, as well as plotting the receiver operating characteristic (ROC) curves and calculating the AUCs of the ROC curves. For the different models mentioned above, the DeLong test was used for the statistical comparison of the AUC values, and a P value <0.05 was considered statistically significant.

Results

Baseline characteristics

The baseline characteristics of the study population are presented in Table 1. All the patients underwent ALND, and 396 cases of lymph node metastasis were confirmed. The mean age of the patients was 49.63±10.15 years in the non-ALNM group and 49.92±10.77 years in the ALNM group. The Student’s t-test revealed no significant difference between the two groups in terms of age (t=0.446, P=0.656). In the MRI-based radiological assessment, in the non-ALNM group, 391 patients (95.8%) were correctly diagnosed as negative, and 17 (4.2%) were misclassified as positive, while in the ALNM group, 143 patients (36.1%) were misclassified as negative and 253 (63.9%) were correctly diagnosed as positive. The Chi-squared analysis revealed a statistically significant difference between the two groups in terms of the MRI-based assessment.

Table 1

Baseline information

Characteristics	Total (n=804)	ALNM (n=396)	Non-ALNM (n=408)	P
Age (years), mean ± SD	49.78±10.45	49.92±10.77	49.63±10.15	0.656
MRI-based ALN metastasis, n (%)				<0.05
Low-risk [0]	534 (66.4)	143 (36.1)	391 (95.8)
High-risk [1]	270 (33.6)	253 (63.9)	17 (4.2)

ALNM, axillary lymph node metastasis; ALN, axillary lymph node; MRI, magnetic resonance imaging; SD, standard deviation.

Diagnostic performance of models

The MM-AXLNet model, which was trained on dual regions, including the tumor and ALN regions, demonstrated superior performance, achieving an AUC of 0.885, accuracy of 0.819, precision of 0.821, sensitivity of 0.820, F1-score of 0.819, and specificity of 0.846 on the test set. The five-fold cross-validation ROC curves are shown in Figure 6. MM-AXLNet outperformed MM-AXLNet_ALN (AUC: 0.885 vs. 0.815, DeLong test P<0.05) and MM-AXLNet_tumor (AUC: 0.885 vs. 0.780, DeLong test P<0.05) on the test set. The performance of different models in each training and test set through the five-fold cross-validation analysis is presented in Table 2 and Figure 7.

Figure 6 Five-fold cross-validation ROC curves of the MM-AXLNet model in the (A) training set, and (B) test set. AUC, area under the curve; ROC, receiver operating characteristic.

Table 2

Diagnostic performance of various models in the training and test sets

Dataset	Model	AUC	Accuracy	Precision	Sensitivity	Specificity	F1-score	P value
Training	MM-AXLNet_tumor	0.894±0.021	0.813±0.020	0.816±0.021	0.813±0.021	0.773±0.029	0.813±0.021	–
	MM-AXLNet_ALN	0.904±0.021	0.834±0.020	0.829±0.030	0.833±0.020	0.838±0.039	0.834±0.020	–
	MM-AXLNet	0.974±0.007	0.911±0.015	0.913±0.015	0.911±0.015	0.904±0.022	0.911±0.015	–
Testing	MM-AXLNet_tumor	0.780±0.042	0.723±0.025	0.727±0.026	0.725±0.025	0.717±0.063	0.723±0.025	0.029
	MM-AXLNet_ALN	0.799±0.017	0.741±0.015	0.744±0.014	0.742±0.015	0.769±0.054	0.741±0.014	0.047
	MM-AXLNet	0.885±0.015	0.819±0.020	0.821±0.016	0.820±0.017	0.846±0.019	0.819±0.018	–

Data are presented as mean ± SD. The MM-AXLNet model and other models were compared using the DeLong test. ALN, axillary lymph node; AUC, area under the curve; SD, standard deviation.

Figure 7 ROC curves of the three models in the (A) training set and (B) test set. ALN, axillary lymph node; AUC, area under the curve; ROC, receiver operating characteristic.

Ablation study analysis

Table 3 sets out the contribution of each embedding in our network architecture, showing the performance of Models A–D on the dataset, with corresponding ROC curves displayed in Figure 8A. Notably, the addition of the T2WI sequences and clinical diagnostic reports significantly improved model performance. Incorporating the T2WI sequences increased the AUC by 5.9% and the accuracy by 3.2%, while adding the clinical diagnostic reports further enhanced the AUC by 8.5% and the accuracy by 7.8%. All these improvements were statistically significant. The integration of textual features notably improved ALNM prediction, with the largest gain observed in the AUC (test set AUC: 0.885 vs. 0.800, DeLong test P<0.05). Model G, the text-only model, had an AUC of 0.778 on the test set, confirming that the clinical diagnostic report text itself had a certain independent predictive capability. Compared to the pure-text Model G, the complete multimodal model further improved the AUC by 13.7%, indicating that text features play a complementary and enhancing role in multimodal fusion. Further, the orthogonal fusion and CAMs resulted in additional performance gains.

Table 3

Diagnostic performance of the models in the ablation experiments on the test set

Model	DCE	Attn	T2WI	Dolg	Text	Adapter_VE	AUC	Accuracy	Precision	Sensitivity	Specificity	F1-score	P value
A	√	–	–	–	–	ViT-base	0.727	0.672	0.679	0.672	0.748	0.669	0.001
B	√	√	–	–	–	ViT-base	0.734	0.688	0.697	0.688	0.767	0.685	0.002
C	√	√	√	–	–	ViT-base	0.793	0.720	0.728	0.719	0.770	0.712	0.016
D	√	√	√	√	–	ViT-base	0.800	0.741	0.753	0.743	0.833	0.741	0.049
E	√	√	√	√	√	–	0.787	0.73	0.738	0.73	0.793	0.728	0.022
F	√	√	√	√	√	ResNet-18	0.797	0.748	0.750	0.748	0.724	0.747	0.049
G	–	–	–	–	√	–	0.778	0.722	0.723	0.722	0.716	0.705	0.019
Ours	√	√	√	√	√	ViT-base	0.885	0.819	0.821	0.82	0.846	0.819	–

The Adapter_VE, visual encoder was used during the fine-tuning process of the BALN-CLIP model; “–” indicates that the fine-tuning step of the BALN-CLIP model was omitted in this study’s dataset, and the visual encoder and text encoder were directly adopted in their original pre-trained states; all the parameters were frozen during the subsequent model training processes for Model E, and the visual encoder was not used for Model G; comparison of the MM-AXLNet model and other models using the DeLong test. Attn, attention module; AUC, area under the curve; DCE, dynamic contrast-enhanced; Dolg, orthogonal fusion module; T2WI, T2-weighted imaging.

Figure 8 ROC curves of the ablation experimental models. (A) ROC curves of the ablation experiments for each embedding in the network architecture; (B) ROC curves of the ablation experiments for the BALN-CLIP model during fine-tuning. AUC, area under the curve; ROC, receiver operating characteristic.

Table 3 also presents the results of another ablation experiment evaluating the effect of the BALN-CLIP model on MM-AXLNet model performance, with ROC curves depicted in Figure 8B. Compared to Models E and F, MM-AXLNet achieved accuracy improvements of 8.9% and 7.1%, and AUC increases of 9.8% and 8.8%, respectively, all of which were statistically significant.

Table 4 shows the differences in the parameter counts, floating point operations (FLOPs), and inference latency across different architectures. Lightweight Models A and B had the lowest parameter counts and inference latency (approximately 166 M, 5–6 ms), but their predictive performance was relatively limited, making them suitable only for extremely resource-constrained scenarios. The medium-complexity Models C, D, F, and G achieved a more balanced performance between accuracy and efficiency, better meeting the dual requirements of “high accuracy and low latency” for primary hospitals. The high-performance Model E and our MM-AXLNet model, despite showing optimal AUC performance, had the highest inference latency (approximately 19–20 ms), making them more suitable for central hospitals with sufficient computational resources or research environments.

Table 4

Comparison of parameter counts, computational complexity, and inference latency across different model architectures

Model	Total params (M)	FLOPs (G)	Avg inference latency (ms)
A	166.60	31.40	5.47
B	166.63	31.40	5.59
C	170.00	62.80	10.68
D	170.30	62.80	10.76
E	378.10	143.90	19.56
F	253.70	87.90	13.98
G	208.50	81.10	8.68
Ours	378.10	143.90	19.81

The unit “G” represents giga and “M” represents million. Avg, average; FLOP, floating point operation; params, parameters.

Comparative experiment analysis

The comparison results for the MM-AXLNet model and ResNet-18, vit-base, MedCLIP-ResNet50, and MedCLIP-ViT models are summarized in Table 5, with corresponding ROC curves shown in Figure 9. Notably, the multimodal models outperformed the baseline models on the test set. The ability of the MM-AXLNet model to predict ALN metastasis surpassed that of all comparative prediction models, with AUC improvements exceeding 15%.

Table 5

Diagnostic performance of comparative models and radiologists for ALN metastasis

Model	AUC	Accuracy	Precision	Sensitivity	Specificity	F1-score	P value
ResNet18	0.666	0.654	0.655	0.654	0.667	0.653	<0.001
ViT-base	0.693	0.657	0.665	0.658	0.715	0.653	<0.001
MedCLIP-ResNet	0.708	0.679	0.680	0.680	0.677	0.679	<0.001
MedCLIP-ViT	0.731	0.713	0.715	0.713	0.761	0.712	<0.001
Radiologists	–	0.800	0.938	0.639	0.958	0.759	0.036
Ours	0.885	0.819	0.821	0.820	0.846	0.819	–

The MM-AXLNet model and other models were compared using the DeLong test. ALN, axillary lymph node; AUC, area under the curve.

Figure 9 ROC curves of comparative experimental models and radiologists (showing specificity and sensitivity points) on the test set. AUC, area under the curve; ROC, receiver operating characteristic.

Comparison with radiologists’ performance

As shown in Table 5, the radiologists achieved an accuracy of 0.800, a sensitivity of 0.639, and a specificity of 0.958 on the test set. In Figure 9, the sensitivity and specificity achieved by the radiologists are plotted alongside the model ROC curves. The comparative analysis demonstrated that on this study’s single-center dataset, the MM-AXLNet model outperformed the radiologists in the test group, with significant improvements particularly in sensitivity and AUC. Thus, the MM-AXLNet model effectively reduced the risk of missed diagnoses and maintained superior classification performance across varying decision thresholds, with statistically significant superiority compared to radiologists.

Discussion

ALNM is a critical prognostic factor in breast cancer (22), and the timely, accurate detection of lymph node metastasis is crucial for guiding surgical decisions and adjuvant therapy. In this study, we developed a multimodal model, MM-AXLNet, based on DCE imaging and T2WI, incorporating textual features extracted from clinical diagnostic reports for preoperative ALNM prediction. When applied to tumor and ALN regions, MM-AXLNet achieved promising results, with test set AUCs ranging from 0.861 to 0.903 across the five-fold cross-validation, demonstrating both the model’s superiority and the important contribution of textual and ALN features in predicting ALNM. These findings suggest that the MM-AXLNet model that incorporates textual features has strong potential for the non-invasive, preoperative identification of ALNM in breast cancer patients.

Radiologists primarily determine ALNM based on morphological and structural features on MRI, focusing on suspicious nodes while potentially missing clinically negative nodes harboring micrometastases, leading to reduced sensitivity. This assessment relies heavily on visual interpretation, requiring expertise and experience, and is subject to variability and potential information loss due to subjective judgments. Although ALND and SLNB remain the standard surgical procedures for evaluating ALN status (5,6), they are invasive. Further, SLNB has a negative rate as high as 70–80%, indicating unnecessary surgery (23). Thus, an accurate, convenient and non-invasive method urgently needs to be developed to identify ALNM preoperatively and to better guide clinical decision-making, prognostic assessment, and personalized treatment planning.

Recent advances in multimodal research have led to significant breakthroughs in the integration of imaging and text, particularly through pre-trained models, generative approaches, and multimodal fusion strategies. In parallel, the field of NLP has made substantial progress in the structured analysis and extraction of information from clinical diagnostic reports. In the multimodal domain, models such as CLIP, which align visual and textual representations via contrastive learning, have demonstrated the feasibility of mapping images and text in a shared semantic space. These developments provide a strong foundation for the training strategy proposed in this study, where textual features extracted from clinical reports guide the model to focus on key imaging features. Dai et al. (24) showed that structured radiology reports can effectively direct DL models to relevant MRI features in brain lesions, improving detection performance. Building on this, our model further optimizes the strategy by applying contrastive learning to structured features, thereby improving both disease prediction performance and model interpretability. To our knowledge, this was the first study to incorporate textual features into an ALNM prediction analysis.

Additionally, rapid advances in AI have profoundly affected healthcare, particularly in early cancer detection, treatment response prediction, and prognosis assessment (25,26). Radiomics has been shown to extract subtle imaging features outside human perception, with radiomics-based MRI models developed for the preoperative prediction of ALNM (27,28). However, radiomics features are typically hand-crafted and vary across studies, whereas DL enables the automatic extraction of hierarchical feature representations and has been successfully applied to breast cancer detection and diagnosis (9-11). In our study, the MM-AXLNet model was constructed using DCE and T2WI sequences with consistent spatial resolution, guided by textual features from diagnostic reports. Our results demonstrate that the multiparametric model significantly outperformed models based solely on DCE-MRI, resulting in a statistically significant improvement in the AUC.

Most DL studies (10,29-34) have focused solely on features from the primary tumor, and have not considered the distinct differences between the tumor microenvironment of the primary site and the ALN. Our study showed that deep features extracted from the axillary region can effectively predict ALNM and outperform models relying only on tumor-derived features. Further, the study showed that models combining tumor and ALN features (AUC =0.885) outperformed models using only tumor features (AUC =0.780) or only ALN features (AUC =0.799). These findings further confirm the critical contribution of ALN features in metastasis assessment.

Previous studies (35,36) have shown that orthogonal fusion enhances complementary information between features while reducing redundancy, and that cross-attention mechanisms efficiently capture dynamic interactions across modalities. Building on this, we incorporated an OFM to integrate DCE and T2WI image features, and a CAM to fuse tumor and ALN image-text features. These modules guided the model to focus more effectively on metastasis-relevant information. The ablation studies confirmed that both OFM and CAM significantly improved ALN status classification, further validating their value.

Moreover, consistent with previous research (11,32), our MM-AXLNet model, which integrated tumor and ALN regions, outperformed the radiologists in predicting ALNM. Using this study’s single-center dataset, the performance of MM-AXLNet in both the tumor and ALN regions was superior to that of the radiologists, indicating that this model has certain clinical application potential and could serve as an auxiliary tool for radiologists. However, multicenter external validation is essential to further demonstrate the model’s generalizability and enhance interpretability before clinical implementation to meet clinical application requirements.

This study had a number of limitations, which could be addressed in subsequent research. First, as a retrospective analysis, it is subject to inevitable selection bias and had a limited sample size. Larger-scale, prospective studies need to be conducted to evaluate AI models embedded in routine breast cancer diagnostic and treatment workflows. Second, the volumetric ROIs for tumors and ALNs were manually segmented, which is both time-consuming and labor-intensive. In the future, advanced AI techniques could be used to achieve more automated segmentation. Third, this study employed a 2D network for ALNM prediction, but three-dimensional (3D) feature extraction may capture more comprehensive information. Future research will explore whether 3D models display enhanced prediction performance. Fourth, this study was conducted at a single center without external validation, potentially limiting the model’s generalizability. Multicenter studies with larger, diverse cohorts are essential to provide higher-level evidence for clinical application. Additionally, due to retrospective design limitations, the lymph node labeling in imaging does not correspond with the pathological findings. Pathology can only qualitatively determine whether metastasis exists in the axillary region. Future prospective studies could establish precise imaging-pathology correspondence for individual lymph nodes to improve model prediction accuracy for individual lymph node metastasis.

Conclusions

In this study, we developed MM-AXLNet, a multimodal model that integrates MRI features and structured clinical report information, based on tumor and ALN regions using DCE and T2WI sequences. By incorporating orthogonal fusion and cross-attention mechanisms, the model effectively enhanced classification performance. Compared to models without text feature integration, MM-AXLNet demonstrated superior diagnostic accuracy in preoperative ALNM prediction. Following further validation in larger populations and model calibration, our model holds strong potential as a non-invasive decision support tool to assist clinicians in breast cancer management.

Acknowledgments

We sincerely thank all study participants and referring technicians who contributed to this research. Special thanks go to Yuzhu Cao for technical support provided throughout this study.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1485/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1485/dss

Funding: This work was supported by the National Natural Science Foundation of China (grant Nos. 62371449 and 82302286), Jiangsu Provincial Key Research and Development Program Social Development Project (grant No. BE2022720), the Science and Technology Program of Guangzhou, China (grant No. 2025A03J4118), and the Guizhou Provincial Health Commission Science Technology Fund Project (grant No. gzwkj2024-474).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1485/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of the Sun Yat-sen Memorial Hospital, Sun Yat-sen University (No. SYSKY-2024-896-01), and individual consent for this retrospective analysis was waived.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
Ruan D, Sun L. Diagnostic Performance of PET/MRI in Breast Cancer: A Systematic Review and Bayesian Bivariate Meta-analysis. Clin Breast Cancer 2023;23:108-24. [Crossref] [PubMed]
Lowes S, Leaver A, Cox K, Satchithananda K, Cosgrove D, Lim A. Evolving imaging techniques for staging axillary lymph nodes in breast cancer. Clin Radiol 2018;73:396-409. [Crossref] [PubMed]
Tafreshi NK, Kumar V, Morse DL, Gatenby RA. Molecular and functional imaging of breast cancer. Cancer Control 2010;17:143-55. [Crossref] [PubMed]
Lyman GH, Somerfield MR, Bosserman LD, Perkins CL, Weaver DL, Giuliano AE. Sentinel Lymph Node Biopsy for Patients With Early-Stage Breast Cancer: American Society of Clinical Oncology Clinical Practice Guideline Update. J Clin Oncol 2017;35:561-4. [Crossref] [PubMed]
Mamounas EP, Kuehn T, Rutgers EJT, von Minckwitz G. Current approach of the axilla in patients with early-stage breast cancer. Lancet 2017;S0140-6736(17)31451-4.
Kootstra JJ, Hoekstra-Weebers JE, Rietman JS, de Vries J, Baas PC, Geertzen JH, Hoekstra HJ. A longitudinal comparison of arm morbidity in stage I-II breast cancer patients treated with sentinel lymph node biopsy, sentinel lymph node biopsy followed by completion lymph node dissection, or axillary lymph node dissection. Ann Surg Oncol 2010;17:2384-94. [Crossref] [PubMed]
Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500-10. [Crossref] [PubMed]
Chen Y, Wang L, Dong X, Luo R, Ge Y, Liu H, Zhang Y, Wang D. Deep Learning Radiomics of Preoperative Breast MRI for Prediction of Axillary Lymph Node Metastasis in Breast Cancer. J Digit Imaging 2023;36:1323-31. [Crossref] [PubMed]
Wang Z, Sun H, Li J, Chen J, Meng F, Li H, Han L, Zhou S, Yu T. Preoperative Prediction of Axillary Lymph Node Metastasis in Breast Cancer Using CNN Based on Multiparametric MRI. J Magn Reson Imaging 2022;56:700-9. [Crossref] [PubMed]
Ren T, Lin S, Huang P, Duong TQ. Convolutional Neural Network of Multiparametric MRI Accurately Detects Axillary Lymph Node Metastasis in Breast Cancer Patients With Pre Neoadjuvant Chemotherapy. Clin Breast Cancer 2022;22:170-7. [Crossref] [PubMed]
Lin W, Zhao Z, Zhang X, Wu C, Zhang Y, Wang Y, Xie W. PMC-CLIP: Contrastive language-image pre-training using biomedical documents. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer; 2023:525-36.
Silva-Rodríguez J, Chakor H, Kobbi R, Dolz J, Ben Ayed I. A Foundation Language-Image Model of the Retina (FLAIR): encoding expert knowledge in text supervision. Med Image Anal 2025;99:103357. [Crossref] [PubMed]
Du J, Guo J, Zhang W, Yang S, Liu H, Li H, Wang N. Ret-clip: A retinal image foundation model pre-trained with clinical diagnostic reports. In: International conference on medical image computing and computer-assisted intervention. Springer; 2024:709-19.
Zhang X, Wu C, Zhang Y, Xie W, Wang Y. Knowledge-enhanced visual-language pre-training on chest radiology images. Nat Commun 2023;14:4542. [Crossref] [PubMed]
Chauhan G, Liao R, Wells W, Andreas J, Wang X, Berkowitz S, Horng S, Szolovits P, Golland P. Joint Modeling of Chest Radiographs and Radiology Reports for Pulmonary Edema Assessment. Med Image Comput Comput Assist Interv 2020;12262:529-39. [Crossref] [PubMed]
Huang SC, Shen L, Lungren MP, Yeung S. Gloria: A multimodal global-local representation learning framework for label-efficient medical image recognition. In: Proceedings of the IEEE/CVF international conference on computer vision; 2021:3942-51.
Wang Z, Wu Z, Agarwal D, Sun J. MedCLIP: Contrastive Learning from Unpaired Medical Images and Text. Proc Conf Empir Methods Nat Lang Process 2022;2022:3876-87.
Memarsadeghi M, Riedl CC, Kaneider A, Galid A, Rudas M, Matzek W, Helbich TH. Axillary lymph node metastases in patients with breast carcinomas: assessment with nonenhanced versus uspio-enhanced MR imaging. Radiology 2006;241:367-77. [Crossref] [PubMed]
Chang JM, Leung JWT, Moy L, Ha SM, Moon WK. Axillary Nodal Evaluation in Breast Cancer: State of the Art. Radiology 2020;295:500-15. [Crossref] [PubMed]
Ecanow JS, Abe H, Newstead GM, Ecanow DB, Jeske JM. Axillary staging of breast cancer: what the radiologist should know. Radiographics 2013;33:1589-612. [Crossref] [PubMed]
Verheuvel NC, Ooms HW, Tjan-Heijnen VC, Roumen RM, Voogd AC. Predictors for extensive nodal involvement in breast cancer patients with axillary lymph node metastases. Breast 2016;27:175-81. [Crossref] [PubMed]
Boughey JC, Moriarty JP, Degnim AC, Gregg MS, Egginton JS, Long KH. Cost modeling of preoperative axillary ultrasound and fine-needle aspiration to guide surgery for invasive breast cancer. Ann Surg Oncol 2010;17:953-8. [Crossref] [PubMed]
Dai L, Lei J, Ma F, Sun Z, Du H, Zhang H, Jiang J, Wei J, Wang D, Tan G, Song X, Zhu J, Zhao Q, Ai S, Shang A, Li Z, Zhang Y, Li Y. Boosting Deep Learning for Interpretable Brain MRI Lesion Detection through the Integration of Radiology Report Information. Radiol Artif Intell 2024;6:e230520. [Crossref] [PubMed]
Gillies RJ, Kinahan PE, Hricak H. Radiomics: Images Are More than Pictures, They Are Data. Radiology 2016;278:563-77. [Crossref] [PubMed]
Carriero A, Groenhoff L, Vologina E, Basile P, Albera M. Deep Learning in Breast Cancer Imaging: State of the Art and Recent Advancements in Early 2024. Diagnostics (Basel) 2024;14:848. [Crossref] [PubMed]
Zhang J, Zhang Z, Mao N, Zhang H, Gao J, Wang B, Ren J, Liu X, Zhang B, Dou T, Li W, Wang Y, Jia H. Radiomics nomogram for predicting axillary lymph node metastasis in breast cancer based on DCE-MRI: A multicenter study. J Xray Sci Technol 2023;31:247-63. [Crossref] [PubMed]
Romeo V, Kapetas P, Clauser P, Rasul S, Cuocolo R, Caruso M, Helbich TH, Baltzer PAT, Pinker K. Simultaneous 18F-FDG PET/MRI Radiomics and Machine Learning Analysis of the Primary Breast Tumor for the Preoperative Prediction of Axillary Lymph Node Status in Breast Cancer. Cancers (Basel) 2023;15:5088. [Crossref] [PubMed]
Zhang X, Liu M, Ren W, Sun J, Wang K, Xi X, Zhang G. Predicting of axillary lymph node metastasis in invasive breast cancer using multiparametric MRI dataset based on CNN model. Front Oncol 2022;12:1069733. [Crossref] [PubMed]
Chen M, Kong C, Lin G, Chen W, Guo X, Chen Y, Cheng X, Chen M, Shi C, Xu M, Sun J, Lu C, Ji J. Development and validation of convolutional neural network-based model to predict the risk of sentinel or non-sentinel lymph node metastasis in patients with breast cancer: a machine learning study. EClinicalMedicine 2023;63:102176. [Crossref] [PubMed]
Nguyen S, Polat D, Karbasi P, Moser D, Wang L, Hulsey K, Çobanoğlu MC, Dogan B, Montillo A. Preoperative Prediction of Lymph Node Metastasis from Clinical DCE MRI of the Primary Breast Tumor Using a 4D CNN. Med Image Comput Comput Assist Interv 2020;12262:326-34. [Crossref] [PubMed]
Zhou LQ, Wu XL, Huang SY, Wu GG, Ye HR, Wei Q, Bao LY, Deng YB, Li XR, Cui XW, Dietrich CF. Lymph Node Metastasis Prediction from Primary Breast Cancer US Images Using Deep Learning. Radiology 2020;294:19-28. [Crossref] [PubMed]
Zheng X, Yao Z, Huang Y, Yu Y, Wang Y, Liu Y, Mao R, Li F, Xiao Y, Wang Y, Hu Y, Yu J, Zhou J. Deep learning radiomics can predict axillary lymph node status in early-stage breast cancer. Nat Commun 2020;11:1236. [Crossref] [PubMed]
Tang X, Zhang H, Mao R, Zhang Y, Jiang X, Lin M, Xiong L, Chen H, Li L, Wang K, Zhou J. Preoperative Prediction of Axillary Lymph Node Metastasis in Patients With Breast Cancer Through Multimodal Deep Learning Based on Ultrasound and Magnetic Resonance Imaging Images. Acad Radiol 2025;32:1-11. [Crossref] [PubMed]
Yang M, He D, Fan M, Shi B, Xue X, Li F, Ding E, Huang J. Dolg: Single-stage image retrieval with deep orthogonal fusion of local and global features. In: Proceedings of the IEEE/CVF International conference on Computer Vision; 2021:11772-81.
Labbaki S, Minary P. Orthogonal sequential fusion in multimodal learning. IEEE; 2025.

Cite this article as: Shen H, Cui W, Peng Y, Leng Y, Zhang X, Yuan G, Zheng J. Contrastive report and multiparametric dual-region magnetic resonance imaging learning for the preoperative prediction of axillary lymph node metastasis in breast cancer. Quant Imaging Med Surg 2026;16(1):56. doi: 10.21037/qims-2025-1485

Contrastive report and multiparametric dual-region magnetic resonance imaging learning for the preoperative prediction of axillary lymph node metastasis in breast cancer

Introduction

Methods

Study design and patients

MRI technique

MRI image data segmentation and preprocessing

Clinical diagnostic report data preprocessing

Fine-tuning model development

ALNM prediction model construction

Comparisons with other models

Performance and comparison with the MM-AXLNet model

Statistical analysis

Results

Baseline characteristics

Table 1

Diagnostic performance of models

Table 2

Ablation study analysis

Table 3

Table 4

Comparative experiment analysis

Table 5

Comparison with radiologists’ performance

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share