Dual-discriminator network-based classification method for breast ultrasound imaging

Xue Zhao; Huanyu Zhao; Zhiying Cheng

doi:10.21037/qims-2024-2496

Original Article

Dual-discriminator network-based classification method for breast ultrasound imaging

Xue Zhao, Huanyu Zhao, Zhiying Cheng

Department of Medical Imaging, Chifeng Municipal Hospital, Chifeng, China

Contributions: (I) Conception and design: X Zhao; (II) Administrative support: Z Cheng; (III) Provision of study materials or patients: All authors; (IV) Collection and assembly of data: X Zhao, H Zhao; (V) Data analysis and interpretation: X Zhao, H Zhao; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

Correspondence to: Zhiying Cheng, MMed. Department of Medical Imaging, Chifeng Municipal Hospital, No. 152, Changqing Street, Hongshan District, Chifeng 024000, China. Email: 18810796898@163.com.

Background: Breast cancer is a malignant tumor with significant morbidity and mortality worldwide. Early detection and diagnosis via ultrasound imaging are crucial for effective treatment and improved patient outcomes. Transferring learning from large natural image datasets to smaller medical datasets has been a common strategy to enhance model performance due to the scarcity of labeled medical images. However, there are several challenges in applying these methods to medical image classification, including the small dataset size, data imbalance, the need for strong feature extraction capabilities, and the necessity for model interpretability. This study aims to address data imbalance and enhance both the accuracy and interpretability of breast ultrasound image classification for breast cancer diagnosis.

Methods: We introduce a novel Dual-Discriminator Generative Adversarial Networks (GAN) for iterative data synthesis on unbalanced datasets. We introduce channel and spatial attention mechanisms to recognize intricate details during classification. Extensive experiments on various benchmarks demonstrate that our method achieves competitive performance compared to state-of-the-art methods. Our method shows improved performance on both the Breast Ultrasound Images Dataset (BUSI) and our self-constructed dataset.

Results: Our method achieved a top accuracy of 96.0% on the BUSI dataset and 95.8% on our self-constructed evaluation dataset.

Conclusions: Qualitative evaluation through visualization methods further supports the practical diagnostic value of our method in identifying relevant image features for classification.

Keywords: Breast cancer; ultrasound imaging; deep learning; transfer learning; data synthesis

Submitted Nov 09, 2024. Accepted for publication Feb 13, 2026. Published online Apr 08, 2026.

doi: 10.21037/qims-2024-2496

Introduction

Breast cancer is one of the most common malignant tumors. If not treated timely, it may spread to other organs, seriously affecting the quality of life and survival of the patient. Breast ultrasound is a widely used diagnostic tool in breast cancer evaluation. It demonstrates high sensitivity for identifying suspicious lesions, with studies reporting a pooled sensitivity of 80.1% and specificity of 88.4% in global settings. In specialized clinical settings, ultrasound achieves even higher performance metrics, including a sensitivity of 82.0%, specificity of 99.3%, and a receiver operating characteristic (ROC) curve area of 0.91 (1). Ultrasound can complement mammography and magnetic resonance imaging (MRI) in the field of breast cancer. Mammography excels in detecting microcalcifications but struggles with dense breast tissue. In contrast, ultrasound maintains a higher sensitivity in dense breasts and is particularly valuable for evaluating palpable masses or abnormalities detected on mammograms. MRI, while highly sensitive for detecting multifocal or recurrent cancers, has lower specificity and is costlier and more resource-intensive (2). Breast ultrasound has the nature of non-invasive and radiation-free. It provides a painless patient experience compared to some alternative diagnostic methods. The role of ultrasound in breast cancer screening deserves re-evaluation and has the potential to become a more critical and independent screening tool.

In recent years, researchers have achieved excellent results in medical image analysis tasks via deep learning, such as breast ultrasound image classification (3-6). The conventional method involves fine-tuning Convolutional Neural Networks (CNNs), previously trained on ImageNet, with medical datasets. However, compared to general computer vision tasks, medical image classification exhibits several distinct characteristics (7-9). Firstly, dataset size. Medical images are often scarcer and more expensive to collect than general images, resulting in a smaller dataset size. Secondly, dataset categories. The number of categories in medical image classification tasks is usually significantly smaller than that in general tasks. However, such distinctions do not bear a correlation with task complexity. Thirdly, feature extraction ability. Deep learning models designed for medical image classification require stronger feature extraction capabilities to classify the subtle differences between lesions. Fourth, dataset imbalance. The majority of data in medical images is often concentrated on common diseases, while rare diseases lack sufficient training data. It makes the dataset more unbalanced and affects the model’s capacity to distinguish between various image classes. Finally, interpretability. Given that clear evidence for a disease diagnosis is necessary, medical classification requires strong interpretability and cannot solely be a black-box model.

Considering the aforementioned characteristics, high-quality annotated images are the direct source. However, in fact, it may be difficult to gather high-quality annotated medical datasets. On one hand, there is a scarcity of rare and complex diseases, and the collection is also hindered by patient privacy and data security. On the other hand, the annotation process for medical images is costly, requiring the development of specialized annotation tools and the involvement of experienced medical professionals. In order to address the impact of the above difficulties on medical image research, we present an exemplary self-developed ultrasonic dataset, as shown in Figure 1.

Figure 1 Samples of self-constructed ultrasound dataset. (A) indicates normal; (B) indicates malignant images; and (C) indicates benign images.

Currently, augmentation of medical images via image generation technology is an effective method to enhance the performance of medical models (10,11). Constructing synthetic data can improve the network’s generalization capability to unknown data. Some works utilize traditional data augmentation methods to increase the volume of data, such as rotation and random crop. However, manually designed image processing operations can hardly cover the distribution of the entire dataset (12,13). Moreover, different medical images require distinctive data augmentation strategies. Recently, image generation methods based on Generative Adversarial Networks (GANs) or diffusion models are employed for the automatic augmentation of medical images (14,15). These methods allow us to sample the entire data distribution and augment the dataset more flexibly. Notably, StyleGAN (16) demonstrates style transfer potential for generating high-resolution facial images. This method can be advantageous for generating tailored medical images. Additionally, diffusion models with conditional generation can potentially address the challenge of generating rare medical images with text or image constrains (17). The contributions of our work are as follows. (I) We introduce a novel Dual-Discriminator GAN designed specifically for iterative data synthesis on unbalanced datasets. It includes targeted generation of samples for imbalanced categories and ensures the generated images are realistic and consistent with category features. (II) We construct a CNN with a channel and spatial attention mechanism and perform iterative training on the original dataset and the synthesized data. (III) Our proposed method is validated on publicly available datasets as well as our own collected dataset, demonstrating its effectiveness. Compared to traditional data augmentation methods, our approach achieves higher coverage of the data distribution in the dataset and can be flexibly applied in the data processing pipeline of any medical image classification model, ensuring improved results.

Breast cancer classification via ultrasound images is a vital research area. Researchers initially utilize traditional machine learning methodologies for breast cancer classification. Manually designed features, such as texture, shape, and characteristics are extracted from these images. Using classifiers, such as support vector machines (SVM) or random forests, these features are identified as benign or malignant (18-20). However, these approaches are heavily based on expert-defined characteristics. Deep learning, notably CNN, has shown substantial potential in breast cancer classification. CNN architectures for breast cancer classification typically involve modifications to networks, such as VGGNet, ResNet, or DenseNet (21-23). Deep learning models outperform traditional machine learning methods in breast cancer classification, adeptly identifying distinctive features from ultrasound images. Owing to the paucity of labeled data, transfer learning has been employed in breast cancer classification. Pretrained CNN models are usually initialized on immense natural image datasets like ImageNet and then fine-tuned on smaller ultrasound image datasets (24). This approach enables the models to utilize pre-learned parameters from natural images and adapt them to breast cancer classification, reducing the number of medical images required for model convergence. Inspired by model ensemble strategies, researchers develop ensemble models that merge predictions from multiple models for the breast cancer classification task (25). Ensemble models enhance generalization ability by aggregating predictions from multiple models. Notably, Ayana et al. (26) reported achieving state-of-the-art (SOTA) performance on the Breast Ultrasound Images Dataset (BUSI). The study introduces a breast ultrasound classification approach using a Vision Transformer (ViT) and implements a multistage transfer learning strategy based on ViT.

Transfer learning tries to transfer information from one domain to enhance the performance of related tasks or domains. It has become a promising strategy in medical image analysis in recent years. Transfer learning exhibits considerable potential in medical image analysis by leveraging knowledge from related tasks or domains. We provide an overview of transfer learning in medical images in this part, with an emphasis on common paradigms and a focus on significant advances. Homogeneous transfer learning, accomplishing identical tasks across diversified datasets, has been exhaustively scrutinized. Numerous studies demonstrate the effectiveness of pre-training deep neural networks using generic image datasets like ImageNet, followed by fine-tuning them for medical image datasets. Notably, Alzubaidi et al. present novel methods to realize progressive layer unfreezing and selective initialization for more effectively employing pretrained models (27). Heterogeneous transfer learning, which applies diverse tasks across the same dataset or domain, has exhibited substantial results. For instance, in order to augment primary tasks like disease classification or detection, leveraging knowledge from other tasks may be useful, such as image segmentation or anatomical landmark localization. These methods employ cross-task representations and engage in multi-task learning. Notably, Chen et al. propound an attention guided multi-task learning framework adept at distinguishing task-specific features and enhancing performance in numerous medical image analysis tasks (28). Domain adaptation transfer learning employs separate datasets for diverse tasks, thereby managing distribution shifts among various medical imaging modalities or patient cohorts. Adversarial domain adaptation methodologies like domain adversarial neural networks (DANN) or cycle-consistent GANs (CycleGAN) have gained popularity for aligning feature distributions while preserving task-specific details (29,30). For instance, Hoffman et al. introduce a cycle-consistent adversarial domain adaptation framework that efficiently translates images from the source to the target domain, ensuing improved performance in cross-domain medical image analysis tasks (31). Data augmentation or data simulation to expand the dataset emerges as a promising solution to the scarcity of labeled data. Data augmentation techniques are often used to artificially expand the size and diversity of the training dataset. Various augmentation methods, such as rotation, translation, cropping, and flipping, have been studied for breast ultrasound images. To address the limitations of small datasets, some common data augmentation methods, such as random rotations and flips are involved. Besides, a novel data augmentation method named as Multi-Scale Super-Pixel Elastic is proposed to augment images. Similarly, Hijab et al. fine-tune a VGG16 network to classify breast tumors in ultrasound images. To address overfitting, the authors employ image augmentation to expand the dataset size tenfold and achieve excellent performance (32).

Recently, researchers apply GANs to breast cancer classification by utilizing conditional information. Conditional GANs can generate ultrasound images conditioned on specific characteristics, such as the presence or absence of tumors. When the generation process is precisely regulated under certain conditions, these models can produce ultrasound images relevant to the classification task, thereby improving the accuracy of breast cancer classification (33). We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2024-2496/rc).

Methods

Dataset

To evaluate the performance of our proposed method, we utilize the open-sourced BUSI (34) and a self-constructed ultrasound dataset. Figure 2 displays samples from the widely used breast ultrasound image dataset, BUSI. BUSI includes breast ultrasound images among women aged 25 to 75 years. The number of patients is 600 female patients. All BUSI images were collected in 2018. BUSI consists of 780 images with an average image size of 500×500 pixels in total. All images are in PNG format. The ground truth images are presented with original images. BUSI is classified into three classes, which are normal, benign, and malignant. Our self-constructed ultrasound dataset was compiled by the Medical Imaging Department of Chifeng Municipal Hospital. It consists of 117 benign and 77 malignant breast images. Our dataset is rigorously labeled and checked by senior radiologists. Considering the scarcity of data and the imbalance of data, if we train deep learning models directly based on the above dataset, we will not be able to get excellent results. We chose some mainstream data augmentation methods to expand the data volume, including RandomHorizontalFlip, ColorJitter, GaussianBlur, and RandomResizedCrop as shown in Figure 3. We choose to split training set and testing set following the common setting as 4:1. And the training set and testing set are totally different. It refers to the absolute absence of sample overlap between the training set and the test set. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the institutional ethics board of Chifeng Municipal Hospital (No. CK20250721) and informed consent was taken from all the patients.

Figure 2 Samples of self-constructed ultrasound dataset. (A) indicates benign images, (B) indicates malignant images.

Figure 3 The data augmentation methods for pre-process. Including RandomHorizontalFlip, ColorJitter, GaussianBlur, and RandomResizedCrop.

Proposed method

Our approach is divided into two parts. In the first part, we introduce a novel Dual-Discriminator GAN specifically designed for iterative data synthesis on unbalanced datasets. During the iterative synthesis rounds, the first round involves images sampled from the entire training dataset. In subsequent rounds, we only generate samples that the model cannot classify correctly, which usually means focused generation for categories that are not balanced in the dataset. In addition, within the framework of the GAN, we carefully design the Dual-Discriminator structure. One discriminator is responsible for identifying whether the generated images are real or fake, while the secondary discriminator is dedicated to learning the true category to which the image belongs. The Dual-Discriminator GAN ensures that the generated images are not only more realistic, but also align with its specific class features. In the second part, we construct CNN with attention that is iteratively trained on both the original dataset and the data synthesized from the first part. Learning round-by-round synthesized images helps gradually improve model classification performance, while learning all the data at once can exacerbate confusion. In particular, we modify CNN with both channel and spatial attention mechanisms to significantly increase the model’s ability to learn and recognize intricate details. This enhancement allows the network to dynamically focus on salient features, improving performance, especially in the context of minute and critical features that are often overlooked during training. We select the sigmoid function for binary classification problems and the softmax function for multi-class classification problems. We choose to follow the common setting and select the threshold to maximize the accuracy of test set. The iterative synthesis rounds are shown in Figure 4.

Figure 4 A schematic diagram of our proposed method. The upper figure shows the two-stage data synthesis process, and the lower figure shows the model architecture of CGAN. CGAN, Conditional Generative Adversarial Networks.

Dual-Discriminator GAN for iterative data synthesis

One of the distinguishing features of ultrasound images of breast tumors is the significant data imbalance. In practice, normal images far outnumber tumor images. Within the subset of tumor images, benign tumors are more common than malignant tumors. For example, in the commonly used BUSI, the ratio of benign to malignant samples is about 2:1. The scarcity and heterogeneity of malignant tumors add complexity to the challenge of classifying breast ultrasound images. In our paper, we design a Dual-Discriminator GAN that performs iterative data synthesis over several rounds to address data imbalance. We use Conditional GAN (CGAN) as the base model, so that the model can generate corresponding images according to the category. CGAN adds labels along with noise to the generator input. So, during prediction, the generator will also generate the specified category of images according to the label. The input condition label y must not only fuse with noise z at the input but also fuse with the feature map at each level of the generator and discriminator. It should be noted that when the discriminator feeds with real or fake images generated by the generator, the classification label is also added. The discriminator of CGAN is to accurately classify whether the data is real or generated, taking into account the category label y. The CGAN model architecture is shown in Figure 4. The loss function consists of two parts:

$L_{adv} (D) = - \frac{1}{N} [\sum_{i = 1}^{N} \log (D (x_{i}, y_{i})) + \sum_{j = 1}^{N} \log (1 - D (G (z_{j}, c_{j})))]$ [1]

where x_iare the real images, y_iare the corresponding true labels, G (z_j, c_j) are the generated images, and c_jare the noise vectors and categorical codes.

$L_{G} = - \frac{1}{N} \sum_{N}^{N} \log (D (G (z_{i}, y_{i})))$ [2]

where G (z_i, y_i) is the generated data sample based on the noise vector z_iand the categorical constraint y_i. In addition to the above, we incorporate another discriminator whose role is to discern the degree of alignment between generated images and their respective category. Notably, the image classifier in “Method” inherits the parameters of this discriminator for the purpose of learning the classification of the generated image classes. We hope that during the training of GAN, the model can not only identify real and fake images, but also learn the difference between image categories. We select the weights trained at this stage as the initial weights to continue the classification of breast tumor images:

$L_{class} (D) = - \frac{1}{N} \sum_{i = 1}^{N} \sum_{k = 1}^{M} y_{i, k} \log (D_{k} (x_{i}, y_{i}))$ [3]

where D_k (x_i, y_i) represents the output of the k-th class head in the auxiliary classification head of the discriminator, and y_i_,_k is the one-hot encoded ground truth vector for the class labels of the real image x_i.

Our focus is on synthesizing malignant images to solve data imbalance. Initially, we employ traditional data augmentation to double the data volume, such as horizontal flipping, cropping, noise addition, brightness, contrast, and saturation adjustment. Subsequently, we utilize the Dual-Discriminator GAN to further expand the number of augmented images by twofold. Notably, excessive reliance on conventional image augmentation techniques, may inadvertently degrade the intrinsic quality of the training dataset. While these methods are designed to artificially expand the dataset and enhance model robustness, their overuse can introduce artificial and potentially misleading variations that do not reflect real-world data distributions. This can result in suboptimal model performance, particularly in terms of generalization to unseen data, as the model may learn spurious patterns introduced by the augmentation process rather than the underlying semantic features of the images.

This approach allows us to effectively expand the dataset and enhance the diversity of the malignant images.

Deep transfer learning with channel and spatial attention

We select a deep CNN, ResNet, to realize breast ultrasound image classification. The primary innovation is the introduction of residual learning, which enables the training of very deep networks by allowing gradients to flow through the layers more effectively during the backpropagation process. During the training, we follow the paradigm of transfer learning. We utilize the pre-trained ResNet model’s weights on ImageNet dataset as the initial weights for our model. As outlined in “Method”, the ResNet network functioned as the class discriminator, and all parameters are updated during training. In this section, we employ the weights of class discriminator as the initial weights. Specifically, we update only the parameters of the last two fully connected (fc) layers. This approach enables us to leverage the discriminative capabilities of the GAN model while fine-tuning the classification network specifically for the task of breast tumor classification. We select ResNet-50, ResNet-101 and ResNet-152 as the classifier to evaluate our method.

To increase the model’s ability to learn discriminative features and improves its performance in breast ultrasound classification task. We incorporate the Squeeze-and-Excitation Network (SENet) structure to implement channel and spatial attention in the ResNet model.

Channel attention integration

In the ResNet model, after a convolutional layer, the global average pooling (GAP) operation is applied across the spatial dimensions of the feature map. This operation squeezes the spatial information and produces a channel descriptor, which captures channel-wise statistics. The channel descriptor is then fed into a small fully connected network consisting of one or more fully connected layers. These layers learn channel-specific importance weights or scaling factors by capturing interdependencies among different channels. Activation functions, such as ReLU or sigmoid, may be applied between the fully connected layers. The importance weights obtained from the fully connected network are applied as scaling factors to the original feature map. This rescales the features of each channel, allowing the network to selectively emphasize or suppress certain channels based on their importance scores. The rescaled feature map is then added element-wise to the original input feature map, incorporating channel attention. This operation allows the network to recalibrate channel-wise information and enhance discriminative features.

Spatial attention integration

Following the channel attention integration, the feature map that incorporates channel attention is passed through a spatial attention module. This module aims to capture spatial dependencies within each channel. The spatial attention module typically consists of convolutional layers that capture local spatial information and generate spatial attention maps. These maps highlight informative spatial locations and suppress irrelevant or noisy regions. Element-wise multiplication: the spatial attention maps are multiplied element-wise with the feature map that incorporates channel attention. This operation modulates the features of each spatial location based on their importance scores, allowing the network to attend to relevant spatial information and enhance discriminative spatial patterns.

By integrating the SENet structure into the ResNet model and incorporating both channel and spatial attention mechanisms, the network can effectively capture interdependencies among channels and spatial dependencies within each channel.

Results

Evaluation metrics

Accuracy

Accuracy is the most straightforward metric to evaluate the performance of a classification model. It measures the proportion of correctly classified images out of the total number of images.

ROC curve

The ROC curve is a graphical representation of the model’s diagnostic ability. It plots the true positive rate (TPR) on the y-axis against the false positive rate (FPR) on the x-axis at various threshold levels.

A model with a ROC curve closer to the top-left corner of the plot (where both TPR and FPR are high) is considered to have better performance. Area under the ROC curve (AUC). AUC is a more informative metric than accuracy, especially for imbalanced datasets. It measures the model’s ability to distinguish between different classes by plotting the TPR against the FPR at various threshold settings. The AUC score ranges from 0 to 1, where 1 indicates a perfect model and 0.5 indicates a model that is no better than random guessing. A higher AUC value means that the model has a better ability to correctly classify images.

F1 score

The F1 score is a measure of a model’s accuracy that considers both the precision and the recall of the model. It is the harmonic mean of precision and recall and is particularly useful when the class distribution is imbalanced. A high F1 score indicates that the model has good precision and recall, meaning it correctly identifies the positive class without a high number of false positives or false negatives.

Confusion matrix

A confusion matrix is a table that is often used to define the performance of a classification algorithm. It shows the number of correct and incorrect predictions for each class. The confusion matrix provides a clear overview of the model’s performance, allowing for the calculation of various other metrics, such as precision, recall, and the F1 score.

Quantitative evaluation

We employ ResNet-50, ResNet-101, and ResNet-152 as the base models mentioned in methodology section. We synthesize data from the BUSI dataset, amplifying the final synthesized dataset to 10 times of the original size. This expanded dataset is subsequently utilized for the training of CNNs of “Method” In the task of classifying chest tumor images using the synthesized images, we compare the performances of using the original dataset against the synthesized dataset. The model training duration is set to a consistent 100 epochs across all experiments. It is crucial to highlight that for the control group, which did not employ synthesized data, we implement conventional data augmentation techniques to expand the original dataset to ten times. This approach ensures an equitable data volume between the two training methodologies in an identical number of epochs. Figure 5 illustrates the loss during the training process for the three base models with and without the incorporation of synthesized data. It is evident that the relatively shallow ResNet-50 exhibits a significantly higher training loss compared to the other two deeper networks, suggesting that ResNet-50 is less effective in learning for this specific task. For each base model, a discernible reduction in loss is observed when models are trained with synthesized data, indicating that our synthesized data are more conducive to model than data generated through standard data augmentation methods. The analysis reveals that augmenting the training process with both original and synthetic data significantly enhances model performance. This positive effect is likely due to changes in the characteristics of the augmented data. One key reason is that our data augmentation method specifically focuses on generating more challenging samples to address the class imbalance issue in the original dataset.

Figure 5 The loss of our method.

In our analysis presented in Figure 6, we show the accuracy of the models on the validation set during the training process. The performance of ResNet-50 is notably lower compared to the other two more profound networks. Benefiting from its deeper architecture, ResNet-152 demonstrates superior performance in this classification task, outperforming the other two networks. We observe that the introduction of synthesized data brings a positive effect to all networks, with an increase in accuracy of 2–3 percentage points.

Figure 6 The accuracy of our method.

We note that synthesized data may have a more pronounced effect on deeper networks. Furthermore, ResNet-152 exhibits the fastest convergence when trained with synthesized data, indicating that synthesized data not only enhance the learning process, but also accelerate the convergence of the model.

We present ROC curves for three classification models as shown in Figure 7. The difference of AUC represents the performance difference of the three networks in classification task. We show the performance of our method compared with the SOTA (26) on the BUSI in Table 1. The SOTA reports the performance of 6 models with base networks of CNN and ViT. Our method achieves optimal performance on both accuracy and F1. Table 2 shows the performance of our approach on self-constructed dataset, where the top three in the table act as a control group and directly use the corresponding network for classification. Table 2 also shows that our proposed method is equally effective on our self-constructed dataset.

Figure 7 The ROC curve of our method. ROC, receiver operating characteristic.

Table 1

Anomaly detection results on BUSI dataset

Methods	Framework	Accuracy (95%)	AUC (95%)	F1 score (95%)
ResNet-50 (26)	CNN	0.862	0.879	0.864
EfficientNetB2 (26)	CNN	0.851	0.864	0.856
InceptionNetV3 (26)	CNN	0.854	0.87	0.859
Vitb 16 (26)	ViT	0.942	0.9541	0.936
Vitb 32 (26)	ViT	0.934	0.9548	0.932
Vitl 32 (26)	ViT	0.918	0.9351	0.912
ResNet-50 (ours)	CNN	0.891	0.885	0.891
ResNet-101 (ours)	CNN	0.925	0.912	0.921
ResNet-152 (ours)	CNN	0.960	0.951	0.938

The italicized values represent state-of-the-art (SOTA) performance. 95% means the value is calculated at the 95% confidence level. AUC, area under the curve; BUSI, Breast Ultrasound Images Dataset; CNN, Convolutional Neural Network; ViT, Vision Transformer.

Table 2

Anomaly detection results on our dataset

Methods	Framework	Accuracy (95%)	AUC (95%)	F1 score (95%)
ResNet-50	CNN	0.726	0.817	0.844
ResNet-101	CNN	0.812	0.842	0.852
ResNet-152	CNN	0.937	0.860	0.860
ResNet-50 (ours)	CNN	0.819	0.845	0.881
ResNet-101 (ours)	CNN	0.921	0.901	0.913
ResNet-152 (ours)	CNN	0.958	0.931	0.930

The italicized values represent state-of-the-art (SOTA) performance. 95% means the value is calculated at the 95% confidence level. AUC, area under the curve; CNN, Convolutional Neural Network.

Discussion

We carry out an analysis of the confusion matrices generated by three different networks trained utilizing synthetic data in Figure 8. Specifically, the ResNet-50 misclassifies 56 images, ResNet-101 misclassifies 32 images, and ResNet-152 misclassifies 24 images. For benign images that were incorrectly classified, both ResNet-50 and ResNet-101 mainly misclassify them as normal images, while ResNet-152 primarily misclassifies them as malignant images. Concerning misclassified malignant images, ResNet-50 and ResNet-152 mostly misclassify them as normal images, whereas ResNet-101 mostly misclassifies them as benign images. For normal images that are incorrectly classified, ResNet-50 and ResNet-101 primarily misclassify them as malignant images, while ResNet-152 mostly misclassifies them as benign images.

Figure 8 The confusion matrix for our method. From left to right, the base models are ResNet-50, ResNet-101, and ResNet-152.

We present the classification visualization of 6 test images by the above ResNet-152 network in Figure 9. Five methods are employed to visualize the activation areas corresponding to the classification results via heat maps. Visualization techniques for CNNs are critical tools for understanding how they operate internally. By using these techniques, we can determine the regions that the model focuses on when making specific decisions. The Grad-CAM is a popular visualization method for identifying regions in an image that most significantly influence the model’s predictions. It uses gradient information of the model to generate an activation map [class activation map (CAM)], emphasizing areas in the input image most relevant to model predictions. Grad-CAM++ is an upgraded version of Grad-CAM that rectifies potential inaccuracies in the original mechanism’s mapping process by considering spatial consistency of gradients for generating CAM. EigenCAM is a feature map-based visualization technique that uses principal component analysis (PCA). It identifies patterns best representative of class activations by analyzing covariance matrices of feature maps. Layer-CAM is well suited for any layer in the model, focusing not just on the last feature map of the model, thereby generating CAMs corresponding to different depths. Lastly, XGrad-CAM is an extension of Grad-CAM, which allows the use of different types of pooling layers, such as global max pooling (GMP), in addition to GAP. In summary, we believe that the heat maps generated by GradCAM++ and LayerCAM show better correspondence between the activation areas and actual lesion locations. This suggests that in real clinical practice, physicians tend to derive diagnoses based on image characteristics in these relevant regions. We also think that these visualization techniques can assist in conveying more practical diagnostic recommendations for the task of classifying chest ultrasound images.

Figure 9 Representative heatmap visualizations for three malignant samples (A-C) by our ResNet-152 network.

Conclusions

In this paper, we propose a novel Dual-Discriminator GAN-based method for breast ultrasound imaging classification. The Dual-Discriminator GAN addresses the challenge of dataset imbalance by iteratively synthesizing data, which significantly enhances the representation of underrepresented classes. The integration of channel and spatial attention mechanisms within our CNN architecture allows for a more refined feature extraction, enabling the model to discern subtleties in ultrasound images that may be critical for accurate classification. Extensive experiments demonstrate our method’s superiority over existing techniques, with improved accuracy and faster convergence. The qualitative evaluation through visualization techniques, such as GradCAM++ and LayerCAM further corroborates the practical diagnostic value of our method. These tools not only illuminate the decision-making process of the model but also assist in providing actionable insights for clinical practice.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2024-2496/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2024-2496/dss

Funding: None.

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2024-2496/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the institutional ethics board of Chifeng Municipal Hospital (No. CK20250721) and informed consent was taken from all the patients.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Sood R, Rositch AF, Shakoor D, Ambinder E, Pool KL, Pollack E, Mollura DJ, Mullen LA, Harvey SC. Ultrasound for Breast Cancer Detection Globally: A Systematic Review and Meta-Analysis. J Glob Oncol 2019;5:1-17. [Crossref] [PubMed]
Aristokli N, Polycarpou I, Themistocleous SC, Sophocleous D, Mamais I. Comparison of the diagnostic performance of Magnetic Resonance Imaging (MRI), ultrasound and mammography for detection of breast cancer based on tumor type, breast density and patient's history: A review. Radiography (Lond) 2022;28:848-56. [Crossref] [PubMed]
Xie J, Song X, Zhang W, Dong Q, Wang Y, Li F, Wan C. A novel approach with dual-sampling convolutional neural network for ultrasound image classification of breast tumors. Phys Med Biol 2020;65: [Crossref] [PubMed]
Al-Dhabyani W, Gomaa M, Khaled H, Aly F. Deep learning approaches for data augmentation and classiffcation of breast masses using ultrasound images. Int J Adv Comput Sci Appl 2019;10:1-11.
Daoud MI, Abdel-Rahman S, Alazrai R. Breast ultrasound image classiffcation using a pretrained convolutional neural network. 2019 15th International Conference on Signal-Image Technology & InternetBased Systems (SITIS). Sorrento, Italy. IEEE;2019:167-71.
Sharma AK, Nandal A, Dhaka A, Dixit R. Medical image classiffcation techniques and analysis using deep learning networks: a review. In: Patgiri R, Biswas A, Roy P, editors. Health informatics: a computational perspective in healthcare. Singapore: Springer; 2021:233-58.
Cai L, Gao J, Zhao D. A review of the application of deep learning in medical image classification and segmentation. Ann Transl Med 2020;8:713. [Crossref] [PubMed]
Kim HE, Cosa-Linan A, Santhanam N, Jannesari M, Maros ME, Ganslandt T. Transfer learning for medical image classification: a literature review. BMC Med Imaging 2022;22:69. [Crossref] [PubMed]
Smitha P, Shaji L, Mini MG. A review of medical image classiffcation techniques. International conference on VLSI, Communication & Intrumrnataiom 2011;11:34-8.
Garcea F, Serra A, Lamberti F, Morra L. Data augmentation for medical imaging: A systematic literature review. Comput Biol Med 2023;152:106391. [Crossref] [PubMed]
Chen Y, Yang XH, Wei Z, Heidari AA, Zheng N, Li Z, Chen H, Hu H, Zhou Q, Guan Q. Generative Adversarial Networks in Medical Image augmentation: A review. Comput Biol Med 2022;144:105382. [Crossref] [PubMed]
Khan AR, Khan S, Harouni M, Abbasi R, Iqbal S, Mehmood Z. Brain tumor segmentation using K-means clustering and deep learning with synthetic data augmentation for classification. Microsc Res Tech 2021;84:1389-99. [Crossref] [PubMed]
Tandon R, Agrawal S, Chang A, Band SS. VCNet: Hybrid Deep Learning Model for Detection and Classification of Lung Carcinoma Using Chest Radiographs. Front Public Health 2022;10:894920. [Crossref] [PubMed]
Shi H, Lu J, Zhou Q. A novel data augmentation method using style-based gan for robust pulmonary nodule segmentation. 2020 Chinese Control and Decision Conference (CCDC). Hefei, China. IEEE;2020:2486-91.
Moghadam PA, Van Dalen S, Martin KC, Lennerz J, Yip S, Farahani H, Bashashati A. A morphology focused diffusion probabilistic model for synthesis of histopathology images. Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2023:2000-9.
Karras T, Laine S, Aila T. A Style-Based Generator Architecture for Generative Adversarial Networks. IEEE Trans Pattern Anal Mach Intell 2021;43:4217-28. [Crossref] [PubMed]
Ho J, Jain A, Abbeel P. Denoising diffusion probabilistic models. Advances in neural information processing systems 2020;33:6840-51.
Abdel-Nasser M, Melendez J, Moreno A, Omer OA, Puig D. Breast tumor classiffcation in ultrasound images using texture analysis and superresolution methods. Engineering Applications of Artiffcial Intelligence 2017;59:84-92.
Mishra AK, Roy P, Bandyopadhyay S, Das SK. Breast ultrasound tumour classiffcation: A machine learning—radiomics based approach. Expert Systems 2021;38:e12713.
Huang YL, Wang KL, Chen DR. Diagnosis of breast tumors with ultrasonic texture analysis using support vector machines. Neural Comput & Applic 2006;15:164-9.
Singh R, Ahmed T, Kumar A, Singh AK, Pandey AK, Singh SK. Imbalanced breast cancer classiffcation using transfer learning. IEEE/ACM transactions on computational biology and bioinformatics 2020;18:83-93. [Crossref] [PubMed]
Ragab DA, Attallah O, Sharkas M, Ren J, Marshall S. A framework for breast cancer classification using Multi-DCNNs. Comput Biol Med 2021;131:104245. [Crossref] [PubMed]
Wakili MA, Shehu HA, Sharif MH, Sharif MHU, Umar A, Kusetogullari H, Ince IF, Uyaver S. Classification of Breast Cancer Histopathological Images Using DenseNet and Transfer Learning. Comput Intell Neurosci 2022;2022:8904768. [Crossref] [PubMed]
Ayana G, Dese K, Choe SW. Transfer Learning in Breast Cancer Diagnoses via Ultrasound Imaging. Cancers (Basel) 2021;13:738. [Crossref] [PubMed]
Hosni M, Abnane I, Idri A, Carrillo de Gea JM, Fernández Alemán JL. Reviewing ensemble classification methods in breast cancer. Comput Methods Programs Biomed 2019;177:89-112. [Crossref] [PubMed]
Ayana G, Choe SW. BUViTNet: Breast Ultrasound Detection via Vision Transformers. Diagnostics (Basel) 2022;12:2654. [Crossref] [PubMed]
Alzubaidi L, Santamaría J, Manoufali M, Mohammed B, Fadhel MA, Zhang J, Al-Timemy AH, Al-Shamma O, Duan Y. Mednet: pre-trained convolutional neural network model for the medical imaging tasks. arXiv:2110.06512 [Preprint]. 2021. Available online: https://doi.org/10.48550/arXiv.2110.06512
Chen B, Liu Y, Zhang Z, Lu G, Kong AWK. Transattunet: Multi-level attentionguided u-net with transformer for medical image segmentation. IEEE Transactions on Emerging Topics in Computational Intelligence, 2023. Available online: https://arxiv.org/pdf/2107.05274
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, March M, Lempitsky V. Domain-adversarial training of neural networks. Journal of machine learning research 2016;17:1-35.
Zhu JY, Park T, Isola P, Efros AA. Unpaired image-to-image translation using cycle-consistent adversarial networks. Proceedings of the IEEE international conference on computer vision. Venice, Italy. IEEE;2017:2223-32.
Hoffman J, Tzeng E, Park T, Zhu JY, Isola P, Saenko K, Efros A, Darrell T. Cycada: Cycle-consistent adversarial domain adaptation. Proceedings of the 35th International Conference on Machine Learning. PMLR;2018:1989-98.
Hijab A, Rushdi MA, Gomaa MM, Eldeib A. Breast cancer classiffcation in ultrasound images using transfer learning. 2019 Fifth international conference on advances in biomedical engineering (ICABME). Tripoli, Lebanon. IEEE;2019:1-4.
Yao Z, Luo T, Dong Y, Jia X, Deng Y, Wu G, et al. Virtual elastography ultrasound via generative adversarial network for breast cancer diagnosis. Nat Commun 2023;14:788. [Crossref] [PubMed]
Al-Dhabyani W, Gomaa M, Khaled H, Fahmy A. Dataset of breast ultrasound images. Data Brief 2020;28:104863. [Crossref] [PubMed]

Cite this article as: Zhao X, Zhao H, Cheng Z. Dual-discriminator network-based classification method for breast ultrasound imaging. Quant Imaging Med Surg 2026;16(5):412. doi: 10.21037/qims-2024-2496

Dual-discriminator network-based classification method for breast ultrasound imaging

Introduction

Methods

Dataset

Proposed method

Dual-Discriminator GAN for iterative data synthesis

Deep transfer learning with channel and spatial attention

Channel attention integration

Spatial attention integration

Results

Evaluation metrics

Accuracy

ROC curve

F1 score

Confusion matrix

Quantitative evaluation

Table 1

Table 2

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share