A multibranch fusion network (MBF-Net) for intracranial large-vessel occlusion classification
Introduction
Background
Stroke poses a major threat to both human life and health (1). The most devastating type of ischemic stroke is acute ischemic stroke (AIS) caused by large-vessel occlusion (LVO), responsible for 60% of stroke-related disabilities and 90% of stroke-related deaths in clinical practice (2). The burden of the condition, however, can be significantly lessened if patients are given a correct diagnosis and treatment as soon as it manifests. This is advantageous for the patient’s rehabilitation and also significantly lessens the impact of complications incurred by the disease (3). Therefore, timely diagnosis of LVO and appropriate treatment based on the patient’s symptoms are crucial.
Recent experimental studies have demonstrated that the prognosis of patients with anterior circulation LVO treated with endovascular therapy (EVT) is superior to that of patients treated with intravenous thrombolysis (4-6). Consequently, researchers and clinicians have increasingly focused on EVT as a preferred treatment option. However, both treatment modalities have an optimal treatment time window and are highly time-dependent, particularly for EVT. Therefore, determining whether the patient has LVO and the exact location of the occluded blood artery is critical. Radiologists typically use computed tomographic angiography (CTA) to confirm the status of intracranial LVO in patients and to identify the responsible blood vessel. CTA is also an integral imaging component for anterior circulation thrombectomy. However, a single head CTA examination can result in hundreds of high-resolution images that need to be reviewed, creating difficulties for radiologists and clinical imaging workflows and potentially leading to delaying the patient’s course of treatment. Therefore, to ensure that patients can receive the most effective surgical treatment possible, it is crucial for radiologists to quickly and accurately identify whether LVO is present and location of the relevant blood vessels.
In many rural areas across China, medical resources remain scarce. There is a shortage of stroke specialists, and access to advanced imaging and interventional equipment is limited, severely hindering the timely diagnosis and treatment of stroke survivors. Studies have shown that the prevalence and mortality rates of stroke in rural regions are significantly higher than those in urban areas (7). Against this backdrop, a technology capable of rapidly and accurately identifying LVO and its occlusion site is essential to ensuring that patients receive the most effective surgical treatment in a timely manner.
In recent years, there have been significant efforts and positive outcomes in identifying LVO tasks. However, several issues persist. The bulk of the related research has focused solely on binary classification tasks, failing to pinpoint the exact location of occluded blood vessels. Moreover, these studies often use pre-existing network models for training and classification without adapting the networks to the unique requirements of the LVO task. Specifically, these networks are typically trained under the assumption of balanced data and do not address the specific characteristics of the medical condition. However, the actual collected data commonly exhibits a long-tail effect, resulting in severe imbalance among sample categories. Therefore, it is crucial to consider strategies for training a robust classification model with imbalanced input.
To address these challenges, in this paper, we propose a multibranch fusion network (MBF-Net) for LVO classification with an auxiliary attention guidance module (AAGM). By leveraging the distinct learning capabilities of three branches, our fusion model effectively mitigates overfitting on head data and underfitting on tail data, thereby addressing the long-tail effect problem. Specifically, the initial two branches of the network focus on extracting features related to healthy and patient populations, respectively. Subsequently, the third branch dynamically integrates the feature information from the first two branches to pinpoint the responsible blocked vascular region. This fusion approach not only helps in combating the long-tail effect issue but also prevents the model from disproportionately emphasizing head data at the expense of tail data. The fusion model ensures a balanced classification accuracy across all classes by preventing the favoring of the majority class. When a binary classification task is transformed into a multiclassification task for locating specific occluded vessels, detecting responsible vessels can be challenging due to physiological structural differences. To overcome this challenge, we introduce two modules to enhance the learning capacity of the model: branch hierarchical deep aggregation (BHDA) and semantic information enhancement (SFE). BHDA aggregates feature information at different depths of each branch to preserve richer spatial location information, while SFE actively identifies and merges the most valuable characteristics on the feature maps of each branch to acquire more robust semantic information. Importantly, we include the AAGM to mimic the behavior of medical professionals who focus on the blood vessel region during diagnosis, guiding the model to emphasize the blood vessel region and leading to a more interpretable heat map visualization. Notably, the proposed method not only assists clinicians in identifying critical information from a large number of CTA images but also achieves precise localization of the occluded vessel. This capability holds significant clinical value, as it can support the optimization of endovascular treatment strategies, such as the selection of the most appropriate thrombectomy route and the development of personalized surgical plans. In addition, the method substantially reduces the burden of manual annotation, offering key references and valuable insights into the development and integration of large-scale multimodal artificial intelligence (AI) models in stroke diagnosis and interventional decision-making. In this study, extensive tests were conducted on the LVO dataset to validate the efficacy of our proposed model. The primary contributions of our research are expressed as follows:
- We propose MBF-Net for LVO classification that effectively addresses the long-tail effect problem in the dataset by utilizing the properties of several branches with various learning focuses;
- To address the challenge of detecting occluded responsible vessels due to physiological structural differences, we designed two modules, BHDA and SFE, which enhance the model’s ability to extract information and effectively improve the final classification accuracy;
- Through the application of heat map visualization, we demonstrate that our AAGM, which guides the model’s attention focus, can enhance the interpretability of the model’s decision-making results.
Related work
LVO detection
The risk of disability and mortality associated with AIS induced by LVO is frequently higher than that induced by other types, but this can be greatly mitigated by timely and efficient opening of the occluded arteries. Therefore, it is crucial to determine whether a patient has LVO. Prehospital assessment scales based on simplifications of the National Institutes of Health Stroke Scale (NIHSS) scoring items (8), such as the Rapid Arterial Occlusion Evaluation (RACE) scale (9) and the Prehospital Acute Stroke Severity (PASS) scale (10), are used to identify LVO early in the course of a stroke. Although scale-based methods can be conducted quickly, they do not account for all details and rely on the expertise of medical staff for practical utility, and thus their accuracy remains unsatisfactory. A greater number of studies are focusing on deep learning technology instead of prehospital LVO prediction scales due to deep learning’s superior performance in medical applications. For instance, Stib et al. (11) preprocessed numerous CTA sequence images from multiple phases into a single-input image, employing maximum density projection technology and heuristic retinal vascular segmentation technology to achieve good performance on the LVO dataset using DenseNet-121. Nishi et al. (12) proposed an alternative method to direct prediction. Their approach for clinical outcome prediction involved the use of high-dimensional imaging features collected from ischemic lesion segmentation tasks. They adopted a joint training strategy for segmentation and classification tasks, and the final experimental results showed that this approach could effectively improve classification accuracy. Meanwhile, You et al. (13), using structured demographic data, clinical data, and non-contrast computed tomography (NCCT) imaging features acquired from a deep learning model, related a multilevel AIS assessment model and successfully used it for the LVO binary classification test. This work has proven the viability of employing deep learning technology to identify LVO, but these methods involve immediately training own data through existing networks without further investigation of the potential relationships between the characteristics of the training data and the chosen network architectures. Regarding this, Remedios et al. (14) examined how architectural components (such as parallel processing branches and residual connections) in convolutional neural networks (CNNs) affect the LVO detection performance on medical images using a dataset with 300 samples. They discovered that networks with more skip connections have a greater potential for generalization ability. In addition, a study on DeepSymNet (15) used the input from the left and right hemispheres of the brain to compare voxel differences at comparable positions on both sides and determine the presence of LVO. Similarly, brain symmetry information was also employed by Tolhuisen et al. (16) to detect LVO and to segment high-density lesions. However, none of these studies addressed the class sample imbalance issue in medical data and were restricted to determining the presence of LVO.
Long-tailed image classification
The long-tail effect problem, which frequently appears in real-world data images, notably medical images, can lead deep models to overemphasize head data while undervaluing learning from tail data. Several attempts have been made to address this issue. The initial strategy was to resample the data, which typically involves two approaches. One is under sampling via the omission of a portion of head data (17), but doing so destroys the original data distribution. The other is oversampling by repeatedly extracting data from tail data (18), which can result in overfitting on tail data and significantly affect the model’s ability to generalize. As research advances, there has been greater emphasis on altering the loss function to increase tail sample weights and decrease head weights in order to lessen the negative impacts of imbalanced categorization. In their study, Cui
et al. (19) defined the concept of “class actual sample effective number”, which is distinct from the sample number, and based the reweighting coefficient on it. In the study by Ren et al. (20), weights for training samples were automatically determined depending on the gradient direction. Moreover, Sinha et al. (21) and Yang et al. (22) varied weight distribution dynamically in response to variations in model learning difficulty. Recently, decoupled learning-based approaches have also received attention. For instance, Zhou et al. (23) discovered that selecting the right data-sampling strategies for various learning tasks can significantly increase the accuracy of long-tail classification. On this foundation, Wang et al. (24) demonstrated through tests that selecting suitable loss functions, in addition to sampling strategies, for various task branches can also increase model performance. Furthermore, several two-stage fine-tuning methods (25,26) have been proposed and achieved a degree of success in handling long-tail issues. These approaches divide the training process into two distinct stages. The network is initially trained using the original, unbalanced data, as is customary, and in the second stage, only rebalanced data are used to fine-tune the network with a smaller learning rate. We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2255/rc).
Methods
As shown in Figure 1, the proposed MBF-Net consists of three branches, the conventional learning branch (Branch C), reverse sampling branch (Branch R), and balanced branch (Branch B), used for universal representation learning, patient representation learning, and classifier learning, respectively. The input sample label pairs for the three branches, , , and , are all derived from the same dataset, but the sampling strategies differ (see the “Branch samplers” section), with x being a training sample, being its corresponding label, and L being the number of categories.
The network’s full training procedure can be summarized as follows: first, the sample label pairs generated with different strategies for sampling are sent to the relevant network branches. They acquire their respective branch feature vectors , , and after passing through the modified backbone network with shared weights (see the “Modified backbone” section). Subsequently, for the feature vectors and of the Branch C and Branch R, two different types of operations are completed. In the first step, the feature vectors are sent to their corresponding independent convolution layers and to obtain feature vectors and , and then to obtain classification probabilities and based on feature vectors and . Second, feature vectors and are also input into the AAGM used to guide model attention tendencies (see the “Auxiliary attention guidance module” section) to obtain segmentation probability maps and . The feature vector needs to pass through and to output and , and then is input into the adaptive fusion module (AFM) (see the “AFM” section) to acquire fusion feature vector , which in turn is sent to the classifier to obtain the classification probability . Finally, the overall loss function of the network is constructed using classification probabilities , , and , along with segmentation probability maps and . The cumulative classification loss is calculated via the classification probabilities , , and as follows:
where represents the cross-entropy loss function, and is a real classification label corresponding to three branches. Consistent with the description in reference (23), is used to shift the learning focus of triplet branches. and are the current epoch and the number of total training epochs, respectively. For the segmentation map, and are used to construct the auxiliary guidance loss function , which is calculated as follows:
where represents the binary cross-entropy loss function, and and are the binary segmentation labels of the corresponding branches. The final overall loss function of the entire network can be described as follows:
where the relative weights of each component of the loss are adjusted by the use of the variables and . The final validation experiment for weight selection indicated that the best outcome occurs when is set to 1 and is set to 0.5.
Branch samplers
This study employed a variety of sampling strategies for the difference branches (27). A detailed schematic is provided in Figure 2. We employ a uniform sampler that maintains the original data distribution for Branch C, which focuses on learning the image feature representation of healthy individuals. It ensures that each sample is chosen with an equal chance in each sampling period and can eventually extract more normal samples (since there are far more normal samples in the dataset than patient samples), which is favorable for learning normal sample feature representation. With a reverse sampler, which the Branch R employs, the sampling probability of each class is directly proportional to the reciprocal of its sample size, meaning that the more samples a class has, the lower the sampling probability. The benefit of this is that when blood vessel occlusion occurs, this branch will pay greater attention to patient samples with fewer data and learn more about patients’ feature information. Because both of these two branches have their own focus groups, using the output of any one branch as the result of inference is biased. We employ Branch B as the end result output branch (relating to the inference stage). The purpose of this branch is to treat all categories in the dataset equally and then combine the important feature expressions learned by the previous two branches to predict the final outcome. To this end, a class-balanced sampler is implemented, which assigns equal sampling probabilities to all categories. This facilitates the final classification learning by keeping the final sampling quantity of each class consistent in this branch.
Modified backbone
The length and orientation of the vascular veins in each person’s brain are not immediately discernible due to their unique physiological structure. Consequently, the multiclassification task of correctly identifying blocked responsible blood vessels is more challenging than the binary classification of determining LVO because it requires the backbone network to extract more abundant and effective feature information. Following the experimental evaluation of widely used classic classification backbone networks, the high-resolution network (HR-Net) (28) was ultimately chosen as the fundamental backbone network.
With constant cross-resolution information exchange, HR-Net is able to preserve high-quality spatial information and to extract robust contextual semantic information. Given the small sample size in our dataset, we employed a three-stage HR-Net in this study to reduce the possibility of model overfitting. Even with HR-Net extracting reliable contextual semantic information, further improvement is possible: (I) feature extraction for each block can be enhanced; and (II) the strategy for multilevel feature fusion in the final classification can be refined. Consequently, we modified the original network and created two-unit modules: BHDA and SFE. The former aggregates various depth feature information in each stage of each branch to maintain more abundant spatial information, while the latter actively searches for and incorporates the most valuable information on the feature maps of each branch to acquire more robust semantic features. Through the employment of these two modules, the network’s increased ability to extract more useful classification information allows it to effectively address the difficulty in categorizing blocked responsible blood vessels caused by the idiosyncrasies of the physiological structure.
BHDA
The network must save higher-quality spatial feature information to achieve precise positioning of the occluded responsible blood vessels, a task which the BHDA module excels at. In order to retain higher-quality spatial feature information in each branch with less overhead cost, this module integrates shallow-level and deep-level feature information at each stage of each branch. This aids in the network’s ability to determine the exact positioning of LVOs. Figure 3 illustrates the structure of the BHDA module. The BHDA method groups two blocks (each consisting of two 3×3 convolutions) into a node and then connects the blocks and nodes on the branch in the form of a tree. This involves more than merely aggregating all of the block units and nodes along the tree’s upward routing. In order to better retain spatial information, we also provide the output of the aggregation node (node) as input to the following group of blocks. In comparison to DenseNet, the BHDA module offers faster computation. This is because DenseNet requires skip connections at each layer, whereas the BHDA module achieves similar functionality with less computational cost. The operations of the BHDA module can be formally expressed as follows:
where and are the outputs of the two blocks under the nth node, is the output of the nth node, is the deep layer aggregation (DLA) (29) operation, and is the convolution block. The ultimate output of BHDA includes a greater abundance of feature information at various depth levels as compared to the output produced by continuous convolution operations.
Semantic feature enhancement
An abundance of robust, high-level semantic information can ensure the network’s ability to accurately classify the obstructed responsible blood vessels. Informed by previous work (30), we developed an SFE module that actively seeks out and incorporates the most valuable features in the feature maps of each branch to obtain richer semantic information, thereby improving the network’s final classification performance for occluded responsible blood vessels. The combination of cross-attention and self-attention is the core of the SFE module (see Figure 4 for the specific structure). The module uses the low-level feature map as Q to query the highest-level feature map as K and V and then obtains the new feature map after active interaction, which is the same size as (when , it becomes a self-attention mechanism). In this process, the highest-level semantic information is enhanced in the regions of the low-level feature maps through attention mechanisms in order to fulfill the needs of the high-level features. The enhanced low-level features are then uniformly downsampled to the same size as that of the high-level semantic feature maps and concatenated with them to enhance the high-level semantic features. Therefore, this top-down, nonlocal interaction allows for the enhancement and integration of important low-level feature information into high-level feature maps, which improves the network’s capacity to extract higher-level semantic information, which can be expressed as follows:
where and are the feature values at positions i and j, respectively, on the low-level feature map and the highest-level feature map ; and are the function values obtained by transforming functions , , and ; is a function for measuring similarity; is a weight aggregation function (in this case, matrix multiplication); is a normalization function for softmax; and is the output value at position i following the feature interaction transformation of . Following the completion of all transformations, all new features must be unified to the size of , and the final feature output is acquired via 1×1 convolution.
AFM
The Branch C and Branch R, respectively, concentrate on the vascular feature information of healthy individuals and patients. It is biased for predicting results using either branch. The Branch B, however, can balance the learning focus of vascular features between healthy individuals and patients and is suitable for final classification because its AFM can dynamically and adaptively fuse the feature information of the first two branches (see Figure 5 for this details of the AFM structure). The new feature vectors and ( and ) are produced by feeding the Branch B feature through the distinct convolution layers and of the other two branches. After entering the module, they take different routes to complete weight allocation in spatial position and channel response, and then and the final feature fusion is completed adaptively based on their respective weights. To extract weights in spatial position, two convolution layers and with kernel size 1×1 are first used to compress and by halving the number of channels. They are then concatenated along the channel to produce . The function is then used to extract the mapping values of the two input features in spatial position as follows:
where is the importance of each pixel value in spatial position of the two input features. is sent into the softmax function to acquire the final weights and of and in spatial position to capture the dependency relationship between the two input features. The operation to compute the spatial attention weights is formulated as follows:
where is the pixel value at position on the k-th channel of , and , and are the pixel weights of features and at spatial location , respectively. The idea of obtaining weights at the channel dimension is similar to that described in another work (31). To obtain the fused feature , the two input features, and , are first added element by element at the corresponding channel positions. For particular responses , global average pooling is then employed to acquire global channel information. Finally, feature interaction is completed via a fully connected layer. This module is unique in its final extraction operation, as the channel mappings for the two branches are separately extracted via two independent extraction functions, as follows:
where is a fully connected function , and is its weight parameter. Additionally, softmax operations are executed on the values at corresponding channel positions of to acquire the relative weights of the two input features in order to quantify the impact of the two input features on the final output in the channel dimension, as follows:
where is the current m-th channel, , and , , and are channel weight sets of the two branches. After the weighting of spatial position and channel response is combined, the final output of this module, , is the fused result of adding the respective position elements of and , as follows:
AAGM
In the medical field, traditional deep model methods have weak explanatory power for model decision results. To address this issue, we designed an AAGM. This module mimics the behavior of radiologists focusing on the blood vessel area when diagnosing illnesses. As segmentation tasks are used to guide the model’s attention, the model is more focused on the blood vessel area, thus enhancing the interpretability of the model’s decision results. This module is similar to one proposed previously (32), and the details of its structure are provided in Figure 6. The feature vector is obtained from the initial convolution, is obtained from each branch in the final stage of the HR-Net, and and are the highest-level semantic feature vectors generated from the Branch C and Branch R through the main network. To generate the segmented map of the same size as the original image, and must be obtained using continuous upsampling, skip connections, and convolution operations. With the assumption that the true segmentation label is , the AAGM’s loss function can be calculated as Eq. [3].
This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Experimental setup
LVO dataset
The First Affiliated Hospital of Chongqing Medical University provided the CTA and NCCT images used as the source data for training and testing the proposed deep learning model. In our study, the primary objective was to identify and categorize instances of intracranial LVOs, with a specific focus on occlusions occurring in the bilateral internal carotid arteries (ICAs) and the first segment of the bilateral middle cerebral artery (MCA-M1). To maintain the targeted approach of our work and ensure the precision of our results, we established strict criteria for patient inclusion based on the location of the vascular occlusion. Patients with vascular occlusions located outside of these predefined arterial segments were excluded. Ultimately, 435 patients were included after patients with poor image quality and those whose vascular blockage location did not match the research scope were excluded. Among these patients were healthy controls, while the remaining 159 were patients (64 with right MCA occlusion, 28 with right ICA occlusion, 47 with left MCA occlusion, and 20 with left ICA occlusion). All CTA scans were performed with a SOMATOM Definition Flash CT scanner (Siemens Healthineers, Erlangen, Germany). The typical acquisition parameters were as follows: voltage, 120 kVp; exposure, 250 mAs; slice thickness, 0.75 mm; and pitch, 0.8. Contrast-enhanced images were obtained with a 70- to 100-mL bolus of iodinated contrast medium at a rate of 4–5 mL/s. This study employed a stratified sampling approach for data partitioning to ensure experimental rigor. Initially, we employed stratified sampling to randomly select 20% of the samples from each category in the original dataset to form an independent test set. Following this extraction, data augmentation techniques were applied offline to augment the data volume for categories with fewer samples. Subsequently, the dataset was divided into training and validation sets in an 8:2 ratio. Figure 7 presents the final dataset generation process used in this study. To acquire the raw data, we first employed hospital workstation equipment to remove interference from superfluous information, such as that related to the skull. We then obtained images of the cerebral vessels using maximum density projection processing on a computer. Finally, the case-level categorization labels and pixel-level responsible vessel annotation tasks were completed by three radiologists with extensive clinical expertise, each with over 5 years of experience in neurovascular imaging. Annotations were aggregated through a consensus approach, in which the initial independent annotations were followed by blinded reviews and group discussions to resolve any discrepancies, ensuring the reliability and accuracy of the final labels.
Implementation details
The experiments were conducted with a GeForce GTX 2070 SUPER (Nvidia, Santa Clara, CA, USA) setting with 8 GB of memory and built on the PyTorch framework. The batch size was 16, and the input image size was 256×256. With a momentum of 0.9 and an initial learning rate of 0.0001, adaptive moment estimation (ADAM) was chosen as the network optimizer. With a warm-up strategy, this learning rate increased over the first 10 epochs and decreased at a rate of 0.1 during the 80th and 120th training epochs. No pretrained model was used. Model weights were initialized with the Kaiming method. To ensure effective training with limited data, we applied image augmentation (rotation, flipping, and scaling) and dropout and used a compact HR-Net architecture. The total number of training epochs was 150.
Three branches input images during the training phase using various sampling strategies. During the training phase, the three branches processed images sampled using different strategies to enhance learning diversity and manage any class imbalance. The AAGM and AFM were also active. AAGM contributes to attention learning via auxiliary segmentation, while AFM adaptively fuses feature representations. During the inference phase, only Branch B was used to make the final prediction. The AFM module remained active to integrate features from the other two branches. In contrast, the AAGM and segmentation supervision were disabled, as they are only designed to guide feature learning during training. This setup ensured efficient inference while preserving the performance benefits learned during training. Sensitivity and precision were used as evaluation metrics to assess how well the model performed in each classification, while accuracy and macro-F1 were used to assess how effective the model was overall. To quantitatively evaluate the classification performance of the model, the following metrics were calculated: sensitivity, precision, accuracy, and macro F1-score, which are defined as follows:
In Eq. [13], TP is the number of true positives, FP is the number of false positives, FN is the number of false negatives, TN is the number of true negatives, and N is the number of classes.
Results
Comparison to related work
First, to begin the selection process for the appropriate backbone network, we used the corresponding experimental outcomes for HR-Net as the foundational backbone. Given the small size of our own medical dataset sample, we selected the conventional classification backbone (28,33-38) with fewer parameters for initial trials. That is, the results were obtained under identical conditions, with one branch and no data augmentation for the dataset. Table 1 shows that the accuracy of most models was less than 60% and that their performance was poor. When considering the fact that healthy controls accounted for more than half of the sample size in the dataset, we concluded that in the absence of any adjustment strategy, the model’s decision-making is heavily influenced by head data (healthy controls) with an absolute advantage in quantity. This causes a significant shift in learning focus and has an impact on learning of tail data class features. Second, although HR-Net achieved the second-best accuracy of 61.76%, 0.98 percentage points less accurate than 18-layer multi-scale residual network (Res2Net-18)’s accuracy of 62.74%, we can see that HR-Net only has half of the parameters as Res2Net-18. Given these facts, we chose HR-Net as the basic backbone for the subsequent experiments.
Table 1
| Network | Param (M) | ACC (%) | F1-score (%) |
|---|---|---|---|
| ResNet-18 (33) | 13.96 | 58.82 | 30.17 |
| Res2Net-18 (34) | 13.84 | 62.74 | 44.04 |
| ResNext-18 (35) | 15.44 | 53.92 | 25.22 |
| MobileNetV3 (36) | 6.04 | 54.90 | 29.55 |
| ShuffleNetV2 (37) | 6.36 | 55.88 | 23.53 |
| HR-Net (28) | 6.39 | 61.76 | 42.30 |
| DenseNet-121 (38) | 6.96 | 60.78 | 39.74 |
ACC, accuracy; HR-Net, high-resolution network; M, million; Param, number of network parameters.
We compared different algorithms for handling the problem of significantly imbalanced class sample numbers to illustrate the superiority of the network proposed in this paper. Table 2 displays the specific experimental outcomes (* indicates the best result). Based on the sensitivity index results, our proposed approach performed best in identifying right MCA occlusion, reaching 94.44%, but generated suboptimal outcomes in identifying nearly all other vascular occlusions. Overall, the performance of our MBF-Net network is satisfactory. However, we discovered that regardless of the approach utilized, the ability to recognize right ICA occlusion was poor. According to preliminary analysis, this may be related to the image quality of this class of samples. Specifically, the low sensitivity observed for the right ICA may be attributed to both the limited number of training cases and image quality inconsistencies, such as vascular ambiguity or motion artifacts, that complicate feature extraction. Regarding the left ICA class, only four samples were included in the test set due to the small original dataset size, limiting the statistical representativeness of the evaluation. We acknowledge this limitation and will expand the dataset in future work to provide more balanced and reliable validation.
Table 2
| Methods | Sensitivity (%) | Precision (%) | ACC, average (%) | F1-MA, average (%) | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| R-MCA | R-ICA | L-MCA | L-ICA | Normal | Average | R-MCA | R-ICA | L-MCA | L-ICA | Normal | Average | ||||
| CE | 55.56 | 33.33 | 66.67 | 20.00 | 91.8.0 | 53.47 | 62.50 | 50.00 | 58.82 | 66.67 | 78.87 | 63.37 | 71.68 | 55.39 | |
| CDB (21) | 77.78 | 55.56 | 80.00 | 50.00 | 96.72* | 72.01 | 82.35 | 71.43 | 70.59 | 83.33 | 89.39 | 79.42 | 84.07 | 74.58 | |
| LDAM-DRW (25) | 77.78 | 66.67* | 86.77* | 60.00 | 95.08 | 77.24 | 93.33* | 75.00 | 68.42 | 85.71 | 90.63* | 82.62 | 85.84 | 79.06 | |
| CB-Focal (19) | 77.78 | 44.44 | 86.67* | 80.00 | 88.53 | 75.48 | 63.64 | 80.00 | 81.25 | 100.00* | 87.10 | 82.40 | 82.30 | 77.54 | |
| BBN (23) | 66.67 | 66.67* | 80.00 | 80.00 | 95.08 | 77.68 | 85.71 | 66.67 | 75.00 | 100.00* | 87.88 | 83.05 | 84.96 | 79.86 | |
| MTA-Net (39) | 77.78 | 60.00 | 86.67* | 86.67 | 77.78 | 77.78 | 78.87 | 70.59 | 83.33 | 100.00* | 80.00 | 82.56 | 84.45 | 83.11 | |
| MBH-Net (40) | 66.67 | 44.44 | 80.00 | 77.78 | 80.00 | 77.33 | 66.67 | 90.00 | 75.00 | 100.00* | 85.71 | 83.47 | 86.25 | 82.42 | |
| LVO-Net (41) | 66.67 | 66.67 | 80.00 | 100.00* | 86.67 | 80.00 | 90.00 | 90.00 | 100.00* | 78.95 | 87.88 | 88.93 | 86.93 | 81.86 | |
| MBF-Net (w/o AAGM) | 83.33 | 44.44 | 80.00 | 90.00 | 95.08 | 78.57 | 68.18 | 100.00* | 92.31 | 90.00 | 90.63 | 88.22 | 86.73 | 81.01 | |
| MBF-Net (with AAGM) | 94.44* | 55.56 | 80.00 | 80.00 | 93.44 | 80.69* | 80.95 | 100.00* | 75.00 | 100.00* | 90.48 | 89.29* | 87.61* | 83.37* | |
*, the best value. “Sensitivity” denotes recall; “Normal” denotes normal vessels. AAGM, auxiliary attention guidance module; ACC, accuracy; BBN, bilateral-branch network; CB-Focal, class-balanced focal loss; CDB, class-wise difficulty-balanced loss; CE, cross-entropy loss; F1-MA, macro F1-score; L-ICA, left internal carotid artery; L-MCA, left middle cerebral artery; LDAM-DRW, label-distribution-aware margin loss with deferred re-weighting; LVO, large-vessel occlusion; LVO-Net, large-vessel occlusion network; MBF-Net, multibranch fusion network; MBH-Net, multi-branch hybrid network; MTA-Net, multi-task attention network; R-ICA, right internal carotid artery; R-MCA, right middle cerebral artery; w/o, without.
For the precision index, our proposed MBF-Net model achieved a perfect score of 100% when diagnosing bilateral carotid artery occlusion, while its precision for other vascular occlusion types was lower. This indicates that MBF-Net is particularly reliable for bilateral cases relative to other types of occlusion. Finally, we can deduce from the overall evaluation indicators of accuracy and F1-score that the proposed MBF-Net model is approximately 1–2 percentage points superior to other optimal methods and provides a substantial advantage in addressing LVO datasets with severely imbalanced class sample numbers.
Comparison to fusion methods
We compared the AFM module with other fusion methods to further determine how effective the module is. Table 3 presents the results of the experiment. AFM space dimension (AFM-S) and AFM channel dimension (AFM-C) are obtained by splitting from the AFM module. With a weight of 0.5 for each branch, the 0.5+0.5 fusion approach directly combines the features produced from the Branch C and Branch R. The bilateral-branch network (BBN) strategy uses a hyperparameter to dynamically fuse multibranch features and follows the cumulative learning method used in BBN work. Similar to AFM, the attention-feature fusion module (AFFM) fusion approach also incorporates adaptive fusion on the spatial and channel levels. The difference between the two is that the AFM fusion we used employs a parallel form, whereas AFFM takes a serial form to organize the spatial and channel weights of features (i.e., first completing channel weighting and then conducting spatial position weighting). We can infer from the data in Table 3 that the 0.5+0.5 approach, which uses direct addition without design, is unable to fully explore significant semantic information, leading to poor classification accuracy. The combined effect of channel weighting and spatial position weighting is superior to either one used separately. Although both AFFM and AFM fuse spatial and channel weights, AFM outperforms AFFM in terms of performance. We believe a parallel organizational form could be more appropriate for this task. Finally, the BBN strategy fixes the weights of each branch during testing, even if it has the property of dynamic fusion during training, which is inconsistent with the training strategy. This is perhaps why the overall impact is less than that of the AFM approach.
Table 3
| Fusion method | Avg-sen | Avg-pre | ACC | F1-MA |
|---|---|---|---|---|
| AFM-S | 74.60 | 81.17 | 80.53 | 76.18 |
| AFM-C | 69.36 | 78.64 | 81.42 | 71.87 |
| AFFM | 78.26 | 84.12 | 84.07 | 80.21 |
| 0.5+0.5 | 73.26 | 78.77 | 80.53 | 75.20 |
| BBN strategy | 69.56 | 85.92 | 83.19 | 75.56 |
| AFM | 80.69 | 89.29 | 87.61 | 83.37 |
0.5+0.5 denotes weighted fusion. ACC, accuracy; AFFM, attention-feature fusion module; AFM, adaptive fusion module; AFM-C, AFM channel dimension; AFM-S, AFM space dimension; Avg-pre, average precision; Avg-sen, average sensitivity; BBN, bilateral-branch network; F1-MA, macro F1-score.
Hyperparameter analysis
We conducted comparative experiments with various values for the pertinent parameters to test the reliability of the hyperparameters selected for our model. We established an experimental breakpoint with increments of 0.1 in the range of 0–1 for β, which is the weight of the segmentation loss in the loss function. The experimental findings are presented in Table 4, while Figure 8A shows the plotted data from the table. We found that as the value of β rises, the model performance steadily improves, reaching its peak at β=0.5, after which it gradually degrades. The outcomes without the AAGM are worse than the results at high weights. This shows that using modest attention guidance can enhance the network model’s classification performance, while using too-large guidance weights can alter the model’s learning trajectory. For real-world settings, a value between 0.3 and 0.5 is recommended for β, as its effect is relatively stable across different class distributions.
Table 4
| Value | ACC | F1-MA |
|---|---|---|
| 0.1 | 80.53 | 72.83 |
| 0.2 | 82.3 | 73.96 |
| 0.3 | 82.3 | 74.09 |
| 0.4 | 83.19 | 78.2 |
| 0.5 | 87.61 | 83.37 |
| 0.6 | 86.72 | 82.75 |
| 0.7 | 81.42 | 73.33 |
| 0.8 | 82.3 | 72.62 |
| 0.9 | 81.41 | 72.88 |
| 1.0 | 78.76 | 65.17 |
ACC, accuracy; F1-MA, macro F1-score.
The effect of the type and quantity of blocks in the BHDA module on the outcomes of the experiments was also examined, with the results summarized in Table 5 and plotted in Figure 8B, where three experimental results acquired under BHDA module convolution are represented by ACC-B and F1-B, and those obtained with HR-Net’s original convolution are represented by ACC-A and F1-A; the numbers 2, 4, 6, and 8 after the letters A and B indicate the number of basic convolutions used in each block type. The results demonstrate that regardless of the block type, as the number of block layers rises, the model’s performance will initially rise and eventually fall. According to our analysis, this could be the result of poor performance caused by overfitting when the number of blocks is large and by an insufficient representation ability when the number of blocks is small. The results also show that when the number of blocks is consistent, the experimental outcomes produced by the BHDA module convolution are superior to those produced by the original convolution of the HR-Net. This is due to the BHDA module’s addition of aggregation nodes, which allows for the extraction of richer semantic information while indirectly raising the model’s parameter count. When there are many blocks, this rise in parameter count cannot be ignored. However, the proposed BHDA module performs at its best when the parameter count is equal (B4 vs. A4, A6, and A8).
Table 5
| Type | ACC (%) | F1-MA (%) | Param (M) |
|---|---|---|---|
| A2/B2 | 77.88/82.30 | 69.58/73.96 | 4.99/5.45 |
| A4/B4 | 83.19/87.61 | 75.56/83.37 | 5.94/7.10 |
| A6/B6 | 85.84/84.07 | 80.51/79.79 | 6.90/10.36 |
| A8/B8 | 79.65/79.65 | 71.45/74.20 | 7.85/16.86 |
ACC, accuracy; BHDA, branch hierarchical deep aggregation; F1-MA, macro F1-score; M, million; Param, number of parameters.
Ablation study
As shown in Table 6, we conducted ablation experiments to assess the contribution of each network module to the outcome. From the first section of Table 6, it can be observed that the model’s prediction performance improved with the addition of additional branches with various sampling strategies, raising the accuracy from the initial value of 78.76 to 83.19. This is because when information from different learning branches is merged, it produces richer feature information, which enhances the final classification impact. The second section of Table 6 further illustrates the efficacy of our proposed modules. The model’s learning performance decreased if any one of them was reduced. Additionally, we compared the BHDA, SFE, and AAGM modules with other similar-function modules such as DLA (29), feature pyramid transformer (FPT) (30), and U-shaped CNN (UNet) (32). The results showed a certain degree of improvement over the previously constructed models, further demonstrating the rationality and effectiveness of our designed modules. The BHDA and AFM modules accounted for the bulk of performance enhancement. We hypothesize that this outcome is due to BHDA’s ability to aggregate nodes and thus retain higher-quality spatial feature information and AFM’s capacity to selectively eliminate redundant information and prioritize critical information. Additionally, we found that although the AAGM’s primary goal was to guide the model’s learning attention tendency, it ultimately also enhanced the model’s capacity for learning expression, which raised the outcome by 0.88 percentage points.
Table 6
| Bran. C | Bran. R | Bran. B | BHDA | DLA | SFE | FPT | AFM | AAGM | UNet | ACC (%) | F1-MA (%) |
|---|---|---|---|---|---|---|---|---|---|---|---|
| √ | √ | √ | √ | 78.76 | 66.90 | ||||||
| √ | √ | √ | √ | √ | 82.30 | 73.75 | |||||
| √ | √ | √ | √ | √ | √ | 83.19 | 75.56 | ||||
| √ | √ | √ | √ | √ | 81.42 | 71.87 | |||||
| √ | √ | √ | √ | √ | √ | 84.96 | 79.03 | ||||
| √ | √ | √ | √ | √ | √ | 83.19 | 75.56 | ||||
| √ | √ | √ | √ | √ | √ | 86.73 | 81.01 | ||||
| √ | √ | √ | √ | √ | √ | √ | 87.18 | 80.06 | |||
| √ | √ | √ | √ | √ | √ | √ | 87.02 | 81.19 | |||
| √ | √ | √ | √ | √ | √ | √ | 86.95 | 82.13 | |||
| √ | √ | √ | √ | √ | √ | √ | 87.61 | 83.37 |
AAGM, auxiliary attention guidance module; ACC, accuracy; AFM, adaptive fusion module; BHDA, branch hierarchical deep aggregation; Bran. B, balanced branch; Bran. C, conventional learning branch; Bran. R, reverse sampling branch; DLA, deep learning aggregation; F1-MA, macro F1-score; FPT, feature pyramid transformer; SFE, semantic feature enhancement; UNet, U-Net model for segmentation.
Although deep learning in medical image recognition has produced good experimental results, it is challenging for radiologists and patients to trust the decision-making outcomes due to its opaque internal workings and decision-making processes. Therefore, it is essential to help clinicians comprehend the foundation of the model’s decision-making when implementing deep learning technology in the field of medical imaging.
The AAGM generates attention maps, representing the image areas of focus. These maps indicate the regions the model concentrates on during the classification of intracranial LVOs. These attention maps can be used by clinicians to understand and validate the model’s reasoning. For instance, in cases in which the model identifies an occlusion, the attention map can highlight the specific vascular regions contributing to this decision, allowing clinicians to cross-check with their own assessments.
In this study, we further created heat maps of model predictions using heat map visualization technology (42) to aid clinicians in understanding the rationale underlying the deep models’ conclusions. This technique is a form of visual explanation that gathers components helpful for categorization through the attention mechanism and compiles them into an interpretable support set. This produces a high-weight region at the image level, enhancing model transparency. Examples of the decision-making criteria used by our proposed model and various comparison approaches are shown in Figure 9. The first row of the figure depicts the original image with the occluded responsible vessel area outlined in red, which we hope the model will learn to use as a foundation for decision-making. The visualization of all approaches is shown in each column of images under the same categorization, with areas in deeper shades of red denoting more persuasive evidence. The figure also shows that all algorithms had roughly four vessel areas as their evidence centers in the heat map column for healthy controls. The most effective algorithm is class-balanced focal loss (CB-Focal), followed by MBF-Net, despite it having a smaller total range. However, MBF-Net (with AAGM) outperformed the other methods for occluded vessel evidence areas. Although some algorithms’ focus areas did include target areas (the regions marked in red in the original image), their focus center positions (the regions with the reddest shading) were noticeably off. It would be challenging for clinicians to appreciate that the deep models do truly provide prediction results based on critical area information if all the occlusion prediction outcomes are inadequate. When different vessels are blocked, MBF-Net (with AAGM) offers greater difference information. The position of the focus center can be adjusted within a suitable range depending on the occluded arteries. This demonstrates the effectiveness of our proposed model algorithm.
Discussion
In this paper, we propose the MBF-NET designed for the classification of intracranial LVOs. It is important to emphasize that our primary goal is not to replace clinical decision-making with AI but rather to develop a clinically oriented intelligent assistance system that provides a timely and reliable “second opinion” in complex or high-workload scenarios, thereby improving diagnostic efficiency and reducing the risk of misdiagnosis. To begin with, the network takes advantage of the traits of multiple branches with various learning goals in order to alleviate the difficulty of model training brought on by the long-tail effect in the real collected dataset. The network then uses the BHDA module and SFE module to extract more robust semantic information to address the classification challenge caused by differences in patient physiology. Additionally, to incorporate prior knowledge of radiologists’ observational tendencies into the network’s training process, we developed an AAGM. Finally, test results on the LVO dataset demonstrate that the accuracy of LVO classification can be greatly increased with our proposed network model.
However, we must also be aware of the following limitations and potential solutions for this work:
- To ensure the generalizability of the model, it is crucial to validate its performance using data from multiple medical facilities. Since this study relied on a single data source, it is necessary to include sample data from additional medical facilities in future research for training and validation purposes. This expansion in data sources will enhance the model’s ability to perform effectively across diverse clinical settings, improving its applicability and reliability.
- Radiologists exert considerable effort to annotate the segmentation labels as needed by the AAGM. However, the final segmentation accuracy is not significant when this module is actually employed, and only the segmentation labels are required to provide attention guidance. Therefore, high-precision segmentation networks’ coarse labels could be used in place of manually annotated fine labels.
- Our work focused on situations in which only one vessel is blocked due to the nature of the data that were actually collected. However, in real-world conditions, there are instances in which numerous vessels are blocked simultaneously, which can be explored further in further studies.
- Implementing MBF-Net in clinical settings presents challenges, such as integration with existing picture archiving and communication systems and ensuring data compatibility. To address these, adopting standardized data formats and using middleware can facilitate smoother integration. Extensive validation in diverse clinical environments will also be crucial to ensure reliability and accuracy. These steps are necessary to effectively incorporate MBF-Net into practical clinical use.
- In recent years, therapeutic strategies for ischemic stroke have expanded to include interventions targeting M2- and even M3-segment occlusions, extending beyond the traditional scope of LVO. Consequently, the clinical decision-making process is shifting from a binary LVO-based judgment toward more individualized criteria, such as collateral circulation quality and the extent of the ischemic penumbra—factors that may even outweigh strict adherence to the therapeutic time window. This shift in clinical paradigm offers new opportunities and value for the application of AI in stroke management. If AI systems can achieve accurate and rapid identification of collateral status or penumbral regions, they may serve as powerful tools for facilitating individualized stroke treatment. In future work, we plan to determine whether our proposed model—or its extended versions—can integrate perfusion imaging and other data types to intelligently assess these complex indicators, thereby promoting the deeper integration of AI into precision stroke management.
Conclusions
Our proposed network model demonstrated excellent performance on LVO datasets with severely imbalanced class sample numbers, indicating its potential for clinical application. Through the application of automated sample labeling and multicenter sample data, we intend to further enhance model performance in the future. Ultimately, MBF-Net could serve as a reliable auxiliary tool for radiologists, improving diagnostic efficiency and reducing the risk of misdiagnosis in clinical practice.
Acknowledgments
Part of the content of this work was previously presented in a conference and published as a preliminary version in the 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) (doi: 10.1109/BIBM55620.2022.9995268). The current version expands upon that preliminary work in terms of methodology, experiments, and clinical relevance.
Footnote
Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2255/rc
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2255/dss
Funding: This study was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2255/coif). X.L. received support from the Scientific Research Foundation of Chongqing University of Technology (Nos. 0121230235, 0103210650, and 2023ZDZ023), the Technology Campus Teaching Reform Project of Chongqing University of Technology (No. 0121249254), and the Youth Project of the Science and Technology Research Program of Chongqing Education Commission of China (Nos. KJQN202301145 and KJQN202301162). The other authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Wang W, Jiang B, Sun H, Ru X, Sun D, Wang L, Wang L, Jiang Y, Li Y, Wang Y, Chen Z, Wu S, Zhang Y, Wang D, Wang Y, Feigin VL. NESS-China Investigators. Prevalence, Incidence, and Mortality of Stroke in China: Results from a Nationwide Population-Based Survey of 480 687 Adults. Circulation 2017;135:759-71. [Crossref] [PubMed]
- Malhotra K, Gornbein J, Saver JL. Ischemic Strokes Due to Large-Vessel Occlusions Contribute Disproportionately to Stroke-Related Dependence and Death: A Review. Front Neurol 2017;8:651. [Crossref] [PubMed]
- Hsieh MJ, Chen YJ, Tang SC, Chen JH, Lin LC, Seak CJ, Lee JT, Chang KC, Lien LM, Chan L, Liu CH, Hsieh CY, Chern CM, Chen JC, Chiu TF, Hung SC, Ng CJ, Jeng JS. 2020 Guideline for Prehospital Management, Emergency Evaluation and Treatment of Patients With Acute Ischemic Stroke: A Guideline for Healthcare Professionals from the Taiwan Society of Emergency Medicine and Taiwan Stroke Society. J Acute Med 2021;11:12-7. [Crossref] [PubMed]
- Albers GW, Marks MP, Kemp S, Christensen S, Tsai JP, Ortega-Gutierrez S, et al. Thrombectomy for Stroke at 6 to 16 Hours with Selection by Perfusion Imaging. N Engl J Med 2018;378:708-18. [Crossref] [PubMed]
- Berkhemer OA, Fransen PS, Beumer D, van den Berg LA, Lingsma HF, Yoo AJ, et al. A randomized trial of intraarterial treatment for acute ischemic stroke. N Engl J Med 2015;372:11-20. [Crossref] [PubMed]
- Jansen IGH, Mulder MJHL, Goldhoorn RB. Endovascular treatment for acute ischaemic stroke in routine clinical practice: prospective, observational cohort study (MR CLEAN Registry). BMJ 2018;360:k949. [Crossref] [PubMed]
- Tu WJ, Zhao Z, Yin P, Cao L, Zeng J, Chen H, et al. Estimated Burden of Stroke in China in 2020. JAMA Netw Open 2023;6:e231455. [Crossref] [PubMed]
- Brott T, Adams HP Jr, Olinger CP, Marler JR, Barsan WG, Biller J, Spilker J, Holleran R, Eberle R, Hertzberg V. Measurements of acute cerebral infarction: a clinical examination scale. Stroke 1989;20:864-70. [Crossref] [PubMed]
- Pérez de la Ossa N, Carrera D, Gorchs M, Querol M, Millán M, Gomis M, Dorado L, López-Cancio E, Hernández-Pérez M, Chicharro V, Escalada X, Jiménez X, Dávalos A. Design and validation of a prehospital stroke scale to predict large arterial occlusion: the rapid arterial occlusion evaluation scale. Stroke 2014;45:87-91. [Crossref] [PubMed]
- Hastrup S, Damgaard D, Johnsen SP, Andersen G. Prehospital Acute Stroke Severity Scale to Predict Large Artery Occlusion: Design and Comparison With Other Scales. Stroke 2016;47:1772-6. [Crossref] [PubMed]
- Stib MT, Vasquez J, Dong MP, Kim YH, Subzwari SS, Triedman HJ, Wang A, Wang HC, Yao AD, Jayaraman M, Boxerman JL, Eickhoff C, Cetintemel U, Baird GL, McTaggart RA. Detecting Large Vessel Occlusion at Multiphase CT Angiography by Using a Deep Convolutional Neural Network. Radiology 2020;297:640-9. [Crossref] [PubMed]
- Nishi H, Oishi N, Ishii A, Ono I, Ogura T, Sunohara T, Chihara H, Fukumitsu R, Okawa M, Yamana N, Imamura H, Sadamasa N, Hatano T, Nakahara I, Sakai N, Miyamoto S. Deep Learning-Derived High-Level Neuroimaging Features Predict Clinical Outcomes for Large Vessel Occlusion. Stroke 2020;51:1484-92. [Crossref] [PubMed]
- You J, Tsang ACO, Yu PLH, Tsui ELH, Woo PPS, Lui CSM, Leung GKK. Automated Hierarchy Evaluation System of Large Vessel Occlusion in Acute Ischemia Stroke. Front Neuroinform 2020;14:13. [Crossref] [PubMed]
- Remedios LW, Lingam S, Remedios SW, Gao R, Clark SW, Davis LT, Landman BA. Comparison of convolutional neural networks for detecting large vessel occlusion on computed tomography angiography. Med Phys 2021;48:6060-8. [Crossref] [PubMed]
- Barman A, Inam ME, Lee S, Savitz S, Sheth S, Giancardo L. Determining ischemic stroke from CT-angiography imaging using symmetry-sensitive convolutional networks. In: 2019 IEEE 16th International Symposium on Biomedical Imaging (ISBI 2019). IEEE; 2019:1873-7.
- Tolhuisen ML, Ponomareva E, Boers AMM, Jansen IGH, Koopman MS, Sales Barros R, Berkhemer OA, van Zwam WH, van der Lugt A, Majoie CBLM, Marquering HA. A convolutional neural network for anterior intra-arterial thrombus detection and segmentation on non-contrast computed tomography of patients with acute ischemic stroke. Applied Sciences 2020;10:4861.
- Buda M, Maki A, Mazurowski MA. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw 2018;106:249-59. [Crossref] [PubMed]
- Sarafianos N, Xu X, Kakadiaris IA. Deep imbalanced attribute classification using visual attention aggregation. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018:680-97.
- Cui Y, Jia M, Lin TY, Song Y, Belongie S. Class-balanced loss based on effective number of samples. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019:9268-77.
- Ren M, Zeng W, Yang B, Urtasun R. Learning to reweight examples for robust deep learning. In: International conference on machine learning. PMLR; 2018:4334-43.
- Sinha S, Ohashi H, Nakamura K. Class-wise difficulty-balanced loss for solving class-imbalance. In: Proceedings of the Asian Conference on Computer Vision. Cham: Springer International Publishing; 2020.
- Yang Y, Chen S, Tan D, Yao R, Zhu S, Jia Y, Yang W, Shen Y. Fusion Branch Network with Class Learning Difficulty Loss Function for Recognition of Haematoma Expansion Signs in Intracerebral Haemorrhage. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2021:3448-55.
- Zhou B, Cui Q, Wei XS, Chen ZM. Bbn: Bilateral-branch network with cumulative learning for long-tailed visual recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020:9719-28.
- Wang P, Han K, Wei XS, Zhang L, Wang L. Contrastive learning based hybrid networks for long-tailed image classification. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021:943-52.
- Cao K, Wei C, Gaidon A, Arechiga N, Ma T. Learning imbalanced datasets with label-distribution-aware margin loss. In: Advances in Neural Information Processing Systems 32 (NeurIPS 2019). 2019.
- Cui Y, Song Y, Sun C, Howard A, Belongie S. Large scale fine-grained categorization and domain-specific transfer learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:4109-18.
- Li Y, Wang Y, Lin G, Lin Y, Wei D, Zhang Q, Ma K, Lu G, Zhang Z, Zheng Y. Triplet-branch network with prior-knowledge embedding for fatigue fracture grading. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing; 2021:449-58.
- Wang J, Sun K, Cheng T, Jiang B, Deng C, Zhao Y, Liu D, Mu Y, Tan M, Wang X, Liu W, Xiao B. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans Pattern Anal Mach Intell 2021;43:3349-64. [Crossref] [PubMed]
- Yu F, Wang D, Shelhamer E, Darrell T. Deep layer aggregation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:2403-12.
- Zhang D, Zhang H, Tang J, Wang M, Hua X, Sun Q. Feature pyramid transformer. In: European Conference on Computer Vision. Cham: Springer International Publishing; 2020:323-39.
- Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018:7132-41.
- Ronneberger O, Fischer P, Brox T. U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-Assisted Intervention. Cham: Springer International Publishing; 2015:234-41.
- He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-8.
- Gao SH, Cheng MM, Zhao K, Zhang XY, Yang MH, Torr P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans Pattern Anal Mach Intell 2021;43:652-62. [Crossref] [PubMed]
- Xie S, Girshick R, Dollár P, Tu Z, He K. Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:1492-500.
- Howard A, Sandler M, Chu G, Chen LC, Chen B, Tan M, Wang W, Zhu Y, Pang R, Vasudevan V, Le QV, Adam H. Searching for mobilenetv3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019:1314-24.
- Ma N, Zhang X, Zheng HT, Sun J. Shufflenet v2: Practical guidelines for efficient CNN architecture design. In: Proceedings of the European Conference on Computer Vision (ECCV). 2018:116-31.
- Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely connected convolutional networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017:4700-8.
- Ling Y, Wang Y, Dai W, Yu J, Liang P, Kong D. MTANet: Multi-Task Attention Network for Automatic Medical Image Segmentation and Classification. IEEE Trans Med Imaging 2024;43:674-85. [Crossref] [PubMed]
- Yao R, Tan D, Yang Y, Li Y, Liu J, Wu J, Chen S, Wang J. Mbh-net: Multi-branch hybrid network with auxiliary attention guidance for large vessel occlusion detection. In: 2022 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2022:872-6.
- Ma Y, Chen S, Xiong H, Yao R, Zhang W, Yuan J, Duan H. LVONet: automatic classification model for large vessel occlusion based on the difference information between left and right hemispheres. Phys Med Biol 2024; [Crossref]
- Li L, Wang B, Verma M, Nakashima Y, Kawasaki R, Nagahara H. Scouter: Slot attention-based classifier for explainable image recognition. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. 2021:1046-55.
(English Language Editor: J. Gray)

