Attacking medical images with minimal noise: exploiting vulnerabilities in medical deep-learning systems
Introduction
Advancements in deep neural network (DNN) technology have led the widespread application of DNNs in image recognition, including in the medical imaging domain. These DNN-based methods have surpassed traditional image-processing technology and have even achieved human-competitive accuracy (1). However, several studies have shown that introducing artificial distortions to images can lead to DNN misclassifications (2-4), which has prompted the introduction of effective algorithms to generate adversarial samples or adversarial images (4-7).
Szegedy et al. first demonstrated the susceptibility of DNNs to well-crafted artificial disturbances, which can be generated with the use of several back-propagation gradient algorithms to derive gradient information (8). Goodfellow et al. proposed the fast gradient sign algorithm, which was designed to calculate effective perturbations based on the premise that the linearity and high dimensionality of inputs are primary factors in a network’s vulnerability to minor perturbations (9). Fawzi et al. introduced a greedy search method for identifying perturbations predicated on the linearity of DNN decision boundaries (10). Additionally, Papernot et al. developed the adversarial saliency map, employed the Jacobian matrix to craft the map, and demonstrated this method’s efficacy in generating fixed-length perturbations along each axis (11). Beyond perturbations, alternative methods for generating adversarial images have been explored to induce misclassification in DNNs, including the use of artificial images (12) and image rotation (13). Adversarial perturbations have been applied in the fields of natural language processing (14,15), speech recognition (16), and malware classification (17).
A prevalent method for creating these images involves adding a minimal amount of finely tuned perturbations to natural images that are imperceptible to the human eye, which can cause the DNNs to misclassify the images into entirely different categories. However, many previously developed approaches do not account for the extreme scenarios in which the perturbations are exceedingly subtle (i.e., the number of pixels modified is quite small). Moreover, generating adversarial images under these highly constrained conditions may provide novel insights into the geometric features and overall behavior of DNN models in high-dimensional spaces (18). For instance, the properties of adversarial images near a decision boundary can elucidate the shape of the boundary (19).
Su et al. introduced a technique that relies solely on probabilistic label information for feedback and requires the perturbation of only a single pixel to execute a black box DNN attack (20). This method achieved effective results on the Kaggle Canadian Institute For Advanced Research 10 (CIFAR-10) dataset by only altering a sing pixel. The strength of this method is that it constitutes a semi-black box attack approach that depends exclusively on feedback from the black box (probability label) without requiring access to DNN internal information, such as gradient and network structure details. Moreover, it is simpler than all of the previous approaches, as this method bypasses the complexity of searching for the perturbation problems related to explicit objective functions, focusing instead directly on the probabilistic label values of the target class.
After reviewing the findings of Su et al. (20), we came to the conclusion that the perturbation of several pixels on an image can cause the image to be misclassified, and we speculated that a classification system can be deceived by disturbing a few pixels on the medical image. However, the resolution of the natural images used in Su et al.’s research (20) was relatively low (32×32), while the resolution of medical images is significantly higher (the medical images used in our study have a resolution of 224×224). Consequently, it would be impractical to achieve the desired effect by disturbing only a single pixel in the creation of an adversarial medical image. Therefore, a method capable of generating a high-resolution adversarial image with the minimum number of pixel perturbations needs to be established. Therefore, we conducted a study to develop solutions to this challenge.
We ultimately devised an approach that offers several advantages over other existing methods:
(I) First, our approach requires minimal computing resources, as it does not employ gradient calculation but instead relies on a combination of random search and the differential evolution (DE) algorithm to identify an approximate optimal solution to the optimization problem. (II) Second, local minima are avoided through the use of random search, which can ensure that all solutions within the current solution space have an equal probability of being selected. (III) Our method can provide a relatively quick convergence, as it ensures continuity of convergence at each iteration step. This is attributed to the optimal solution at a given iteration being a subset of the solution from the previous iteration. (IV) Finally, the proposed method is capable of successfully targeting high-resolution medical images with minimal pixel modifications.
Methods
Dataset
Our model was evaluated using the following three publicly available datasets: (I) the Kaggle diabetic retinopathy (dr) dataset (21); (II) the chest radiograph of emphysema (cxr) dataset as described by Wang et al. (22); and (III) the melanocytic lesion (derm) dataset from the International Skin Imaging Collaboration website (23), which comprises images classified as benign or malignant melanocytic. The images in these datasets are labeled as either diseased or diseased, and they are cropped to 224×224 pixels. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Problem description
Generating adversarial images can be viewed as an optimization problem with constraints. An input image is represented as a matrix, each element of which corresponds to a pixel. Let f denote the classifier that processes the input images, and let represent the original images. The perturbation matrix introduces alterations of the same dimensions as . The perturbation matrix and the original image are superimposed to form a perturbed image . denotes the allowable modification, and denotes the maximum allowable modification, a positive real number defined by the magnitude of the set , ensuring that satisfies . In targeted attacks, the adversaries aim to identify the optimal perturbation that solves the following optimization problem (20):
Eq. [1] focuses on determining the following: (I) which pixels should be altered and (II) the extent of perturbation required for each pixel. In our method, the equation is slightly different and is expressed as follows:
Eq. [2] outlines the model employed in this study whose purpose is to identify a perturbation matrix that contains the fewest possible nonzero elements. The addition of the perturbation decreases the probability label’s value of the original image. Here, d represents a small positive integer. Unlike previously described methods that alter a substantial number of pixels, our approach requires only a minimal number of alterations.
The study by Su et al. achieved attacks on classification systems by altering only 1, 3, or 5 pixels, albeit on low-resolution images (20). However, considering the typical high resolution of medical images, we sought to determine the minimum number of pixels that need to be modified to induce misclassification in a high-resolution image-based classification system.
DE
DE, a widely recognized optimization algorithm for addressing complex multimodal optimization challenges (24), is a critical component of our algorithm. DE is categorized within the broader family of evolutionary algorithms (EAs). Notably, DE incorporates mechanisms that maintain population diversity during the selection phase, thereby enhancing the likelihood of identifying superior solutions more efficiently than do gradient-based methods or other EAs in practical applications. DE generates a new set of candidate solutions (children) based on the given population (parents) in each iteration. The survival of the children is contingent upon their superior fitness compared to their parents, thus concurrently promoting diversity and fitness enhancement.
DE applies to a broader array of optimization problems than do gradient-based methods, including those that are non-differentiable, noisy, or dynamic. The use of DE to generate adversarial images offers several advantages (20). (I) First, DE offers an enhanced probability of identifying global optima. As a meta-heuristic, DE is less prone to local minima than is gradient descent or greedy algorithms, partly due to its diversity-preserving mechanisms and the employment of multiple candidate solutions. Additionally, given the complexity of the strict pixel modification constraint employed in this study, DE is particularly suitable. (II) Second, under DE, minimal information is required from the target system. Unlike traditional optimization methods, such as gradient descent or quasi-Newtonian methods, DE does not require the problem’s differentiability. This aspect is critical for adversarial image generation, as some networks are non-differentiable (25), and gradient computation demands extensive system information, which may not always be available. (III) Finally, the proposed method offers simplicity, as its effectiveness is independent of the classifier used. For executing an attack, knowledge of the probability label is sufficient, simplifying the overall process.
Decrease group differential evolution (DGDE)
The proposed method, referred to as DGDE, was designed to generate adversarial images via the alteration of only a few pixels to deceive the classifier. This approach is grounded on two assumptions:
- A random perturbation matrix of the same size as the input image can always be found. This random perturbation matrix is added to the input image to generate adversarial images and make the classifier output wrong. is the classification class of the input image, means that the input image is negative, and means that the input image is positive:
- It is possible to identify a subset of the given solution with fewer pixels that still leads to misclassification by the DNN via the following:
By introducing a random noise matrix to the input image , we simulate adding random perturbations to each dimension of the image point in input space, pushing input image across the classification boundary. This initial perturbation serves as a foundational solution for misclassification and can be represented as follows:
Here, denotes the perturbation matrix, and each element of this perturbation vector represents the information of one pixel in the perturbation matrix; and specify the pixel locations; and , , and correspond to the pixel’s red, green, and blue values, respectively.
The matrix is the perturbation generated in the g-th iteration. We randomly select from to create a subperturbation , and set the elements that are not chosen to zero. Next, is kept close to the probability label of the prediction of . According to the literature (20), very few pixels need to be changed to cause the input image to be misclassified; that is, although it is possible to attack an image by changing its pixels, only a subset of pixels changed in the image can cause the image to be transformed into an adversarial image. In iteration , the algorithm can remove some disturbed pixels that do not contribute to the adversarial image in iteration . Thus, the solution is the optimization of , and we set it to be the next optimal solution .
In Figure 1, data point represents the original image in the input space with the decision boundary acting as the classification boundary. Initially, the algorithm introduces random noise to data point , creating a subspace of perturbed points. Any point in this subspace corresponds to the original image altered by random noise. Upon identification of data point across the classification boundary, which is classified into a different category, it becomes apparent that is significantly distant from , indicating low similarity between the two. Subsequently, serves as the initial solution, and a subperturbation matrix is derived via random search, characterized by fewer perturbed pixels compared to its predecessor. By iterating these steps, the algorithm converges on point , which is nearest to point . Consequently, the adversarial image denoted by point bears the highest resemblance to the original image represented by point .
Figure 1 also depicts the initial phase of the algorithm, where diverse random noises are applied to the original data point , resulting in a perturbed box encompassing various perturbed data points. This process enables the identification of data point beyond the decision boundary. However, as point is not the closest point to the original point , the algorithm seeks a new point closer to and with fewer perturbations than . After several iterations, the algorithm identifies point , which has the minimal perturbation, indicating that the perturbed image corresponding to is most akin to the original image associated with point . The pseudocode of DGDE is shown in Figure S1.
In the DGDE algorithm, the initiate, mutation, crossover, and greedy selection operations, are introduced as follows:
- Initiate: the population is initiated randomly as follows:
- where represents the i-th chromosome in the 0th generation population; represents the gene in i chromosome in 0 generation; and denote the upper and lower bounds of , respectively; NP is the size of population; and rand (0, 1) represents a uniform distribution over the interval (0, 1).
- Mutation: in DGDE, the differential strategy involves randomly selecting two different individuals in the population and carrying out vector synthesis after the vector difference scaling as follows:
- where, is the scaling factor, and represents the i-th individual in the generation population.
- Crossover: the crossover operation between individuals is conducted for the population and its intermediate of the generation as follows:
- where CR is the crossover probability, and is the random integer of .
Eq. [12] indicates that being selected from and according to the CR or .
- where CR is the crossover probability, and is the random integer of .
- Greedy selection: our algorithm uses the greedy algorithm in DE to select individuals to enter the next generation population as follows:
- Here, is the fitness function. If the fitness of the newly generated offspring is greater than that of the parent , the child is the newly generated offspring ; otherwise, the child is equal to the parent .
Results
Accuracy of random noise attack
In our study, we employed a pretrained residual net 50 (ResNet-50) model (26) to assess the efficacy of our method. As set out in Table 1, random noise was deployed to attack the cxr, derm, and dr datasets, comprising 374, 103, and 1,156 samples, respectively. Based on the experimental outcomes, the success rates for the random noise attack in the datasets were 1.0 for cxr, 0.64 for derm, and 1.0 for dr.
Table 1
Dataset | Positive to negative |
---|---|
cxr | 1 |
derm | 0.64 |
dr | 1 |
cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset.
Adversarial images
The effectiveness of the DGDE algorithm was evaluated across three medical image data sets, the results of which are presented in Figure 2. The first column of Figure 2 displays the original image: the second column shows the image post-random noise attack; the third and fourth columns represent the adversarial images at 100 and 200 iterations, respectively; the final column exhibits the ultimate adversarial image; and a red circle highlights the disturbed pixels.
Observations from the perturbed images across different iterations indicated that the initial image, overlaid with random noise, became misclassified. The images at 100 and 200 iterations demonstrated a noticeable reduction in attacked pixels, while the probability label of the positive class for the adversarial image remained below 0.5. The final column of Figure 2 presents the optimal adversarial result, which closely resembles the origin image.
The relation between the number of pixels, the probability label of the positive class, and the iterations were presented in Figures 3,4. In the experiments of Figures 3,4, 15 samples were randomly selected for the experiments, and each color in the Figures 3,4 represents an experimental result of corresponding samples.
Figure 3 illustrates the relationship between the iteration count and the number of pixels across the three datasets. The results indicated that the number of perturbed pixels significantly decreased within the first 400 iterations.
Figure 4 shows that as the number of disturbed pixels decreased, the probability label for the perturbed image classification remained below 0.5, indicating consistent misclassification by the system. The curve indicates a sudden increase in the probability label for the diseased image after a specific iteration count, suggesting that the adversarial image found a more direct path to the classification boundary. As demonstrated in Figure 4, even when the pixel count was reduced to approximately 100, the confidence level remained notably low.
In Figure 5, the first row represents the classification outcomes of the original image by the trained network, while the second row shows the results for the adversarial image. The probability label of the positive class is displayed in the upper left corner of each image. Our objective was to minimize the number of disturbed pixels while ensuring misclassification. However, a reduction in disturbed pixels was associated with an incremental increase in the probability label of the positive class, suggesting an improvement in the classification accuracy. Thus, it is necessary to maintain a balance between minimizing the number of disturbed pixels and maintaining a low probability label for the positive class.
As Figure 6 shows, in successful attacks, the range of minimally disturbed pixels varied from 7 to 55, equating to 0.0140% to 0.110% of the input image’s pixels. This percentage indicates a relatively low proportion of disturbed pixels.
As shown in Table 2, the mean number of disturbed pixels in a dataset is , where is the number of adversarial images, and is the number of perturbed pixels in the adversarial image . Thus, we calculated that the average number of disturbed pixels in the samples of cxr, derm, and dr datasets was 30, 18, and 11, accounting for 0.0598%, 0.0359%, and 0.0219% of the number of all pixels in the adversarial image, respectively.
Table 2
Dataset | Number of perturbed pixels | ||||
---|---|---|---|---|---|
Max | Min | Mean | Std | Percentage (%) | |
cxr | 55 | 11 | 30 | 13.9 | 0.0598 |
derm | 35 | 7 | 18 | 11.2 | 0.0359 |
dr | 21 | 7 | 11 | 4.8 | 0.0219 |
cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset; Max, maximum number of pixels, Min, minimum number of pixels; Std, standard deviation number of pixels.
Discussion
The research by He et al. (26), Lei et al. (27) and Jiang et al. (28) suggests that numerous data points are proximate to the decision boundary. To analyze assumptions about decision boundaries, data points can be incrementally moved in the input space, and a quantitative assessment can be conducted to determine the frequency at which the class labels changed. Su et al. (20) showed the feasibility of shifting data points across various dimensions to identify points at which class labels transition. However, it remains to be determined which data point beyond the classification boundary is most akin to the original one.
Our findings support Goodfellow et al.’s hypothesis that minor yet cumulative perturbations across multiple dimensions can induce significant output alterations (9). With only a few pixels altered, our method successfully manipulated numerous images, demonstrating a particular susceptibility of the algorithm to this minimal pixel attack strategy.
In our investigation, we could readily identify a perturbed data point crossing the decision boundary by injecting random noise into the original data point despite this new point not being the closest to the original point. To locate the nearest data point beyond the classification threshold, we introduced a novel DGDE method whose purpose was to identify such points.
In traditional pixel-attack methods, it is necessary to determine the minimum number of pixels that need to be changed for a successful attack. If the number of changed pixels is large, the attack is easy to succeed. On the contrary, the attack will fail if the number of changed pixels is small. However, this leads to the problem of achieving a balance between success and concealment. This problem is solved in our approach because our method does not need to determine the number of perturbed pixels, as the algorithm can automatically find the adversarial example with the minimum number of perturbed pixels.
Building on the studies of Su et al. (20), He et al. (26), Lei et al. (27) and Jiang et al. (28), as well as our own theoretical assumptions, we developed a straightforward model that yielded effective outcomes. This model does not require training related to deep learning; rather, it employs a random search and the DE algorithm to compute adversarial images closely resembling the original, specifically those with the minimal number of altered pixels. Our approach demands relatively low graphics processing unit resources and is feasible in terms of central processing power, as it relies on pretrained classifiers to assess adversarial images rather than deep-learning training.
Our method enables the identification of a relatively optimal subset through a random search, ensuring that the size of the given optimal solution is consistently smaller than that of the previous iteration. This approach guarantees that the algorithm continually finds a relatively superior solution, thereby avoiding local minima.
Due to the existence of random search, our algorithm has a certain probability of finding a shortcut that is able to acquire a superior result directly instead of doing so result through stepwise iterations.
Our approach introduces a novel model that eschews deep-learning techniques. Instead, it employs the DGDE algorithm to determine the adversarial image most resembling the original and with the fewest disturbed pixels. As Figure 3 shows, the algorithm initially produces a perturbed image with the same dimensions as that of the original image; however, as the algorithm progresses, convergence is hastened because the set of optimal data points is consistently smaller than the previous set.
Conclusions
Our study effectively demonstrated how targeted attacks can misclassify diseased images using classifiers. The initial random noise attack yielded a high success rate against diseased images, altering only a small number of pixels. However, we observed that the success rate of random noise attacks on negative images was significantly lower that than for positive images. Future studies will seek to clarify the underlying mechanisms of random noise attacks on images to differentiate methodologies for attacking positive images and those for attacking negative images.
Acknowledgments
Funding: None.
Footnote
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1764/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-44. [Crossref] [PubMed]
- Chen Z, Pawar K, Ekanayake M, Pain C, Zhong S, Egan GF. Deep Learning for Image Enhancement and Correction in Magnetic Resonance Imaging-State-of-the-Art and Challenges. J Digit Imaging 2023;36:204-30. [Crossref] [PubMed]
- Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning. Science 2019;363:1287-9. [Crossref] [PubMed]
- Sorin V, Soffer S, Glicksberg BS, Barash Y, Konen E, Klang E. Adversarial attacks in radiology - A systematic review. Eur J Radiol 2023;167:111085. [Crossref] [PubMed]
- Ahmed S, Dera D, Hassan SU, Bouaynaya N, Rasool G. Failure Detection in Deep Neural Networks for Medical Imaging. Front Med Technol 2022;4:919046. [Crossref] [PubMed]
- Bortsova G, González-Gonzalo C, Wetstein SC, Dubost F, Katramados I, Hogeweg L, Liefers B, van Ginneken B, Pluim JPW, Veta M, Sánchez CI, de Bruijne M. Adversarial attack vulnerability of medical image analysis systems: Unexplored factors. Med Image Anal 2021;73:102141. [Crossref] [PubMed]
- Li Y, Liu S. Adversarial Attack and Defense in Breast Cancer Deep Learning Systems. Bioengineering (Basel) 2023;10:973. [Crossref] [PubMed]
- Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R. Intriguing properties of neural networks. 2nd International Conference on Learning Representations. 2014. doi:
10.48550/arXiv.1312.6199 10.48550/arXiv.1312.6199 - Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. 3rd Int Conf Learn Represent 2015:1-11. doi:
10.48550/arXiv.1412.6572 10.48550/arXiv.1412.6572 - Fawzi A, Moosavi-Dezfooli SM, Frossard P. The robustness of deep networks: A geometrical perspective. IEEE Signal Processing Magazine 2017;34:50-62. [Crossref]
- Papernot N, Mcdaniel P, Jha S, Fredrikson M, Swami A. The limitations of deep learning in adversarial settings. 2016 IEEE European Symposium on Security and Privacy (EuroS&P) 2016:372-87.
- Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: Visualising image classification models and saliency maps. 2nd International Conference on Learning Representations 2014. doi:
10.48550/arXiv.1312.6034 .10.48550/arXiv.1312.6034 - Engstrom L, Tran B, Tsipras D, Schmidt L, Madry A. A rotation and a translation suffice: Fooling cnns with simple transformations. CoRR, abs/1712.02779, 2017. doi:
10.48550 /arXiv.1712.02779 - Tiwari K, Zhang L. Implications of Minimum Description Length for Adversarial Attack in Natural Language Processing. Entropy (Basel) 2024;26:354. [Crossref] [PubMed]
- Zhang WE, Sheng QZ, Alhazmi A, Li C. Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Transactions on Intelligent Systems and Technology 2020;11:1-41. (TIST). [Crossref]
- Bhanushali AR, Mun H, Yun J. Adversarial Attacks on Automatic Speech Recognition (ASR): A Survey. IEEE Access 2024;12:88279-302.
- Yan S, Ren J, Wang W, Sun L, Zhang W, Yu Q. A Survey of Adversarial Attack and Defense Methods for Malware Classification in Cyber Security. IEEE Communications Surveys & Tutorials 2023;25:467-96. [Crossref]
- Yuan J, He ZH. Consistency-Sensitivity Guided Ensemble Black-Box Adversarial Attacks in Low-Dimensional Spaces. ICCV 2021:7758-66.
- Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P. Universal adversarial perturbations. 2017 IEEE Conference on Computer Vision and Pattern Recognition 2017:86-94.
- Su JW, Vargas DV, Sakurai K. One Pixel Attack for Fooling Deep Neural Networks. IEEE Transactions on Evolutionary Computation 2019;23:828-41. [Crossref]
. Available online: https://www.kaggle.com/c/diabetic-retinopathy-detection/data/Kaggle Diabetic Retinopathy Challenge - Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. CVPR 2017:2097-106.
. Available online: https://www.isic-archive.comInternational Skin Imaging Collaboration - Das S, Suganthan PN. Differential Evolution: A Survey of the State-of-the-Art. IEEE Transactions on Evolutionary Computation 2011;4-31. [Crossref]
- Wu X, Huang Y, Guan H, Niu B, Lan F. Noise Non-Differentiable in Deep Learning End-to-End Image Watermarking Models. 2023 International Conference on Culture-Oriented Science and Technology (CoST) 2023:146-51.
- He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016:770-8.
- Lei S, He F, Yuan Y, Tao D. Understanding Deep Learning via Decision Boundary. IEEE Trans Neural Netw Learn Syst 2023; Epub ahead of print. [Crossref] [PubMed]
- Jiang H, Song Q, Kernec JL. Searching the Adversarial Example in the Decision Boundary. International Conference on UK-China Emerging Technologies (UCET) 2020:1-4.