Attacking medical images with minimal noise: exploiting vulnerabilities in medical deep-learning systems

Wei Wang; Moritz Wildgruber; Yisong Wang

doi:10.21037/qims-24-1764

Original Article

Attacking medical images with minimal noise: exploiting vulnerabilities in medical deep-learning systems

Wei Wang^1,2,3 , Moritz Wildgruber⁴, Yisong Wang^1,2

¹College of Computer Science and Technology, Guizhou University, Guiyang, China; ²Institute for Artificial Intelligence, Guizhou University, Guiyang, China; ³Oncology department, Guizhou Provincial People’s Hospital, Guiyang, China; ⁴Department for Radiology, University Hospital, LMU Munich, München, Germany

Contributions: (I) Conception and design: W Wang, Y Wang; (II) Provision of study materials or patients: W Wang, Y Wang; (III) Collection and assembly of data: W Wang, Y Wang; (IV) Data analysis and interpretation: W Wang, Y Wang; (V) Manuscript writing: All authors; (VI) Final approval of manuscript: All authors.

Correspondence to: Yisong Wang, PhD. College of Computer Science and Technology, Guizhou University, No. 2708 South Huaxi Avenue, Huaxi District, Guiyang 550025, China; Institute for Artificial Intelligence, Guizhou University, Guiyang, China. Email: yswang@gzu.edu.cn.

Background: A change in the output of deep neural networks (DNNs) via the perturbation of a few pixels of an image is referred to as an adversarial attack, and these perturbed images are known as adversarial samples. This study examined strategies for compromising the integrity of DNNs under stringent conditions, specifically by inducing the misclassification of medical images of disease with minimal pixel modifications.

Methods: This study used the following three publicly available datasets: the chest radiograph of emphysema (cxr) dataset, the melanocytic lesion (derm) dataset, and the Kaggle diabetic retinopathy (dr) dataset. To attack the medical images, we proposed a method termed decrease group differential evolution (DGDE) for generating adversarial images. Under this method, a noise matrix of the same size as the input image is first used to attack the image sample several times until the initial adversarial perturbation s0 is obtained. Next, a subset s1 is randomly picked from the initial adversarial perturbation s0 that is still able to cause the samples to be misclassified by the classifier. A new subset s2 is subsequently randomly selected from the adversarial perturbation subset s1, which can still cause the adversarial samples to be misclassified by the classifier. Finally, the adversarial perturbation subset sn with the minimum number of elements is obtained by continuous reduction of the number of perturbed pixels.

Results: In this study, the DGDE method was used to attack the images of the cxr dataset, the derm dataset, and the dr dataset; and the minimum number of pixels required to be considered a successful attack was 11, 7, and 7, respectively, while the maximum number of pixel changes was 55, 35, and 21, respectively. The average number of pixel changes was 30, 18, and 11, respectively, in the cxr dataset, the derm dataset, and the dr dataset, respectively, while the percentages of the average number of pixel changes among the total number of pixels of the image were 0.0598%, 0.0359%, and 0.0219, respectively.

Conclusions: Unlike the traditional differential evolution (DE) method, the proposed DGDE method modifies fewer pixels to generate adversarial samples by introducing a variable population number and a novel crossover and selection strategy. However, the success rate of the initial attack on different image datasets varied greatly. In future studies, we intend to identify the reasons for this phenomenon and improve the success rate of the initial attack.

Keywords: Medical image; pixel attack; semi-black box attack; decrease group differential evolution (DGDE)

Submitted Aug 23, 2024. Accepted for publication Oct 30, 2024. Published online Nov 13, 2024.

doi: 10.21037/qims-24-1764

Introduction

Advancements in deep neural network (DNN) technology have led the widespread application of DNNs in image recognition, including in the medical imaging domain. These DNN-based methods have surpassed traditional image-processing technology and have even achieved human-competitive accuracy (1). However, several studies have shown that introducing artificial distortions to images can lead to DNN misclassifications (2-4), which has prompted the introduction of effective algorithms to generate adversarial samples or adversarial images (4-7).

Szegedy et al. first demonstrated the susceptibility of DNNs to well-crafted artificial disturbances, which can be generated with the use of several back-propagation gradient algorithms to derive gradient information (8). Goodfellow et al. proposed the fast gradient sign algorithm, which was designed to calculate effective perturbations based on the premise that the linearity and high dimensionality of inputs are primary factors in a network’s vulnerability to minor perturbations (9). Fawzi et al. introduced a greedy search method for identifying perturbations predicated on the linearity of DNN decision boundaries (10). Additionally, Papernot et al. developed the adversarial saliency map, employed the Jacobian matrix to craft the map, and demonstrated this method’s efficacy in generating fixed-length perturbations along each axis (11). Beyond perturbations, alternative methods for generating adversarial images have been explored to induce misclassification in DNNs, including the use of artificial images (12) and image rotation (13). Adversarial perturbations have been applied in the fields of natural language processing (14,15), speech recognition (16), and malware classification (17).

A prevalent method for creating these images involves adding a minimal amount of finely tuned perturbations to natural images that are imperceptible to the human eye, which can cause the DNNs to misclassify the images into entirely different categories. However, many previously developed approaches do not account for the extreme scenarios in which the perturbations are exceedingly subtle (i.e., the number of pixels modified is quite small). Moreover, generating adversarial images under these highly constrained conditions may provide novel insights into the geometric features and overall behavior of DNN models in high-dimensional spaces (18). For instance, the properties of adversarial images near a decision boundary can elucidate the shape of the boundary (19).

Su et al. introduced a technique that relies solely on probabilistic label information for feedback and requires the perturbation of only a single pixel to execute a black box DNN attack (20). This method achieved effective results on the Kaggle Canadian Institute For Advanced Research 10 (CIFAR-10) dataset by only altering a sing pixel. The strength of this method is that it constitutes a semi-black box attack approach that depends exclusively on feedback from the black box (probability label) without requiring access to DNN internal information, such as gradient and network structure details. Moreover, it is simpler than all of the previous approaches, as this method bypasses the complexity of searching for the perturbation problems related to explicit objective functions, focusing instead directly on the probabilistic label values of the target class.

After reviewing the findings of Su et al. (20), we came to the conclusion that the perturbation of several pixels on an image can cause the image to be misclassified, and we speculated that a classification system can be deceived by disturbing a few pixels on the medical image. However, the resolution of the natural images used in Su et al.’s research (20) was relatively low (32×32), while the resolution of medical images is significantly higher (the medical images used in our study have a resolution of 224×224). Consequently, it would be impractical to achieve the desired effect by disturbing only a single pixel in the creation of an adversarial medical image. Therefore, a method capable of generating a high-resolution adversarial image with the minimum number of pixel perturbations needs to be established. Therefore, we conducted a study to develop solutions to this challenge.

We ultimately devised an approach that offers several advantages over other existing methods:

(I) First, our approach requires minimal computing resources, as it does not employ gradient calculation but instead relies on a combination of random search and the differential evolution (DE) algorithm to identify an approximate optimal solution to the optimization problem. (II) Second, local minima are avoided through the use of random search, which can ensure that all solutions within the current solution space have an equal probability of being selected. (III) Our method can provide a relatively quick convergence, as it ensures continuity of convergence at each iteration step. This is attributed to the optimal solution at a given iteration being a subset of the solution from the previous iteration. (IV) Finally, the proposed method is capable of successfully targeting high-resolution medical images with minimal pixel modifications.

Methods

Dataset

Our model was evaluated using the following three publicly available datasets: (I) the Kaggle diabetic retinopathy (dr) dataset (21); (II) the chest radiograph of emphysema (cxr) dataset as described by Wang et al. (22); and (III) the melanocytic lesion (derm) dataset from the International Skin Imaging Collaboration website (23), which comprises images classified as benign or malignant melanocytic. The images in these datasets are labeled as either diseased or diseased, and they are cropped to 224×224 pixels. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Problem description

Generating adversarial images can be viewed as an optimization problem with constraints. An input image is represented as a matrix, each element of which corresponds to a pixel. Let f denote the classifier that processes the input images, and let $x = (x_{1}, \dots, x_{n})$ represent the original images. The perturbation matrix $e = (e_{i}, \dots, e_{n})$ introduces alterations of the same dimensions as $x$ . The perturbation matrix $e$ and the original image $x$ are superimposed to form a perturbed image $\hat{x}$ . ${‖ \cdot ‖}_{n}$ denotes the $L_{n}$ allowable modification, and $L$ denotes the maximum allowable modification, a positive real number defined by the magnitude of the set $e$ , ensuring that $e$ satisfies $f (x + e) \neq f (x)$ . In targeted attacks, the adversaries aim to identify the optimal perturbation $e$ that solves the following optimization problem (20):

$\underset{e}{minimize} f (x + e), subject to {‖ e ‖}_{n} \leq L$ [1]

Eq. [1] focuses on determining the following: (I) which pixels should be altered and (II) the extent of perturbation required for each pixel. In our method, the equation is slightly different and is expressed as follows:

$\underset{e}{minimize} f (x + e), subject to {‖ e ‖}_{0} \leq L$ [2]

Eq. [2] outlines the model employed in this study whose purpose is to identify a perturbation matrix that contains the fewest possible nonzero elements. The addition of the perturbation decreases the probability label’s value of the original image. Here, d represents a small positive integer. Unlike previously described methods that alter a substantial number of pixels, our approach requires only a minimal number of alterations.

The study by Su et al. achieved attacks on classification systems by altering only 1, 3, or 5 pixels, albeit on low-resolution images (20). However, considering the typical high resolution of medical images, we sought to determine the minimum number of pixels that need to be modified to induce misclassification in a high-resolution image-based classification system.

DE

DE, a widely recognized optimization algorithm for addressing complex multimodal optimization challenges (24), is a critical component of our algorithm. DE is categorized within the broader family of evolutionary algorithms (EAs). Notably, DE incorporates mechanisms that maintain population diversity during the selection phase, thereby enhancing the likelihood of identifying superior solutions more efficiently than do gradient-based methods or other EAs in practical applications. DE generates a new set of candidate solutions (children) based on the given population (parents) in each iteration. The survival of the children is contingent upon their superior fitness compared to their parents, thus concurrently promoting diversity and fitness enhancement.

DE applies to a broader array of optimization problems than do gradient-based methods, including those that are non-differentiable, noisy, or dynamic. The use of DE to generate adversarial images offers several advantages (20). (I) First, DE offers an enhanced probability of identifying global optima. As a meta-heuristic, DE is less prone to local minima than is gradient descent or greedy algorithms, partly due to its diversity-preserving mechanisms and the employment of multiple candidate solutions. Additionally, given the complexity of the strict pixel modification constraint employed in this study, DE is particularly suitable. (II) Second, under DE, minimal information is required from the target system. Unlike traditional optimization methods, such as gradient descent or quasi-Newtonian methods, DE does not require the problem’s differentiability. This aspect is critical for adversarial image generation, as some networks are non-differentiable (25), and gradient computation demands extensive system information, which may not always be available. (III) Finally, the proposed method offers simplicity, as its effectiveness is independent of the classifier used. For executing an attack, knowledge of the probability label is sufficient, simplifying the overall process.

Decrease group differential evolution (DGDE)

The proposed method, referred to as DGDE, was designed to generate adversarial images via the alteration of only a few pixels to deceive the classifier. This approach is grounded on two assumptions:

A random perturbation matrix $e^{0}$ of the same size as the input image $x$ can always be found. This random perturbation matrix is added to the input image to generate adversarial images $x + e^{0}$ and make the classifier output wrong. $c$ is the classification class of the input image, $c = 0$ means that the input image is negative, and $c = 1$ means that the input image is positive:
$f (x) = c$ [3]

$f (x + e^{0}) = 1 - c$ [4]
It is possible to identify a subset of the given solution with fewer pixels that still leads to misclassification by the DNN via the following:
$f (x + e^{g + 1}) = 1 - c, f (x + e^{g}) = 1 - c$ [5]

$e^{g + 1} \subset e^{g}$ [6]
where $g$ represents the iteration number, and $e^{g}$ is the perturbation matrix at iteration $g$ . The nonzero elements of $e^{g + 1}$ are derived from those in $e^{g}$ .

By introducing a random noise matrix $e^{0}$ to the input image $x$ , we simulate adding random perturbations to each dimension of the image point in input space, pushing input image $x$ across the classification boundary. This initial perturbation serves as a foundational solution for misclassification and can be represented as follows:

$e = {(e_{i j})}_{m \times n}$ [7]

$e_{i j} = [e_{i j, 1}, e_{i j, 2}, e_{i j, 3}, e_{i j, 4}, e_{i j, 5}]$ [8]

Here, $e$ denotes the perturbation matrix, and each element of this perturbation vector represents the information of one pixel in the perturbation matrix; $e_{i j, 1}$ and $e_{i j, 2}$ specify the pixel locations; and $e_{i j, 3}$ , $e_{i j, 4}$ , and $e_{i j, 5}$ correspond to the pixel’s red, green, and blue values, respectively.

The matrix $e^{g}$ is the perturbation generated in the g-th iteration. We randomly select $e_{i j}^{g + 1}$ from $e^{g}$ to create a subperturbation $e^{g + 1} = [e_{i j}^{g + 1} | i = 1, \dots, m, j = 1, \dots, n | {| e^{g + 1} |}_{0} < {‖ e^{g} ‖}_{0}]$ , and set the elements that are not chosen to zero. Next, $f (x + e^{g + 1})$ is kept close to the probability label of the prediction of $f (x + e^{g})$ . According to the literature (20), very few pixels need to be changed to cause the input image to be misclassified; that is, although it is possible to attack an image by changing its pixels, only a subset of pixels changed in the image can cause the image to be transformed into an adversarial image. In iteration $g + 1$ , the algorithm can remove some disturbed pixels that do not contribute to the adversarial image in iteration $g$ . Thus, the solution $e^{g + 1}$ is the optimization of $e^{g}$ , and we set it to be the next optimal solution $e^{g + 1}$ .

In Figure 1, data point $a$ represents the original image in the input space with the decision boundary acting as the classification boundary. Initially, the algorithm introduces random noise to data point $a$ , creating a subspace of perturbed points. Any point in this subspace corresponds to the original image altered by random noise. Upon identification of data point $b$ across the classification boundary, which is classified into a different category, it becomes apparent that $b$ is significantly distant from $a$ , indicating low similarity between the two. Subsequently, $b$ serves as the initial solution, and a subperturbation matrix is derived via random search, characterized by fewer perturbed pixels compared to its predecessor. By iterating these steps, the algorithm converges on point $f$ , which is nearest to point $a$ . Consequently, the adversarial image denoted by point bears the highest resemblance to the original image represented by point $a$ .

Figure 1 The workflow of algorithm implementation. a is the input data point. b is the data point obtained by adding random noise to the input image. c, d, and e are the data points during the running of the algorithm. f is the adversarial sample obtained by the algorithm, which is on the opposite side of the classification interface and the closest to the input data point a. The numbers 1 to 5 are the movement steps of the data points, respectively.

Figure 1 also depicts the initial phase of the algorithm, where diverse random noises are applied to the original data point $a$ , resulting in a perturbed box encompassing various perturbed data points. This process enables the identification of data point $b$ beyond the decision boundary. However, as point $b$ is not the closest point to the original point $a$ , the algorithm seeks a new point $c$ closer to $a$ and with fewer perturbations than $b$ . After several iterations, the algorithm identifies point $f$ , which has the minimal perturbation, indicating that the perturbed image corresponding to $f$ is most akin to the original image associated with point $a$ . The pseudocode of DGDE is shown in Figure S1.

In the DGDE algorithm, the initiate, mutation, crossover, and greedy selection operations, are introduced as follows:

Initiate: the population is initiated randomly as follows:
$x_{i} (0) = {x_{j, i} | x_{j, i}^{L} \leq x_{j, i} (0) \leq x_{j, i}^{U}, i = 1, 2, \dots, N P, j = 1, 2,, \dots, P}$ [9]

$x_{j, i} (0) = x_{j, i}^{L} + r a n d (0, 1) \cdot (x_{j, i}^{U} - x_{j, i}^{L})$ [10]
- where $x_{i} (0)$ represents the i-th chromosome in the 0th generation population; $x_{j, i} (0)$ represents the $j$ gene in i chromosome in 0 generation; $x_{j, i}^{U}$ and $x_{j, i}^{L}$ denote the upper and lower bounds of $x_{j, i}$ , respectively; NP is the size of population; and rand (0, 1) represents a uniform distribution over the interval (0, 1).
Mutation: in DGDE, the differential strategy involves randomly selecting two different individuals in the population and carrying out vector synthesis after the vector difference scaling as follows:
$\begin{array}{l} v_{i} (g + 1) = x_{r 1} (g) + F \cdot [x_{r 2} (g) - x_{r 3} (g)] \\ i \neq r 1 \neq r 2 \neq r 3 \end{array}$ [11]
- where, $F$ is the scaling factor, and $x_{i} (g)$ represents the i-th individual in the $g$ generation population.
Crossover: the crossover operation between individuals is conducted for the population xi(g) and its intermediate vi(g+1) of the generation as follows:
$u_{j, i} = {\begin{array}{l} v_{j, i} (g + 1), & if : r a n d (0, 1) \leq C R or j = j_{r a n d} \\ x_{j, i} (g), & otherwise \end{array}$ [12]
- where CR is the crossover probability, and $j_{r a n d}$ is the random integer of $[1, 2, \dots, D]$ . Eq. [12] indicates that $u_{j, i}$ being selected from $v_{j, i} (g + 1)$ and $x_{j, i} (g)$ according to the CR or $j_{r a n d}$ .
Greedy selection: our algorithm uses the greedy algorithm in DE to select individuals to enter the next generation population as follows:
$x_{i} (g + 1) = {\begin{array}{l} u_{i} (g + 1), & if : f [u_{i} (g + 1)] \leq f [x_{i} (g)] \\ x_{i} (g), & otherwise \end{array}$ [13]
- Here, $f$ is the fitness function. If the fitness of the newly generated offspring $u_{i} (g + 1)$ is greater than that of the parent $x_{i} (g)$ , the child $x_{i} (g + 1)$ is the newly generated offspring $u_{i} (g + 1)$ ; otherwise, the child $x_{i} (g + 1)$ is equal to the parent $x_{i} (g)$ .

Results

Accuracy of random noise attack

In our study, we employed a pretrained residual net 50 (ResNet-50) model (26) to assess the efficacy of our method. As set out in Table 1, random noise was deployed to attack the cxr, derm, and dr datasets, comprising 374, 103, and 1,156 samples, respectively. Based on the experimental outcomes, the success rates for the random noise attack in the datasets were 1.0 for cxr, 0.64 for derm, and 1.0 for dr.

Table 1

Success of the random noise attack

Dataset	Positive to negative
cxr	1
derm	0.64
dr	1

cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset.

Adversarial images

The effectiveness of the DGDE algorithm was evaluated across three medical image data sets, the results of which are presented in Figure 2. The first column of Figure 2 displays the original image: the second column shows the image post-random noise attack; the third and fourth columns represent the adversarial images at 100 and 200 iterations, respectively; the final column exhibits the ultimate adversarial image; and a red circle highlights the disturbed pixels.

Figure 2 Different iterations of the image. cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset.

Observations from the perturbed images across different iterations indicated that the initial image, overlaid with random noise, became misclassified. The images at 100 and 200 iterations demonstrated a noticeable reduction in attacked pixels, while the probability label of the positive class for the adversarial image remained below 0.5. The final column of Figure 2 presents the optimal adversarial result, which closely resembles the origin image.

The relation between the number of pixels, the probability label of the positive class, and the iterations were presented in Figures 3,4. In the experiments of Figures 3,4, 15 samples were randomly selected for the experiments, and each color in the Figures 3,4 represents an experimental result of corresponding samples.

Figure 3 Relation between iteration and the number of pixels of dr, derm, and cxr. A few samples were randomly selected from each database for inclusion in the experiment, and the different color lines in the figure represent the experimental results of different samples. cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset.

Figure 4 Corresponding relationship between the iteration and the probability of positive-class label in dr, derm, and cxr. A few samples were randomly selected from each database to participate in the experiment, and the different color lines in the figure represent the experimental results of different samples. cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset.

Figure 3 illustrates the relationship between the iteration count and the number of pixels across the three datasets. The results indicated that the number of perturbed pixels significantly decreased within the first 400 iterations.

Figure 4 shows that as the number of disturbed pixels decreased, the probability label for the perturbed image classification remained below 0.5, indicating consistent misclassification by the system. The curve indicates a sudden increase in the probability label for the diseased image after a specific iteration count, suggesting that the adversarial image found a more direct path to the classification boundary. As demonstrated in Figure 4, even when the pixel count was reduced to approximately 100, the confidence level remained notably low.

In Figure 5, the first row represents the classification outcomes of the original image by the trained network, while the second row shows the results for the adversarial image. The probability label of the positive class is displayed in the upper left corner of each image. Our objective was to minimize the number of disturbed pixels while ensuring misclassification. However, a reduction in disturbed pixels was associated with an incremental increase in the probability label of the positive class, suggesting an improvement in the classification accuracy. Thus, it is necessary to maintain a balance between minimizing the number of disturbed pixels and maintaining a low probability label for the positive class.

Figure 5 The confidence of clean and adversarial images. Clean denotes the image is the original input image, and DGDE denotes the image has been attacked by DGDE. The red circle marks the perturbed pixels in the adversarial examples. DGDE, decrease group differential evolution; DR, Kaggle diabetic retinopathy dataset.

As Figure 6 shows, in successful attacks, the range of minimally disturbed pixels varied from 7 to 55, equating to 0.0140% to 0.110% of the input image’s pixels. This percentage indicates a relatively low proportion of disturbed pixels.

Figure 6 The box and whisker plots of the results. cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset.

As shown in Table 2, the mean number of disturbed pixels in a dataset is $\sum_{m} n_{i} / m$ , where $m$ is the number of adversarial images, and $n_{i}$ is the number of perturbed pixels in the adversarial image $i$ . Thus, we calculated that the average number of disturbed pixels in the samples of cxr, derm, and dr datasets was 30, 18, and 11, accounting for 0.0598%, 0.0359%, and 0.0219% of the number of all pixels in the adversarial image, respectively.

Table 2

Number of perturbed pixels

Dataset	Number of perturbed pixels
Dataset	Max	Min	Mean	Std	Percentage (%)
cxr	55	11	30	13.9	0.0598
derm	35	7	18	11.2	0.0359
dr	21	7	11	4.8	0.0219

cxr, chest radiograph of emphysema dataset; derm, melanocytic lesion dataset; dr, Kaggle diabetic retinopathy dataset; Max, maximum number of pixels, Min, minimum number of pixels; Std, standard deviation number of pixels.

Discussion

The research by He et al. (26), Lei et al. (27) and Jiang et al. (28) suggests that numerous data points are proximate to the decision boundary. To analyze assumptions about decision boundaries, data points can be incrementally moved in the input space, and a quantitative assessment can be conducted to determine the frequency at which the class labels changed. Su et al. (20) showed the feasibility of shifting data points across various dimensions to identify points at which class labels transition. However, it remains to be determined which data point beyond the classification boundary is most akin to the original one.

Our findings support Goodfellow et al.’s hypothesis that minor yet cumulative perturbations across multiple dimensions can induce significant output alterations (9). With only a few pixels altered, our method successfully manipulated numerous images, demonstrating a particular susceptibility of the algorithm to this minimal pixel attack strategy.

In our investigation, we could readily identify a perturbed data point crossing the decision boundary by injecting random noise into the original data point despite this new point not being the closest to the original point. To locate the nearest data point beyond the classification threshold, we introduced a novel DGDE method whose purpose was to identify such points.

In traditional pixel-attack methods, it is necessary to determine the minimum number of pixels that need to be changed for a successful attack. If the number of changed pixels is large, the attack is easy to succeed. On the contrary, the attack will fail if the number of changed pixels is small. However, this leads to the problem of achieving a balance between success and concealment. This problem is solved in our approach because our method does not need to determine the number of perturbed pixels, as the algorithm can automatically find the adversarial example with the minimum number of perturbed pixels.

Building on the studies of Su et al. (20), He et al. (26), Lei et al. (27) and Jiang et al. (28), as well as our own theoretical assumptions, we developed a straightforward model that yielded effective outcomes. This model does not require training related to deep learning; rather, it employs a random search and the DE algorithm to compute adversarial images closely resembling the original, specifically those with the minimal number of altered pixels. Our approach demands relatively low graphics processing unit resources and is feasible in terms of central processing power, as it relies on pretrained classifiers to assess adversarial images rather than deep-learning training.

Our method enables the identification of a relatively optimal subset through a random search, ensuring that the size of the given optimal solution is consistently smaller than that of the previous iteration. This approach guarantees that the algorithm continually finds a relatively superior solution, thereby avoiding local minima.

Due to the existence of random search, our algorithm has a certain probability of finding a shortcut that is able to acquire a superior result directly instead of doing so result through stepwise iterations.

Our approach introduces a novel model that eschews deep-learning techniques. Instead, it employs the DGDE algorithm to determine the adversarial image most resembling the original and with the fewest disturbed pixels. As Figure 3 shows, the algorithm initially produces a perturbed image with the same dimensions as that of the original image; however, as the algorithm progresses, convergence is hastened because the set of optimal data points is consistently smaller than the previous set.

Conclusions

Our study effectively demonstrated how targeted attacks can misclassify diseased images using classifiers. The initial random noise attack yielded a high success rate against diseased images, altering only a small number of pixels. However, we observed that the success rate of random noise attacks on negative images was significantly lower that than for positive images. Future studies will seek to clarify the underlying mechanisms of random noise attacks on images to differentiate methodologies for attacking positive images and those for attacking negative images.

Acknowledgments

Funding: None.

Footnote

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-1764/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki (as revised in 2013).

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521:436-44. [Crossref] [PubMed]
Chen Z, Pawar K, Ekanayake M, Pain C, Zhong S, Egan GF. Deep Learning for Image Enhancement and Correction in Magnetic Resonance Imaging-State-of-the-Art and Challenges. J Digit Imaging 2023;36:204-30. [Crossref] [PubMed]
Finlayson SG, Bowers JD, Ito J, Zittrain JL, Beam AL, Kohane IS. Adversarial attacks on medical machine learning. Science 2019;363:1287-9. [Crossref] [PubMed]
Sorin V, Soffer S, Glicksberg BS, Barash Y, Konen E, Klang E. Adversarial attacks in radiology - A systematic review. Eur J Radiol 2023;167:111085. [Crossref] [PubMed]
Ahmed S, Dera D, Hassan SU, Bouaynaya N, Rasool G. Failure Detection in Deep Neural Networks for Medical Imaging. Front Med Technol 2022;4:919046. [Crossref] [PubMed]
Bortsova G, González-Gonzalo C, Wetstein SC, Dubost F, Katramados I, Hogeweg L, Liefers B, van Ginneken B, Pluim JPW, Veta M, Sánchez CI, de Bruijne M. Adversarial attack vulnerability of medical image analysis systems: Unexplored factors. Med Image Anal 2021;73:102141. [Crossref] [PubMed]
Li Y, Liu S. Adversarial Attack and Defense in Breast Cancer Deep Learning Systems. Bioengineering (Basel) 2023;10:973. [Crossref] [PubMed]
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan D, Goodfellow I, Fergus R. Intriguing properties of neural networks. 2nd International Conference on Learning Representations. 2014. doi: 10.48550/arXiv.1312.619910.48550/arXiv.1312.6199
Goodfellow IJ, Shlens J, Szegedy C. Explaining and harnessing adversarial examples. 3rd Int Conf Learn Represent 2015:1-11. doi: 10.48550/arXiv.1412.657210.48550/arXiv.1412.6572
Fawzi A, Moosavi-Dezfooli SM, Frossard P. The robustness of deep networks: A geometrical perspective. IEEE Signal Processing Magazine 2017;34:50-62. [Crossref]
Papernot N, Mcdaniel P, Jha S, Fredrikson M, Swami A. The limitations of deep learning in adversarial settings. 2016 IEEE European Symposium on Security and Privacy (EuroS&P) 2016:372-87.
Simonyan K, Vedaldi A, Zisserman A. Deep inside convolutional networks: Visualising image classification models and saliency maps. 2nd International Conference on Learning Representations 2014. doi: 10.48550/arXiv.1312.6034.10.48550/arXiv.1312.6034
Engstrom L, Tran B, Tsipras D, Schmidt L, Madry A. A rotation and a translation suﬀice: Fooling cnns with simple transformations. CoRR, abs/1712.02779, 2017. doi: 10.48550/arXiv.1712.02779
Tiwari K, Zhang L. Implications of Minimum Description Length for Adversarial Attack in Natural Language Processing. Entropy (Basel) 2024;26:354. [Crossref] [PubMed]
Zhang WE, Sheng QZ, Alhazmi A, Li C. Adversarial Attacks on Deep-learning Models in Natural Language Processing: A Survey. ACM Transactions on Intelligent Systems and Technology 2020;11:1-41. (TIST). [Crossref]
Bhanushali AR, Mun H, Yun J. Adversarial Attacks on Automatic Speech Recognition (ASR): A Survey. IEEE Access 2024;12:88279-302.
Yan S, Ren J, Wang W, Sun L, Zhang W, Yu Q. A Survey of Adversarial Attack and Defense Methods for Malware Classification in Cyber Security. IEEE Communications Surveys & Tutorials 2023;25:467-96. [Crossref]
Yuan J, He ZH. Consistency-Sensitivity Guided Ensemble Black-Box Adversarial Attacks in Low-Dimensional Spaces. ICCV 2021:7758-66.
Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P. Universal adversarial perturbations. 2017 IEEE Conference on Computer Vision and Pattern Recognition 2017:86-94.
Su JW, Vargas DV, Sakurai K. One Pixel Attack for Fooling Deep Neural Networks. IEEE Transactions on Evolutionary Computation 2019;23:828-41. [Crossref]
Kaggle Diabetic Retinopathy Challenge. Available online: https://www.kaggle.com/c/diabetic-retinopathy-detection/data/
Wang X, Peng Y, Lu L, Lu Z, Bagheri M, Summers RM. Chestx-ray8: Hospital-scale chest x-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. CVPR 2017:2097-106.
International Skin Imaging Collaboration. Available online: https://www.isic-archive.com
Das S, Suganthan PN. Differential Evolution: A Survey of the State-of-the-Art. IEEE Transactions on Evolutionary Computation 2011;4-31. [Crossref]
Wu X, Huang Y, Guan H, Niu B, Lan F. Noise Non-Differentiable in Deep Learning End-to-End Image Watermarking Models. 2023 International Conference on Culture-Oriented Science and Technology (CoST) 2023:146-51.
He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2016:770-8.
Lei S, He F, Yuan Y, Tao D. Understanding Deep Learning via Decision Boundary. IEEE Trans Neural Netw Learn Syst 2023; Epub ahead of print. [Crossref] [PubMed]
Jiang H, Song Q, Kernec JL. Searching the Adversarial Example in the Decision Boundary. International Conference on UK-China Emerging Technologies (UCET) 2020:1-4.

Cite this article as: Wang W, Wildgruber M, Wang Y. Attacking medical images with minimal noise: exploiting vulnerabilities in medical deep-learning systems. Quant Imaging Med Surg 2024;14(12):9374-9384. doi: 10.21037/qims-24-1764

Attacking medical images with minimal noise: exploiting vulnerabilities in medical deep-learning systems

Introduction

Methods

Dataset

Problem description

DE

Decrease group differential evolution (DGDE)

Results

Accuracy of random noise attack

Table 1

Adversarial images

Table 2

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share