Simultaneous classification and delineation of seven types of histological growth patterns in lung adenocarcinomas using self-supervised learning and online hard patch mining
Introduction
According to recent global cancer statistics, lung cancer remains the leading cause of cancer-related fatalities worldwide (1), with lung adenocarcinomas (LUADs) being the most prevalent subtype in many regions. LUAD progresses through precancerous lesions such as atypical adenomatous hyperplasia (AAH) (2), which can advance to adenocarcinoma in situ (AIS), then to minimally invasive adenocarcinoma (MIA), and finally to invasive lung adenocarcinoma (INA) if untreated (3,4). Among INA cases, invasive non-mucinous adenocarcinomas (INMA) are the most common subtype. The 5th edition of the WHO classification recognizes various pattern-based subtypes within INMA, including lepidic, acinar, papillary, solid, complex gland, and micropapillary adenocarcinomas (as shown in Figure 1). These subtypes differ significantly in their differentiation and grading. Well-differentiated adenocarcinomas are lepidic predominant with less than 20% high-grade patterns, moderately differentiated adenocarcinomas are acinar/papillary predominant with less than 20% high-grade patterns, and high-grade adenocarcinomas are characterized by any predominant pattern with 20% or more high-grade patterns (5). Accurate identification of these growth patterns, normal tissue, and AIS is crucial, as AIS may coexist with INMA and serves as an essential indicator for pre-invasive lesions (6). Currently, the interpretation of histopathologic images relies heavily on pathologists, with manual analysis taking about one hour per image and being dependent on individual expertise (7,8).
The radiologic-pathologic correlation in the LUAD spectrum is primarily demonstrated through the imaging-based prediction and identification of lesions at various stages, ranging from noninvasive to minimally invasive and invasive. Imaging modalities, particularly low-dose CT scans, have significantly enhanced the detection of sub-solid nodules (SSNs), which are strongly associated with lesions across the LUAD spectrum. Based on the IASLC/ATS/ERS classification, these lesions include AAH, AIS, MIA, and invasive pulmonary adenocarcinoma (IPA). Each of these lesions exhibits distinct imaging features, such as nodule size, shape, density, and margin characteristics, which correlate closely with their pathologic aggressiveness and prognosis. Given the high prevalence of non-smoking-related lung cancer in Asian populations, low-dose computed tomography (CT) screening plays a crucial role. Accurate imaging evaluation is therefore essential for the early and precise diagnosis of LUAD, enabling timely intervention and improving patient outcomes (9).
In recent years, deep learning (DL) has been widely used for medical image processing (10,11). The analysis of whole-slide images (WSIs) using automated or computer-aided diagnosis has gained recognition in the field of pathology (12). A WSI is typically created by digitally converting glass slide tissue into a high-resolution virtual slide, assembled from multiple images (13). Tellez et al. (14) proposed a method to compress WSIs into a low-dimensional latent space for training with neural networks and WSI-level labels. However, this down-sampling process leads to a loss of detailed information, affecting interpretation accuracy. The compressed WSI file size demands substantial input/output (IO) capabilities, making clinical application challenging. To address these limitations, most studies have adopted the patch-based approach, splitting WSIs into patches of equal size during DL analysis (15-21). Subsequently, inference is performed on individual patches to generate diagnostic results at the WSI level. This approach significantly reduces computational and IO costs while preserving WSI details. Furthermore, a self-supervised method on unannotated pathology slides has been proposed, constructing a histomorphological phenotype atlas and correlating with lung cancer survival/immunophenomics (22).
Despite the advantages of patch-based methods, several challenges remain. Firstly, the sliding window operation may generate numerous background patches devoid of tissue (Figure 2), hindering model training efficiency if left unattended. Secondly, accurate labeling of all growth patterns in WSIs is challenging, resulting in sparse labeled data for patch-based DL tasks. This hampers model training to achieve high performance. In some cases, certain growth patterns may share similar textures with other growth patterns. For example, distinguishing between the lepidic growth pattern and the papillary and acinar growth patterns can be challenging (23) (Figure 2). Correctly classifying these patches during model training presents a significant challenge. To solve sparse labeled data in patch-based LUAD classification, a semi-supervised framework with dynamic confidence threshold and multi-teacher distillation achieves high accuracy (vs. supervised models/pathologists) (24).
To address these challenges, our study presents a patch-based DL classification model integrated with self-supervised learning (SSL) and online hard patch mining (OHPM) training strategies. The main contributions of our research are as follows:
- The proposed model can identify and delineate AIS and six INMA growth patterns in normal lung tissue. Both the number of identified subtypes/growth patterns and the classification accuracy outperformed those in the available studies.
- We introduce pseudo-labeling and consistent regularization strategies based on image augmentation to enrich training samples with unlabeled data, enhancing training efficiency and classification accuracy.
- We develop a loss-based OHPM method that can detect hard patches with similar image features, improving the extraction of the representative features as well as the ability to distinguish between hard patches.
- We introduce a novel preprocessing method based on contrastive learning to efficiently remove background patches (void of stained tissues) using the structural similarity index measure (SSIM) (25). This accelerates model training and may have broader applications for the analysis of patch-based WSIs.
We present this article in accordance with the CLEAR reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2282/rc).
Methods
Dataset description
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of Affiliated Beijing Chaoyang Hospital, Capital Medical University (Beijing, China; ethics approval number: 2023-8-23-73). Informed consent was taken from all the patients. The original dataset consisted of 400 histopathological sections from the Department of Pathology at Affiliated Beijing Chaoyang Hospital, Capital Medical University. Two pathologists meticulously examined all sections and reviewed relevant medical reports to confirm the presence of targeted INMA growth patterns (acinar, complex gland, lepidic, micropapillary, papillary, solid), AIS, or normal lung tissues. The sections were scanned at 40× magnification using a KFPRO-005-EX scanner (Ningbo Konfoong Bioinformation Tech Co., Ltd., China) and a PANNORAMIC Digital Slide scanner (3DHISTECH Ltd., Hungary), with pixel sizes of 0.125 µm/pixel and 0.172 µm/pixel, respectively, and were digitized in MRXS format. Pathologists used the Automated Slide Analysis Platform to outline and annotate the contours of the six INMA growth patterns and AIS. MRXS WSIs were converted to Tagged Image File Format using SlideMaster (3DHISTECH Ltd., Hungary), with contours and annotations exported in XML format. To ensure result validity, two pathologists, each with over ten years of clinical experience, independently reviewed the annotations. Inter-observer agreement metrics were calculated and assessed before excluding any slides to ensure consistency. Ultimately, 112 WSIs with inconsistent annotations were excluded. The distribution of tissue types in the remaining 288 WSIs is shown in Figure 3. The Ethics Committee of Beijing Chaoyang Hospital, Capital Medical University (Beijing, China), approved the use of this dataset for related research.
Second, we utilized the Dartmouth lung cancer histology dataset provided by Wei et al. (20) as an independent test set. This dataset consisted of 143 hematoxylin and eosin (H&E)-stained formalin-fixed paraffin-embedded (FFPE) LUAD WSIs from the Department of Pathology and Laboratory Medicine at Dartmouth-Hitchcock Medical Center (DHMC). Images were scanned using an Aperio AT2 and stored in TIFF format. This dataset was used for model inference to evaluate its performance in a multi-center independent test set. During preprocessing, all patches were resized to 512×512 pixels. Patches with more than half of the total area labeled were considered validly labeled and used for subsequent training and evaluation.
Besides, we constructed the unlabeled dataset for SSL by randomly selecting 50 unlabeled patches from each WSI in the original dataset and, for the unlabeled patches, required that they must contain lung tissue.
The proposed workflow
Figure 4 illustrates the proposed workflow. The EfficientNet-B0 model (26) integrates SSL and OHPM to differentiate and delineate seven histological characteristics of LUAD alongside normal tissues. During the inference phase, the model analyzes patches containing lung tissue, with predicted probabilities aggregated into WSI level probability maps based on the upper-left coordinates of the patches. The area of each growth pattern is calculated from these probability maps to facilitate grading according to the IASLC grading system. To assess the accuracy of the segmented growth patterns, the Intersection over Union (IoU) metric was computed using binary masks of tumor regions predicted by the model and labeled by pathologists.
Dataset preprocessing
During the sliding window operation (1,024 pixels × 1,024 pixels), patches containing multiple LUAD types were excluded based on annotations from the original WSI. However, removing background patches (those devoid of tissue) proved challenging due to variations in RGB values from imperfect imaging and staining. As a result, explicit thresholding was not feasible. Instead, a similarity measure was employed to eliminate background patches. Initially, two background patches were randomly selected from each WSI, resulting in a total of 576 patches. The mean values and standard deviations for each RGB channel were calculated across all selected background patches, and a centroid was determined using k-means clustering. The SSIM (25) was then utilized to measure similarity between images, guiding the removal of background patches, as shown in Eq. [1]:
where denotes a patch extracted using a sliding window from the WSI, is the reference image. and represent the respective mean values of and , and represents the standard deviations of and , respectively. signifies the mutual covariance between and . and are hyperparameters and are set to 1e−4 and 9e−4, respectively (25). In this study, the threshold used for filtering the background patches was set to 0.9 based on two main considerations. The SSIM index ranges from −1 to 1, with values closer to 1 indicating higher similarity between images. Setting a higher threshold effectively removes background regions while preserving lung tissue areas. Our validation method, guided by pathologists’ expertise, checks whether screened patches are background or contain lung tissues, helping adjust the threshold to balance removal effectiveness and computational time. Results show that a threshold of 0.9 achieves this balance optimally.
Subsequently, we proceeded to randomly partition all the preprocessed patches into training, validation, and testing sets. This allocation was carried out with a distribution ratio of 8:1:1, as detailed in Table 1.
Table 1
| Dataset partition | Acinar | Micropapillary | Normal | Complex gland | Lepidic | Papillary | Solid | AIS |
|---|---|---|---|---|---|---|---|---|
| Training | 2,262 | 566 | 19,008 | 477 | 8,487 | 838 | 427 | 21,904 |
| Validation | 272 | 61 | 2,356 | 66 | 1,086 | 111 | 61 | 2,753 |
| Testing | 283 | 69 | 2,449 | 68 | 1,052 | 103 | 53 | 2,670 |
| Sum | 2,817 | 696 | 23,813 | 611 | 10,625 | 1,052 | 541 | 27,307 |
AIS, adenocarcinoma in situ.
EfficientNet-based model for tissue classification
EfficientNet (26) is a family of image classification models designed to accommodate varying resource constraints by uniformly scaling depth, width, and resolution through a composite approach. The baseline model, EfficientNet-B0, is optimized for accuracy and efficiency using AutoML techniques. The family includes several variants, ranging from B0 to B7. We chose EfficientNet-B0 over larger variants like B3 or B4 due to its balance of computational efficiency and model performance. EfficientNet-B0 provides a good trade-off between resource usage and accuracy, making it suitable for many applications. It avoids the excessive computational demands and potential overfitting risks of larger models. Additionally, it offers solid baseline performance and can be efficiently deployed on devices with limited computational power, ensuring faster training and inference times while maintaining acceptable accuracy levels for the task. In our study, the output layer of the model is replaced with a SoftMax layer to predict the classification probabilities of eight distinct growth patterns (shown as Px in Figure 5, where x represents the corresponding growth pattern names). The MBConv layer utilizes a depth-separable convolution technique, which reduces the total number of model parameters by approximately one-third, thereby enhancing training efficiency through a shortcut mechanism.
SSL method for pseudo-label generation
The FixMatch method, as proposed in reference (27), utilizes a SSL approach to enhance information in unlabeled patches by generating high-confidence pseudo-labels through consistency regularization and data augmentation. The process involves several key steps. Initially, supervised training is conducted using labeled data, where the loss value () is computed using the cross entropy loss (CEL) metric. Weak augmentation is first applied to the unlabeled data, incorporating random horizontal and vertical flips. Subsequently, strong augmentation is performed, which includes random cropping, equalization, rotation, and flips, along with random adjustments to brightness, contrast, saturation, and hue with a 50% probability using a color jitter operation. The trained model then predicts growth pattern probabilities using the weakly augmented unlabeled data, generating pseudo-labels based on predictions exceeding a predefined confidence threshold (). These pseudo-labels are subsequently used to compute the CEL value () with the strongly augmented unlabeled data in a supervised manner. A pathologist verified the pseudo-labels generated for different growth patterns to ensure their quality. Finally, the model parameters are optimized based on the combined loss . By minimizing this loss, the model is optimized to generate consistent predictions across varying augmentations of the same image, thereby enhancing its robustness and generalization capability. The algorithm detailing this procedure is presented in Table 2.
Table 2
| Algorithm 1 The proposed SSL algorithm |
| Input: EfficientNet-B0 model . Labeled batch , unlabeled batch , confidence threshold . |
| (Calculating CEL () for labeled batch .) |
| Weak and strong augmentation for unlabeled batch, and extracting and , , , (Predicting the probabilities for weak augmentation batch , .) , (Generating the pseudo label for unlabeled batch .) |
| (Calculating CEL () for the strong augmentation unlabeled batch . Note that when is true, otherwise ) |
| Optimization of the model parameters according to |
CEL, cross entropy loss; SSL, self-supervised learning.
OHPM method
As noted in the introduction, patches with similar histological features across different growth patterns can impede model convergence. To tackle this issue, we propose the OHPM method, designed to enhance the feature extraction of hard patches. This method involves dividing labeled data into batches, which are used for model training. For each batch, the training loss for each patch is calculated using CEL, and the average loss of the batch serves as a threshold. Patches with losses exceeding this threshold are identified as hard patches, which are then subjected to a strong enhancement process as detailed in Section “SSL method for pseudo-label generation”. The enhanced patches are combined with the corresponding hard patches to form new batches, aiding in model fine-tuning. Algorithm 2 outlines the OHPM method.
To validate the effectiveness of OHPM, we conduct an ablation experiment (Section “Ablation experiments”) comparing model performance with and without the OHPM method. The results demonstrate that incorporating OHPM improves the model’s ability to handle hard patches, thereby enhancing overall performance. Algorithm 2 outlines the OHPM method, and the detailed procedure of this algorithm is presented in Table 3.
Table 3
| Algorithm 2 The OHPM method |
| Input: EfficientNet-B0 model . Labeled batch . (Calculating CEL () for each patch in labeled batch .) (Calculating average loss () for labeled batch .) |
| Initial hard patch list |
| for |
| if: |
| strong augmentation for , and get |
| append and to |
| end |
| end |
| Calculating CEL () for the hard patches . |
| Optimization of the model parameters according to |
CEL, cross entropy loss; OHPM, online hard patch mining.
Experimental configurations
The code was implemented using the PyTorch framework. Data preprocessing, model training, and inference were conducted on a server running CentOS 7 and equipped with dual NVIDIA V100 16GB Volta Graphics Processing Units (GPUs).
During the training phase, transfer learning was employed by utilizing weight trained on ImageNet. The batch size was set to 16, and the initial learning rate was 0.0001. The optimizer was AdamW. To address the issue of imbalance data for classification, we employed a weighted random sampler to resample the training data based on the number of patches for each growth pattern.
To comprehensively evaluate the performance of the proposed framework and avoid data leakage, we adopt a case-based approach for dataset partitioning. Specifically, we utilize five-fold cross-validation to train and validate the models, ensuring that patches from the same case (i.e., the same WSI) are exclusively assigned to either the training, validation, or testing set, but not shared across multiple sets. This strategy effectively mitigates the risk of overfitting, as it prevents the model from being exposed to highly similar patches from the same case during both training and validation/testing phases. In addition, both the training and validation sets were guaranteed to contain all growth patterns included in the study. The training process consisted of 30 epochs, and the model with the minimum loss on the validation set was saved.
Evaluation metrics
To assess the performance of our model, various evaluation metrics, including accuracy, precision, recall, F1 score (F1), Cohen’s kappa (k) and IoU, were calculated using Python with the scikit-learn library.
Accuracy is defined as:
Precision is defined as:
Recall is defined as:
where, denotes the number of correctly classified positive samples; denotes the number of falsely classified positive samples; indicates the number of correctly classified negative samples, and is the number of falsely classified negative samples.
F1 is defined as:
is defined as:
where, denotes the relative observed agreement among raters for a certain metric, and is the hypothetical probability of chance agreement (28).
IoU is defined as:
where and represent the area of labeled and predicted regions, respectively.
95% confidence intervals (CIs) were estimated for precision, recall, F1 and IoU by bootstrapped (29) resampling the samples 100 times.
Results
Comparison with peer studies
On the dataset of this study, we trained the methods proposed by Gertych et al. (19) and Wei et al. (20) according to the same parameter settings as in this study and compared the results of the 5-fold validation. Tables 4,5 clearly demonstrate that the proposed method exhibits superior accuracy when compared to Gertych et al. across all three categories: acinar, micropapillary, and solid. The comparison results indicate that the proposed method achieved better results in all evaluation metrics. Additionally, the model’s parameter size is only 3.98 M, which is approximately only 64% of the model parameters proposed in Wei et al. (11.18 M). The suggested lightweight model can analyze a WSI in under 1 minute. Consequently, the model can be deployed on edge computing devices to improve prediction efficiency.
Table 4
| Evaluation metrics | Algorithms | Acinar | Micropapillary | Normal | Complex gland |
|---|---|---|---|---|---|
| Accuracy | Gertych et al. (19) | 0.76±0.11 | 0.82±0.12 | 0.84±0.10 | 0.76±0.11 |
| Wei et al. (20) | 0.86±0.10 | 0.89±0.08 | 0.83±0.09 | 0.82±0.11 | |
| Our model | 0.93±0.06† | 0.95±0.04† | 0.98±0.01† | 0.92±0.03† | |
| Precision | Gertych et al. (19) | 0.81±0.10 | 0.80±0.13 | 0.83±0.12 | 0.77±0.11 |
| Wei et al. (20) | 0.88±0.09 | 0.87±0.11 | 0.86±0.07 | 0.84±0.09 | |
| Our model | 0.96±0.03† | 0.93±0.05† | 0.97±0.02† | 0.94±0.06† | |
| Recall | Gertych et al. (19) | 0.79±0.08 | 0.81±0.10 | 0.80±0.09 | 0.78±0.06 |
| Wei et al. (20) | 0.86±0.07 | 0.85±0.06 | 0.84±0.06 | 0.83±0.10 | |
| Our model | 0.95±0.01† | 0.97±0.02† | 0.96±0.03† | 0.96±0.02† | |
| F1 | Gertych et al. (19) | 0.80±0.08 | 0.80±0.11 | 0.81±0.12 | 0.77±0.10 |
| Wei et al. (20) | 0.87±0.07 | 0.86±0.09 | 0.85±0.06 | 0.83±0.07 | |
| Our model | 0.95±0.02† | 0.95±0.03† | 0.96±0.01† | 0.95±0.02† | |
| AUC | Gertych et al. (19) | 0.92±0.04 | 0.94±0.05 | 0.96±0.03 | 0.95±0.02 |
| Wei et al. (20) | 0.97±0.02 | 0.98±0.03 | 0.98±0.02 | 0.96±0.01 | |
| Our model | 0.997±0.002† | 0.998±0.001† | 0.996±0.002† | 0.994±0.001† |
Data are presented as mean ± standard deviation. †, the best results. AUC, area under the curve.
Table 5
| Evaluation metrics | Algorithms | Lepidic | Papillary | Solid | AIS |
|---|---|---|---|---|---|
| Accuracy | Gertych et al. (19) | 0.81±0.11 | 0.78±0.10 | 0.87±0.09 | 0.88±0.11 |
| Wei et al. (20) | 0.82±0.11 | 0.79±0.11 | 0.89±0.11 | 0.90±0.11 | |
| Our model | 0.90±0.01† | 0.93±0.05† | 0.98±0.01† | 0.95±0.03† | |
| Precision | Gertych et al. (19) | 0.79±0.12 | 0.76±0.10 | 0.82±0.09 | 0.86±0.08 |
| Wei et al. (20) | 0.88±0.10 | 0.84±0.07 | 0.88±0.06 | 0.90±0.07 | |
| Our model | 0.90±0.01† | 0.87±0.10† | 0.98±0.01† | 0.98±0.02† | |
| Recall | Gertych et al. (19) | 0.83±0.10 | 0.77±0.12 | 0.83±0.11 | 0.84±0.13 |
| Wei et al. (20) | 0.90±0.08 | 0.88±0.10 | 0.92±0.05 | 0.89±0.06 | |
| Our model | 0.91±0.04† | 0.97±0.01† | 0.99±0.01† | 0.96±0.02† | |
| F1 | Gertych et al. (19) | 0.81±0.11 | 0.76±0.12 | 0.82±0.08 | 0.85±0.07 |
| Wei et al. (20) | 0.89±0.07 | 0.86±0.08 | 0.90±0.06 | 0.89±0.05 | |
| Our model | 0.90±0.03† | 0.92±0.04† | 0.98±0.01† | 0.97±0.02† | |
| AUC | Gertych et al. (19) | 0.94±0.03 | 0.93±0.02 | 0.91±0.04 | 0.97±0.01 |
| Wei et al. (20) | 0.96±0.02 | 0.97±0.01 | 0.95±0.02 | 0.98±0.01 | |
| Our model | 0.992±0.003† | 0.998±0.001† | 0.998±0.001† | 0.996±0.002† |
Data are presented as mean ± standard deviation. †, the best results. AIS, adenocarcinoma in situ; AUC, area under the curve.
The performances of the best classification model
The mean and 95% CI of the precision, recall, F1, and IoU of the model with minimal loss in the 5-fold cross-validation for each growth pattern and normal category are presented in Table 6. The results reveal that the micropapillary, normal, complex gland, solid, and AIS categories exhibited precision and recall values surpassing 0.96, indicating a low incidence of false positives and false negatives. However, the acinar and papillary categories displayed relatively lower accuracy, implying that the model showed sensitivity toward these patterns and a tendency to misclassify them as other categories. The IoU for each category exceeded 0.83, except for complex gland (0.77), signifying strong agreement between the model predictions and the pathologist’s annotations for most growth patterns at the WSI level. Figure 6 illustrates some examples of delineated growth patterns at the WSI level.
Table 6
| Histologic subtype | Precision (95% CI) | Recall (95% CI) | F1 (95% CI) | IoU (95% CI) |
|---|---|---|---|---|
| Acinar | 0.89 (0.5, 1) | 0.95 (0.5, 1) | 0.9 (0.67, 1) | 0.83 (0.75, 0.89) |
| Micropapillary | 0.99 (0.99, 1) | 0.97 (0.93, 1) | 0.98 (0.78, 1) | 0.83 (0.76, 0.87) |
| Normal | 0.99 (0.95, 1) | 1 (0.96, 1) | 0.99 (0.98, 1) | 0.99 (0.97, 0.99) |
| Complex gland | 1 (0.99, 1) | 0.99 (0.99, 1) | 0.99 (0.99, 1) | 0.77 (0.71, 0.82) |
| Lepidic | 0.9 (0.75, 1) | 0.91 (0.71, 1) | 0.9 (0.78, 1) | 0.87 (0.79, 0.94) |
| Papillary | 0.87 (0.5, 1) | 0.97 (0.67, 1) | 0.9 (0.67, 1) | 0.86 (0.81, 0.92) |
| Solid | 1 (0.99, 1) | 0.99 (0.99, 1) | 0.99 (0.99, 1) | 0.92 (0.88, 0.96) |
| AIS | 0.98 (0.92, 1) | 0.96 (0.89, 1) | 0.97 (0.92, 1) | 0.93 (0.89, 0.98) |
AIS, adenocarcinoma in situ; CI, confidence interval; IoU, Intersection over Union.
Comparison with the outlined results by pathologists
We conducted a comparison between the inference results obtained from the proposed model and those made by experienced pathologists. The arithmetic mean of Cohen’s kappa scores for 100 sample tests was 0.96 (Table 7). We then aggregated the inference results at the patch level and calculated Cohen’s kappa score at the WSI level, which reached 0.97 (Table 7). This indicates that the proposed model consistently achieved results comparable to those of experienced pathologists. Figure 7 presents result from the proposed model and the pathologists.
Table 7
| Consistency type | Cohen’s kappa (95% CI) |
|---|---|
| Consistency of model predictions with pathologist annotations (patch level) | 0.96 (0.93, 0.99) |
| Consistency of model predictions with pathologist annotations (WSI level) | 0.97 (0.94, 0.98) |
Average Cohen’s kappa is calculated by averaging kappa scores for all subsamples. CI, confidence interval; WSI, whole-slide image.
Ablation experiments
To validate the improved performance of SSL and the OHPM, three training configurations were utilized in the study. Firstly, the model was trained using labeled patches without SSL and OHPM (baseline). Secondly, unlabeled patches were employed to train the model with SSL (baseline + SSL). Lastly, the OHPM method was integrated into SSL training (baseline + SSL + OHPM). Mean precision, recall, and F1 scores, along with their 95% CIs, were presented in Figure 8.
Impact of the SSL method
The introduction of the SSL method resulted in improved mean values of precision, recall, and F1 score for all categories when compared to Baseline and Baseline with SSL. For the micropapillary, complex gland, and papillary categories, the improvements in precision, recall, and F1 score exceeded 0.1. Consequently, the CI lengths for precision, recall, and F1 score were significantly narrowed compared to the Baseline, suggesting that the SSL method enhanced the classification precision of micropapillary, complex gland, papillary, and solid categories. The narrower CIs indicate improved stability in the model’s classification. Table 8 demonstrates the agreement between the model’s predicted labels for the unlabeled samples included in the training and the pathologist’s results for their labeling, as measured by accuracy. The results show that the agreement for each subtype is greater than or equal to 0.92, indicating that the pseudo-labels generated by the adopted SSL strategy are consistent with the pathologist’s diagnostic results and are effective in improving the model’s performance.
Table 8
| Histologic subtype | Accuracy |
|---|---|
| Acinar | 0.92 |
| Micropapillary | 0.93 |
| Normal | 0.96 |
| Complex gland | 0.92 |
| Lepidic | 0.94 |
| Papillary | 0.95 |
| Solid | 0.95 |
| AIS | 0.96 |
AIS, adenocarcinoma in situ.
Impact of the OHPM method
Compared to the Baseline + SSL model, the introduction of the OHPM method further improved the mean values of precision, recall, and F1 score for all categories. Specifically, the precision and F1 score values for the acinar and lepidic categories, as well as the recall of the lepidic and AIS categories, exhibited improvements greater than 0.1. Additionally, compared to baseline + SSL, the CI lengths for precision, recall, and F1 score were reduced to varying degrees. This suggests that the introduction of the OHPM method improved acinar, lepidic, and AIS classification precision, while the narrowing of the CI indicates further enhancement in the model’s classification stability.
Figure 9 illustrates the changes in loss values and validation accuracies during the training process for models employing different training strategies. The red and green curves depict the training outcomes with and without the OHPM method, respectively. Compared to training without the OHPM method, incorporating the OHPM method does not significantly increase computational overhead or training time. Instead, it results in a smoother decrease in training loss and a more gradual increase in validation accuracy. This leads to reduced training loss and enhanced validation accuracy. These findings suggest that the OHPM method not only optimizes the model’s training process but also boosts its overall performance.
Comparison of different backbone networks
As shown in Table 9, the proposed model achieves an overall classification accuracy of 95%, a precision of 95%, a recall of 96%, and an F1 score of 95%. These results are significantly better than ResNet-50 (30), DenseNet-169 (31), and ViT-B models (32). For example, compared to ResNet-50, our model improves 4% in accuracy, 3% in precision, 9% in recall, and 3% in F1 score. Compared to DenseNet-169, our model improves 4% in accuracy, 4% in precision, 4% in recall, and 4% in F1 score. Compared to ViT-B, our model improves 2% in accuracy, 2% in precision, 3% in recall, and 2% in F1 score.
Table 9
| Student | Accuracy | Precision | Recall | F1 |
|---|---|---|---|---|
| ResNet-50 | 0.91 | 0.92 | 0.92 | 0.92 |
| DenseNet-169 | 0.91 | 0.91 | 0.92 | 0.91 |
| ViT-B | 0.93 | 0.93 | 0.93 | 0.93 |
| Our model | 0.95† | 0.95† | 0.96† | 0.95† |
†, the best results.
Impact of each component in the SSL method
Table 10 compares how different component compositions in the SSL approach affect the model’s classification results. The combination of pseudo-labeling and consistency regularization achieves the highest accuracy (95.54%), precision (95.18%), recall (87.56%), and F1 score (94.89%). In comparison, using only pseudo-labeling results in an accuracy of 85.23%, precision of 70.65%, recall of 72.33%, and F1 score of 71.48%. When only consistency regularization is applied, the accuracy is 86.50%, precision is 74.20%, recall is 73.38%, and F1 score is 73.79%. These results indicate that integrating both pseudo-labeling and consistency regularization substantially improves the model’s ability to classify various growth patterns of LUADs. The findings emphasize the importance of combining these strategies to maximize their individual strengths. This integration not only enhances the model’s accuracy but also achieves a better balance between precision and recall, as evidenced by the higher F1 score.
Table 10
| Method | Accuracy (%) | Precision (%) | Recall (%) | F1 (%) |
|---|---|---|---|---|
| Only pseudo-labeling | 85.23 | 70.65 | 72.33 | 71.48 |
| Only consistency regularization | 86.50 | 74.20 | 73.38 | 73.79 |
| Pseudo-labeling + consistency regularization | 95.54† | 95.18† | 87.56† | 94.89† |
†, the best results. SSL, self-supervised learning.
ANOVA
We performed a statistical analysis employing ANOVA to evaluate and compare the performance of three distinct models—baseline, baseline + SSL, and baseline + SSL + OHPM—across multiple categories in the classification of LUAD. The metrics under scrutiny included precision, recall, and the F1 score. The ANOVA outcomes revealed substantial performance disparities among the models for the majority of the categories examined. Notably, the baseline + SSL + OHPM model exhibited superior performance, consistently achieving higher precision, recall, and F1 scores across the majority of categories when compared to its counterparts. The enhanced performance of this model implies that the incorporation of SSL and OHPM methodologies significantly refines the training process, culminating in more precise and dependable classification results. The statistical significance of these disparities highlights the potential of these sophisticated approaches to augment diagnostic precision within the realm of histopathological image analysis.
Test results on public datasets
In this study, the efficacy of the proposed method was rigorously evaluated using the Dartmouth lung cancer histology dataset. The performance of our method was compared with that of DiPalma et al. (33) and
Sheikh et al. (34), utilizing the same evaluation metrics employed in the present study. The detailed comparison of results is presented in Table 11, with all values directly extracted from the respective publications. The results demonstrate that our method achieved superior performance across all evaluation metrics for the various subtypes. Specifically, our method exhibited performance enhancements of 1.44%, 19.34%, 6.78%, and 19.16% over DiPalma et al. (33), and 0.99%, 1.04%, and 0.84% over Sheikh et al. (34). These improvements not only underscore the superiority of our method but also validate its robust generalization capability.
Discussion
The present model exhibits better performances
Compared to the available studies, our model consistently outperformed them across the evaluation metrics employed. This improvement can be attributed to several factors, including our model selection and the utilization of SSL and OHPM integrated into our study.
Firstly, we used the EfficientNet model architecture, which concurrently optimizes resolution, depth, and width, circumventing the limitations of optimizing each individually (26). As a result, it achieves better classification performance in several medical image classification tasks (35,36). Given the large size of WSIs, it is crucial to employ lightweight yet highly accurate classification models for sliding window-based WSI analysis methods. We chose EfficientNet-B0 as our classification network, with a model parameter size not exceeding 3.98 M, meeting the requirements for timely clinical analysis.
Secondly, the application of SSL was instrumental in augmenting the efficiency of supervised training by capitalizing on a significant volume of unlabeled data in tandem with a smaller labeled dataset. The critical aspect lies in enhancing the model’s accuracy in generating pseudo-labels for unlabeled data. In this study, we employed a consistency regularization scheme to achieve SSL for unlabeled data by integrating the disparity between predicted values of two forms of augmented images derived from the same source image into the objective function.
Lastly, our study revealed that certain morphological similarities among histological growth patterns in LUAD could yield inaccurate classification outcomes. The OHPM method led to a 2% improvement in classification precision for acinar and papillary categories (Figure 8). This outcome underscored the effectiveness of this approach, consistent with findings of Xu et al. (37). The method emulated a human learning approach, where repetitive learning of complex events resulted in the establishment of more connections among diverse neurons in the brain. Consequently, an increased number of conditioned reflex circuits formed, activating previously dormant synapses, which ultimately enhanced synaptic activity and facilitated a wider range of information transfer (38).
SSL and OPHM are effective solutions to improve model performance
Ablation experiments demonstrated that the incorporation of SSL and OHPM had a positive impact on precision, recall, F1 score, and model stability. Additionally, a comparison was made to measure classification accuracy before and after the adoption of SSL and OHPM (Figure 10). Figure 10A depicted the confusion matrix for the Baseline model, revealing classification errors exceeding 0.1, indicated by solid red squares. These misclassifications included the model’s tendency to classify acinar as lepidic, complex gland as acinar, and papillary as lepidic, resulting in lower classification accuracy for these three categories. In contrast, the Baseline + SSL + OHPM model achieved accuracy exceeding 0.9 for each category (Figure 10B). This approach reduced the probability of the model misclassifying acinar, complex gland, and papillary as lepidic by 7%, 9%, and 12%, as shown by the red dashed squares in Figure 10B. Simultaneously, classification accuracy for acinar, complex gland, and papillary increased by 9%, 16%, and 16%. However, challenges persisted for acinar, complex gland, lepidic, and AIS classification due to the histological feature similarities among these growth patterns (24,39).
Clinical applicability in terms of different imaging devices
Although our study employed an internal dataset for training and testing, our proposed method for classifying LUAD growth patterns holds promise for clinical applications for two key reasons. Firstly, the hematoxylin-eosin staining method, used in the study, is also commonly used in most clinical practices of similar purposes (40). It indicates the proposed model can be used for most current clinical histopathology diagnosis. Furthermore, morphological characteristics play a central role in the WSI-based interpretation of LUAD growth patterns. The faithful reproduction of colors in WSI images relies on the precision of microscopes and digital scanners. The U.S. Food and Drug Administration has issued guidance emphasizing the importance of consistent and reliable display of digital microscope images (41). In addition, the International Color Consortium recommends regular color calibration and compliance checks for all medical-grade displays, with a suggested interval of every 50 days, as these displays can undergo changes over time (42). These standardizations have been applied by most of the manufacturers and would ensure the stability of the proposed model when dealing with images acquired from various imaging devices. Consequently, this method can be broadly applied in various clinical scenarios for LUAD staging.
Limitation and future work
The average IoU values over 100-sample tests for acinar, micropapillary, complex gland, lepidic, and papillary were below 0.9. This can be attributed to the relatively small, labeled regions of these subtypes in our dataset. As a result, the computation of IoU was more sensitive to the accuracy of segmentation in these regions. Patches located at the boundaries of labeled regions covered only a fraction of the labeled area, leading to smaller numerator values for IoU calculations. This inevitably resulted in lower IoU values compared to the patches including the complete labeled region. Given the small size of the labeled regions for these subtypes, the number of patches at the boundary did not significantly differ from those covering the complete labeled region. Consequently, IoU for boundary patches had a substantial impact on the overall IoU of the region. In future work, we plan to explore multi-scale learning to address the challenge of poor IoU for boundary patches by fusing features at different scales through methods such as pyramid pooling.
The similarity in histological features among acinar, complex gland, lepidic, and AIS categories contributes to a misclassification probability in these categories. ViT (32) has demonstrated superior performance compared to convolutional neural network (CNN) networks in certain image refinement classification tasks, primarily owing to its adaptive attention unit for filtering (43,44). In our future research, we intend to explore the fusion of CNN with ViT to enhance classification accuracy while keeping the model parameters in check (45,46). Furthermore, the patch-based classification method poses some challenges in accurately delineating the boundaries of growth patterns, leading to slightly inaccuracy in calculating the area of the specific growth pattern. We will develop the pixel-level segmentation model which allows calculating precisely the area and, ultimately, improving the accuracy of LUAD staging.
Conclusions
In this study, we developed a SSL framework to identify and delineate seven lung tumor growth patterns and normal lung tissue in WSIs. By leveraging weak data augmentation, we generated high-confidence pseudo-labels for unlabeled samples, thereby maximizing the use of unlabeled data. To address the challenge of feature extraction in complex patches, we integrated an OHPM strategy. Extensive experiments confirmed the effectiveness of our approach. Our model achieved performance comparable to pathologists, with an overall classification accuracy of 95.3%±5.5% and a recall rate of 96.8%±2.9%. Furthermore, the method’s ability to delineate growth areas contributed significantly to the diagnostic process, with an overall IoU of 87.5%±6.9%, which is particularly beneficial for assessing IPAs. As an automated solution, our model analyzes WSIs in approximately 1 minute using a standard workstation, approximately 60 times faster than manual interpretation. This approach not only reduces the workload for pathologists but also enhances diagnostic accuracy and advances computer-aided histological image analysis. The integration of SSL and OHPM enhances model performance and shows promise for deployment in diverse clinical settings with varying resource constraints.
Acknowledgments
None.
Footnote
Reporting Checklist: The authors have completed the CLEAR reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2282/rc
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-24-2282/dss
Funding: This work was supported by
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-2282/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The study was approved by the Ethics Committee of Affiliated Beijing Chaoyang Hospital, Capital Medical University (Beijing, China; ethics approval number: 2023-8-23-73). Informed consent was taken from all the patients.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A, Bray F. Global Cancer Statistics 2020: GLOBOCAN Estimates of Incidence and Mortality Worldwide for 36 Cancers in 185 Countries. CA Cancer J Clin 2021;71:209-49. [Crossref] [PubMed]
- Wang Z, Li Z, Zhou K, Wang C, Jiang L, Zhang L, Yang Y, Luo W, Qiao W, Wang G, Ni Y, Dai S, Guo T, Ji G, Xu M, Liu Y, Su Z, Che G, Li W. Deciphering cell lineage specification of human lung adenocarcinoma with single-cell RNA sequencing. Nat Commun 2021;12:6500. [Crossref] [PubMed]
- Amin MB, Paner GP, Alvarado-Cabrero I, Young AN, Stricker HJ, Lyles RH, Moch H. Chromophobe renal cell carcinoma: histomorphologic characteristics and evaluation of conventional pathologic prognostic parameters in 145 cases. Am J Surg Pathol 2008;32:1822-34. [Crossref] [PubMed]
- Robbins P, Pinder S, de Klerk N, Dawkins H, Harvey J, Sterrett G, Ellis I, Elston C. Histological grading of breast carcinomas: a study of interobserver agreement. Hum Pathol 1995;26:873-9. [Crossref] [PubMed]
- Moreira AL, Ocampo PSS, Xia Y, Zhong H, Russell PA, Minami Y, et al. A Grading System for Invasive Pulmonary Adenocarcinoma: A Proposal From the International Association for the Study of Lung Cancer Pathology Committee. J Thorac Oncol 2020;15:1599-610. [Crossref] [PubMed]
- Cagle PT, Miller RA, Allen TC. 17 - Nonneuroendocrine Carcinomas (Excluding Sarcomatoid Carcinoma) and Salivary Gland Analogue Tumors of the Lung. In: Leslie KO, Wick MR, editors. Practical Pulmonary Pathology: A Diagnostic Approach (Third Edition). Third Edition. Elsevier; 2018. p. 573-596.e6.
- Krupinski EA, Graham AR, Weinstein RS. Characterizing the development of visual search expertise in pathology residents viewing whole slide images. Hum Pathol 2013;44:357-64. [Crossref] [PubMed]
- Arvaniti E, Fricker KS, Moret M, Rupp N, Hermanns T, Fankhauser C, Wey N, Wild PJ, Rüschoff JH, Claassen M. Automated Gleason grading of prostate cancer tissue microarrays via deep learning. Sci Rep 2018;8:12054. [Crossref] [PubMed]
- Wu FZ, Wu YJ, Tang EK. An integrated nomogram combined semantic-radiomic features to predict invasive pulmonary adenocarcinomas in subjects with persistent subsolid nodules. Quant Imaging Med Surg 2023;13:654-68. [Crossref] [PubMed]
- Xu X, Li C, Fan X, Lan X, Lu X, Ye X, Wu T. Attention Mask R-CNN with edge refinement algorithm for identifying circulating genetically abnormal cells. Cytometry A 2023;103:227-39. [Crossref] [PubMed]
- Li C, Yao G, Xu X, Yang L, Zhang Y, Wu T, Sun J. DCSegNet: Deep Learning Framework Based on Divide-and-Conquer Method for Liver Segmentation. IEEE Access 2020;8:146838-46.
- Kumar N, Gupta R, Gupta S. Whole Slide Imaging (WSI) in Pathology: Current Perspectives and Future Directions. J Digit Imaging 2020;33:1034-40. [Crossref] [PubMed]
- Hanna MG, Reuter VE, Hameed MR, Tan LK, Chiang S, Sigel C, Hollmann T, Giri D, Samboy J, Moradel C, Rosado A, Otilano JR 3rd, England C, Corsale L, Stamelos E, Yagi Y, Schüffler PJ, Fuchs T, Klimstra DS, Sirintrapun SJ. Whole slide imaging equivalency and efficiency study: experience at a large academic center. Mod Pathol 2019;32:916-28. [Crossref] [PubMed]
- Tellez D, Litjens G, van der Laak J, Ciompi F. Neural Image Compression for Gigapixel Histopathology Image Analysis. IEEE Trans Pattern Anal Mach Intell 2021;43:567-78. [Crossref] [PubMed]
- Alsubaie N, Raza SEA, Snead D, Rajpoot NM. Growth Pattern Fingerprinting for Automatic Analysis of Lung Adenocarcinoma Overall Survival. IEEE Access 2023;11:23335-46.
- Campanella G, Hanna MG, Geneslaw L, Miraflor A, Werneck Krauss Silva V, Busam KJ, Brogi E, Reuter VE, Klimstra DS, Fuchs TJ. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat Med 2019;25:1301-9. [Crossref] [PubMed]
- Chan TH, Cendra FJ, Ma L, Yin G, Yu L. Histopathology Whole Slide Image Analysis with Heterogeneous Graph Representation Learning. In: 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2023. p. 15661-70.
- Coudray N, Ocampo PS, Sakellaropoulos T, Narula N, Snuderl M, Fenyö D, Moreira AL, Razavian N, Tsirigos A. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat Med 2018;24:1559-67. [Crossref] [PubMed]
- Gertych A, Swiderska-Chadaj Z, Ma Z, Ing N, Markiewicz T, Cierniak S, Salemi H, Guzman S, Walts AE, Knudsen BS. Convolutional neural networks can accurately distinguish four histologic growth patterns of lung adenocarcinoma in digital slides. Sci Rep 2019;9:1483. [Crossref] [PubMed]
- Wei JW, Tafe LJ, Linnik YA, Vaickus LJ, Tomita N, Hassanpour S. Pathologist-level classification of histologic patterns on resected lung adenocarcinoma slides with deep neural networks. Sci Rep 2019;9:3358. [Crossref] [PubMed]
- Yang H, Chen L, Cheng Z, Yang M, Wang J, Lin C, Wang Y, Huang L, Chen Y, Peng S, Ke Z, Li W. Deep learning-based six-type classifier for lung cancer and mimics from histopathological whole slide images: a retrospective study. BMC Med 2021;19:80. [Crossref] [PubMed]
- Claudio Quiros A, Coudray N, Yeaton A, Yang X, Liu B, Le H, Chiriboga L, Karimkhan A, Narula N, Moore DA, Park CY, Pass H, Moreira AL, Le Quesne J, Tsirigos A, Yuan K. Mapping the landscape of histomorphological cancer phenotypes using self-supervised learning on unannotated pathology slides. Nat Commun 2024;15:4596. [Crossref] [PubMed]
- Kuhn E, Morbini P, Cancellieri A, Damiani S, Cavazza A, Comin CE. Adenocarcinoma classification: patterns and prognosis. Pathologica 2018;110:5-11.
- Wang Q, Zhang Y, Lu J, Li C, Zhang Y. Semi-supervised lung adenocarcinoma histopathology image classification based on multi-teacher knowledge distillation. Phys Med Biol 2024;
- Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 2004;13:600-12. [Crossref] [PubMed]
- Tan M, Le Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. In: Chaudhuri K, Salakhutdinov R, editors. Proceedings of the 36th International Conference on Machine Learning. PMLR; 2019. p. 6105-14.
- Sohn K, Berthelot D, Li C-L, Zhang Z, Carlini N, Cubuk ED, Kurakin A, Zhang H, Raffel C. FixMatch: simplifying semi-supervised learning with consistency and confidence. In: Proceedings of the 34th International Conference on Neural Information Processing Systems. 2020.
- Cohen J. A Coefficient of Agreement for Nominal Scales. Educ Psychol Meas 1960;20:37-46.
- Efron B. Bootstrap Methods: Another Look at the Jackknife. In: Kotz S, Johnson NL, editors. Breakthroughs in Statistics: Methodology and Distribution. New York, NY: Springer New York; 1992. p. 569-93.
- He K, Zhang X, Ren S, Sun J. Deep Residual Learning for Image Recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 770-8.
- Huang G, Liu Z, Van Der Maaten L, Weinberger KQ. Densely Connected Convolutional Networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. p. 2261-9.
- Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S, Uszkoreit J, Houlsby N. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In: International Conference on Learning Representations. 2021.
- DiPalma J, Suriawinata AA, Tafe LJ, Torresani L, Hassanpour S. Resolution-based distillation for efficient histology image classification. Artif Intell Med 2021;119:102136. [Crossref] [PubMed]
- Sheikh TS, Kim JY, Shim J, Cho M. Unsupervised Learning Based on Multiple Descriptors for WSIs Diagnosis. Diagnostics (Basel) 2022;12:1480. [Crossref] [PubMed]
- Huang C, Wang W, Zhang X, Wang SH, Zhang YD. Tuberculosis Diagnosis Using Deep Transferred EfficientNet. IEEE/ACM Trans Comput Biol Bioinform 2023;20:2639-46. [Crossref] [PubMed]
- Batool A, Byun YC. Lightweight EfficientNetB3 Model Based on Depthwise Separable Convolutions for Enhancing Classification of Leukemia White Blood Cell Images. IEEE Access 2023;11:37203-15.
- Xu X, Li C, Lan X, Fan X, Lv X, Ye X, Wu T. A Lightweight and Robust Framework for Circulating Genetically Abnormal Cells (CACs) Identification Using 4-Color Fluorescence In Situ Hybridization (FISH) Image and Deep Refined Learning. J Digit Imaging 2023;36:1687-700. [Crossref] [PubMed]
- Bengtsson SL, Nagy Z, Skare S, Forsman L, Forssberg H, Ullén F. Extensive piano practicing has regionally specific effects on white matter development. Nat Neurosci 2005;8:1148-50. [Crossref] [PubMed]
- Borczuk AC. Updates in grading and invasion assessment in lung adenocarcinoma. Mod Pathol 2022;35:28-35. [Crossref] [PubMed]
- Morrison LE, Lefever MR, Lewis HN, Kapadia MJ, Bauer DR. Conventional histological and cytological staining with simultaneous immunohistochemistry enabled by invisible chromogens. Lab Invest 2022;102:545-53. [Crossref] [PubMed]
- fda.gov [Internet]. Technical performance assessment of digital pathology whole slide imaging devices: Guidance for industry and food and drug administration staff. Available online: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/technical-performance-assessment-digital-pathology-whole-slide-imaging-devices
- color.org [Internet]. Visualization of medical content on color display systems. Available online: https://www.color.org/whitepapers.xalter
- Zhang Y, Cao J, Zhang L, Liu X, Wang Z, Ling F, Chen W. A free lunch from ViT: adaptive attention multi-scale fusion Transformer for fine-grained visual recognition. In: ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 2022. p. 3234-8.
- He J, Chen J, Liu S, Kortylewski A, Yang C, Bai Y, Wang C, Yuille AL. TransFG: A Transformer Architecture for Fine-grained Recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2022. p. 852-60.
- Guo J, Han K, Wu H, Xu C, Tang Y, Xu C, Wang Y. CMT: Convolutional Neural Networks Meet Vision Transformers. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2022. p. 12175-85.
- Li K, Wang Y, Zhang J, Gao P, Song G, Liu Y, Li H, Qiao Y. UniFormer: Unifying Convolution and Self-Attention for Visual Recognition. IEEE Trans Pattern Anal Mach Intell 2023;45:12581-600. [Crossref] [PubMed]

