Radiomics based on 18F-FDG PET for predicting treatment response and prognosis in newly diagnosed diffuse large B-cell lymphoma patients: do lesion selection and segmentation methods matter?
Introduction
Diffuse large B-cell lymphoma (DLBCL) is the most common histologic subtype of non-Hodgkin lymphoma (NHL), accounting for approximately 30–40% of NHLs (1). Rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) therapy is the first-line regimen for treating DLBCL (2). However, up to 15% of patients experience relapse or are refractory to first-line therapy, with a median survival not exceeding one year (3). Ideally, these high-risk patients should be identified prior to receiving therapy. Over the past two decades, the International Prognostic Index (IPI) has been recognized as a prognostic model (4). However, it is difficult to predict refractory disease, which might be due to its lack of information on intratumoral functional and metabolic profiles (5,6). Accurate treatment outcome prediction is still a significant clinical challenge.
18F-fluorodeoxyglucose (18F-FDG) positron emission tomography/computed tomography (PET/CT) has become an established imaging modality for patients with DLBCL. Several studies have reported that metabolic tumor volume (MTV) and total lesion glycolysis (TLG) play important predictive roles in patients with DLBCL (7-9). However, these parameters do not reflect tumor heterogeneity, which ultimately contributes to treatment resistance and poor prognosis (10). Quantifying heterogeneity within a tumor can be achieved through radiomics analysis of PET images (11).
Radiomics refers to the extraction of large volumes of quantitative data from medical images to build predictive models (12). Machine learning is a branch of artificial intelligence that is based on the development and training of algorithms, in which computers learn from the data and perform predictions without previous specific programming (13). Radiomics-based machine learning has been applied for differential diagnosis, histological classification, treatment response and prognostic prediction in a variety of tumors (14-17), including DLBCL (18-20). Currently, radiomics analyses in DLBCL are based on predefined tumor segmentations, but the best cutoff is still a matter of debate. For example, the 41% maximum standardized uptake value (SUVmax) has been validated by Sasanelli et al. as prognostic in DLBCL (21). Nevertheless, Ilyas et al. reported that an SUV 2.5 achieved the best interobserver agreement and was easiest to apply (22). In addition, published reports have used different methods to measure radiomics features: some studies have used the hottest lesion (7), whereas others have used the largest lesion (23,24) or tumor segmentations at the patient level (25). Eertink et al. demonstrated that radiomics features at the patient level are more predictive than those of the hottest or largest lesion (26). However, another study reported that there were no significant differences between models based on different lesion selection approaches (27).
Therefore, the purpose of this study was to assess the effects of lesion selection and segmentation methods on the predictive power of baseline 18F-FDG PET radiomics features in DLBCL patients for treatment response and prognosis via machine learning techniques. Additionally, we investigated the potential value of adding radiomics features to the clinical features. We present this article in accordance with the TRIPOD reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-24-585/rc).
Methods
Study populations
The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Committee of West China Hospital (No. 2023-954), and informed consent was waived because this was a retrospective study. In this study, patients with lymphoma who underwent baseline 18F-FDG PET/CT examination at West China Hospital between January 2015 and December 2021 were selected. The inclusion criteria were as follows: (I) pathologically confirmed DLBCL; (II) treatment with R-CHOP or rituximab, etoposide, prednisone, vincristine, cyclophosphamide, and doxorubicin (R-EPOCH) for 6–8 cycles; and (III) 18F-FDG PET/CT examination at the end of treatment to assess the treatment response. The exclusion criteria were as follows: (I) incomplete clinical information or imaging data; (II) coexistent central nervous system lymphoma or other malignancies; and (III) volume of interest (VOI) voxels less than 64 after image resampling. Patients whose follow-up time was less than 2 years or who were lost to follow-up were excluded from prognosis prediction. The workflow of patient selection is shown in Figure 1.
Clinical variables, including sex, age, Ann Arbor stage, serum lactate dehydrogenase (LDH) level, B symptoms, Eastern Cooperative Oncology Group performance status, extranodal involvement, bulky disease, histological subtypes, and the IPI index, were recorded for each patient.
Treatment response and follow-up evaluation
The treatment response was assessed according to the Lugano response criteria (28). On the basis of the criterion, patients were divided into two groups [complete regression (CR) with a score of 1–3 or non-CR with a score of 4–5]. The follow-up data were obtained through electronic medical records and telephone interviews. The primary endpoint for assessing the prognosis was defined as 2-year event-free survival (EFS), which was defined as whether patients experienced relapse, progression, or death within the two-year time frame.
18F-FDG PET/CT image acquisition
18F-FDG PET/CT scanning was performed as previously described (29). Briefly, whole-body PET/CT images were acquired from the same integrated PET/CT scanner (Gemini gxl16, Philips, the Netherlands). All patients fasted for at least 6 hours before intravenous injection of 18F-FDG (5.18 MBq/kg). Blood glucose levels were less than 11 mmol/L in all individuals. The CT scan parameters were 120 kV, 40 mAs, 5.0 mm slice thickness and 512×512 matrices. The PET scan parameters were 60±5 min after tracer administration and 2.5 min per bed position. We used the acquired CT data to perform attenuation correction on all the PET images. To match the quality control criteria, the mean hepatic SUV should be between 1.3 and 3.0. All procedures were conducted in accordance with the European Association of Nuclear Medicine (EANM) guidelines (30).
VOI drawing and feature extraction
Local image features extraction (LIFEx) software (31) was used to generate the VOI. The hottest lesion, largest lesion and lesions at the patient level were chosen as the targets for radiomics features extraction. For the lesions at the patient level, all segmented lesions were aggregated by assigning all voxels within the individual lesions to one and all voxels outside any of the segmented individual lesions to zero. Furthermore, manual segmentation and four frequently used semiautomatic segmentation methods, including SUV2.5, SUV4.0, 25% SUVmax and 41% SUVmax, were applied to delineate lesions (Figure 2). Two physicians manually adjusted the VOI to ensure that the measurement was reliable. If there was a discrepancy, the VOIs were reviewed and determined by a senior nuclear medical scientist.
Feature extraction followed the Image Biomarker Standardization Initiative (IBSI) reporting guidelines (32). Radiomics features were extracted from the PET images via open-source LIFEx. We did not extract the radiomics features of CT, as the CT component of the PET-CT scans was performed as low-dose noncontrast scans, in accordance with usual clinical practice. The spatial resampling size was a 1 mm × 1 mm × 1 mm voxel size. The intensity discretization for the PET data was processed with a fixed bin count of 64 and absolute scale bounds between 0 and 20 (33). After preprocessing, 112 radiomics features at the hottest lesion, largest lesion and patient levels were extracted from the PET images. For patient-level feature extraction, all segmented lesions were aggregated into one VOI, and then radiomics features were extracted from the aggregated VOI. The extracted radiomics features include conventional imaging parameters, morphology, intensity, histogram, gray-level co-occurrence matrix (GLCM), gray-level run-length matrix (GLRLM), neighborhood gray-tone difference matrix (NGTDM), and gray-level size zone matrix (GLSZM). The full list of characteristics is provided in Table S1.
Feature selection and model construction
Multivariate logistic regression was used to identify potential independent clinical predictors of treatment response and patient prognosis. Clinical features with statistical significance in multivariate analysis were used to establish clinical characteristic models. The random forest (RF) method was used to screen radiomics features to reduce the spatial dimension of radiomics features, and the extreme gradient boosting (XGboost) machine learning classifier was used for radiomics model construction. Compared with advanced machine learning methods such as RF and XGboost, least absolute shrinkage and selection operator (LASSO) logistic regression is simpler and easier to interpret. To evaluate whether advanced machine learning methods can improve the performance of prediction models, LASSO (λ=1,000) logistic regression was also used to construct the simple radiomics models, which revealed that the predictive performance of simple models was lower than that of complex models (Table S2).
A total of six types of treatment response prediction models and seven types of prognosis prediction models were developed in this study (Table S3): (I) Model 1, clinical model; (II) Model 2, MTV at the patient level; (III) Model 3, radiomics features at the patient level; (IV) Model 4, radiomics features for the hottest lesion; (V) Model 5, radiomics features for the largest lesion; (VI) Model 6, combination of the clinical predictors and radiomics features; (VII) Model 7, IPI.
Statistical analysis
We used IBM SPSS Statistics (version 27.0, IBM Corp) and Python software (version 3.8) to perform the statistical analyses. The samples were randomly divided into a training set and a validation set at a ratio of 7:3. The difference in the related clinical information between the training and validation cohorts was assessed using via χ2 tests or Mann-Whitney U tests, as appropriate. When multiple testing variables were involved, the Bonferroni correction was used to control for type I error, and the adjusted P value was eventually showed. Missing values were imputed by median or mode. To correct for imbalance in patients who achieved CR and without CR, oversampling of patients with non-CR was applied in each training set. Synthetic samples were generated with interpolated feature values via SMOTE, as implemented in the scikit-learn package in python. The performance of all the models was evaluated via the area under the curve (AUC), which was compared via the Delong test (34). Diagnostic performance was assessed via sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV). When the P value was less than 0.05, the result was considered statistically significant. Additionally, the relative importance of individual radiomics features was determined via the feature reduction method, which yielded the highest predictive value.
Results
Patient characteristics
The study included a total of 522 patients, comprising 266 males and 256 females, who were enrolled to evaluate the precision of the treatment response. The median age of the patients included in the study was 58 years. Among these 522 patients, a significant proportion, 419 (80.3%), achieved CR postchemotherapy. A majority of the patients were at advanced stages according to Ann Arbor. Additionally, these patients had normal LDH levels, showed no B symptoms, and no bulky disease, although extranodal involvement was observed. No statistically significant difference was observed between the training and validation cohorts (Table 1).
Table 1
Characteristics | Treatment response | Prognosis | |||||
---|---|---|---|---|---|---|---|
Training cohort (n=365) | Validation cohort (n=157) | P | Training cohort (n=267) | Validation cohort (n=115) | P | ||
Age (years) | 0.747 | 0.965 | |||||
≤60 | 210 (57.5%) | 88 (56.1%) | 151 (56.6%) | 60 (52.2%) | |||
>60 | 155 (42.5%) | 69 (43.9%) | 116 (43.4%) | 55 (47.8%) | |||
Gender | 0.181 | 0.461 | |||||
Male | 193 (52.9%) | 73 (46.5%) | 141(52.8%) | 56 (48.7%) | |||
Female | 172 (47.1%) | 84 (53.5%) | 126 (47.2%) | 59 (51.3%) | |||
Ann Arbor stage | 0.252 | 0.185 | |||||
I/II | 178 (48.8%) | 68 (43.3%) | 115 (43.1%) | 58 (50.4%) | |||
III/IV | 187 (51.2%) | 89 (56.7%) | 152 (56.9%) | 57 (49.6%) | |||
LDH | 0.248 | 0.894 | |||||
Normal | 206 (56.4%) | 80 (51.0%) | 135 (50.6%) | 56 (48.7%) | |||
Elevated (>220 U/L) | 159 (43.6%) | 77 (49.0%) | 132 (49.4%) | 59 (51.3%) | |||
B symptoms | 0.357 | 0.800 | |||||
Yes | 105 (28.8%) | 39 (24.8%) | 73 (27.3%) | 31 (27.0%) | |||
No | 260 (71.2%) | 118 (75.2%) | 194 (72.7%) | 84 (73.0%) | |||
ECOG PS | 0.552 | 0.912 | |||||
≤1 | 326 (89.3%) | 146 (93.0%) | 241 (90.3%) | 105 (91.3%) | |||
>1 | 39 (10.7%) | 11 (7.0%) | 26 (9.7%) | 10 (8.7%) | |||
Extranodal involvement | 0.110 | 0.877 | |||||
<1 | 128 (35.1%) | 37 (23.6%) | 67 (25.1%) | 28 (24.3%) | |||
≥1 | 237 (64.9%) | 120 (76.4%) | 200 (74.9%) | 87 (75.7%) | |||
Bulky disease | 0.284 | 0.629 | |||||
Yes | 33 (9.0%) | 19 (12.1%) | 28 (10.5%) | 14 (12.2%) | |||
No | 332 (91.0%) | 138 (87.9%) | 239 (89.5%) | 101 (87.8%) | |||
Pathological type | 0.248 | 0.952 | |||||
GCB | 94 (25.8%) | 33 (21.0%) | 55 (20.6%) | 25 (21.7%) | |||
Non-GCB | 271 (74.2%) | 124 (79.0%) | 212 (79.4%) | 90 (78.3%) | |||
Treatment response/prognosis | 0.232 | 0.767 | |||||
CR/non-event | 288 (78.9%) | 131 (83.4%) | 184 (68.9%) | 81 (70.4%) | |||
Non-CR/event | 77 (21.1%) | 26 (16.6%) | 83 (31.1%) | 34 (29.6%) | |||
IPI score | 0.233 | ||||||
0–1 | – | – | – | 97 (36.3%) | 45 (39.1%) | ||
2 | – | – | – | 69 (25.8%) | 32 (27.8%) | ||
3 | – | – | – | 57 (21.3%) | 23 (20.1%) | ||
4–5 | – | – | – | 44 (16.6%) | 15 (13.0%) |
LDH, lactate dehydrogenase; ECOG PS, Eastern Cooperative Oncology Group Performance Status; GCB, germinal centre B cell; CR, complete response; IPI, International Prognostic Index.
In total, 382 patients, including 197 males and 185 females, were enrolled in this study for prognosis precision. The median age of the study patients was 58 years. The median follow-up period was 40 months for the entire study population. A total of 117 patients out of 382 (31%) experienced relapse, progression, or death within 2 years. The majority of the included patients were in advanced Ann Arbor stages and had a low-to-moderate IPI score, normal LDH, no B symptoms, and no bulky disease but extranodal involvement. No statistically significant difference was observed between the training and validation cohorts (Table 1).
Clinical models
According to the results of the multivariate logistic regression analysis, age (P<0.001) and Ann Arbor stage (P<0.001) were significantly related to treatment response and together yielded an AUC of 0.622 (95% CI: 0.562–0.682). For prognosis prediction, multivariate logistic regression analysis revealed that the Ann Arbor stage (P<0.001) was an independent predictor of prognosis, with an AUC of 0.636 (95% CI: 0.569–0.703). The IPI yielded an AUC of 0.623 (95% CI: 0.579–0.667). The results of the logistic regression analyses are listed in Table S4 and Table S5, respectively.
Total MTV (TMTV) analysis
The median TMTV was 200 mL for patients who achieved CR and 388 mL for patients who did not achieve CR via manual segmentation. Similarly, when manual segmentation was used, the median TMTV was 676 mL for patients who experienced events within 2 years, where it was 255 mL for patients who did not. In the validation cohort, TMTV models utilizing the SUV4.0 segmentation method yielded the highest AUC in predicting treatment response. Conversely, TMTV models utilizing manual segmentation resulted in the highest AUC for prognosis prediction. However, no significant difference was observed in the AUC among the models (all P>0.05) (Table S6).
Comparison of VOI segmentation methods
The AUCs of the radiomics models developed by different segmentation methods are shown in Figure 3. For the hottest lesion, the 41% SUVmax had the highest AUC (0.704, 95% CI: 0.652–0.756) for treatment response prediction and the manual segmentation method had the highest AUC (0.636, 95% CI: 0.571–0.701) for prognosis prediction (Table 2); however, there was no significant difference in the AUC among the models (all P>0.05). With respect to the different segmentation methods, the features selected by the RF-XGboost model vary between 76 and 79 (Figure 4). For all the segmentation methods, the ten most important features were always pertaining to texture- or intensity-based (Tables S7,S8). For the LASSO-logistic models, the selected features vary between 1 and 24 on the basis of different segmentation methods (Table S9).
Table 2
Segmentation | Sensitivity | Specificity | Accuracy | PPV | NPV | AUC* |
---|---|---|---|---|---|---|
Treatment response | ||||||
SUV 2.5 | 0.615 | 0.600 | 0.614 | 0.611 | 0.617 | 0.642 |
SUV 4.0 | 0.561 | 0.571 | 0.559 | 0.647 | 0.471 | 0.618 |
25% SUVmax | 0.674 | 0.689 | 0.672 | 0.735 | 0.606 | 0.638 |
41% SUVmax | 0.555 | 0.600 | 0.551 | 0.548 | 0.552 | 0.704 |
Manual | 0.664 | 0.724 | 0.648 | 0.757 | 0.553 | 0.695 |
Prognosis | ||||||
SUV 2.5 | 0.617 | 0.396 | 0.596 | 0.582 | 0.386 | 0.584 |
SUV 4.0 | 0.611 | 0.452 | 0.611 | 0.590 | 0.304 | 0.612 |
25% SUVmax | 0.587 | 0.342 | 0.587 | 0.581 | 0.440 | 0.626 |
41% SUVmax | 0.604 | 0.458 | 0.604 | 0.583 | 0.340 | 0.594 |
Manual | 0.579 | 0.485 | 0.579 | 0.550 | 0.268 | 0.636 |
*, AUC in the validation cohort. PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; SUV, standardized uptake value; SUVmax, maximum standardized uptake value.
For the largest lesion, the manual segmentation method had the highest AUC (0.666, 95% CI: 0.620–0.712) for treatment response prediction, and the 25% SUVmax had the highest AUC (0.705, 95% CI: 0.673–0.737) for prognosis prediction (Table 3); however, there was no significant difference in the AUC among the models (all P>0.05). With respect to the different segmentation methods, the features selected by the RF-XGboost model vary between 73 and 79 (Figure 5). For all the segmentation methods, the ten most important features were always pertain to texture (Tables S10,S11). For the LASSO-logistic models, the selected features vary between 1 and 9 on the basis of different segmentation methods.
Table 3
Segmentation | Sensitivity | Specificity | Accuracy | PPV | NPV | AUC* |
---|---|---|---|---|---|---|
Treatment response | ||||||
SUV 2.5 | 0.563 | 0.583 | 0.563 | 0.722 | 0.412 | 0.616 |
SUV 4.0 | 0.559 | 0.558 | 0.559 | 0.558 | 0.558 | 0.626 |
25% SUVmax | 0.587 | 0.615 | 0.580 | 0.705 | 0.457 | 0.665 |
41% SUVmax | 0.604 | 0.648 | 0.603 | 0.567 | 0.632 | 0.657 |
Manual | 0.624 | 0.667 | 0.620 | 0.625 | 0.615 | 0.666 |
Prognosis | ||||||
SUV 2.5 | 0.611 | 0.444 | 0.633 | 0.601 | 0.294 | 0.633 |
SUV 4.0 | 0.613 | 0.598 | 0.616 | 0.600 | 0.410 | 0.667 |
25% SUVmax | 0.622 | 0.603 | 0.616 | 0.608 | 0.418 | 0.705 |
41% SUVmax | 0.623 | 0.482 | 0.623 | 0.606 | 0.408 | 0.700 |
Manual | 0.660 | 0.589 | 0.627 | 0.601 | 0.401 | 0.652 |
*, AUC in the validation cohort. PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; SUV, standardized uptake value; SUVmax, maximum standardized uptake value.
At the patient level, SUV4.0 had the highest AUC (0.768, 95% CI: 0.709–0.827) for treatment response prediction and 41% of the SUVmax values had the highest AUC (0.699, 95% CI: 0.636–0.762) for prognosis prediction (Table 4); however, there was no significant difference in the AUC among the models (all P>0.05). With respect to the different segmentation methods, the number of features selected by the RF-XGboost model varies between 74 and 83 (Figure 6). For all the segmentation methods, the ten most important features were always pertain to texture (Tables S12,S13). For the LASSO-logistic models, the selected features vary between 1 and 13 on the basis of different segmentation methods.
Table 4
Segmentation | Sensitivity | Specificity | Accuracy | PPV | NPV | AUC* |
---|---|---|---|---|---|---|
Treatment response | ||||||
SUV 2.5 | 0.632 | 0.677 | 0.623 | 0.688 | 0.567 | 0.656 |
SUV 4.0 | 0.690 | 0.643 | 0.691 | 0.743 | 0.62 | 0.768 |
25% SUVmax | 0.568 | 0.611 | 0.565 | 0.548 | 0.578 | 0.652 |
41% SUVmax | 0.697 | 0.588 | 0.681 | 0.658 | 0.714 | 0.744 |
Manual | 0.564 | 0.548 | 0.565 | 0.611 | 0.515 | 0.606 |
Prognosis | ||||||
SUV 2.5 | 0.612 | 0.440 | 0.627 | 0.592 | 0.265 | 0.617 |
SUV 4.0 | 0.591 | 0.405 | 0.595 | 0.586 | 0.341 | 0.652 |
25% SUVmax | 0.633 | 0.412 | 0.620 | 0.602 | 0.311 | 0.583 |
41% SUVmax | 0.641 | 0.422 | 0.627 | 0.607 | 0.360 | 0.699 |
Manual | 0.633 | 0.554 | 0.626 | 0.602 | 0.300 | 0.645 |
*, AUC in the validation cohort. PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; SUV, standardized uptake value; SUVmax, maximum standardized uptake value.
Comparison of lesion selection approaches
The AUCs of the radiomics models constructed on the basis of a single lesion or patient level are shown in Figure 3. For the segmentation methods of SUV2.5, SUV4.0, and 41% SUVmax, the highest AUCs were observed for the treatment response prediction models constructed at the patient level. In contrast, when the manual segmentation method was used, the prediction model built on the hottest lesion achieved the highest AUC. When the segmentation method with a 25% SUVmax was considered, the prediction model constructed on the largest lesion yielded the highest AUC. Overall, the AUCs were greater for the treatment response prediction models constructed at the patient level, although there was no significant difference in the AUCs among the models (all P>0.05). For prognosis prediction, regardless of the segmentation method, the AUCs were the highest for the models constructed on the basis of the largest lesion, but there was no significant difference in the AUC among the models (all P>0.05).
Added value of radiomics features
The AUCs and diagnostic measurements of the predictive models are presented in Table 5. For predicting treatment response, Model 2 and Model 3 utilized the SUV4.0 segmentation method, with Model 4 employed the 41% SUVmax segmentation method, and Model 5 used the manual segmentation. Model 6 is a composite, amalgamating models 1 and 5. Compared with any other model, the combined model had the highest discriminative power (all P<0.05) (Figure 7). In addition, the sensitivity, specificity, PPV and NPV of the combined model were also greater than those of the best clinical model.
Table 5
Segmentation | Sensitivity | Specificity | Accuracy | PPV | NPV | AUC* |
---|---|---|---|---|---|---|
Treatment response | ||||||
Clinical (Model 1) | 0.669 | 0.669 | 0.669 | 0.557 | 0.765 | 0.622 |
TMTVSUV4.0 (Model 2) | 0.686 | 0.686 | 0.686 | 0.696 | 0.675 | 0.755 |
Radiomicpatient-SUV4.0 (Model 3) | 0.674 | 0.689 | 0.672 | 0.735 | 0.606 | 0.704 |
Radiomichottest-41%SUVmax (Model 4) | 0.624 | 0.667 | 0.620 | 0.625 | 0.615 | 0.666 |
Radiomiclargest-manual (Model 5) | 0.690 | 0.643 | 0.691 | 0.743 | 0.620 | 0.768 |
Model 1+5 (Model 6) | 0.833 | 0.843 | 0.831 | 0.821 | 0.866 | 0.908 |
Prognosis | ||||||
Clinical (Model 1) | 0.603 | 0.320 | 0.603 | 0.592 | 0.334 | 0.636 |
TMTVmanual (Model 2) | 0.627 | 0.534 | 0.627 | 0.619 | 0.390 | 0.610 |
Radiomicpatient-41%SUVmax (Model 3) | 0.579 | 0.485 | 0.579 | 0.550 | 0.268 | 0.636 |
Radiomichottest-manual (Model 4) | 0.622 | 0.603 | 0.616 | 0.608 | 0.418 | 0.705 |
Radiomiclargest-25%SUVmax (Model 5) | 0.641 | 0.422 | 0.627 | 0.607 | 0.360 | 0.653 |
Model 1+4 (Model 6) | 0.733 | 0.762 | 0.729 | 0.623 | 0.716 | 0.837 |
IPI (Model 7) | 0.611 | 0.491 | 0.611 | 0.654 | 0.542 | 0.623 |
*, AUC in the validation cohort. PPV, positive predictive value; NPV, negative predictive value; AUC, area under the curve; TMTV, total metabolic tumor volume; IPI, International Prognostic Index; SUV, standardized uptake value; SUVmax, maximum standardized uptake value.
For predicting prognosis, we chose different models representing various segmentation methods. Model 2 represents a manual segmentation method, whereas Model 4 also represents another manual segmentation technique. Model 3 typifies the 41% SUVmax segmentation method, and Model 5 correlates with the 25% SUVmax segmentation method. Finally, we amalgamate Model 1 and Model 4 to construct Model 6. The AUC of the combined model was the highest (all P<0.05) (Figure 7), and it had better discriminative power than the best clinical model (P=0.022) and the IPI model (P=0.027). In addition, the sensitivity, specificity, PPV and NPV of the combined model were greater than those of the best clinical model and the IPI model. Compared with that of the IPI model, the AUC of the best clinical model was greater, but the discriminative power between the two models was not significant (P=0.92).
Discussion
The principal revelation of this study is that the predictive capacity remains largely uninfluenced by the choice of lesion selection approach and VOI segmentation methodology. Nevertheless, substantial disparities exist in the radiomics feature values procured through the employment of various lesion selection approaches and VOI segmentation procedures. Furthermore, our investigation supports the premise that radiomics features increase value beyond what is customary with presently utilized clinical parameters.
Currently, there is no consensus on the best lesion selection or segmentation method for DLBCL 18F-FDG PET/CT studies. However, radiomics features can be influenced by different segmentation methods (24). In addition, some studies have calculated radiomics features at the patient level because of tumor heterogeneity (25,26), where others have calculated radiomics features only for the hottest lesion (7,12) or largest lesion (23,24), as texture features become difficult to interpret at the patient level. Therefore, studying the discriminative power of radiomics features in relation to lesion selection and segmentation methods is essential. In our study, the discriminative power was comparable among the lesion selection and segmentation approaches. These results are in line with those of previous DLBCL-related studies. Eertink et al. (27) reported that lesion selection approaches did not affect the ability of radiomics features to predict patient prognosis. However, they did not explore the impact of different segmentation methods. In a further study by Eertink et al. (35), 50 patients with progression or relapse within 2 years and 50 patients without progression were included, and the authors concluded that there was no substantial difference in the discriminative power of radiomics features among segmentation methods in DLBCL at the patient level and for the largest lesion. Similarly, another study revealed that lesion selection and segmentation methods do not affect the prognostic predictive ability of radiomics models for not only selecting all lesions and the largest lesion but also considering the hottest lesion. Compared with these studies, our design included a larger sample size from an Asian population. More importantly, our research revealed that the lesion selection and segmentation methods did not affect the predictive ability of radiomics features to predict treatment response. To our knowledge, no studies have assessed the influence of lesion selection and segmentation methods on PET radiomics features and their predictive power for treatment response in DLBCL patients. Early prediction of treatment response can provide more appropriate treatment options for patients. As the manual segmentation method for all lesions at the patient level is time-consuming, the semiautomated segmentation method for the hottest or largest lesion could be a feasible approach for treatment response and prognosis assessment in DLBCL patients.
A crucial discovery from our study suggested that a model integrating both radiomics and clinical features significantly enhanced the predictive value. This finding is consistent with a study (36) of treatment response prediction, indicating that the model accounting for radiomics features could provide additional predictive value to conventional clinical features in lymphoma. However, compared with the model (AUC =0.82) reported in their study, the combined model in our study had a greater AUC (0.908). For prognosis prediction, Jiang et al. (37) compared a combined model and a clinical model and reported that radiomics feature data extracted from PET images could help predict clinical outcomes in patients with DLBCL. Nevertheless, our study yielded a higher AUC value than their study did. Similarly, another study revealed that a hybrid nomogram (with an AUC of 0.781) combining the IPI and radiomics features had additional predictive ability compared with the IPI (4). By comparison, the combined model in our study had a greater AUC value for survival prediction (AUC =0.837). The underlying cause might be explained by the hypothesis that radiomics features can reflect the intratumoral metabolic heterogeneity (38,39), which is a treatment-responsive and prognostic determinant of patients (40-43). Since the complex nature and biological processes of malignancy involve multiple components, taking both clinical and imaging features into account may provide more comprehensive disease characterization and better prognostication.
Our study demonstrated that texture features were included in the top 10 of important features of all the models. Similarly, Aide et al. reported that nine textural features (out of 19) were univariately significant (23). Parvez et al. reported that 3 textural features significantly predict disease-free survival (12). Owing to the different features that have been applied, it is difficult to compare different studies directly, and there is currently no consensus on the application of radiomics features. However, texture features might be preferred for translation into the clinic as these features are easy to understand and are related to disease characteristics that can be easily recognized in PET images.
Recently, machine learning applications have received increasing attention from researchers. The key concept of machine learning is to produce accurate predictions on new unseen data after being trained on a finite learning dataset. Radiomics-based machine learning has been applied to a variety of tasks in solid and hematologic tumors (44-47). In this study, we used the RF feature selection method combined with the XGboost classifier to construct the model. RF considers a subset of features or predictive variables at each node to construct a series of decision trees (48). XGboost is a tree-based algorithm that uses a computationally efficient stochastic gradient descent algorithm to minimize error when new trees are added (49). To evaluate whether advanced machine learning methods can improve the performance of prediction models, LASSO logistic regression was also used to construct simple radiomics models. The results showed that the predictive performance of simple models was lower than that of complex models, which needs to be validated in prospective large sample studies.
There are several limitations in this study. First, owing to the retrospective nature of the study, the findings need to be further validated prospectively. Second, the current study included patients from a single institution, and the sample size was limited. Therefore, our results need to be further validated in multicenter studies involving a larger cohort of patients. In addition, the follow-up period of this study was relatively short, and long-term follow-up of the included cohort needs to be accomplished. Finally, protein expression and gene arrangement are acknowledged prognostic factors but were not evaluated in our study due to the unavailability of these data from all patients.
Conclusions
This study revealed negligible variances in the predictive performance of radiomics features that were extracted via different lesion selection strategies and VOI segmentation methods. However, noteworthy differences were observed in the actual values derived from the radiomics features, as well as the features chosen among various lesion selection strategies and segmentation methods. Furthermore, a combined model that incorporates both radiomics features and clinical risk factors may have potential in predicting patient response to treatment and prognosis in DLBCL patients.
Acknowledgments
Funding: This study was supported by
Footnote
Reporting Checklist: The authors have completed the TRIPOD reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-24-585/rc
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-24-585/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki (as revised in 2013). The study was approved by the Ethics Committee of West China Hospital (No. 2023-954), and informed consent was waived because this was a retrospective study.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Ritter Z, Papp L, Zámbó K, Tóth Z, Dezső D, Veres DS, Máthé D, Budán F, Karádi É, Balikó A, Pajor L, Szomor Á, Schmidt E, Alizadeh H. Two-Year Event-Free Survival Prediction in DLBCL Patients Based on In Vivo Radiomics and Clinical Parameters. Front Oncol 2022;12:820136. [Crossref] [PubMed]
- Santiago R, Ortiz Jimenez J, Forghani R, Muthukrishnan N, Del Corpo O, Karthigesu S, Haider MY, Reinhold C, Assouline S. CT-based radiomics model with machine learning for predicting primary treatment failure in diffuse large B-cell Lymphoma. Transl Oncol 2021;14:101188. [Crossref] [PubMed]
- Crump M, Neelapu SS, Farooq U, Van Den Neste E, Kuruvilla J, Westin J, Link BK, Hay A, Cerhan JR, Zhu L, Boussetta S, Feng L, Maurer MJ, Navale L, Wiezorek J, Go WY, Gisselbrecht C. Outcomes in refractory diffuse large B-cell lymphoma: results from the international SCHOLAR-1 study. Blood 2017;130:1800-8. [Crossref] [PubMed]
- Zhang X, Chen L, Jiang H, He X, Feng L, Ni M, Ma M, Wang J, Zhang T, Wu S, Zhou R, Jin C, Zhang K, Qian W, Chen Z, Zhuo C, Zhang H, Tian M. A novel analytic approach for outcome prediction in diffuse large B-cell lymphoma by [18F]FDG PET/CT. Eur J Nucl Med Mol Imaging 2022;49:1298-310.
- Wight JC, Chong G, Grigg AP, Hawkes EA. Prognostication of diffuse large B-cell lymphoma in the molecular era: moving beyond the IPI. Blood Rev 2018;32:400-15. [Crossref] [PubMed]
- Yim SK, Yhim HY, Han YH, Jeon SY, Lee NR, Song EK, Jeong HJ, Kim HS, Kwak JY. Early risk stratification for diffuse large B-cell lymphoma integrating interim Deauville score and International Prognostic Index. Ann Hematol 2019;98:2739-48. [Crossref] [PubMed]
- Ceriani L, Gritti G, Cascione L, Pirosa MC, Polino A, Ruberto T, Stathis A, Bruno A, Moccia AA, Giovanella L, Hayoz S, Schär S, Dirnhofer S, Rambaldi A, Martinelli G, Mamot C, Zucca E. SAKK38/07 study: integration of baseline metabolic heterogeneity and metabolic tumor volume in DLBCL prognostic model. Blood Adv 2020;4:1082-92. [Crossref] [PubMed]
- Chang CC, Cho SF, Chuang YW, Lin CY, Chang SM, Hsu WL, Huang YF. Prognostic significance of total metabolic tumor volume on (18)F-fluorodeoxyglucose positron emission tomography/ computed tomography in patients with diffuse large B-cell lymphoma receiving rituximab-containing chemotherapy. Oncotarget 2017;8:99587-600. [Crossref] [PubMed]
- Huang H, Xiao F, Han X, Zhong L, Zhong H, Xu L, Zhu J, Ni B, Liu J, Fang Y, Zhang M, Shen L, Wang T, Liu J, Shi Y, Chen Y, Zheng L, Liu Q, Chen F, Wang J. Correlation of pretreatment 18F-FDG uptake with clinicopathological factors and prognosis in patients with newly diagnosed diffuse large B-cell lymphoma. Nucl Med Commun 2016;37:689-98. [Crossref] [PubMed]
- Dagogo-Jack I, Shaw AT. Tumour heterogeneity and resistance to cancer therapies. Nat Rev Clin Oncol 2018;15:81-94. [Crossref] [PubMed]
- Lambin P, Rios-Velazquez E, Leijenaar R, Carvalho S, van Stiphout RG, Granton P, Zegers CM, Gillies R, Boellard R, Dekker A, Aerts HJ. Radiomics: extracting more information from medical images using advanced feature analysis. Eur J Cancer 2012;48:441-6. [Crossref] [PubMed]
- Parvez A, Tau N, Hussey D, Maganti M, Metser U. (18)F-FDG PET/CT metabolic tumor parameters and radiomics features in aggressive non-Hodgkin's lymphoma as predictors of treatment outcome and survival. Ann Nucl Med 2018;32:410-6. [Crossref] [PubMed]
- Manco L, Maffei N, Strolin S, Vichi S, Bottazzi L, Strigari L. Basic of machine learning and deep learning in imaging for medical physicists. Phys Med 2021;83:194-205. [Crossref] [PubMed]
- Toyama Y, Hotta M, Motoi F, Takanami K, Minamimoto R, Takase K. Prognostic value of FDG-PET radiomics with machine learning in pancreatic cancer. Sci Rep 2020;10:17024. [Crossref] [PubMed]
- Nasief H, Zheng C, Schott D, Hall W, Tsai S, Erickson B, Allen Li X. A machine learning based delta-radiomics process for early prediction of treatment response of pancreatic cancer. NPJ Precis Oncol 2019;3:25. [Crossref] [PubMed]
- Agner SC, Rosen MA, Englander S, Tomaszewski JE, Feldman MD, Zhang P, Mies C, Schnall MD, Madabhushi A. Computerized image analysis for identifying triple-negative breast cancers and differentiating them from other molecular subtypes of breast cancer on dynamic contrast-enhanced MR images: a feasibility study. Radiology 2014;272:91-9. [Crossref] [PubMed]
- Zheng J, Kong J, Wu S, Li Y, Cai J, Yu H, Xie W, Qin H, Wu Z, Huang J, Lin T. Development of a noninvasive tool to preoperatively evaluate the muscular invasiveness of bladder cancer using a radiomics approach. Cancer 2019;125:4388-98. [Crossref] [PubMed]
- Zhou Y, Ma XL, Pu LT, Zhou RF, Ou XJ, Tian R. Prediction of Overall Survival and Progression-Free Survival by the (18)F-FDG PET/CT Radiomic Features in Patients with Primary Gastric Diffuse Large B-Cell Lymphoma. Contrast Media Mol Imaging 2019;2019:5963607. [Crossref] [PubMed]
- Lue KH, Wu YF, Lin HH, Hsieh TC, Liu SH, Chan SC, Chen YH. Prognostic Value of Baseline Radiomic Features of (18)F-FDG PET in Patients with Diffuse Large B-Cell Lymphoma. Diagnostics (Basel) 2020;11:36. [Crossref] [PubMed]
- Frood R, Clark M, Burton C, Tsoumpas C, Frangi AF, Gleeson F, Patel C, Scarsbrook AF. Discovery of Pre-Treatment FDG PET/CT-Derived Radiomics-Based Models for Predicting Outcome in Diffuse Large B-Cell Lymphoma. Cancers (Basel) 2022;14:1711. [Crossref] [PubMed]
- Sasanelli M, Meignan M, Haioun C, Berriolo-Riedinger A, Casasnovas RO, Biggi A, Gallamini A, Siegel BA, Cashen AF, Véra P, Tilly H, Versari A, Itti E. Pretherapy metabolic tumour volume is an independent predictor of outcome in patients with diffuse large B-cell lymphoma. Eur J Nucl Med Mol Imaging 2014;41:2017-22. [Crossref] [PubMed]
- Ilyas H, Mikhaeel NG, Dunn JT, Rahman F, Møller H, Smith D, Barrington SF. Defining the optimal method for measuring baseline metabolic tumour volume in diffuse large B cell lymphoma. Eur J Nucl Med Mol Imaging 2018;45:1142-54. [Crossref] [PubMed]
- Aide N, Fruchart C, Nganoa C, Gac AC, Lasnon C. Baseline (18)F-FDG PET radiomic features as predictors of 2-year event-free survival in diffuse large B cell lymphomas treated with immunochemotherapy. Eur Radiol 2020;30:4623-32. [Crossref] [PubMed]
- Senjo H, Hirata K, Izumiyama K, Minauchi K, Tsukamoto E, Itoh K, Kanaya M, Mori A, Ota S, Hashimoto D, Teshima TNorth Japan Hematology Study Group. High metabolic heterogeneity on baseline 18FDG-PET/CT scan as a poor prognostic factor for newly diagnosed diffuse large B-cell lymphoma. Blood Adv 2020;4:2286-96. [Crossref] [PubMed]
- Cottereau AS, Nioche C, Dirand AS, Clerc J, Morschhauser F, Casasnovas O, Meignan M, Buvat I. (18)F-FDG PET Dissemination Features in Diffuse Large B-Cell Lymphoma Are Predictive of Outcome. J Nucl Med 2020;61:40-5. [Crossref] [PubMed]
- Eertink JJ, van de Brug T, Wiegers SE, Zwezerijnen GJC, Pfaehler EAG, Lugtenburg PJ, van der Holt B, de Vet HCW, Hoekstra OS, Boellaard R, Zijlstra JM. (18)F-FDG PET baseline radiomics features improve the prediction of treatment outcome in diffuse large B-cell lymphoma. Eur J Nucl Med Mol Imaging 2022;49:932-42. [Crossref] [PubMed]
- Eertink JJ, Zwezerijnen GJC, Cysouw MCF, Wiegers SE, Pfaehler EAG, Lugtenburg PJ, van der Holt B, Hoekstra OS, de Vet HCW, Zijlstra JM, Boellaard R. Comparing lesion and feature selections to predict progression in newly diagnosed DLBCL patients with FDG PET/CT radiomics features. Eur J Nucl Med Mol Imaging 2022;49:4642-51. [Crossref] [PubMed]
- Cheson BD, Fisher RI, Barrington SF, Cavalli F, Schwartz LH, Zucca E, et al. Recommendations for initial evaluation, staging, and response assessment of Hodgkin and non-Hodgkin lymphoma: the Lugano classification. J Clin Oncol 2014;32:3059-68. [Crossref] [PubMed]
- Ou X, Wang J, Zhou R, Zhu S, Pang F, Zhou Y, Tian R, Ma X. Ability of (18)F-FDG PET/CT Radiomic Features to Distinguish Breast Carcinoma from Breast Lymphoma. Contrast Media Mol Imaging 2019;2019:4507694. [Crossref] [PubMed]
- Boellaard R, Delgado-Bolton R, Oyen WJ, Giammarile F, Tatsch K, Eschner W, et al. FDG PET/CT: EANM procedure guidelines for tumour imaging: version 2.0. Eur J Nucl Med Mol Imaging 2015;42:328-54. [Crossref] [PubMed]
- Nioche C, Orlhac F, Boughdad S, Reuzé S, Goya-Outi J, Robert C, Pellot-Barakat C, Soussan M, Frouin F, Buvat I. LIFEx: A Freeware for Radiomic Feature Calculation in Multimodality Imaging to Accelerate Advances in the Characterization of Tumor Heterogeneity. Cancer Res 2018;78:4786-9. [Crossref] [PubMed]
- Zwanenburg A, Vallières M, Abdalah MA, Aerts HJWL, Andrearczyk V, Apte A, et al. The Image Biomarker Standardization Initiative: Standardized Quantitative Radiomics for High-Throughput Image-based Phenotyping. Radiology 2020;295:328-38. [Crossref] [PubMed]
- Tixier F, Le Rest CC, Hatt M, Albarghach N, Pradier O, Metges JP, Corcos L, Visvikis D. Intratumor heterogeneity characterized by textural features on baseline 18F-FDG PET images predicts response to concomitant radiochemotherapy in esophageal cancer. J Nucl Med 2011;52:369-78. [Crossref] [PubMed]
- DeLong ER, DeLong DM, Clarke-Pearson DL. Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics 1988;44:837-45.
- Eertink JJ, Pfaehler EAG, Wiegers SE. van T, Brug D, Lugtenburg PJ, Hoekstra OS, Zijlstra JM, de Vet HCW, Boellaard R. Quantitative Radiomics Features in Diffuse Large B-Cell Lymphoma: Does Segmentation Method Matter? J Nucl Med 2022;63:389-95. [Crossref] [PubMed]
- Ben Bouallègue F, Tabaa YA, Kafrouni M, Cartron G, Vauchot F, Mariano-Goulart D. Association between textural and morphological tumor indices on baseline PET-CT and early metabolic response on interim PET-CT in bulky malignant lymphomas. Med Phys 2017;44:4608-19. [Crossref] [PubMed]
- Jiang C, Li A, Teng Y, Huang X, Ding C, Chen J, Xu J, Zhou Z. Optimal PET-based radiomic signature construction based on the cross-combination method for predicting the survival of patients with diffuse large B-cell lymphoma. Eur J Nucl Med Mol Imaging 2022;49:2902-16. [Crossref] [PubMed]
- Moon SH, Kim J, Joung JG, Cha H, Park WY, Ahn JS, Ahn MJ, Park K, Choi JY, Lee KH, Kim BT, Lee SH. Correlations between metabolic texture features, genetic heterogeneity, and mutation burden in patients with lung cancer. Eur J Nucl Med Mol Imaging 2019;46:446-54. [Crossref] [PubMed]
- Sala E, Mema E, Himoto Y, Veeraraghavan H, Brenton JD, Snyder A, Weigelt B, Vargas HA. Unravelling tumour heterogeneity using next-generation imaging: radiomics, radiogenomics, and habitat imaging. Clin Radiol 2017;72:3-10. [Crossref] [PubMed]
- Lue KH, Wu YF, Liu SH, Hsieh TC, Chuang KS, Lin HH, Chen YH. Intratumor Heterogeneity Assessed by (18)F-FDG PET/CT Predicts Treatment Response and Survival Outcomes in Patients with Hodgkin Lymphoma. Acad Radiol 2020;27:e183-92. [Crossref] [PubMed]
- Milgrom SA, Elhalawani H, Lee J, Wang Q, Mohamed ASR, Dabaja BS, et al. A PET Radiomics Model to Predict Refractory Mediastinal Hodgkin Lymphoma. Sci Rep 2019;9:1322. [Crossref] [PubMed]
- Jiang C, Huang X, Li A, Teng Y, Ding C, Chen J, Xu J, Zhou Z. Radiomics signature from [18F]FDG PET images for prognosis predication of primary gastrointestinal diffuse large B cell lymphoma. Eur Radiol 2022;32:5730-41.
- Zhou Y, Zhu Y, Chen Z, Li J, Sang S, Deng S. Radiomic Features of (18)F-FDG PET in Hodgkin Lymphoma Are Predictive of Outcomes. Contrast Media Mol Imaging 2021;2021:6347404. [Crossref] [PubMed]
- Bonekamp D, Kohl S, Wiesenfarth M, Schelb P, Radtke JP, Götz M, Kickingereder P, Yaqubi K, Hitthaler B, Gählert N, Kuder TA, Deister F, Freitag M, Hohenfellner M, Hadaschik BA, Schlemmer HP, Maier-Hein KH. Radiomic Machine Learning for Characterization of Prostate Lesions with MRI: Comparison to ADC Values. Radiology 2018;289:128-37. [Crossref] [PubMed]
- Cha KH, Hadjiiski LM, Samala RK, Chan HP, Cohan RH, Caoili EM, Paramagul C, Alva A, Weizer AZ. Bladder Cancer Segmentation in CT for Treatment Response Assessment: Application of Deep-Learning Convolution Neural Network-A Pilot Study. Tomography 2016;2:421-9. [Crossref] [PubMed]
- Ha S, Choi H, Cheon GJ, Kang KW, Chung JK, Kim EE, Lee DS. Autoclustering of Non-small Cell Lung Carcinoma Subtypes on (18)F-FDG PET Using Texture Analysis: A Preliminary Result. Nucl Med Mol Imaging 2014;48:278-86. [Crossref] [PubMed]
- Qian C, Jiang C, Xie K, Ding C, Teng Y, Sun J, Gao L, Zhou Z, Ni X. Prognosis Prediction of Diffuse Large B-Cell Lymphoma in (18)F-FDG PET Images Based on Multi-Deep-Learning Models. IEEE J Biomed Health Inform 2024;28:4010-23. [Crossref] [PubMed]
- Sylvester EVA, Bentzen P, Bradbury IR, Clément M, Pearce J, Horne J, Beiko RG. Applications of random forest feature selection for fine-scale genetic population assignment. Evol Appl 2018;11:153-65. [Crossref] [PubMed]
Chen T Guestrin C. XGBoost: A Scalable Tree Boosting System. arXiv:1603.02754.