Transferring thrombus labels from CTPA to non-contrast CT via accurate pulmonary vessel registration: a critical step toward AI-based pulmonary embolism detection
Introduction
Pulmonary embolism (PE), a form of venous thromboembolism (VTE), poses a significant global health challenge as the third most common acute cardiovascular syndrome following myocardial infarction and stroke (1). As the most severe manifestation of VTE, PE has an estimated annual incidence of 60–120 per 100,000 population with an in-hospital mortality of approximately 14% (2). Early diagnosis and intervention, such as thrombolytic therapy, are critical for improving patient outcomes (3-6). However, PE remains frequently underdiagnosed, with recent studies reporting diagnostic delays in 12–36% of patients and a pooled mean diagnostic delay of 6.3 days (5,7). According to the 2019 European Society of Cardiology (ESC) guidelines, clinicians must first assess the probability of PE using clinical prediction rules (CPRs) before deciding whether to perform computed tomography (CT) pulmonary angiography (CTPA). This process can delay timely diagnosis and treatment (8,9). Additionally, CTPA carries risks of complications such as contrast-induced nephropathy, which can prevent some patients from undergoing this procedure (8,10).
Although D-dimer testing is effective for ruling out low-risk PE, its utilization remains suboptimal (11,12). In contrast, the use of CTPA is rapidly increasing, yet diagnostic yields are declining. Overreliance on CTPA not only poses health risks to patients but also places strain on healthcare systems (12-14). Non-contrast-enhanced CT (NCCT) is widely available and relatively inexpensive. Previous studies have shown that PE exhibits certain patterns on NCCT images, and it is not entirely impossible to visually assess the presence of thrombi. However, this process is highly challenging, and visual inspection alone is insufficient for the evaluation of small peripheral vessels; its application in PE diagnosis is restricted (15-18). Integrating artificial intelligence (AI) models to detect PE on NCCT images could help to overcome this limitation (19). Training such models requires determining ground truth on NCCT images (20-24). This is difficult through visual assessment alone, so we propose using image registration between enhanced and NCCT images to transfer thrombus landmarks from enhanced images to NCCT.
The accuracy of this process is highly dependent on the precision of image registration. PE thrombi are located within the pulmonary arteries and their branches (5). We can indirectly assess the accuracy of thrombus registration by validating the registration accuracy of pulmonary arteries and their branches on CTPA and NCCT images. Recent years have seen the emergence of numerous registration algorithms with significant potential in research and clinical applications. However, before these algorithms can be applied in clinical and research settings, their performance must be thoroughly evaluated (25-27). To date, no studies have specifically investigated the registration of pulmonary arteries and their branches on CTPA and NCCT images.
In this study, we qualitatively assessed three commonly used registration algorithms using clinical visual evaluation and quantitatively compared four algorithms using Dice coefficient, Jaccard similarity coefficient [i.e., Intersection over Union (IoU)], and Hausdorff distance (HD). We also explored methods for optimizing registration algorithms and evaluated their outcomes.
Methods
Study design and data inclusion
We included patients diagnosed with PE via CTPA between February 2012 and October 2024 at three hospitals: Tianjin Haihe Hospital, Tianjin Medical University Cancer Institute and Hospital, and Tianjin Hospital. A total of 772 patients were initially identified. The exclusion criteria were as follows: (I) absence of either NCCT or CTPA images for the same patient; (II) PE diagnosis confirmed via upper abdominal CT; and (III) age under 18 years. Ultimately, 324 patients met the inclusion criteria. Based on imaging characteristics, these patients were categorized into four groups: normal imaging, inflammatory imaging, cancer imaging, and postoperative imaging. We randomly selected 10 patients from each group for validation. Figure 1 shows the detailed inclusion and exclusion criteria.
The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This retrospective study was approved by the Ethics Committee of Tianjin Haihe Hospital (approval No. 2025HHKT-001, dated 17 March 2025), the lead institution of this multi-center collaborative research. All participating hospitals were informed and agreed to the study. The requirement for informed consent was waived due to the retrospective nature of the study using fully anonymized data.
NCCT and CTPA acquisition parameters
Patients were positioned supine with arms raised and centered on the table. All NCCT and CTPA scans were performed consecutively during the same imaging session to ensure anatomical consistency. First, a contrast agent skin test was performed using a small test dose of iodinated contrast medium; by the time of subsequent CTPA acquisition, this test dose had been excreted by the kidneys and did not interfere with NCCT image interpretation.
NCCT scans were performed after confirming no allergic reaction, during deep inhalation, covering the lung apices to the lung bases. Scan parameters included 120 kV tube voltage, automatic tube current regulation, and two reconstruction settings: 5 mm thickness/5 mm interval and 1 mm thickness/1 mm interval. Window settings were 1,200–1,500 Hounsfield units (HU)/−400 to −600 HU for lung windows and 350–450 HU/40–60 HU for mediastinal windows. Images were reconstructed using standard (mediastinal window) and lung algorithms (lung window).
Immediately after NCCT completion, CTPA scans were performed using the same equipment and coverage without repositioning the patient. Contrast agent (300–350 mgI/mL) was administered via a right antecubital vein cannula using a dual-syringe high-pressure injector. The dosage was calculated as 1.2–2 mL/kg body weight, injected at 3.0 mL/s, followed by a saline flush. Scans were performed during the arterial phase with a 25–35 s delay. The additional effective radiation dose from NCCT is typically 1.0–4.0 mSv, depending on whether low-dose or standard protocols are used based on patient body habitus and clinical indication.
Defining ground truth on CTPA
Data preprocessing of the images was carried out to resolve image quality and image noise differences between images and to ensure that the same specification of image features was used for computation. First, all images were resampled to a voxel size of 1×1×1 mm3 using linear interpolation; second, a Gaussian filter was used for denoising; and finally, the images were normalized using grayscale discretization.
Image data were reviewed by a radiology graduate student (holding a Master’s degree with 1.5 years of specialized radiology residency training) to confirm the presence of PE. A radiologist and a technologist systematically validated all CTPA images and categorized them. To assess the reproducibility of disease categorization, interobserver agreement between the two radiologists for classifying patients into the four disease categories (normal, inflammatory, cancer, and postoperative) was evaluated using Cohen’s kappa statistic. An imaging physician performed rigid alignment of image pairs using bone alignment, following standard clinical procedures.
From the four PE datasets, 10 pairs of NCCT and CTPA images were stratified randomly, including 10 pairs each of normal patients, inflammatory patients, cancer patients, and postoperative patients, totaling 40 patients (80 image pairs of NCCT and CTPA). Using Totalsegmentator (v2.7.0; https://totalsegmentator.com/) with the “lung vessels” and “heart” tasks, pulmonary arteries and their branches were segmented on the 80 images (28,29). Two radiologists reviewed and modified the segmentations. As it was challenging to distinguish heart chambers from the pulmonary artery, the entire heart region, excluding the aorta, was outlined.
Registration
Multiple registration algorithms are available. This study used three: generic mode in 3Dslicer (v5.6.2; https://www.slicer.org/) plugin SlicerElastix (v1.0, Queen’s PerkLab, University, Kingston, Ontario, Canada) (30,31), SyN mode in ANTsPy (v0.5.4, University of Pennsylvania, Philadelphia, PA, USA) (26), and Demons Registration in Python’s SimpleITK (v2.4.1; https://simpleitk.org/) (32-34). These algorithms are referred to as Elastix, ANTs, and Demons, respectively.
Registration parameters can affect algorithm accuracy. All three methods used officially recommended parameters. For Elastix, the parameter set was “Elastix”, and the preset was set to “generic (all)”. For ANTs, the transformation mode was set to “SyN” with GPU enabled. For Demons, the number of iterations was 200, the standard deviation was 1.0, and the smooth displacement field was “True”, with other settings default. Detailed parameters for all registration algorithms are provided in Table S1.
After registration, the segmentation labels of pulmonary arteries and branches from enhanced images were mapped to fixed images (NCCT) using the transformation matrix generated by registration. Nearest neighbor interpolation preserved binary label integrity.
Performance evaluation
To quantitatively assess registration performance, Dice coefficient, IoU, HD, and clinical physician scores were used to evaluate the overlap between manually segmented pulmonary vessel system labels on original CTPA and NCCT images, and those on registered CTPA images (new segmentations via transformation fields) and NCCT images.
The Dice coefficient measures sample similarity, ranging from 0 to 1, according to the following formula:
where A and B are the segmentation masks of NCCT and registered CTPA, and |A∩B| is the overlapping voxel count. A Dice coefficient of 1.0 indicates complete overlap, whereas 0 means no overlap.
IoU also assesses similarity by calculating the ratio of intersection to union of two sets, ranging from 0 to 1. Its formula is as follows:
A value of 1.0 means complete overlap, whereas 0 indicates no overlap.
HD measures boundary alignment. The HD from set A to B is defined as:
where a and b are points in images A and B, and d (a, b) is the distance between them. This study used HD95, the 95th percentile of boundary point distances, to mitigate the impact of minimal outliers. Smaller values indicate higher similarity.
These three coefficients were calculated using Python’s SimpleITK (v2.4.1) toolkit (31).
Two radiologists visually assessed registration performance using the following criteria:
- Unacceptable for clinical use;
- Acceptable but requiring major modifications for most structures;
- Acceptable but needing major modifications for some structures;
- Acceptable with minor modifications needed;
- Directly acceptable for clinical use.
After independent scoring, discrepancies were resolved through discussion to reach a final score for each image pair. Inter-rater reliability for the clinical physician scores was assessed using Cohen’s kappa statistic based on the independent assessments of the two radiologists prior to consensus discussion.
Statistical analysis
This study combined quantitative and qualitative methods to evaluate the performance of different registration algorithms. All statistical analyses were performed using jamovi (version 2.6.26.0; https://www.jamovi.org/) with a significance level of α=0.05 (two-tailed). Continuous variables (Dice, IoU, HD95) were presented as mean ± standard deviation, with box plots illustrating distribution. Ordered categorical data (clinical physician scores) were described using median and interquartile range (IQR) (P25, P75), supplemented by frequency distribution tables. Wilcoxon signed-rank tests analyzed differences between the best registration method and the other two across the four indicators, calculating mean differences. The rank biserial correlation quantified effect sizes, with P<0.001.
Post-hoc power analysis was conducted using G*Power 3.1.9.7 (https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower) to evaluate the statistical power achieved with our validation sample size (n=40). Based on the observed Cohen’s dz effect sizes from Wilcoxon signed-rank comparisons between registration methods, power curves were generated to determine the achieved power for detecting large effects (dz >1.0) at an alpha level of 0.05 (two-tailed).
Results
Figure 2 shows an example of rigidly registered CTPA and NCCT images, segmented by TotalSegmentator and corrected by a radiologist. The green area indicates the NCCT image segment, and the yellow area shows the CTPA image segment. For better visibility, non-enhanced images are used as the background, displaying only segmentation contours. Numbers from top to bottom represent the Dice coefficient, IoU, and HD95 metrics. The three-dimensional reconstruction with smoothing is shown in Figure 3.
Table 1 lists the four metrics after rigid registration of enhanced images. Inflammatory cases (Dice ≈0.659, IoU ≈0.495) and post-surgical cases (Dice ≈0.647, IoU ≈0.481) had slightly higher volume overlap than normal (Dice ≈0.616, IoU ≈0.449) and cancer cases (Dice ≈0.606, IoU ≈0.434). Cancer cases had slightly lower boundary alignment (HD95 ≈5.59) than inflammatory (HD95 ≈5.97), post-surgical (HD95 ≈6.02), and normal cases (HD95 ≈6.08). Overall, Dice values (0.632±0.07) were lower than IoU values (0.465±0.07). The median clinical scores for all four conditions were below 3.
Table 1
| Disease category | Dice | IoU | HD95 | Physician score |
|---|---|---|---|---|
| Total | 0.632±0.07 | 0.465±0.07 | 5.91±2.34 | 1 (1, 2) |
| Normal | 0.616±0.08 | 0.449±0.08 | 6.08±2.27 | 1 (1, 1) |
| Inflammation | 0.659±0.07 | 0.495±0.07 | 5.97±3.56 | 2 (1, 2) |
| Cancer | 0.606±0.06 | 0.434±0.06 | 5.59±1.68 | 1 (1, 1) |
| Postoperative | 0.647±0.06 | 0.481±0.07 | 6.02±1.73 | 1 (1, 1) |
Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95th percentile Hausdorff distance; IoU, Intersection over Union.
Figure 4 displays an enhanced image after different registration methods, with NCCT as the background. Red indicates Elastix, blue ANTs, yellow Demons, and green NCCT segmentation. After rigid registration, Dice was 0.623, IoU 0.452, and HD95 5.451. After Elastix, ANTs, and Demons registration, Dice coefficients were 0.776, 0.689, and 0.7; IoU values were 0.634, 0.526, and 0.538; and HD95 values were 4.33, 8.31, and 5.70.
Table 2 shows the overall situation after different registration methods on rigidly registered images. Post-surgical cases had the highest volume overlap (Dice ≈0.767, IoU ≈0.627), followed by inflammatory (Dice ≈0.749, IoU ≈0.602) and cancer cases (Dice ≈0.747, IoU ≈0.601), with normal cases the lowest (Dice ≈0.727, IoU ≈0.576). After registration, post-surgical cases had the best boundary alignment (HD95 ≈5.9), whereas cancer cases had the highest HD95 (≈5.97). Clinical scores for all four conditions improved, with medians reaching 4, and post-surgical cases had a higher P25 of 4.
Table 2
| Disease category | Dice | IoU | HD95 | Physician score |
|---|---|---|---|---|
| Normal | 0.727±0.072 | 0.576±0.089 | 5.93±2.22 | 4 (3, 4) |
| Inflammation | 0.749±0.064 | 0.602±0.080 | 5.92±2.27 | 4 (3, 4) |
| Cancer | 0.747±0.068 | 0.601±0.087 | 5.97±2.33 | 4 (3, 4) |
| Postoperative | 0.767±0.068 | 0.627±0.088 | 5.9±2.32 | 4 (4, 4) |
Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95th percentile Hausdorff distance; IoU, Intersection over Union.
Table 3 presents the results of different registration methods. Compared to rigid registration, all three methods improved image registration. Elastix showed the best volume overlap (Dice ≈0.819, IoU ≈0.695) and boundary alignment (HD95 ≈3.42), achieving the highest clinical score [4 (IQR: 4, 5)]. ANTs had better volume overlap (Dice ≈0.732, IoU ≈0.578) than Demons but slightly worse boundary alignment (HD95 ≈7.89) than Demons (HD95 ≈6.49). ANTs and Demons had the same interquartile ranges for clinical scores, with Demons having a median of 4 and ANTs a median of 3.
Table 3
| Registration method | Dice | IoU | HD95 | Physician score |
|---|---|---|---|---|
| Rigid | 0.632±0.070 | 0.465±0.073 | 5.91±2.34 | 1 (1, 2) |
| Elastix | 0.819±0.030 | 0.695±0.043 | 3.42±0.57 | 4 (4, 5) |
| ANTs | 0.732±0.032 | 0.578±0.040 | 7.89±1.29 | 3 (3, 4) |
| Demons | 0.692±0.062 | 0.532±0.071 | 6.49±1.68 | 4 (3, 4) |
Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95th percentile Hausdorff distance; IoU, Intersection over Union.
Violin plots, combining box plots and density distribution, illustrate data concentration and dispersion, showing the stability of registration effects across different methods. Figures 5-8 indicate that Elastix had median Dice and IoU values of 0.823 and 0.700, ANTs 0.732 and 0.577, and Demons 0.698 and 0.536. For HD95, Elastix’s median was 3.36, compared to 7.76 and 5.89 for the other methods. In clinical score box plots, Elastix’s median was 4.0, while the others were 3.0 and 3.5. In Dice and IoU violin plots, Elastix and Demons showed no outliers, whereas ANTs had some. In the HD95 violin plot, Elastix had one outlier at 4.74, ANTs one, and Demons four.
Table 4 shows the Wilcoxon signed-rank test results. Since the data were non-normally distributed, paired Wilcoxon signed-rank tests were used. For Dice and IoU, Elastix vs. ANTs and Demons had W=820 (P<0.001) and rank biserial correlation =1. For HD95, Elastix vs. ANTs and Demons had W=1 and 0 (P<0.001), with rank biserial correlation =−0.998 and −1, indicating strong effects.
Table 4
| Metric | Comparison | Statistic | P value | Mean difference | SE difference | Rank biserial effect size |
|---|---|---|---|---|---|---|
| Dice (Elastix) | Dice (ANTs) | 820 | <0.001 | 0.088 | 0.00351 | 1 |
| Dice (Demons) | 820 | <0.001 | 0.1255 | 0.00656 | 1 | |
| IoU (Elastix) | IoU (ANTs) | 820 | <0.001 | 0.1185 | 0.00469 | 1 |
| IoU (Demons) | 820 | <0.001 | 0.162 | 0.00705 | 1 | |
| HD95 (Elastix) | HD95 (ANTs) | 1 | <0.001 | −4.4425 | 0.21259 | −0.998 |
| HD95 (Demons) | 0 | <0.001 | −2.77 | 0.23603 | −1 |
HD95, 95th percentile Hausdorff distance; IoU, Intersection over Union; SE, standard error.
Interobserver agreement analysis demonstrated high reliability for both disease categorization and clinical scoring. For classification of patients into the four disease categories, Cohen’s kappa was 0.927 [95% confidence interval (CI): 0.891–0.963, P<0.001], indicating almost perfect agreement between the two radiologists (Table 5).
Table 5
| N | Raters | Kappa | Z | P value | 95% CI | |
|---|---|---|---|---|---|---|
| Lower | Upper | |||||
| 324 | 2 | 0.927 | 25.6 | <0.001 | 0.891 | 0.963 |
CI, confidence interval.
For the clinical physician scoring of registration quality, inter-rater reliability was also high, with a Cohen’s kappa of 0.867 (95% CI: 0.806–0.930, P<0.001) (Table 6). Following independent scoring, discrepancies between raters were resolved through discussion to reach a final consensus score for each image pair.
Table 6
| N | Raters | Kappa | Z | P | 95% CI | |
|---|---|---|---|---|---|---|
| Lower | Upper | |||||
| 40 | 2 | 0.867 | 20.4 | <0.001 | 0.806 | 0.93 |
Clinical scores were obtained for 4 registration methods per patient (total 160 assessments). CI, confidence interval.
Post-hoc power analysis was conducted to validate the statistical adequacy of our validation sample size (n=40). Figure 9 illustrates the power curve for dependent means comparison (matched pairs) with a total sample size of 40, two-tailed α=0.05. The curve demonstrates that statistical power increases rapidly with effect size, reaching and maintaining 1.00 (100% power) when Cohen’s dz exceeds approximately 0.9–1.0.
As detailed in Table 7, our observed Cohen’s dz values for comparisons between Elastix and other registration methods ranged from 1.59 to 4.83 across all evaluation metrics. Specifically, for Dice coefficient comparisons, effect sizes were 3.29 (Elastix vs. Rigid), 3.93 (Elastix vs. ANTs), and 3.10 (Elastix vs. Demons). For IoU, values ranged from 3.69 to 3.95, and for clinical scores, values ranged from 1.59 to 4.83.
Table 7
| Metric | Elastix/Rigid | Elastix/ANTs | Elastix/Demons |
|---|---|---|---|
| Dice | 3.29 | 3.93 | 3.1 |
| IoU | 3.87 | 3.95 | 3.69 |
| HD95 | −1.16 | −3.33 | −2.09 |
| Clinic | 4.83 | 2.21 | 1.59 |
Rows represent metrics, columns represent comparisons against baseline methods. HD95, 95th percentile Hausdorff distance; IoU, Intersection over Union.
All observed effect sizes substantially exceeded the threshold of dz =1.0, falling within the plateau region of the power curve where achieved power equals 1.00. Even for the smallest observed effect size (clinical score comparison between Elastix and Demons, dz =1.59), the statistical power exceeded 0.99. These results confirm that our sample of 40 validation cases provides sufficient statistical power to detect meaningful differences between registration algorithms with high confidence.
Table 8 lists the four metrics for different cases after Elastix registration. Post-surgical cases had the best volume overlap (Dice ≈0.837, IoU ≈0.721) and boundary alignment (HD95 ≈3.3). Cancer cases had slightly better volume overlap (Dice ≈0.821, IoU ≈0.698) and boundary alignment (HD95 ≈3.34) than inflammatory and normal cases. Inflammatory cases had slightly better volume overlap (Dice ≈0.811, IoU ≈0.683) than normal cases (Dice ≈0.807, IoU ≈0.677), with both having HD95 ≈3.51. All four conditions had clinical score medians ≥4, with post-surgical cases reaching 5 and interquartile ranges ≤1.
Table 8
| Disease category | Dice | IoU | HD95 | Physician score |
|---|---|---|---|---|
| Total | 0.819±0.030 | 0.695±0.043 | 3.42±0.57 | 4 (4, 5) |
| Normal | 0.807±0.032 | 0.677±0.045 | 3.51±0.53 | 4 (4, 5) |
| Inflammation | 0.811±0.029 | 0.683±0.042 | 3.51±0.51 | 4 (4, 4) |
| Cancer | 0.821±0.028 | 0.698±0.040 | 3.34±0.79 | 4 (4, 5) |
| Postoperative | 0.837±0.027 | 0.721±0.039 | 3.3±0.46 | 5 (4, 5) |
Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95th percentile Hausdorff distance; IoU, Intersection over Union.
Interobserver agreement for disease classification between the two radiologists was excellent (κ=0.927, 95% CI: 0.891–0.963), indicating the high reproducibility of the disease classification system. Inter-rater reliability for clinical scoring was also high (κ=0.867, 95% CI: 0.806–0.93), suggesting that the clinical scoring system was robust and consistent across raters (Tables 5,6).
Discussion
This study explored the performance of different registration methods in registering pulmonary arteries and their branches in pre- and post-contrast chest CT images. Previous studies focused on lung, heart, and trachea registration but lacked specialized research on pulmonary vascular systems (26,35,36).
Elastix, ANTs, and Demons all improved registration over rigid registration. Elastix achieved the highest Dice coefficient (0.819), IoU (0.695), and HD95 (3.42), with significant improvements in volume overlap and boundary alignment. Paired t-tests showed that Elastix outperformed the other two methods. Elastix demonstrated stable performance across different cases: Dice coefficients were approximately 0.807 for normal cases, 0.811 for inflammatory, 0.821 for cancer, and 0.837 for post-surgical. Clinical scores for Elastix-registered images had a median of 4, indicating minimal or no modifications were needed for clinical application. The inclusion of heart chambers might have interfered with registration results, suggesting actual pulmonary artery and branch registration accuracy could be higher.
In volume overlap analysis, Dice values were generally higher than IoU values. According to their definitions, when |A∩B| >0:
for registered images where |A∩B| ≠0 and A ≠ B, |A∩B| < |A∪B|, so Dice/IoU ∈ [1,2). Almost all registration methods, including rigid registration, performed better in post-surgical and inflammatory cases than in normal and cancer cases. This may be due to smaller vascular volumes in the latter two groups during manual segmentation. Post-surgical cases had partially removed vessels, and inflammatory cases had blurred lung-vessel boundaries, leading to potential omission of pulmonary vessel parts by clinicians. Compared to cancer and normal cases, post-surgical and inflammatory cases had smaller vascular volumes requiring transformation field conversion. Moreover, post-surgical and inflammatory cases may have more feature points, such as surgical metal shadows and inflammatory lung consolidation, enhancing registration accuracy.
Frequency distribution showed that Demons’ distribution was similar to rigid registration, whereas Elastix and ANTs’ distributions were more concentrated. This suggests that Demons’ registration is more dependent on original image quality, whereas Elastix and ANTs are more stable across various scenarios. Rigid registration, involving manual boundary alignment by clinicians, may explain its superior HD95 performance over ANTs and Demons.
On a personal notebook [GeForce RTX 2060 (6GB), NVIDIA, Santa Clara, CA, USA], Elastix registration took about 1 min, Demons about 2 min 30 s, and ANTs about 3 min. Pre-cropping lung images to focus on the lungs can significantly improve registration speed and accuracy. Future large-cohort experiments using registered transformation fields to generate gold standards on NCCT images should involve pre-cropping non-enhanced and enhanced images to the same volume before rigid registration to enhance efficiency and accuracy.
This study might not entirely showcase each algorithm’s optimal performance. Although we tested multiple parameter combinations on a subset of images and ultimately chose the official recommended parameters, parameter selection can still influence algorithm performance, necessitating further research for comprehensive evaluation.
Limitations
This study selected three mature and representative registration methods: Elastix, ANTs, and Demons. However, many registration algorithms are available, so our conclusions may not identify the absolute best algorithm.
This multi-center study included data from three hospitals specializing in tumor, respiratory, and orthopedic diseases. Despite this, the dataset was limited, with a high proportion of cancer patients. To enhance representativeness, cases were divided into four groups with stratified sampling, but this approach may not cover all disease types affecting pulmonary images. Second, although our post-hoc power analysis confirms adequate statistical power for detecting differences between registration methods within the validation cohort (n=40), we acknowledge that the overall sample size remains relatively modest for developing robust AI-based detection systems. The current study serves as a preliminary methodological evaluation, and we are committed to conducting large-scale multi-center validation studies with substantially expanded patient cohorts to confirm these findings and enhance generalizability.
The gold standard of pulmonary arteries and their branches was established via TotalSegmentator segmentation followed by clinical physician revision. However, terminal small vessels may be overlooked due to their shape and CT HU values similar to surrounding tissues. Thrombi, as part of the pulmonary arteries, can theoretically be registered via transformation fields if vascular registration is accurate. However, there is currently no direct evidence proving the alignment between registered gold standards and actual PE locations, requiring further validation in future studies.
Conclusions
This study serves as a preliminary experiment aimed at developing an AI model for detecting PE in NCCT images. Among the evaluated registration methods, Elastix performed best in Dice coefficient, IoU, and HD, with clinical physician scores between 4 and 5, indicating “clinically acceptable with minimal or no modifications needed”. Thus, Elastix-generated registration fields can accurately transfer thrombus gold standards from enhanced to NCCT images. However, the reliability and validity of thrombus gold standards on NCCT require additional evaluation.
Acknowledgments
None.
Footnote
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2373/dss
Funding: This study was supported by a grant from
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2373/coif). All authors declared that this study was supported by a grant from the Tianjin Municipal Education Commission (No. 2024ZXZD004). The authors have no other conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This retrospective study was approved by the Ethics Committee of Tianjin Haihe Hospital (approval No. 2025HHKT-001, dated March 17, 2025), the lead institution of this multi-center collaborative research. All participating hospitals were informed and agreed to the study. Informed consent was waived due to the retrospective nature of the study using fully anonymized data.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Raskob GE, Angchaisuksiri P, Blanco AN, Buller H, Gallus A, Hunt BJ, Hylek EM, Kakkar A, Konstantinides SV, McCumber M, Ozaki Y, Wendelboe A, Weitz JIISTH Steering Committee for World Thrombosis Day. Thrombosis: a major contributor to global disease burden. Arterioscler Thromb Vasc Biol 2014;34:2363-71. [Crossref] [PubMed]
- Freund Y, Cohen-Aubart F, Bloom B. Acute Pulmonary Embolism: A Review. JAMA 2022;328:1336-45. [Crossref] [PubMed]
- Duffett L, Castellucci LA, Forgie MA. Pulmonary embolism: update on management and controversies. BMJ 2020;370:m2177. [Crossref] [PubMed]
- Martinez Licha CR, McCurdy CM, Maldonado SM, Lee LS. Current Management of Acute Pulmonary Embolism. Ann Thorac Cardiovasc Surg 2020;26:65-71. [Crossref] [PubMed]
- Maughan BC, Jarman AF, Redmond A, Geersing GJ, Kline JA. Pulmonary embolism. BMJ 2024;384:e071662. [Crossref] [PubMed]
- Trott T, Bowman J. Diagnosis and Management of Pulmonary Embolism. Emerg Med Clin North Am 2022;40:565-81. [Crossref] [PubMed]
- van Maanen R, Trinks-Roerdink EM, Rutten FH, Geersing GJ. A systematic review and meta-analysis of diagnostic delay in pulmonary embolism. Eur J Gen Pract 2022;28:165-72. [Crossref] [PubMed]
- Cohen AT, Agnelli G, Anderson FA, Arcelus JI, Bergqvist D, Brecht JG, Greer IA, Heit JA, Hutchinson JL, Kakkar AK, Mottier D, Oger E, Samama MM, Spannagl MVTE Impact Assessment Group in Europe (VITAE). Venous thromboembolism (VTE) in Europe. The number of VTE events and associated morbidity and mortality. Thromb Haemost 2007;98:756-64. [Crossref] [PubMed]
- Konstantinides SV, Meyer G, Becattini C, Bueno H, Geersing GJ, Harjola VP, et al. 2019 ESC Guidelines for the diagnosis and management of acute pulmonary embolism developed in collaboration with the European Respiratory Society (ERS). Eur Heart J 2020;41:543-603. [Crossref] [PubMed]
- Doganay S, Oguz AK, Ergun I. Increased risk of contrast-induced acute kidney injury in patients with pulmonary thromboembolism. Ren Fail 2015;37:1138-44. [Crossref] [PubMed]
- Kline JA, Garrett JS, Sarmiento EJ, Strachan CC, Courtney DM. Over-Testing for Suspected Pulmonary Embolism in American Emergency Departments: The Continuing Epidemic. Circ Cardiovasc Qual Outcomes 2020;13:e005753. [Crossref] [PubMed]
- Perera M, Aggarwal L, Scott IA, Cocks N. Underuse of risk assessment and overuse of computed tomography pulmonary angiography in patients with suspected pulmonary thromboembolism. Intern Med J 2017;47:1154-60. [Crossref] [PubMed]
- Chean LN, Tan C, Hiskens MI, Rattenbury M, Sundaram P, Perara J, Smith K, Kumar P. Overuse of Computed Tomography Pulmonary Angiography and Low Utilization of Clinical Prediction Rules in Suspected Pulmonary Embolism Patients at a Regional Australian Hospital. Healthcare (Basel) 2024;12:278. [Crossref] [PubMed]
- Raji H. JavadMoosavi SA, Dastoorpoor M, Mohamadipour Z, Mousavi Ghanavati SP. Overuse and underuse of pulmonary CT angiography in patients with suspected pulmonary embolism. Med J Islam Repub Iran 2018;32:3. [Crossref] [PubMed]
- Ehsanbakhsh A, Hatami F, Valizadeh N, Khorashadizadeh N, Norouzirad F. Evaluating the Performance of Unenhanced Computed Tomography in the Diagnosis of Pulmonary Embolism. J Tehran Heart Cent 2021;16:156-61. [PubMed]
- Chien CH, Shih FC, Chen CY, Chen CH, Wu WL, Mak CW. Unenhanced multidetector computed tomography findings in acute central pulmonary embolism. BMC Med Imaging 2019;19:65. [Crossref] [PubMed]
- Tatco VR, Piedad HH. The validity of hyperdense lumen sign in non-contrast chest CT scans in the detection of pulmonary thromboembolism. Int J Cardiovasc Imaging 2011;27:433-40. [Crossref] [PubMed]
- Guo R, Deng M, Xi L, Zhang S, Xu W, Liu M. Chest non contrasted computed tomography in detecting acute pulmonary thromboembolism: A single center retrospective study. Exp Ther Med 2024;28:304. [Crossref] [PubMed]
- Hagen F, Vorberg L, Thamm F, Ditt H, Maier A, Brendel JM, Ghibes P, Bongers MN, Krumm P, Nikolaou K, Horger M. Improved detection of small pulmonary embolism on unenhanced computed tomography using an artificial intelligence-based algorithm - a single centre retrospective study. Int J Cardiovasc Imaging 2024;40:2293-304. [Crossref] [PubMed]
- Gore JC. Artificial intelligence in medical imaging. Magn Reson Imaging 2020;68:A1-4. [Crossref] [PubMed]
- Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500-10. [Crossref] [PubMed]
- Li M, Jiang Y, Zhang Y, Zhu H. Medical image analysis using deep learning algorithms. Front Public Health 2023;11:1273253. [Crossref] [PubMed]
- Thrall JH, Li X, Li Q, Cruz C, Do S, Dreyer K, Brink J. Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success. J Am Coll Radiol 2018;15:504-8. [Crossref] [PubMed]
- Yang R, Yu Y. Artificial Convolutional Neural Network in Object Detection and Semantic Segmentation for Medical Imaging Analysis. Front Oncol 2021;11:638182. [Crossref] [PubMed]
- Keszei AP, Berkels B, Deserno TM. Survey of Non-Rigid Registration Tools in Medicine. J Digit Imaging 2017;30:102-16. [Crossref] [PubMed]
- Murphy K, van Ginneken B, Reinhardt JM, Kabus S, Ding K, Deng X, et al. Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge. IEEE Trans Med Imaging 2011;30:1901-20. [Crossref] [PubMed]
- Zou J, Gao B, Song Y, Qin J. A review of deep learning-based deformable medical image registration. Front Oncol 2022;12:1047215. [Crossref] [PubMed]
- Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203-11. [Crossref] [PubMed]
- Wasserthal J, Breit HC, Meyer MT, Pradella M, Hinck D, Sauter AW, Heye T, Boll DT, Cyriac J, Yang S, Bach M, Segeroth M. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell 2023;5:e230024. [Crossref] [PubMed]
- Klein S, Staring M, Murphy K, Viergever MA, Pluim JP. elastix: a toolbox for intensity-based medical image registration. IEEE Trans Med Imaging 2010;29:196-205. [Crossref] [PubMed]
- Shamonin DP, Bron EE, Lelieveldt BP, Smits M, Klein S, Staring MAlzheimer's Disease Neuroimaging Initiative. Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease. Front Neuroinform 2014;7:50. [PubMed]
- Beare R, Lowekamp B, Yaniv Z. Image Segmentation, Registration and Characterization in R with SimpleITK. J Stat Softw 2018;86:8. [Crossref] [PubMed]
- Lowekamp BC, Chen DT, Ibáñez L, Blezek D. The Design of SimpleITK. Front Neuroinform 2013;7:45. [Crossref] [PubMed]
- Yaniv Z, Lowekamp BC, Johnson HJ, Beare R. SimpleITK Image-Analysis Notebooks: a Collaborative Environment for Education and Reproducible Research. J Digit Imaging 2018;31:290-303. [Crossref] [PubMed]
- Kadoya N, Fujita Y, Katsuta Y, Dobashi S, Takeda K, Kishi K, Kubozono M, Umezawa R, Sugawara T, Matsushita H, Jingu K. Evaluation of various deformable image registration algorithms for thoracic images. J Radiat Res 2014;55:175-82. [Crossref] [PubMed]
- Nielsen MS, Østergaard LR, Carl J. A new method to validate thoracic CT-CT deformable image registration using auto-segmented 3D anatomical landmarks. Acta Oncol 2015;54:1515-20. [Crossref] [PubMed]



