Transferring thrombus labels from CTPA to non-contrast CT via accurate pulmonary vessel registration: a critical step toward AI-based pulmonary embolism detection

Xiaobin Gong; Bingzhen Jia; Yuyang Zhang; Baolin Tan; Anwei He; Zhaoxiang Ye; Zhiheng Xing

doi:10.21037/qims-2025-aw-2373

Original Article

Transferring thrombus labels from CTPA to non-contrast CT via accurate pulmonary vessel registration: a critical step toward AI-based pulmonary embolism detection

Xiaobin Gong^1#, Bingzhen Jia^2#, Yuyang Zhang¹, Baolin Tan², Anwei He³, Zhaoxiang Ye⁴, Zhiheng Xing^1,2

¹Haihe Clinical School, Tianjin Medical University, Tianjin, China; ²Department of Radiology, Tianjin Haihe Hospital, TCM Key Research Laboratory for Infectious Disease Prevention for State Administration of Traditional Chinese Medicine, Tianjin Institute of Respiratory Diseases, Haihe Hospital, Tianjin University, Tianjin, China; ³First Department of Radiology, Tianjin Hospital, Tianjin, China; ⁴Department of Radiology, Tianjin Medical University Cancer Institute and Hospital/National Clinical Research Center for Cancer/Tianjin’s Clinical Research Center for Cancer/Key Laboratory of Cancer Prevention and Therapy/Tianjin Key Laboratory of Digestive Cancer/State Key Laboratory of Druggability Evaluation and Systematic Translational Medicine, Tianjin, China

Contributions: (I) Conception and design: Z Xing, X Gong; (II) Administrative support: Z Xing, Z Ye, A He; (III) Provision of study materials or patients: Z Xing, Z Ye, A He; (IV) Collection and assembly of data: X Gong, B Jia, Y Zhang, B Tan; (V) Data analysis and interpretation: X Gong; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Anwei He, MM. First Department of Radiology, Tianjin Hospital, No. 406 Jiefangnan Rd, Hexi Dist, Tianjin 300050, China. Email: heanwei2009@163.com; Zhaoxiang Ye, MD. Department of Radiology, Tianjin Medical University Cancer Institute and Hospital/National Clinical Research Center for Cancer/Tianjin’s Clinical Research Center for Cancer/Key Laboratory of Cancer Prevention and Therapy/Tianjin Key Laboratory of Digestive Cancer/State Key Laboratory of Druggability Evaluation and Systematic Translational Medicine, No. 1 Huanhu West Rd, Tiyuan North Rd, Hexi Dist, Tianjin 300060, China. Email: yezhaoxiang@163.com; Zhiheng Xing, MD. Haihe Clinical School, Tianjin Medical University, Tianjin, China; Department of Radiology, Tianjin Haihe Hospital, TCM Key Research Laboratory for Infectious Disease Prevention for State Administration of Traditional Chinese Medicine, Tianjin Institute of Respiratory Diseases, Haihe Hospital, Tianjin University, No. 890 Jingu Rd, Shuanggang, Jinnan Dist, Tianjin 300350, China. Email: 18920696025@189.cn.

Background: Pulmonary embolism (PE) remains a leading cause of cardiovascular mortality with a low early diagnosis rate, highlighting the urgent need for more efficient detection methods. Non-contrast-enhanced computed tomography (NCCT) offers a promising alternative to computed tomography pulmonary angiography (CTPA) but requires accurate image registration for artificial intelligence (AI) model training. This study evaluates and compares three commonly used image registration algorithms (Elastix, ANTs, and Demons) to determine their efficacy in aligning pulmonary arteries on CTPA and NCCT images for potential AI applications in PE detection.

Methods: This retrospective study included 324 PE patients diagnosed via CTPA across three hospitals. Pulmonary arteries and their branches were segmented on NCCT and CTPA images using Totalsegmentator with manual radiologist review. Three registration algorithms (Elastix, ANTs, Demons) were compared against rigid registration. Accuracy was assessed using Dice coefficient, Jaccard similarity [Intersection over Union (IoU)], 95th percentile Hausdorff distance (HD95), and clinical physician scores (1–5 scale). Pairwise comparisons were performed using Wilcoxon signed-rank tests with rank biserial correlation for effect size calculation.

Results: Elastix demonstrated significantly superior performance compared to ANTs and Demons across all metrics. Elastix achieved Dice coefficient of 0.819±0.030 vs. 0.732±0.032 for ANTs and 0.692±0.062 for Demons (both P<0.001, effect size r=1.0). IoU values were 0.695±0.043 for Elastix, 0.578±0.040 for ANTs, and 0.532±0.071 for Demons (both P<0.001, r=1.0). HD95 was significantly lower for Elastix (3.42±0.57 mm) compared to ANTs (7.89±1.29 mm) and Demons (6.49±1.68 mm) (both P<0.001, r=−0.998 to −1.0). Median clinical physician scores for Elastix were 4 [interquartile range (IQR): 4, 5], indicating “clinically acceptable with minimal or no modifications needed”, vs. 3 (IQR: 3, 4) for ANTs and 4 (IQR: 3, 4) for Demons.

Conclusions: Among the evaluated registration methods, Elastix showed the highest accuracy and clinical applicability for transferring thrombus landmarks from enhanced CTPA to NCCT images (Dice =0.819, IoU =0.695, HD95 =3.42 mm, all P<0.001 vs. comparators). This study provides a foundation for developing AI models to detect PE in NCCT images, although further validation of thrombus ground standards on NCCT is required.

Keywords: Pulmonary embolism (PE); unenhanced chest computed tomography (unenhanced chest CT); image registration; artificial intelligence (AI); algorithm evaluation

Submitted Nov 09, 2025. Accepted for publication Mar 13, 2026. Published online Apr 13, 2026.

doi: 10.21037/qims-2025-aw-2373

Introduction

Pulmonary embolism (PE), a form of venous thromboembolism (VTE), poses a significant global health challenge as the third most common acute cardiovascular syndrome following myocardial infarction and stroke (1). As the most severe manifestation of VTE, PE has an estimated annual incidence of 60–120 per 100,000 population with an in-hospital mortality of approximately 14% (2). Early diagnosis and intervention, such as thrombolytic therapy, are critical for improving patient outcomes (3-6). However, PE remains frequently underdiagnosed, with recent studies reporting diagnostic delays in 12–36% of patients and a pooled mean diagnostic delay of 6.3 days (5,7). According to the 2019 European Society of Cardiology (ESC) guidelines, clinicians must first assess the probability of PE using clinical prediction rules (CPRs) before deciding whether to perform computed tomography (CT) pulmonary angiography (CTPA). This process can delay timely diagnosis and treatment (8,9). Additionally, CTPA carries risks of complications such as contrast-induced nephropathy, which can prevent some patients from undergoing this procedure (8,10).

Although D-dimer testing is effective for ruling out low-risk PE, its utilization remains suboptimal (11,12). In contrast, the use of CTPA is rapidly increasing, yet diagnostic yields are declining. Overreliance on CTPA not only poses health risks to patients but also places strain on healthcare systems (12-14). Non-contrast-enhanced CT (NCCT) is widely available and relatively inexpensive. Previous studies have shown that PE exhibits certain patterns on NCCT images, and it is not entirely impossible to visually assess the presence of thrombi. However, this process is highly challenging, and visual inspection alone is insufficient for the evaluation of small peripheral vessels; its application in PE diagnosis is restricted (15-18). Integrating artificial intelligence (AI) models to detect PE on NCCT images could help to overcome this limitation (19). Training such models requires determining ground truth on NCCT images (20-24). This is difficult through visual assessment alone, so we propose using image registration between enhanced and NCCT images to transfer thrombus landmarks from enhanced images to NCCT.

The accuracy of this process is highly dependent on the precision of image registration. PE thrombi are located within the pulmonary arteries and their branches (5). We can indirectly assess the accuracy of thrombus registration by validating the registration accuracy of pulmonary arteries and their branches on CTPA and NCCT images. Recent years have seen the emergence of numerous registration algorithms with significant potential in research and clinical applications. However, before these algorithms can be applied in clinical and research settings, their performance must be thoroughly evaluated (25-27). To date, no studies have specifically investigated the registration of pulmonary arteries and their branches on CTPA and NCCT images.

In this study, we qualitatively assessed three commonly used registration algorithms using clinical visual evaluation and quantitatively compared four algorithms using Dice coefficient, Jaccard similarity coefficient [i.e., Intersection over Union (IoU)], and Hausdorff distance (HD). We also explored methods for optimizing registration algorithms and evaluated their outcomes.

Methods

Study design and data inclusion

We included patients diagnosed with PE via CTPA between February 2012 and October 2024 at three hospitals: Tianjin Haihe Hospital, Tianjin Medical University Cancer Institute and Hospital, and Tianjin Hospital. A total of 772 patients were initially identified. The exclusion criteria were as follows: (I) absence of either NCCT or CTPA images for the same patient; (II) PE diagnosis confirmed via upper abdominal CT; and (III) age under 18 years. Ultimately, 324 patients met the inclusion criteria. Based on imaging characteristics, these patients were categorized into four groups: normal imaging, inflammatory imaging, cancer imaging, and postoperative imaging. We randomly selected 10 patients from each group for validation. Figure 1 shows the detailed inclusion and exclusion criteria.

Figure 1 Study flowchart. CT, computed tomography; CTPA, computed tomography pulmonary angiography; NCCT, non-contrast-enhanced computed tomography; PE (+), pulmonary embolism positive.

The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This retrospective study was approved by the Ethics Committee of Tianjin Haihe Hospital (approval No. 2025HHKT-001, dated 17 March 2025), the lead institution of this multi-center collaborative research. All participating hospitals were informed and agreed to the study. The requirement for informed consent was waived due to the retrospective nature of the study using fully anonymized data.

NCCT and CTPA acquisition parameters

Patients were positioned supine with arms raised and centered on the table. All NCCT and CTPA scans were performed consecutively during the same imaging session to ensure anatomical consistency. First, a contrast agent skin test was performed using a small test dose of iodinated contrast medium; by the time of subsequent CTPA acquisition, this test dose had been excreted by the kidneys and did not interfere with NCCT image interpretation.

NCCT scans were performed after confirming no allergic reaction, during deep inhalation, covering the lung apices to the lung bases. Scan parameters included 120 kV tube voltage, automatic tube current regulation, and two reconstruction settings: 5 mm thickness/5 mm interval and 1 mm thickness/1 mm interval. Window settings were 1,200–1,500 Hounsfield units (HU)/−400 to −600 HU for lung windows and 350–450 HU/40–60 HU for mediastinal windows. Images were reconstructed using standard (mediastinal window) and lung algorithms (lung window).

Immediately after NCCT completion, CTPA scans were performed using the same equipment and coverage without repositioning the patient. Contrast agent (300–350 mgI/mL) was administered via a right antecubital vein cannula using a dual-syringe high-pressure injector. The dosage was calculated as 1.2–2 mL/kg body weight, injected at 3.0 mL/s, followed by a saline flush. Scans were performed during the arterial phase with a 25–35 s delay. The additional effective radiation dose from NCCT is typically 1.0–4.0 mSv, depending on whether low-dose or standard protocols are used based on patient body habitus and clinical indication.

Defining ground truth on CTPA

Data preprocessing of the images was carried out to resolve image quality and image noise differences between images and to ensure that the same specification of image features was used for computation. First, all images were resampled to a voxel size of 1×1×1 mm³ using linear interpolation; second, a Gaussian filter was used for denoising; and finally, the images were normalized using grayscale discretization.

Image data were reviewed by a radiology graduate student (holding a Master’s degree with 1.5 years of specialized radiology residency training) to confirm the presence of PE. A radiologist and a technologist systematically validated all CTPA images and categorized them. To assess the reproducibility of disease categorization, interobserver agreement between the two radiologists for classifying patients into the four disease categories (normal, inflammatory, cancer, and postoperative) was evaluated using Cohen’s kappa statistic. An imaging physician performed rigid alignment of image pairs using bone alignment, following standard clinical procedures.

From the four PE datasets, 10 pairs of NCCT and CTPA images were stratified randomly, including 10 pairs each of normal patients, inflammatory patients, cancer patients, and postoperative patients, totaling 40 patients (80 image pairs of NCCT and CTPA). Using Totalsegmentator (v2.7.0; https://totalsegmentator.com/) with the “lung vessels” and “heart” tasks, pulmonary arteries and their branches were segmented on the 80 images (28,29). Two radiologists reviewed and modified the segmentations. As it was challenging to distinguish heart chambers from the pulmonary artery, the entire heart region, excluding the aorta, was outlined.

Registration

Multiple registration algorithms are available. This study used three: generic mode in 3Dslicer (v5.6.2; https://www.slicer.org/) plugin SlicerElastix (v1.0, Queen’s PerkLab, University, Kingston, Ontario, Canada) (30,31), SyN mode in ANTsPy (v0.5.4, University of Pennsylvania, Philadelphia, PA, USA) (26), and Demons Registration in Python’s SimpleITK (v2.4.1; https://simpleitk.org/) (32-34). These algorithms are referred to as Elastix, ANTs, and Demons, respectively.

Registration parameters can affect algorithm accuracy. All three methods used officially recommended parameters. For Elastix, the parameter set was “Elastix”, and the preset was set to “generic (all)”. For ANTs, the transformation mode was set to “SyN” with GPU enabled. For Demons, the number of iterations was 200, the standard deviation was 1.0, and the smooth displacement field was “True”, with other settings default. Detailed parameters for all registration algorithms are provided in Table S1.

After registration, the segmentation labels of pulmonary arteries and branches from enhanced images were mapped to fixed images (NCCT) using the transformation matrix generated by registration. Nearest neighbor interpolation preserved binary label integrity.

Performance evaluation

To quantitatively assess registration performance, Dice coefficient, IoU, HD, and clinical physician scores were used to evaluate the overlap between manually segmented pulmonary vessel system labels on original CTPA and NCCT images, and those on registered CTPA images (new segmentations via transformation fields) and NCCT images.

The Dice coefficient measures sample similarity, ranging from 0 to 1, according to the following formula:

$D i c e = \frac{2 \times | A \cap B |}{| A | + | B |}$ [1]

where A and B are the segmentation masks of NCCT and registered CTPA, and |A∩B| is the overlapping voxel count. A Dice coefficient of 1.0 indicates complete overlap, whereas 0 means no overlap.

IoU also assesses similarity by calculating the ratio of intersection to union of two sets, ranging from 0 to 1. Its formula is as follows:

$I o U = \frac{| A \cap B |}{| A \cup B |}$ [2]

A value of 1.0 means complete overlap, whereas 0 indicates no overlap.

HD measures boundary alignment. The HD from set A to B is defined as:

$H a u s d o r f f D i s t a n c e (A, B) = \underset{a \in A}{m a x} {\underset{b \in B}{m i n} {d (a, b)}}$ [3]

where a and b are points in images A and B, and d (a, b) is the distance between them. This study used HD95, the 95^th percentile of boundary point distances, to mitigate the impact of minimal outliers. Smaller values indicate higher similarity.

These three coefficients were calculated using Python’s SimpleITK (v2.4.1) toolkit (31).

Two radiologists visually assessed registration performance using the following criteria:

Unacceptable for clinical use;
Acceptable but requiring major modifications for most structures;
Acceptable but needing major modifications for some structures;
Acceptable with minor modifications needed;
Directly acceptable for clinical use.

After independent scoring, discrepancies were resolved through discussion to reach a final score for each image pair. Inter-rater reliability for the clinical physician scores was assessed using Cohen’s kappa statistic based on the independent assessments of the two radiologists prior to consensus discussion.

Statistical analysis

This study combined quantitative and qualitative methods to evaluate the performance of different registration algorithms. All statistical analyses were performed using jamovi (version 2.6.26.0; https://www.jamovi.org/) with a significance level of α=0.05 (two-tailed). Continuous variables (Dice, IoU, HD95) were presented as mean ± standard deviation, with box plots illustrating distribution. Ordered categorical data (clinical physician scores) were described using median and interquartile range (IQR) (P25, P75), supplemented by frequency distribution tables. Wilcoxon signed-rank tests analyzed differences between the best registration method and the other two across the four indicators, calculating mean differences. The rank biserial correlation quantified effect sizes, with P<0.001.

Post-hoc power analysis was conducted using G*Power 3.1.9.7 (https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower) to evaluate the statistical power achieved with our validation sample size (n=40). Based on the observed Cohen’s d_z effect sizes from Wilcoxon signed-rank comparisons between registration methods, power curves were generated to determine the achieved power for detecting large effects (d_z >1.0) at an alpha level of 0.05 (two-tailed).

Results

Figure 2 shows an example of rigidly registered CTPA and NCCT images, segmented by TotalSegmentator and corrected by a radiologist. The green area indicates the NCCT image segment, and the yellow area shows the CTPA image segment. For better visibility, non-enhanced images are used as the background, displaying only segmentation contours. Numbers from top to bottom represent the Dice coefficient, IoU, and HD95 metrics. The three-dimensional reconstruction with smoothing is shown in Figure 3.

Figure 2 A sample from the normal imaging appearance group. The green areas are the ventricular and atrial regions finally determined by TotalSegmentator and clinicians. The yellow areas are the pulmonary artery branches of various levels finally determined. The top three metrics represent Dice coefficient, IoU, and HD95 values, respectively. HD95, 95th percentile Hausdorff distance; IoU, Intersection over Union.

Figure 3 3D reconstruction of the segmented regions in Figure 2 using the “volume rendering” function in 3DSlicer. The gray model represents the reconstructed entire heart and pulmonary vessels of various branches. 3D, three-dimensional.

Table 1 lists the four metrics after rigid registration of enhanced images. Inflammatory cases (Dice ≈0.659, IoU ≈0.495) and post-surgical cases (Dice ≈0.647, IoU ≈0.481) had slightly higher volume overlap than normal (Dice ≈0.616, IoU ≈0.449) and cancer cases (Dice ≈0.606, IoU ≈0.434). Cancer cases had slightly lower boundary alignment (HD95 ≈5.59) than inflammatory (HD95 ≈5.97), post-surgical (HD95 ≈6.02), and normal cases (HD95 ≈6.08). Overall, Dice values (0.632±0.07) were lower than IoU values (0.465±0.07). The median clinical scores for all four conditions were below 3.

Table 1

The testing results of different diseases after rigid registration

Disease category	Dice	IoU	HD95	Physician score
Total	0.632±0.07	0.465±0.07	5.91±2.34	1 (1, 2)
Normal	0.616±0.08	0.449±0.08	6.08±2.27	1 (1, 1)
Inflammation	0.659±0.07	0.495±0.07	5.97±3.56	2 (1, 2)
Cancer	0.606±0.06	0.434±0.06	5.59±1.68	1 (1, 1)
Postoperative	0.647±0.06	0.481±0.07	6.02±1.73	1 (1, 1)

Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95^th percentile Hausdorff distance; IoU, Intersection over Union.

Figure 4 displays an enhanced image after different registration methods, with NCCT as the background. Red indicates Elastix, blue ANTs, yellow Demons, and green NCCT segmentation. After rigid registration, Dice was 0.623, IoU 0.452, and HD95 5.451. After Elastix, ANTs, and Demons registration, Dice coefficients were 0.776, 0.689, and 0.7; IoU values were 0.634, 0.526, and 0.538; and HD95 values were 4.33, 8.31, and 5.70.

Figure 4 The same sample as in Figure 2. The green areas indicate the gold standard determined by TotalSegmentator and clinicians on non-contrast-enhanced images. The red, yellow, and blue areas represent the registration results of Elastix, ANTs, and Demons, respectively.

Table 2 shows the overall situation after different registration methods on rigidly registered images. Post-surgical cases had the highest volume overlap (Dice ≈0.767, IoU ≈0.627), followed by inflammatory (Dice ≈0.749, IoU ≈0.602) and cancer cases (Dice ≈0.747, IoU ≈0.601), with normal cases the lowest (Dice ≈0.727, IoU ≈0.576). After registration, post-surgical cases had the best boundary alignment (HD95 ≈5.9), whereas cancer cases had the highest HD95 (≈5.97). Clinical scores for all four conditions improved, with medians reaching 4, and post-surgical cases had a higher P25 of 4.

Table 2

The testing results of different diseases after different registration

Disease category	Dice	IoU	HD95	Physician score
Normal	0.727±0.072	0.576±0.089	5.93±2.22	4 (3, 4)
Inflammation	0.749±0.064	0.602±0.080	5.92±2.27	4 (3, 4)
Cancer	0.747±0.068	0.601±0.087	5.97±2.33	4 (3, 4)
Postoperative	0.767±0.068	0.627±0.088	5.9±2.32	4 (4, 4)

Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95^th percentile Hausdorff distance; IoU, Intersection over Union.

Table 3 presents the results of different registration methods. Compared to rigid registration, all three methods improved image registration. Elastix showed the best volume overlap (Dice ≈0.819, IoU ≈0.695) and boundary alignment (HD95 ≈3.42), achieving the highest clinical score [4 (IQR: 4, 5)]. ANTs had better volume overlap (Dice ≈0.732, IoU ≈0.578) than Demons but slightly worse boundary alignment (HD95 ≈7.89) than Demons (HD95 ≈6.49). ANTs and Demons had the same interquartile ranges for clinical scores, with Demons having a median of 4 and ANTs a median of 3.

Table 3

The testing results of different registration

Registration method	Dice	IoU	HD95	Physician score
Rigid	0.632±0.070	0.465±0.073	5.91±2.34	1 (1, 2)
Elastix	0.819±0.030	0.695±0.043	3.42±0.57	4 (4, 5)
ANTs	0.732±0.032	0.578±0.040	7.89±1.29	3 (3, 4)
Demons	0.692±0.062	0.532±0.071	6.49±1.68	4 (3, 4)

Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95^th percentile Hausdorff distance; IoU, Intersection over Union.

Violin plots, combining box plots and density distribution, illustrate data concentration and dispersion, showing the stability of registration effects across different methods. Figures 5-8 indicate that Elastix had median Dice and IoU values of 0.823 and 0.700, ANTs 0.732 and 0.577, and Demons 0.698 and 0.536. For HD95, Elastix’s median was 3.36, compared to 7.76 and 5.89 for the other methods. In clinical score box plots, Elastix’s median was 4.0, while the others were 3.0 and 3.5. In Dice and IoU violin plots, Elastix and Demons showed no outliers, whereas ANTs had some. In the HD95 violin plot, Elastix had one outlier at 4.74, ANTs one, and Demons four.

Figure 5 Violin plot of Dice coefficient values for different registration methods.

Figure 6 Violin plot of IoU values for different registration methods. IoU, Intersection over Union.

Figure 7 Violin plot of HD95 values for different registration methods. HD95, 95th percentile Hausdorff distance.

Figure 8 Violin plot of clinical score values for different registration methods.

Table 4 shows the Wilcoxon signed-rank test results. Since the data were non-normally distributed, paired Wilcoxon signed-rank tests were used. For Dice and IoU, Elastix vs. ANTs and Demons had W=820 (P<0.001) and rank biserial correlation =1. For HD95, Elastix vs. ANTs and Demons had W=1 and 0 (P<0.001), with rank biserial correlation =−0.998 and −1, indicating strong effects.

Table 4

The testing results of Wilcoxon signed rank test

Metric	Comparison	Statistic	P value	Mean difference	SE difference	Rank biserial effect size
Dice (Elastix)	Dice (ANTs)	820	<0.001	0.088	0.00351	1
Dice (Elastix)	Dice (Demons)	820	<0.001	0.1255	0.00656	1
IoU (Elastix)	IoU (ANTs)	820	<0.001	0.1185	0.00469	1
IoU (Elastix)	IoU (Demons)	820	<0.001	0.162	0.00705	1
HD95 (Elastix)	HD95 (ANTs)	1	<0.001	−4.4425	0.21259	−0.998
HD95 (Elastix)	HD95 (Demons)	0	<0.001	−2.77	0.23603	−1

HD95, 95^th percentile Hausdorff distance; IoU, Intersection over Union; SE, standard error.

Interobserver agreement analysis demonstrated high reliability for both disease categorization and clinical scoring. For classification of patients into the four disease categories, Cohen’s kappa was 0.927 [95% confidence interval (CI): 0.891–0.963, P<0.001], indicating almost perfect agreement between the two radiologists (Table 5).

Table 5

Interobserver agreement for disease category classification between radiologists

N	Raters	Kappa	Z	P value	95% CI
N	Raters	Kappa	Z	P value	Lower	Upper
324	2	0.927	25.6	<0.001	0.891	0.963

CI, confidence interval.

For the clinical physician scoring of registration quality, inter-rater reliability was also high, with a Cohen’s kappa of 0.867 (95% CI: 0.806–0.930, P<0.001) (Table 6). Following independent scoring, discrepancies between raters were resolved through discussion to reach a final consensus score for each image pair.

Table 6

Interobserver agreement for clinical physician scoring between radiologists

N	Raters	Kappa	Z	P	95% CI
N	Raters	Kappa	Z	P	Lower	Upper
40	2	0.867	20.4	<0.001	0.806	0.93

Clinical scores were obtained for 4 registration methods per patient (total 160 assessments). CI, confidence interval.

Post-hoc power analysis was conducted to validate the statistical adequacy of our validation sample size (n=40). Figure 9 illustrates the power curve for dependent means comparison (matched pairs) with a total sample size of 40, two-tailed α=0.05. The curve demonstrates that statistical power increases rapidly with effect size, reaching and maintaining 1.00 (100% power) when Cohen’s d_z exceeds approximately 0.9–1.0.

Figure 9 Power curve for dependent means comparison (matched pairs) demonstrating achieved statistical power with the validation sample size (n=40) at α=0.05 (two-tailed).

As detailed in Table 7, our observed Cohen’s d_z values for comparisons between Elastix and other registration methods ranged from 1.59 to 4.83 across all evaluation metrics. Specifically, for Dice coefficient comparisons, effect sizes were 3.29 (Elastix vs. Rigid), 3.93 (Elastix vs. ANTs), and 3.10 (Elastix vs. Demons). For IoU, values ranged from 3.69 to 3.95, and for clinical scores, values ranged from 1.59 to 4.83.

Table 7

Post-hoc power analysis results (Cohen’s d_z values)

Metric	Elastix/Rigid	Elastix/ANTs	Elastix/Demons
Dice	3.29	3.93	3.1
IoU	3.87	3.95	3.69
HD95	−1.16	−3.33	−2.09
Clinic	4.83	2.21	1.59

Rows represent metrics, columns represent comparisons against baseline methods. HD95, 95^th percentile Hausdorff distance; IoU, Intersection over Union.

All observed effect sizes substantially exceeded the threshold of d_z =1.0, falling within the plateau region of the power curve where achieved power equals 1.00. Even for the smallest observed effect size (clinical score comparison between Elastix and Demons, d_z =1.59), the statistical power exceeded 0.99. These results confirm that our sample of 40 validation cases provides sufficient statistical power to detect meaningful differences between registration algorithms with high confidence.

Table 8 lists the four metrics for different cases after Elastix registration. Post-surgical cases had the best volume overlap (Dice ≈0.837, IoU ≈0.721) and boundary alignment (HD95 ≈3.3). Cancer cases had slightly better volume overlap (Dice ≈0.821, IoU ≈0.698) and boundary alignment (HD95 ≈3.34) than inflammatory and normal cases. Inflammatory cases had slightly better volume overlap (Dice ≈0.811, IoU ≈0.683) than normal cases (Dice ≈0.807, IoU ≈0.677), with both having HD95 ≈3.51. All four conditions had clinical score medians ≥4, with post-surgical cases reaching 5 and interquartile ranges ≤1.

Table 8

The testing results of Elastix

Disease category	Dice	IoU	HD95	Physician score
Total	0.819±0.030	0.695±0.043	3.42±0.57	4 (4, 5)
Normal	0.807±0.032	0.677±0.045	3.51±0.53	4 (4, 5)
Inflammation	0.811±0.029	0.683±0.042	3.51±0.51	4 (4, 4)
Cancer	0.821±0.028	0.698±0.040	3.34±0.79	4 (4, 5)
Postoperative	0.837±0.027	0.721±0.039	3.3±0.46	5 (4, 5)

Data are presented as mean ± standard deviation or median (interquartile range). HD95, 95^th percentile Hausdorff distance; IoU, Intersection over Union.

Interobserver agreement for disease classification between the two radiologists was excellent (κ=0.927, 95% CI: 0.891–0.963), indicating the high reproducibility of the disease classification system. Inter-rater reliability for clinical scoring was also high (κ=0.867, 95% CI: 0.806–0.93), suggesting that the clinical scoring system was robust and consistent across raters (Tables 5,6).

Discussion

This study explored the performance of different registration methods in registering pulmonary arteries and their branches in pre- and post-contrast chest CT images. Previous studies focused on lung, heart, and trachea registration but lacked specialized research on pulmonary vascular systems (26,35,36).

Elastix, ANTs, and Demons all improved registration over rigid registration. Elastix achieved the highest Dice coefficient (0.819), IoU (0.695), and HD95 (3.42), with significant improvements in volume overlap and boundary alignment. Paired t-tests showed that Elastix outperformed the other two methods. Elastix demonstrated stable performance across different cases: Dice coefficients were approximately 0.807 for normal cases, 0.811 for inflammatory, 0.821 for cancer, and 0.837 for post-surgical. Clinical scores for Elastix-registered images had a median of 4, indicating minimal or no modifications were needed for clinical application. The inclusion of heart chambers might have interfered with registration results, suggesting actual pulmonary artery and branch registration accuracy could be higher.

In volume overlap analysis, Dice values were generally higher than IoU values. According to their definitions, when |A∩B| >0:

$\frac{D i c e}{I o U} = \frac{2 | A \cap B | | A \cup B |}{(| A | + | B |) | A \cap B |} = \frac{2 | A \cup B |}{| A | + | B |} = \frac{2 | A \cup B |}{| A \cup B | + | A \cap B |}$ [4]

for registered images where |A∩B| ≠0 and A ≠ B, |A∩B| < |A∪B|, so Dice/IoU ∈ [1,2). Almost all registration methods, including rigid registration, performed better in post-surgical and inflammatory cases than in normal and cancer cases. This may be due to smaller vascular volumes in the latter two groups during manual segmentation. Post-surgical cases had partially removed vessels, and inflammatory cases had blurred lung-vessel boundaries, leading to potential omission of pulmonary vessel parts by clinicians. Compared to cancer and normal cases, post-surgical and inflammatory cases had smaller vascular volumes requiring transformation field conversion. Moreover, post-surgical and inflammatory cases may have more feature points, such as surgical metal shadows and inflammatory lung consolidation, enhancing registration accuracy.

Frequency distribution showed that Demons’ distribution was similar to rigid registration, whereas Elastix and ANTs’ distributions were more concentrated. This suggests that Demons’ registration is more dependent on original image quality, whereas Elastix and ANTs are more stable across various scenarios. Rigid registration, involving manual boundary alignment by clinicians, may explain its superior HD95 performance over ANTs and Demons.

On a personal notebook [GeForce RTX 2060 (6GB), NVIDIA, Santa Clara, CA, USA], Elastix registration took about 1 min, Demons about 2 min 30 s, and ANTs about 3 min. Pre-cropping lung images to focus on the lungs can significantly improve registration speed and accuracy. Future large-cohort experiments using registered transformation fields to generate gold standards on NCCT images should involve pre-cropping non-enhanced and enhanced images to the same volume before rigid registration to enhance efficiency and accuracy.

This study might not entirely showcase each algorithm’s optimal performance. Although we tested multiple parameter combinations on a subset of images and ultimately chose the official recommended parameters, parameter selection can still influence algorithm performance, necessitating further research for comprehensive evaluation.

Limitations

This study selected three mature and representative registration methods: Elastix, ANTs, and Demons. However, many registration algorithms are available, so our conclusions may not identify the absolute best algorithm.

This multi-center study included data from three hospitals specializing in tumor, respiratory, and orthopedic diseases. Despite this, the dataset was limited, with a high proportion of cancer patients. To enhance representativeness, cases were divided into four groups with stratified sampling, but this approach may not cover all disease types affecting pulmonary images. Second, although our post-hoc power analysis confirms adequate statistical power for detecting differences between registration methods within the validation cohort (n=40), we acknowledge that the overall sample size remains relatively modest for developing robust AI-based detection systems. The current study serves as a preliminary methodological evaluation, and we are committed to conducting large-scale multi-center validation studies with substantially expanded patient cohorts to confirm these findings and enhance generalizability.

The gold standard of pulmonary arteries and their branches was established via TotalSegmentator segmentation followed by clinical physician revision. However, terminal small vessels may be overlooked due to their shape and CT HU values similar to surrounding tissues. Thrombi, as part of the pulmonary arteries, can theoretically be registered via transformation fields if vascular registration is accurate. However, there is currently no direct evidence proving the alignment between registered gold standards and actual PE locations, requiring further validation in future studies.

Conclusions

This study serves as a preliminary experiment aimed at developing an AI model for detecting PE in NCCT images. Among the evaluated registration methods, Elastix performed best in Dice coefficient, IoU, and HD, with clinical physician scores between 4 and 5, indicating “clinically acceptable with minimal or no modifications needed”. Thus, Elastix-generated registration fields can accurately transfer thrombus gold standards from enhanced to NCCT images. However, the reliability and validity of thrombus gold standards on NCCT require additional evaluation.

Acknowledgments

None.

Footnote

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2373/dss

Funding: This study was supported by a grant from the Tianjin Municipal Education Commission (No. 2024ZXZD004).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-aw-2373/coif). All authors declared that this study was supported by a grant from the Tianjin Municipal Education Commission (No. 2024ZXZD004). The authors have no other conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. The study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. This retrospective study was approved by the Ethics Committee of Tianjin Haihe Hospital (approval No. 2025HHKT-001, dated March 17, 2025), the lead institution of this multi-center collaborative research. All participating hospitals were informed and agreed to the study. Informed consent was waived due to the retrospective nature of the study using fully anonymized data.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Raskob GE, Angchaisuksiri P, Blanco AN, Buller H, Gallus A, Hunt BJ, Hylek EM, Kakkar A, Konstantinides SV, McCumber M, Ozaki Y, Wendelboe A, Weitz JIISTH Steering Committee for World Thrombosis Day. Thrombosis: a major contributor to global disease burden. Arterioscler Thromb Vasc Biol 2014;34:2363-71. [Crossref] [PubMed]
Freund Y, Cohen-Aubart F, Bloom B. Acute Pulmonary Embolism: A Review. JAMA 2022;328:1336-45. [Crossref] [PubMed]
Duffett L, Castellucci LA, Forgie MA. Pulmonary embolism: update on management and controversies. BMJ 2020;370:m2177. [Crossref] [PubMed]
Martinez Licha CR, McCurdy CM, Maldonado SM, Lee LS. Current Management of Acute Pulmonary Embolism. Ann Thorac Cardiovasc Surg 2020;26:65-71. [Crossref] [PubMed]
Maughan BC, Jarman AF, Redmond A, Geersing GJ, Kline JA. Pulmonary embolism. BMJ 2024;384:e071662. [Crossref] [PubMed]
Trott T, Bowman J. Diagnosis and Management of Pulmonary Embolism. Emerg Med Clin North Am 2022;40:565-81. [Crossref] [PubMed]
van Maanen R, Trinks-Roerdink EM, Rutten FH, Geersing GJ. A systematic review and meta-analysis of diagnostic delay in pulmonary embolism. Eur J Gen Pract 2022;28:165-72. [Crossref] [PubMed]
Cohen AT, Agnelli G, Anderson FA, Arcelus JI, Bergqvist D, Brecht JG, Greer IA, Heit JA, Hutchinson JL, Kakkar AK, Mottier D, Oger E, Samama MM, Spannagl MVTE Impact Assessment Group in Europe (VITAE). Venous thromboembolism (VTE) in Europe. The number of VTE events and associated morbidity and mortality. Thromb Haemost 2007;98:756-64. [Crossref] [PubMed]
Konstantinides SV, Meyer G, Becattini C, Bueno H, Geersing GJ, Harjola VP, et al. 2019 ESC Guidelines for the diagnosis and management of acute pulmonary embolism developed in collaboration with the European Respiratory Society (ERS). Eur Heart J 2020;41:543-603. [Crossref] [PubMed]
Doganay S, Oguz AK, Ergun I. Increased risk of contrast-induced acute kidney injury in patients with pulmonary thromboembolism. Ren Fail 2015;37:1138-44. [Crossref] [PubMed]
Kline JA, Garrett JS, Sarmiento EJ, Strachan CC, Courtney DM. Over-Testing for Suspected Pulmonary Embolism in American Emergency Departments: The Continuing Epidemic. Circ Cardiovasc Qual Outcomes 2020;13:e005753. [Crossref] [PubMed]
Perera M, Aggarwal L, Scott IA, Cocks N. Underuse of risk assessment and overuse of computed tomography pulmonary angiography in patients with suspected pulmonary thromboembolism. Intern Med J 2017;47:1154-60. [Crossref] [PubMed]
Chean LN, Tan C, Hiskens MI, Rattenbury M, Sundaram P, Perara J, Smith K, Kumar P. Overuse of Computed Tomography Pulmonary Angiography and Low Utilization of Clinical Prediction Rules in Suspected Pulmonary Embolism Patients at a Regional Australian Hospital. Healthcare (Basel) 2024;12:278. [Crossref] [PubMed]
Raji H. JavadMoosavi SA, Dastoorpoor M, Mohamadipour Z, Mousavi Ghanavati SP. Overuse and underuse of pulmonary CT angiography in patients with suspected pulmonary embolism. Med J Islam Repub Iran 2018;32:3. [Crossref] [PubMed]
Ehsanbakhsh A, Hatami F, Valizadeh N, Khorashadizadeh N, Norouzirad F. Evaluating the Performance of Unenhanced Computed Tomography in the Diagnosis of Pulmonary Embolism. J Tehran Heart Cent 2021;16:156-61. [PubMed]
Chien CH, Shih FC, Chen CY, Chen CH, Wu WL, Mak CW. Unenhanced multidetector computed tomography findings in acute central pulmonary embolism. BMC Med Imaging 2019;19:65. [Crossref] [PubMed]
Tatco VR, Piedad HH. The validity of hyperdense lumen sign in non-contrast chest CT scans in the detection of pulmonary thromboembolism. Int J Cardiovasc Imaging 2011;27:433-40. [Crossref] [PubMed]
Guo R, Deng M, Xi L, Zhang S, Xu W, Liu M. Chest non contrasted computed tomography in detecting acute pulmonary thromboembolism: A single center retrospective study. Exp Ther Med 2024;28:304. [Crossref] [PubMed]
Hagen F, Vorberg L, Thamm F, Ditt H, Maier A, Brendel JM, Ghibes P, Bongers MN, Krumm P, Nikolaou K, Horger M. Improved detection of small pulmonary embolism on unenhanced computed tomography using an artificial intelligence-based algorithm - a single centre retrospective study. Int J Cardiovasc Imaging 2024;40:2293-304. [Crossref] [PubMed]
Gore JC. Artificial intelligence in medical imaging. Magn Reson Imaging 2020;68:A1-4. [Crossref] [PubMed]
Hosny A, Parmar C, Quackenbush J, Schwartz LH, Aerts HJWL. Artificial intelligence in radiology. Nat Rev Cancer 2018;18:500-10. [Crossref] [PubMed]
Li M, Jiang Y, Zhang Y, Zhu H. Medical image analysis using deep learning algorithms. Front Public Health 2023;11:1273253. [Crossref] [PubMed]
Thrall JH, Li X, Li Q, Cruz C, Do S, Dreyer K, Brink J. Artificial Intelligence and Machine Learning in Radiology: Opportunities, Challenges, Pitfalls, and Criteria for Success. J Am Coll Radiol 2018;15:504-8. [Crossref] [PubMed]
Yang R, Yu Y. Artificial Convolutional Neural Network in Object Detection and Semantic Segmentation for Medical Imaging Analysis. Front Oncol 2021;11:638182. [Crossref] [PubMed]
Keszei AP, Berkels B, Deserno TM. Survey of Non-Rigid Registration Tools in Medicine. J Digit Imaging 2017;30:102-16. [Crossref] [PubMed]
Murphy K, van Ginneken B, Reinhardt JM, Kabus S, Ding K, Deng X, et al. Evaluation of registration methods on thoracic CT: the EMPIRE10 challenge. IEEE Trans Med Imaging 2011;30:1901-20. [Crossref] [PubMed]
Zou J, Gao B, Song Y, Qin J. A review of deep learning-based deformable medical image registration. Front Oncol 2022;12:1047215. [Crossref] [PubMed]
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nat Methods 2021;18:203-11. [Crossref] [PubMed]
Wasserthal J, Breit HC, Meyer MT, Pradella M, Hinck D, Sauter AW, Heye T, Boll DT, Cyriac J, Yang S, Bach M, Segeroth M. TotalSegmentator: Robust Segmentation of 104 Anatomic Structures in CT Images. Radiol Artif Intell 2023;5:e230024. [Crossref] [PubMed]
Klein S, Staring M, Murphy K, Viergever MA, Pluim JP. elastix: a toolbox for intensity-based medical image registration. IEEE Trans Med Imaging 2010;29:196-205. [Crossref] [PubMed]
Shamonin DP, Bron EE, Lelieveldt BP, Smits M, Klein S, Staring MAlzheimer's Disease Neuroimaging Initiative. Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease. Front Neuroinform 2014;7:50. [PubMed]
Beare R, Lowekamp B, Yaniv Z. Image Segmentation, Registration and Characterization in R with SimpleITK. J Stat Softw 2018;86:8. [Crossref] [PubMed]
Lowekamp BC, Chen DT, Ibáñez L, Blezek D. The Design of SimpleITK. Front Neuroinform 2013;7:45. [Crossref] [PubMed]
Yaniv Z, Lowekamp BC, Johnson HJ, Beare R. SimpleITK Image-Analysis Notebooks: a Collaborative Environment for Education and Reproducible Research. J Digit Imaging 2018;31:290-303. [Crossref] [PubMed]
Kadoya N, Fujita Y, Katsuta Y, Dobashi S, Takeda K, Kishi K, Kubozono M, Umezawa R, Sugawara T, Matsushita H, Jingu K. Evaluation of various deformable image registration algorithms for thoracic images. J Radiat Res 2014;55:175-82. [Crossref] [PubMed]
Nielsen MS, Østergaard LR, Carl J. A new method to validate thoracic CT-CT deformable image registration using auto-segmented 3D anatomical landmarks. Acta Oncol 2015;54:1515-20. [Crossref] [PubMed]

Cite this article as: Gong X, Jia B, Zhang Y, Tan B, He A, Ye Z, Xing Z. Transferring thrombus labels from CTPA to non-contrast CT via accurate pulmonary vessel registration: a critical step toward AI-based pulmonary embolism detection. Quant Imaging Med Surg 2026;16(5):365. doi: 10.21037/qims-2025-aw-2373

Transferring thrombus labels from CTPA to non-contrast CT via accurate pulmonary vessel registration: a critical step toward AI-based pulmonary embolism detection

Introduction

Methods

Study design and data inclusion

NCCT and CTPA acquisition parameters

Defining ground truth on CTPA

Registration

Performance evaluation

Statistical analysis

Results

Table 1

Table 2

Table 3

Table 4

Table 5

Table 6

Table 7

Table 8

Discussion

Limitations

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share