Synthesizing cone-beam projections in radiotherapy using deep learning network for patients with head and neck cancer

Yuhan Fan; Peng Huang; Jiawen Shang; Zhixing Chang; Zhihui Hu; Ke Zhang; Xin Xie; Zhiqiang Liu; Hui Yan

doi:10.21037/qims-2025-1-2731

Original Article

Synthesizing cone-beam projections in radiotherapy using deep learning network for patients with head and neck cancer

Yuhan Fan^1#, Peng Huang^1#, Jiawen Shang^2#, Zhixing Chang¹, Zhihui Hu¹, Ke Zhang¹, Xin Xie³, Zhiqiang Liu¹, Hui Yan¹

¹Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; ²State Key Laboratory of Cardiovascular Disease, Fuwai Hospital, National Center for Cardiovascular Disease, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; ³Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, Fuzhou, China

Contributions: (I) Conception and design: Y Fan, P Huang, J Shang; (II) Administrative support: X Xie, Z Liu, H Yan; (III) Provision of study materials or patients: Z Hu, K Zhang; (IV) Collection and assembly of data: Y Fan, P Huang, J Shang; (V) Data analysis and interpretation: Y Fan, P Huang, J Shang; (VI) Manuscript writing: All authors; (VII) Final approval of manuscript: All authors.

^#These authors contributed equally to this work as co-first authors.

Correspondence to: Xin Xie, PhD. Department of Radiation Oncology, Clinical Oncology School of Fujian Medical University, Fujian Cancer Hospital, No. 456 Fuma Rd., Fuzhou 350011, China. Email: xiexin_crayon@163.com; Zhiqiang Liu, PhD; Hui Yan, PhD. Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, No. 17 Panjiayuannanli, Beijing 100021, China. Email: zhiqiang.liu@cicams.ac.cn; hui.yan@cicams.ac.cn.

Background: Initially, cone-beam (CB) projection was developed to serve as a two-dimensional (2D) radiograph for fast target localization in radiotherapy. Later, it was gradually replaced by three-dimensional (3D) cone-beam computed tomography (CBCT), which is reconstructed from multiple CB projections. Since the size of CB projections is relatively large, they are usually discarded to conserve clinical storage. To effectively regenerate these projections, a deep learning (DL) method was developed to synthesize CB projections from CBCT.

Methods: CB projection images from 50 patients under image-guided radiotherapy for head and neck cancer was collected. First, the digitally reconstructed radiograph (DRR) was created via a ray-tracing based algorithm. Next, a DL network was built to learn the pixel-to-pixel correspondence between the DRRs and CB projections in the training set. Finally, the CB projections were synthesized from DRRs in the testing set by a DL model. Three DL networks, Attention-UNet, Residual AutoEncoder, and Pix2Pix, were examined. Ablation studies on the effects of scatter correction, CBCT resolution, and training sample size were conducted. Non-DL and DL methods were compared, and a clinically relevant downstream task was conducted. Three similarity metrics, peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), and video quality metric (VQM), were used to evaluate model performance.

Results: Among three DL models, Attention-UNet achieved the best performance in terms of PSNR (29.71), SSIM (0.97), and VQM (0.20). The other two models showed comparable performance with PSNR, SSIM, and VQM values of 26.55, 0.96, and 0.27, respectively, for Residual AutoEncoder, and of 27.22, 0.94, and 0.25, respectively, for Pix2Pix. With scatter correction, the PSNR, SSIM, and VQM for Attention-UNet improved by 21.3%, 2.1%, and 39.3%, respectively. Under the finest voxel size (0.5×0.5×2.0 mm³), Attention-UNet achieved the highest PSNR (29.71±3.62), SSIM (0.97±0.008), and lowest VQM (0.20±0.091). The scatter correction and higher CBCT resolution could effectively improve the prediction accuracy of these DL models.

Conclusions: The DL-based method can effectively synthesize CB projections from CBCT for patients with head and neck cancer. It may thus serve as a means to saving storage space for clinical CB projections.

Keywords: Radiotherapy; cone-beam computed tomography (CBCT); deep learning (DL); digitally reconstructed radiography

Submitted Dec 17, 2025. Accepted for publication Apr 20, 2026. Published online Jun 13, 2026.

doi: 10.21037/qims-2025-1-2731

Introduction

Cone-beam computed tomography (CBCT) is a routine imaging procedure for positioning and the target localization of cancer patient during radiotherapy (1-3). Traditional computed tomography (CT) produces a slice image in a single rotation with a fan-shaped X-ray beam, while CBCT produces a volume image in a single rotation with a cone-shaped X-ray beam. Compared with CT, CBCT provides several benefits for fast patient imaging, including less exposure, quick scanning, and convenient imaging geometry. Besides its applications in radiotherapy, CBCT is widely used for three-dimensional (3D) visualization and implant planning in dentistry (4-6), diagnosis and biopsy guiding in thoracic medicine (7,8), and defect detection in industrial processes (9,10).

In radiotherapy, a single or pair of orthogonal 2D images [cone-beam (CB) projections] can be acquired with an electronic portal imaging device for on-board patient positioning. With advanced imaging techniques, a series of CB projections can be acquired in a minute and used to reconstruct 3D CBCT. The invention of CBCT provided immediate benefits in the field of target localization and object tracking, especially in radiotherapy (11-13). However, with the widespread application of CBCT, storage has become an issue, especially for the raw CB projections. To maintain normal clinical operation, these projections need to be routinely backed up, but they are frequently discarded due to insufficient storage space.

Similar to the most of medical images, CBCT data are compressed and then transferred to hospital information systems. The compression algorithms for this process are either lossless or lossy algorithms depending on the requirement of applications (14-17). The selection of lossy or lossless algorithms for medical image compression has long been discussed. Several professional organizations have issued guidelines and standards for the use of compression in medical imaging applications. It is generally acknowledged that lossy compression algorithms can be used for a given image modality with no significant compromise to the clinical objective (18).

As stated above, CBCT data are backed up routinely, but the related CB projections are typically discarded due to their lower clinical value relative to their larger size. For example, a CB projections containing 360–660 images (512×512 or 1,024×768 in 32-bit float) may have a corresponding CBCT dataset containing 50–100 images (256×256 or 512×512 in 16-bit integer). The size of CB projections is at least 10 times larger than that of CBCT data (19-21). Given the detailed anatomical information already contained in CBCT data, storing CB projections has lower priority. However, with the increasing demand of medical research in radiotherapy, retrospective studies are increasingly demonstratig the value of CB projections for tasks such as retrospective geometric verification (22), sparse-view/low-dose CT (23), and synthetic data augmentation (24). Therefore, there is need to develop procedures that can effectively compress and restore CB projections.

A number of solutions for the high-performance compression of medical images have been proposed, including static image compression (25), dynamic image compression (26), and deep learning (DL)-based image compression algorithms (27,28). However, few studies have investigated the feasibility of restoring CB projections from 3D CBCT, which would effectively reduce the need to store CB projections. In our study, DL networks were employed to synthesize CB projections from CBCT. The remainder of this paper is organized as follows. In the “Methods” section, the reconstruction processes of CBCT and digitally reconstructed radiographs (DRRs) are briefly introduced. Subsequently, the architectures of three DL networks for projection synthesis are described. Finally, the strengths and limitations of the proposed method are outlined, and the trajectory of future work is discussed. We conducted this study to develop a solution to this issue but, considering the complexity of patient motion during imaging, only examined cases of head and neck cancer, as these involve minimal target motion. We present this article in accordance with the TRIPOD+AI reporting checklist (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2731/rc).

Methods

Data

The proposed method was evaluated on clinical dataset consisting of CB projection images collected from 50 patients who received radical treatment under image-guided radiotherapy for head and neck cancer (age range, 34–60 years) at the Cancer Hospital of the Chinese Academy of Medical Sciences. All images were acquired under the same clinical imaging protocol. For each patient, about 10–20 CBCT scans were acquired over a period of 3–5 weeks. In each scan, 300–600 CB projections were acquired with an even angle of spacing 0.67°. The CB projections were obtained by an on-board imaging system of a linear accelerator (Varian Medical Systems, Palo Alto, CA, USA) under the following acquisition parameters: 100 kV 10 mAs. The source-to-isocenter distance was 1,000 mm, while the isocenter-to-imager distance is 500 mm. The dimensions of projection image were 512 by 384 with a pixel size 0.77 mm at the isocenter plane.

This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments and approved by the Ethics Committee of the National Cancer Center/Cancer Hospital of the Chinese Academy of Medical Sciences and Peking Union Medical College (approval No. NCC2018-016). The requirement for informed consent was waived due to the retrospective nature of the analysis.

The clinical CBCT data were reconstructed according to the image resolution specified by the given clinical protocol. To mimic CBCT in different image resolutions, an open-source reconstruction toolkit (version 2.7.0; github.com/RTKConsortium/RTK) was employed to reconstruct CBCT data from CB projection images. The CBCT data were reconstructed based on the Feldkamp-Davis-Kress (FDK) algorithm, with Parker weighting being incorporated for short-scan redundancy handling, frequency-domain apodization with a Hann window, and detector displacement correction. The voxel spacing and volume dimensions were specified in the reconstruction parameters. The geometry of CB projection was obtained from the vendor-specific configuration file stored in the folder of each patient scan.

DRRs

A DRR is a 2D image computed from a 3D volumetric image through the simulation of the physical interactions produced by X-rays traversing the imaging space (29). Leveraging ray-tracing algorithms, DRRs replicate clinical X-ray geometry (source-to-detector distance, gantry angle, and detector pixel resolution) to facilitate calculation of the cumulative attenuation of virtual X-ray beams through the volume. DRRs are important in image registration with real X-ray projection in the target verification of radiotherapy. Modern implementations include GPU-accelerated ray casting and differentiable rendering to optimize end-to-end workflows, enabling seamless integration with neural networks for enhanced prediction accuracy.

In our study, DRRs were generated from 3D CBCT via the fast algorithm of Joseph’s forward projection (30). The physical processes that generate a projection were divided into primary and secondary effects (29). The primary process consisted of assessing the attenuation of primary photons in the object. The secondary process accounted for modeling the corrected factors, such as scatter. Scatter consists of photons that are deflected by the object but still reach the detector. The entire process was implemented with open-source, cross-platform, reconstruction toolkit software for fast CBCT and DRR reconstruction (version 2.7.0; github.com/RTKConsortium/RTK).

DL models

Three DL models, Attention UNet, Residual AutoEncoder, and Pix2Pix, were investigated in this study. The Attention UNet is a convolutional neural network initially designed for biomedical image segmentation and consists of a symmetric encoder-decoder pathway with skip connections bridging corresponding hierarchical layers (31). The Residual AutoEncoder is a deep neural network initially developed for unsupervised feature learning and comprises two symmetric components: an encoder and a decoder (32). Pix2Pix was specifically designed for image-to-image translation tasks and employs a generative adversarial network (GAN) architecture for image prediction applications (33).

All models use a custom data loader to generate realistic projections from DRRs. The training includes mean squared error (MSE) loss for pixel-wise error measurement, while Pix2Pix integrates the VGG19 network to compute perceptual loss, ensuring both perceptual and pixel-level quality (33). During multiscale training, logs were recorded, and generated images and model weights were saved periodically to ensure stable and traceable training. The learning rate was set to 1e−5 for Attention-UNet and to 1e−4 for Residual AutoEncoder and Pix2Pix. The models were trained with a batch size of 10 for 100 epochs. Data augmentation was applied during training, in which identical random transformations were applied to each input target pair. For Residual AutoEncoder, Gaussian noise (factor =0.1) was added to the input images. The entire workflow applied the Adam optimizer for parameter updates, with continuous comparison between high-resolution and generated images being used to ensure progressive optimization and high-quality results. All networks were trained on a GeForce RTX 4080 SUPER (Nvidia, Santa Clara, CA, USA).

Evaluation

The workflow of image synthesis is shown in Figure 1. CBCT data were first reconstructed in different resolutions via the FDK algorithm. The DRRs were then created with Joseph’s forward projection algorithm, and the effect of scatter was corrected. Scatter correction was applied to the DRRs in postprocessing. A scatter-only projection was first estimated via Monte Carlo simulation. The estimated scatter was then mapped to the real imaging condition with the measured flood projection and output factor and then added to the raw DRRs (29). All DRRs and CB projections were normalized to the range of 0–1 through use of the global minimum and maximum pixel intensities computed over the entire dataset. The corrected DRRs and corresponding original CB projections constituted the data set for DL network training and evaluation. The patient data were randomly divided at the patient level into three independent subsets: training (30 patients), validation (10 patients), and testing (10 patients) (random seed =42). Each patient was assigned to only one subset. Cross-validation was not performed due to the computational cost and fold-dependent tuning effect. In this study, raw DRRs, histogram-matched DRRs, and Attention-UNet results, all based on DRRs with scatter correction, were evaluated against the ground truth to compare non-DL and DL models. Subsequently, three DL models (Attention-UNet, Residual AutoEncoder, and Pix2Pix) were compared. Finally, ablation studies were conducted to investigate the effects of scatter correction, CBCT resolution, and sample sizes. To assess the significance of performance differences across the DL models, two-tailed paired t-tests were conducted for pairwise comparisons of the peak signal-to-noise ratio (PSNR) (34), structural similarity index measure (SSIM) (35), and video quality metric (VQM; with lower VQM indicating better quality) between the DL models. To evaluate a clinically relevant downstream task, CBCT images were reconstructed from the synthesized projections of the three models and compared with those reconstructed from the original CB projections. The same reconstruction workflow was applied to all datasets, with a voxel size of 0.5×0.5×2.0 mm³. Hounsfield unit (HU) accuracy was calculated as the mean absolute error (MAE) between the CBCT images reconstructed from raw CB projections and the CBCT images reconstructed from synthesized CB projections with a body structure contracted by 15 mm from the external contour.

Figure 1 The workflow of cone-beam projection synthesis. The dashed line between the corrected DRRs and the cone-beam projections represents the pairing relationship. CB, cone-beam; CBCT, cone-beam computed tomography; DL, deep learning; DRRs, digitally reconstructed radiographs.

Results

Comparison with traditional methods

The performance of the non–DL baselines and the DL model (Attention-UNet) is summarized in Table 1. Histogram matching slightly improved the similarity between DRRs and CB projections compared with the raw DRRs as reflected by the higher PSNR and SSIM values and a lower VQM. However, the Attention-UNet model consistently achieved the best performance across all evaluation metrics. Specifically, it produced the highest PSNR and SSIM values and the lowest VQM, indicating improved structural similarity and perceptual image quality.

Table 1

Comparison between non-deep learning baselines and a deep learning model

Method	PSNR↑	SSIM↑	VQM↓
Raw DRR	11.09±1.04	0.79±0.033	0.79±0.027
Histogram-matched DRR	18.81±2.44	0.87±0.036	0.53±0.089
Deep learning model (Attention-UNet)	29.71±3.62	0.97±0.008	0.20±0.091

Data are presented as mean ± standard deviation. ↑, larger is better; ↓, smaller is better. DRR, digitally reconstructed radiograph; PSNR, peak signal-to-noise ratio; SSIM, structural similarity index measure; VQM, video quality metric.

The effect of scatter correction

The performance of the three DL models with respect to the scatter correction is summarized in Table 2. When the DL models were trained on DRRs with scatter correction, the results of three metrics showed consistent improvement. The PSNR for Attention-UNet, Residual AutoEncoder, and Pix2Pix were 29.71±3.62, 26.55±2.53, and 27.22±2.72, respectively; the SSIM was 0.97±0.008, 0.96±0.007, and 0.94±0.014, respectively; and the VQM was 0.20±0.091, 0.27±0.075, and 0.25±0.084, respectively. Of these models, the Attention-UNet showed the largest improvement in PSNR, SSIM, and VQM among with respect to the scatter correction. After Bonferroni correction within each metric, Attention-UNet remained significantly superior to Pix2Pix in PSNR (raw P=0.0066; adjusted P=0.0198), SSIM (raw P=0.0003; adjusted P=0.0009), and VQM (raw P=0.0027; adjusted P=0.0081). Compared with Residual AutoEncoder, Attention-UNet showed a significant improvement in SSIM (raw P=0.0036; adjusted P=0.0108), whereas the differences in PSNR (raw P=0.0400; adjusted P=0.1200) and VQM (raw P=0.0452; adjusted P=0.1356) were not statistically significant after correction. In addition, Residual AutoEncoder produced a significantly better SSIM than did Pix2Pix (raw P=0.0021; adjusted P=0.0063), whereas no significant differences were observed in PSNR or VQM between these two models after correction.

Table 2

Model performance with respect to scatter correction

Model	Without scatter correction			With scatter correction
Model	PSNR↑	SSIM↑	VQM↓	PSNR↑	SSIM↑	VQM↓
Attention-UNet	24.55±2.47	0.95±0.012	0.33±0.081	29.71±3.62	0.97±0.008	0.20±0.091
Residual AutoEncoder	25.47±2.98	0.96±0.010	0.31±0.089	26.55±2.53	0.96±0.007	0.27±0.075
Pix2Pix	26.14±3.24	0.93±0.015	0.29±0.099	27.22±2.72	0.94±0.014	0.25±0.084

Data are presented as mean ± standard deviation. ↑, larger is better; ↓, smaller is better. PSNR, peak signal-to-noise ratio; SSIM, structural similarity index measure; VQM, video quality metric.

The effect of CBCT resolution

The performance of the three DL models with respect to the CBCT resolution is summarized in Table 3. The original and predicted projection images for three CBCT resolutions are compared in Figure 2. The CBCT resolutions represented by three voxel sizes (0.5×0.5×2.0 mm³, 1.0×1.0×4.0 mm³, and 2.0×2.0×8.0 mm³) were evaluated, with a larger voxel size indicating a lower CBCT resolution. For three models, the model performance became poor as CBCT resolution decreased. For Attention-UNet, when the voxel size was 0.5×0.5×2.0 mm³, the model achieved the best predicted image with the highest PSNR (29.71±3.62) and SSIM (0.97±0.008) values and the lowest VQM value (0.20±0.091). As CBCT resolution reduced, the quality of the predicted image decreased. For example, when the CBCT voxel size increased to 2.0×2.0×8.0 mm³, the model yielded a poor predicted result, with a PSNR of 28.27±3.10, an SSIM of 0.89±0.008, and a VQM of 0.23±0.083. This pattern was observed for the three models, with a low CBCT resolution resulting in a poor predicted image from the DL model. Figure 2 shows the effect of CBCT voxel size on quality of the synthesized projections in two representative cases.

Table 3

The model performance with respect to CBCT resolution

Voxel size (mm³)	Attention-UNet			Residual AutoEncoder			Pix2Pix
Voxel size (mm³)	PSNR↑	SSIM↑	VQM↓	PSNR↑	SSIM↑	VQM↓	PSNR↑	SSIM↑	VQM↓
0.5×0.5×2.0	29.71±3.62	0.97±0.008	0.20±0.091	26.55±2.53	0.96±0.007	0.27±0.075	27.22±2.72	0.94±0.014	0.25±0.084
1.0×1.0×4.0	29.29±3.01	0.96±0.074	0.21±0.080	25.83±2.05	0.95±0.008	0.29±0.063	25.82±3.15	0.92±0.015	0.29±0.092
2.0×2.0×8.0	28.27±3.10	0.89±0.008	0.23±0.083	25.05±3.14	0.94±0.007	0.31±0.090	25.50±3.01	0.91±0.015	0.30±0.087

Data are presented as mean ± standard deviation. ↑, larger is better; ↓, smaller is better. CBCT, cone-beam computed tomography; PSNR, peak signal-to-noise ratio; SSIM, structural similarity index measure; VQM, video quality metric.

Figure 2 The effect of CBCT voxel size on quality of the synthesized projections for two representative cases. AE, AutoEncoder; CBCT, cone-beam computed tomography.

The representative case in Figure 3 provides a visual comparison between the CBCT reconstructed from the original CB projections and the CBCT reconstructed from the synthesized projections generated by the three models. The MAE between the reference CBCT and the CBCT reconstructed from synthesized projections was 61.81±12.1 HU for Attention-UNet, 73.20±16.5 HU for Residual AutoEncoder, and 77.39±18.1 HU for Pix2Pix, respectively. The CBCT reconstructed from the synthesized projections deviated to extent in HU from the reference CBCT. This could have caused dose variation as the plan complexity increased in certain clinical scenarios, such as treatment replanning based on CBCT in adaptive radiotherapy.

Figure 3 The comparison of the reconstructed CBCT from synthesized projections and from original cone-beam projections with a voxel size of 0.5×0.5×2.0 mm³. (A) The reconstructed CBCT from original cone-beam projections. (B) The reconstructed CBCT images from synthesized projections predicted by Attention-UNet. (C) The reconstructed CBCT images from synthesized projections predicted by Residual AutoEncoder. (D) The reconstructed CBCT images from synthesized projections predicted by Pix2Pix. CBCT, cone-beam computed tomography.

Sample size effect

An ablation study on sample sizes was conducted, the results of which are compared in Figure 4. As the number of sample size increased, PSNR and SSIM increased and VQM decreased across all three models. In addition, PSNR increased and VQM degraded substantially when the sample size increased from 5% to 40%, whereas the improvement became more modest beyond 40% in Attention-UNet and Residual AutoEncoder. For Pix2Pix, the performance consistently improved as the sample size increased. Moreover, SSIM improved substantially when the training sample size increased from 5% to 20% of the available training data. However, the increase in SSIM tended to saturate when the training sample size exceeded 20% of the available training data for the three DL models. The training time for different dataset sizes was also evaluated, the results of which are shown in Figure 4D. The training time increased approximately linearly with the increase in sample size.

Figure 4 The model performance with respect to the sample size. (A) PSNR, (B) SSIM, (C) VQM, and (D) time (hours). “Sample size” refers to the number of corrected DRRs. AE, AutoEncoder; DRRs, digitally reconstructed radiographs; PSNR, peak signal-to-noise ratio; SSIM, structural similarity index measure; VQM, video quality metric.

Discussion

In this study, a method for synthesizing 2D CB projections from 3D CBCT data via a DL model was developed. DRRs were first reconstructed for the existing CBCT data and then transferred to DL model for prediction. It was found that the predicted CB projection was highly similar to the original CB projection. With additional scatter corrections and an increase in sample size, an even more faithful approximation could be achieved. With the predicted CB projections being highly similar to the originals, it would possible to eliminate the need to back up the original CB projections in order to free up valuable clinical storage space.

The comparison with histogram-matched DRRs highlights the limitations of traditional intensity-based correction methods. Histogram matching performs only a global intensity remapping and does not account for spatially varying differences or the complex image characteristics between DRRs and CB projections. Consequently, although simple intensity normalization can partially reduce the domain discrepancy, it cannot adequately model the structural and intensity relationships present in CB projections. In contrast, the DL model is more effective in capturing these relationships, enabling more accurate projection synthesis.

Although DRRs with scatter correction can effectively improve the prediction results, and other secondary physics correction, such as beam hardening correction, may also be favorable, these corrections would be more complex and laborious. For instance, the scatter correction indicated that the inclusion of more corrections would be useful in improving the model prediction accuracy. It should be noted that certain corrections are highly dependent on calibration processes, which may introduce additional errors and uncertainties. Among the three models, Attention-UNet achieved the largest improvement after scatter correction, suggesting that this model is more sensitive to this correction. One possible reason for this is that Attention-UNet was trained using a loss function that included MSE and L1. Since PSNR is directly derived from MSE, a smaller MSE corresponds to a higher PSNR. Therefore, scatter correction may produce changes that are more directly rewarded by this objective, resulting in a larger PSNR gain and a lower VQM. However, given the limited number of test cases, there may be a degree of variation in model performance. A larger test set and broader evaluation metrics will be included in future work to further scrutinize this observation.

CBCT resolution exerts a certain effect on the prediction accuracy of models. Due to skip connections, Attention-UNet emphasized preservation of local structural details. Therefore, as CBCT resolution decreased, SSIM decreased with the loss of image structural details. The Residual AutoEncoder was explicitly optimized to minimize pixel differences. As CBCT resolution decreased, a low frequency dominated both inputs and outputs, which resulted in the deterioration of all three metrics. For the Pix2Pix, decreased CBCT resolution introduced substantial loss of high-frequency details. Neither the adversarial component nor the L1 reconstruction loss could compensate for the information loss. As a result, PSNR, SSIM, and VQM were consistently degraded.

Previous studies have examined image compression with super-resolution techniques (36), with the primary aim of achieving high compression ratios. In contrast, our proposed approach represents an alternative storage strategy in that it synthesizes the data that requires storage. Dhont et al. developed a RealDRR framework to render more realistic DRRs from the planning CT. However, their work was not intended for projection compression, and scatter-related effects were not explicitly modeled through a dedicated correction step (37). Regarding more recent DL architectures for image-to-image translation, Pix2Pix was adopted in this study as a representative GAN-based model and demonstrated effective performance for the projection synthesis task. Nevertheless, the rapid evolution of medical image processing has introduced advanced generative frameworks, such as Transformer-based architectures (e.g., Swin-UNet and Vision Transformers). These models may further improve the modeling of complex structural and intensity relationships between DRRs and CB projections. Therefore, evaluating more advanced architectures represents an important direction for future research.

In our study, we observed a deviation between the reconstructed and reference CBCT images; similarly, other studies on synthesized CT have also reported differences in HU assignment for bone, soft tissue, and air. Although statistically significant, they have little impact on the doses of the planning target and critical organs (38). Palmér et al. evaluated the HU difference between synthesized CT and reference CT images, reporting a mean MAE of 67±14 HU and a mean gamma passing rate of 99.4% (range, 95.7–99.9%), in good agreement with the CT-based dose (39). Since more complex plans may have higher sensitivity to variations in HU assignment, care should be taken when adopting the reconstructed CBCT for highly modulated radiotherapy plans. The effect of reconstruction errors and novel DL-based method should be investigated in the future.

There are several limitations to this study which should be addressed. First, only the head region was examined, which entails a relatively high contrast and an abundance of bony structures. Other locations, such as the chest and abdomen, have a higher proportion of soft tissues and thus less contrast. They represent a greater challenge for CB projection synthesis and will be examined in our subsequent work. Second, other corrections for DRRs were not included. DRRs with secondary physics effect correction require additional data acquisition and calibration. This would necessitate greater efforts and should be investigated in future research. Third, incorporating other learning modules, such as attention-based Transformers, may further improve the performance of the DL models examined in our study.

Conclusions

DL models provide an effective means to synthesizing CB projections from existing CBCT data for patients with head and neck cancer. The preliminary result shows that with high CBCT resolution and a larger number of training samples, high-quality CB projections can be generated from the existing CBCT data. With the synthesized CB projections being highly similar to their corresponding originals, the proposed approach may help reduce the storage requirements of CB projection datasets and potentially alleviate the burden on clinical data storage systems. These advantages may be particularly attractive in image-guided radiotherapy, in which CBCT acquisition generates a large number of projection images that require storage.

Acknowledgments

None.

Footnote

Reporting Checklist: The authors have completed the TRIPOD+AI reporting checklist. Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2731/rc

Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2731/dss

Funding: This work was supported by the Non-profit Central Research Institute Fund of Chinese Academy of Medical Sciences (No. 2024-RW320-05), the National High Level Hospital Clinical Research Funding (Nos. 2025-LYZX-R-B06 and 2022-CICAMS-80102022203), Beijing Hope Run Special Fund of Cancer Foundation of China (No. LC2021B01), Beijing Natural Science Foundation (No. L2609075), the Joint Funds for the Innovation of Science and Technology, Fujian Province (No. 2023Y9442), and Fujian Provincial Health Technology Project (No. 2023GGA052).

Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-1-2731/coif). The authors have no conflicts of interest to declare.

Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. This study was conducted in accordance with the Declaration of Helsinki and its subsequent amendments. The ethics committee of National Cancer Center/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College approved this study (No. NCC2018-016). The written informed consent was waived because of the retrospective design.

Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.

References

Létourneau D, Wong JW, Oldham M, Gulam M, Watt L, Jaffray DA, Siewerdsen JH, Martinez AA. Cone-beam-CT guided radiation therapy: technical implementation. Radiother Oncol 2005;75:279-86. [Crossref] [PubMed]
Zhang Y, Jiang Z, Zhang Y, Ren L. A review on 4D cone-beam CT (4D-CBCT) in radiation therapy: Technical advances and clinical applications. Med Phys 2024;51:5164-80. [Crossref] [PubMed]
Liu H, Schaal D, Curry H, Clark R, Magliari A, Kupelian P, Khuntia D, Beriwal S. Review of cone beam computed tomography based online adaptive radiotherapy: current trend and future direction. Radiat Oncol 2023;18:144. [Crossref] [PubMed]
Jain A, Shil M, Sreepradha C, Rai S, Kaur I, Banka A. A Review on Cone-Beam Computed Tomography and its Application in Dentistry. J Pharm Bioallied Sci 2024;16:S38-40. [Crossref] [PubMed]
Chan F, Brown LF, Parashos P. CBCT in contemporary endodontics. Aust Dent J 2023;68:S39-55. [Crossref] [PubMed]
Kaaber L, Matzen LH, Schropp L, Spin-Neto R. Low-dose CBCT protocols in implant dentistry: a systematic review. Oral Surg Oral Med Oral Pathol Oral Radiol 2024;138:427-39. [Crossref] [PubMed]
Styrvoky K, Schwalk A, Pham D, Chiu HT, Rudkovskaia A, Madsen K, Carrio S, Kurian EM, De Las Casas L, Abu-Hijleh M. Shape-Sensing Robotic-Assisted Bronchoscopy with Concurrent use of Radial Endobronchial Ultrasound and Cone Beam Computed Tomography in the Evaluation of Pulmonary Lesions. Lung 2022;200:755-61. [Crossref] [PubMed]
Verhoeven RLJ, Kops SEP, Wijma IN, Ter Woerds DKM, van der Heijden EHFM. Cone-beam CT in lung biopsy: a clinical practice review on lessons learned and future perspectives. Ann Transl Med 2023;11:361. [Crossref] [PubMed]
Hattori H, Yatagawa T, Ohtake Y, Suzuki H. Learning Scatter Artifact Correction in Cone-Beam X-Ray CT Using Incomplete Projections with Beam Hole Array. Journal of Nondestructive Evaluation 2024;43:99.
Dreier T, Nilsson D, Espes E. In-line and at-line battery CT enabled by MetalJet sources. e-Journal of Nondestructive Testing 2024;29.
Wilson LJ, Hadjipanteli A, Østergaard DE, Bogaert E, Brown KF, DeJong R, Earley J, Edouard M, Khan M, Lindsay J, Der Himst JV, Ding GX, Wood T, Aznar MC, Jornet N, Ntentas G. Cone beam CT dose optimisation: A review and expert consensus by the 2022 ESTRO Physics Workshop IGRT working group. Radiother Oncol 2025;209:110958. [Crossref] [PubMed]
Hugo GD, Weiss E, Sleeman WC, Balik S, Keall PJ, Lu J, Williamson JF. A longitudinal four-dimensional computed tomography and cone beam computed tomography dataset for image-guided radiation therapy research in lung cancer. Med Phys 2017;44:762-71. [Crossref] [PubMed]
Park J, Park S, Kim J, Liu Z, Watkins W, Song W. TU-B-201B-04: Four-Dimensional Cone-Beam Computed Tomography and Digital Tomosynthesis Using Motion Signals Extracted from Fiducial Marker Inserted for Liver Cancer Radiation Therapy. Medical Physics 2010;37:3378.
Liu X, An P, Chen Y, Huang X. An improved lossless image compression algorithm based on Huffman coding. Multimedia Tools and Applications 2022;81:4781-95.
Elakkiya S, Thivya KS. Comprehensive Review on Lossy and Lossless Compression Techniques. Journal of The Institution of Engineers (India): Series B 2022;103:1003-12.
Brennecke R, Bürgel U, Rippin G, Post F, Rupprecht HJ, Meyer J. Comparison of image compression viability for lossy and lossless JPEG and Wavelet data reduction in coronary angiography. Int J Cardiovasc Imaging 2001;17:1-12. [Crossref] [PubMed]
Bui V, Chang LC, Li D, Hsu LY, Chen MY. Comparison of lossless video and image compression codecs for medical computed tomography datasets. 2016 IEEE International Conference on Big Data (Big Data). Washington, DC, USA: IEEE; 2016:3960-2.
Liu F, Hernandez-Cabronero M, Sanchez V, Marcellin MW, Bilgin A. The Current Role of Image Compression Standards in Medical Imaging. Information (Basel) 2017;8:131. [Crossref] [PubMed]
Waddington SP, McKenzie AL. Assessment of effective dose from concomitant exposures required in verification of the target volume in radiotherapy. Br J Radiol 2004;77:557-61. [Crossref] [PubMed]
Groh BA, Siewerdsen JH, Drake DG, Wong JW, Jaffray DA. A performance comparison of flat-panel imager-based MV and kV cone-beam CT. Med Phys 2002;29:967-75. [Crossref] [PubMed]
Howerton WB Jr, Mora MA. Advancements in digital imaging: what is new and on the horizon? J Am Dent Assoc 2008;139:20S-4S.
Barnes MP, Pomare D, Menk FW, Moraro B, Greer PB. Evaluation of the truebeam machine performance check (MPC): OBI X-ray tube alignment procedure. J Appl Clin Med Phys 2018;19:68-78. [Crossref] [PubMed]
Lin Y, Chen J, Wang H, Yang J, Guo J, Zhang Y, Li X. DeepSparse: A Foundation Model for Sparse-View CBCT Reconstruction. IEEE Trans Med Imaging 2026;45:3339-51. [Crossref] [PubMed]
Gardner M, Bouchta YB, Mylonas A, Mueller M, Cheng C, Chlap P, Finnegan R, Sykes J, Keall PJ, Nguyen DT. Realistic CT data augmentation for accurate deep-learning based segmentation of head and neck tumors in kV images acquired during radiation therapy. Med Phys 2023;50:4206-19. [Crossref] [PubMed]
Liang Y, Jia T, Li N, Liu X, Jiang J, Lu G, Zhao M. Review of Static Image Compression Algorithms. 2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC). Chongqing: IEEE; 2024:222-31.
Tao L, Gao W, Li G, Zhang C. AdaNIC: Towards Practical Neural Image Compression via Dynamic Transform Routing. 2023 IEEE/CVF International Conference on Computer Vision (ICCV). Paris: IEEE; 2023:16833-42.
Chen J, Chen W, Xing Q, Yu F. Multi-Scale Contextual Coding for Human-Machine Vision of Volumetric Medical Images. IEEE Access 2025;13:145663-79.
Mishra D, Singh SK, Singh RK. Deep Architectures for Image Compression: A Critical Review. Signal Processing 2022;191:108346.
Staub D, Murphy MJ. A digitally reconstructed radiograph algorithm calculated from first principles. Med Phys 2013;40:011902. [Crossref] [PubMed]
Zhang S, Zhang Y, Tuo M, Zhang H. Fast algorithm for Joseph’s forward projection in iterative computed tomography reconstruction. Journal of Ambient Intelligence and Humanized Computing 2023;14:12535-48.
Oktay O, Schlemper J, Le Folgoc L, Lee M, Heinrich M, Misawa K, Mori K, McDonagh S, Hammerla NY, Kainz B, Glocker B, Rueckert D. Attention U-Net: Learning Where to Look for the Pancreas. arXiv 2018. arXiv:1804.03999.
Bank D, Koenigstein N, Giryes R. Autoencoders. In: Rokach L, Maimon O, Shmueli E. editors. Machine Learning for Data Science Handbook: Data Mining and Knowledge Discovery Handbook. Cham: Springer International Publishing; 2023:353-74.
Isola P, Zhu JY, Zhou T, Efros AA. Image-to-Image Translation with Conditional Adversarial Networks. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). Honolulu, HI, USA: IEEE; 2016:5967-76.
Sara U, Akter M, Uddin MS. Image Quality Assessment through FSIM, SSIM, MSE and PSNR—A Comparative Study. Journal of Computer and Communications 2019;7:8-18.
Wang Z, Bovik AC, Sheikh HR, Simoncelli EP. Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 2004;13:600-12. [Crossref] [PubMed]
Chang Z, Shang J, Fan Y, Huang P, Hu Z, Zhang K, Dai J, Yan H. Deep learning-based super-resolution method for projection image compression in radiotherapy. Quant Imaging Med Surg 2025;15:8611-26. [Crossref] [PubMed]
Dhont J, Verellen D, Mollaert I, Vanreusel V, Vandemeulebroucke J. RealDRR - Rendering of realistic digitally reconstructed radiographs using locally trained image-to-image translation. Radiother Oncol 2020;153:213-9. [Crossref] [PubMed]
Singhrao K, Dugan CL, Calvin C, Pelayo L, Yom SS, Chan JW, Scholey JE, Singer L. Evaluating the Hounsfield unit assignment and dose differences between CT-based standard and deep learning-based synthetic CT images for MRI-only radiation therapy of the head and neck. J Appl Clin Med Phys 2024;25:e14239. [Crossref] [PubMed]
Palmér E, Karlsson A, Nordström F, Petruson K, Siversson C, Ljungberg M, Sohlin M. Synthetic computed tomography data allows for accurate absorbed dose calculations in a magnetic resonance imaging only workflow for head and neck radiotherapy. Phys Imaging Radiat Oncol 2021;17:36-42. [Crossref] [PubMed]

Cite this article as: Fan Y, Huang P, Shang J, Chang Z, Hu Z, Zhang K, Xie X, Liu Z, Yan H. Synthesizing cone-beam projections in radiotherapy using deep learning network for patients with head and neck cancer. Quant Imaging Med Surg 2026;16(7):523. doi: 10.21037/qims-2025-1-2731

Synthesizing cone-beam projections in radiotherapy using deep learning network for patients with head and neck cancer

Introduction

Methods

Data

DRRs

DL models

Evaluation

Results

Comparison with traditional methods

Table 1

The effect of scatter correction

Table 2

The effect of CBCT resolution

Table 3

Sample size effect

Discussion

Conclusions

Acknowledgments

Footnote

References

Article Options

Download Citation

Share