Non-rigid image-volume registration for human livers in laparoscopic surgery
Introduction
Identifying abnormal tissue in laparoscopic surgeries remains a significant challenge for surgeons, particularly in ensuring the success of cancer removal procedures. Preoperative medical images, such as computed tomography (CT) scans and magnetic resonance imaging (MRI), provide precise delineation of the cancerous area through segmentation, aiding in accurate localization before surgery starts. However, surgeons must mentally retain the location of the abnormal tissue throughout the operation, which increases the risk of lapses or errors.
Augmented reality (AR) in laparoscopic surgery is an emerging technique that overlays preoperative information onto intraoperative laparoscopic videos, enhancing surgeons’ real-time understanding of the surgical site (1-3). This technology plays a crucial role in surgical procedures by aiding accurate tumour localization, leading to optimal tumour removal and reducing the likelihood of recurrence (4). However, a key challenge lies in achieving precise and autonomous alignment of the preoperative and intraoperative shapes obtained from different modalities, commonly referred to as registration (5).
In laparoscopic surgery, unlike in orthopaedic (6) or plastic (7) procedures, tissue deformation is common, making registration typically non-rigid, like the liver deformation in an animal model due to pneumoperitoneum (8-10). Several studies focus on non-rigid registration techniques for aligning pre- and intra-operative 3D shape (meshes or point cloud) reconstructed from stereo camera (11-15). However, in the context of AR in laparoscopic surgery, despite the stereo nature of laparoscopy, the approach often involves image-volume registration, or 2D-3D registration rather than reconstructing the surgical scene using binocular images. This is because stereo reconstruction performance is limited due to the small baseline of clinical laparoscopes.
Image-volume registration for human organs, utilizing intra-operative monocular laparoscopic images and preoperative 3D meshes, has been extensively studied. This process typically involves an initial alignment step followed by refinement (3,16). As shown in Figure 1, the workflow for 2D-3D registration on laparoscopic images involves several steps: liver segmentation, silhouette contour extraction, point cloud projection, and registration. Segmentation is performed using a convolutional neural network (CNN) that automatically detects the liver region. The contour is extracted from the silhouette of the segmented liver. The point cloud is generated from the mesh reconstructed from the preoperative CT scans.
In the initial stage, objects are aligned using manually selected landmarks to perform standard rigid registration (17). The initial rigid registration is performed using anatomical landmarks (e.g., surface points or distinct anatomical features) extracted from both the preoperative 3D model and the intraoperative image. This method assumes that deformations such as stretching and shearing are not excessively severe, thereby enabling global registration. Robu et al. (18) introduced an initial alignment approach by framing the issue as a quadratic assignment problem, aiming to minimize dissimilarities between feature descriptors while incorporating constraints based on liver contours. In the refinement stage, two primary challenges emerge: establishing correspondences between two shapes and constructing an effective deformation model. For 2D-3D registration, many methods involve projecting the 3D model onto the 2D space, effectively transforming the problem into a 2D-2D registration task. Building on advances in liver anatomy detection, a recent work (19) introduced depth-driven geometric prompt learning for laparoscopic liver landmark detection. Methods that utilise point clouds (3,14) or anatomical features such as ridges (14,20-23) optimise objective functions by minimising the distance between corresponding features. Anatomical features offer an advantage in that 3D meshes used for registration are often texture-less, even though intra-operative laparoscopic images can contain abundant texture features detectable by general algorithms, such as scale invariant feature transform-based (24) or optical flow-based methods (25). In practice, the silhouette contour of the liver in a 2D image is a commonly used feature that can also be extracted from textureless meshes. Adagolodjo et al. (26), Espinel et al. (24), and Özgür et al. (27) employed the silhouette contour of 3D objects as input and used biomechanical models to estimate organ deformation. However, silhouette-based methods are highly sensitive to variations in object shape and the camera extrinsic parameters, which can result in ambiguous or multiple solutions.
Beyond two-step solutions, deep learning-based approaches have shown considerable promise in tackling this challenge. For instance, Nakao et al. (28) proposed an image-to-graph convolutional network for deformable registration of a 3D organ mesh with a low-contrast 2D projection image. Additionally, Zhao et al. (29) improved registration efficiency and accuracy by introducing a learned shape metric to enhance interpolation through training on specific metrics. Convolutional neural networks trained on biomechanical simulations of randomly generated, deforming organ-like structures have demonstrated the potential to generalise to new, patient-specific cases (30). However, these methods often require extensive preoperative data collection with accurate labelling of liver structures. In clinical settings, patient-specific livers are frequently abnormal, which limits the generalisability of these approaches.
To address these challenges, we employ a preoperative network training approach that maps organ boundaries to camera positions. Training is conducted on a 3D model segmented from preoperative CT scans, with contour features easily extracted from the 3D model under various camera viewpoints. In contrast to methods (31,32), which incorporate neural networks to predict deformations during model training, our method focuses on non-rigid registration driven by contour-based alignment. After aligning the initial frame, a differentiable mapping from the boundary to the camera position is utilised. During the refinement step, the network’s gradient is leveraged to efficiently converge to a local optimum, reducing the complexity of both data collection and training. This differentiable mapping not only accelerates the initial alignment process but also significantly improves the speed of online non-rigid registration.
In this paper, we propose a novel method for non-rigid image-volume registration (NRIVR) for human organs in minimally invasive surgery. The main contributions are:
- We introduce the first NRIVR tailored specifically for laparoscopic surgery, addressing the unique challenges posed by the deformable nature of organs in minimally invasive procedures.
- We propose a differentiable mapping from the contour to the camera pose, enabling fast and accurate optimization in deformable registration tasks.
- Simulations and experiments on laparoscopic liver images were conducted to validate the performance of NRIVR.
Methods
The problem of NRIVR addressed in this paper is as follows: given a 3D volume model and an image of the same object in a different shape, with some corresponding landmarks, find the non-rigid registration between them.
The workflow is depicted in Figure 2. The inputs for the registration process include the 3D volume mesh segmented and reconstructed from the preoperative CT scan, as well as the segmented image obtained from laparoscopy. A virtual camera is used to project the 3D mesh onto a 2D space, but its pose is unknown and needs to be identified by minimizing the shape difference between the projected 3D mesh and the 2D image. The registration procedure involves two main steps:
- Preoperative training: a neural network is trained preoperatively to estimate the camera pose from the 2D contour of the 3D model. This step utilizes preoperative data to generate a contour-prior set of the 3D liver model, allowing for the mapping of different camera extrinsic parameters to 2D projections of the 3D model.
- Intraoperative alignment: during the surgery, the trained neural network is used to estimate the initial camera pose. The 3D model is then aligned to the 2D laparoscopic image by optimizing the camera pose and deformation field to minimize the shape difference between the projected 3D mesh and the 2D image. This iterative process involves updating the correspondence between the 2D image pixels and the projected 3D point cloud to refine the alignment.
3D-2D projection model
Based on the camera model, as shown in Figure 3, the projection of a 3D object, denoted as , in homogeneous coordinates onto a 2D image is
where the camera matrix is given by the multiplication of the camera intrinsic matrix and the extrinsic matrix , combining the rotation matrix and translation vector .
The intrinsic matrix , defined as
is composed of focal lengths (fxand fy) and optical centres (cx and cy), which are usually known.
Therefore, in AR, if the 3D object is to be properly overlapped on the 2D laparoscopic image, it is essential to accurately determine the camera’s extrinsic parameters ( and ). The scale in registration can be realized by adjusting the camera’s position. Once these parameters are known, the 3D object can be projected with this camera configuration into 2D space and fused with the original 2D image.
The translation vector is defined as:
The pose of the camera is denoted as . With the neural network, there is:
where C is the contour of the virtual 3D model and LCENN represents long short-term memory-based camera estimation neural network.
3D liver model contour prior dataset
To find the relationship between camera pose and the projected contour of a specific 3D model, we create a contour-prior set unique to each model. This set consists of projected contours from an object-centred view, as shown in Figure 4. Initially, the 3D liver model is placed in a frontal view, displaying its anterior ridge, ligament, and silhouette prominently to the camera. We then rotate the viewpoint along the , , and axes within a range of 0−45 degrees, with a step size of 3 degrees, to capture various perspectives of the 3D liver model.
For translational parameters, even in perspective projection, the camera’s position does not significantly affect the contour’s shape, allowing us to use longer steps to sample positions along the , , and axes.
In some scenarios, the camera pose relative to the liver can be roughly determined based on anatomical landmarks, permitting a smaller range of view angles and positions. We then perform scan-based contour extraction to generate contours from the sampled images. Each contour is represented as a series of points, typically exceeding 2,000 points per contour.
To ensure statistically significant results, the dataset for each liver model contains at least 500 contours captured from different camera viewpoints, with a total of 270 liver models. We divide the dataset into three parts: 74.07% (200 models) for training, 11.11% (30 models) for validation, and 14.81% (40 models) for testing. This ensures that the neural network is trained effectively to map 2D contours to 3D camera poses. However, for the method’s generalizability and robustness, it is crucial to validate and test on data from unseen models that were not part of the training set. Given that patient anatomy is specific, the method should be tested on unseen, real patient data to confirm its applicability to diverse cases in clinical settings.
LCENN
We propose a novel approach for laparoscopic camera pose regression using a combination of long short-term memory (LSTM) units and a fully connected layer. While LSTMs are typically used for temporal sequences, they are also effective for spatial data due to their memory capabilities (33). In our method, contours extracted from laparoscopic images are treated as sequences, with each point on the contour representing a time step, as shown in Figure 5.
To standardize input length, we truncate contour sequences to 240 two-dimensional points, which are modelled as B-splines. These points are transformed into 512-dimensional hyper contour features b using a B-spline neural network (BSNN) (34). However, the high dimensionality of b risks overfitting. To address this, LSTMs are employed to learn spatial dependencies, capturing the most relevant feature correlations for pose estimation. The LSTM output z is passed to a fully connected layer with three neurons to predict orientation coordinates.
The use of LSTMs is particularly advantageous for two reasons. First, the dataset, derived from virtual models, is transformed into sequential data through the process of capturing contours from various camera viewpoints. This sequential structure allows LSTMs to be effective for learning spatial dependencies in the contour data, making them a natural fit for the task. Second, LSTMs are differentiable, enabling efficient gradient-based optimization during training. We train the model using the Adam optimizer (β1=0.9, β2=0.9999), a batch size of 16, and mean square error (MSE) loss.
As illustrated in Figure 5, the architecture processes contour sequences through BSNN and LSTM layers, culminating in pose estimation. This approach leverages the strengths of both B-spline networks and LSTMs, providing a robust solution for laparoscopic camera pose regression.
Deformation field
Since there are deformations of the human organ, we use the weighted average deformation field to describe. Given that the 3D surface is in a discrete manner and the assumption that the deformation is continuous, the displacement at position is , there exists:
where , is relative to the distance between points, , are the nodes with known displacement , as illustrated in Figure 6. The displacement matrix is defined as .
The deformation field can also be projected onto 2D space, and the displacement of point on 2D plane, , is
Therefore, the contour of the deformed shape is estimated as:
Non-rigid 2D-3D registration
The iterative closest point (ICP)-like registration is performed in 2D space. The purpose is to obtain the deformation field and the extrinsic parameters of the camera. The idea is to minimize the shape difference using several terms as follows:
Distance term
In each iteration, the correspondence between the 2D image pixels and the projected 3D point cloud is assumed to be the closest pairs in 2D space, denoted as , where and represent the corresponding 2D and 3D points, respectively. The number of corresponding pairs is . The distance term can be described as the sum of all the Euclidean distances in the 2D space:
where is the weight value, which can be represented in matrix form .
and are corresponding pairs. The function is the deformed position matrix of .
Initially, correspondences are established by pairing the closest points between the 2D image and the projected 3D point cloud. As the iterative optimization progresses and the camera pose is updated, these correspondences are refined. Specifically, each iteration adjusts the correspondences based on the updated camera pose, ensuring that the point pairs are increasingly accurate. In many cases, there are landmarks, including anatomical features, that can be detected both on the 3D mesh and the 2D image. The correspondence between these landmarks in 2D and 3D spaces is determined, and their weights are higher than other points. The landmark vertices are denoted by and and their correspondence is used to enhance the accuracy of the alignment. The landmark term is defined as:
Stiffness term
The stiffness term penalizes deviations between the transformation matrices assigned to adjacent vertices. To represent this in matrix notation, we use the node-arc incidence matrix (35), which is crucial for defining the mesh topology in directed graphs. This matrix features one row for each edge and one column for each vertex. To construct the node-arc incidence matrix , we number the edges and vertices of the mesh. Each edge is directed from the vertex with the lower number to the vertex with the higher number. For an edge r connecting vertices i and j, the matrix entries for this edge are and . With where represents a stiffness parameter, the stiffness term is defined as:
where is the symbol for Kronecker product.
Contour term
The contour term measures the discrepancy between the 2D contour and its projection from the 3D mesh. The contour is represented as a sequence of points , where . The camera pose is first estimated using the LCENN.
The energy function quantifying the difference between the projected and observed contours is:
where is the correspondence matrix mapping the 2D contour to the projected 3D contour.
Optimization for registration
The final objective function is formulated as a weighted sum of multiple energy terms, designed to minimize the shape difference between the observed and model contours:
where are positive weighting parameters that balance the contributions of the data term , the landmark term , the smoothness term , and the camera pose regularization term . The data term measures the discrepancy between the observed and model contours, while the landmark term ensures alignment of specific anatomical landmarks. The smoothness term penalizes irregular deformations, and regularizes the camera pose to prevent unrealistic transformations.
The optimization is performed using gradient descent, starting from an initial solution where the deformation field is set to zero. The gradients of the objective function with respect to the camera pose and the deformation field are computed as follows:
where , , represent the rotational components and , , represent the translational components of the camera pose. The gradient with respect to the deformation field is given by:
where , , , represents the deformation of the i-th node along the j-th axis. The combined gradient for optimization is:
The registration process is iterative. In the initial step, the deformation field is set to zero, and the camera pose is estimated using the LCENN. The gradient descent algorithm then iteratively updates both the camera pose and the deformation field to minimize the objective function . This process continues until convergence, ensuring an accurate alignment between the observed and model contours while maintaining smooth and realistic deformations.
The differentiability of the LCENN ensures efficient gradient computation for optimization. The gradient of with respect to the camera pose is:
This allows the contour term to guide the registration process, ensuring accurate alignment between the 2D and 3D contours through gradient-based optimization. The differentiability of the LSTM network is particularly crucial, as it allows the gradients of the objective function to be backpropagated through the network. This enables the model to learn spatial dependencies in the contour features effectively. Mathematically, the gradient of the objective function with respect to the LSTM parameters can be expressed as:
where is the output of the LSTM, and represents the gradient of the LSTM output with respect to its parameters. This differentiability ensures that the LSTM can be fine-tuned during the optimization process, improving the overall accuracy of the pose estimation.
Similarly, the B-spline representation is differentiable with respect to the input contour points, allowing for the computation of gradients with respect to the deformation field . This ensures that the deformation field can be optimized to minimize the shape difference while maintaining smoothness.
In summary, the differentiability of the LSTM network, B-spline representation, and objective function enables efficient gradient-based optimization, ensuring robust and accurate registration for laparoscopic camera pose estimation. The use of gradient descent ensures that the solution converges to a local minimum, providing a reliable framework for this task. The non-rigid registration steps are outlined in Algorithm 1.
| Require: |
| Ensure: |
| Initialization: |
| 1: , |
| 2: LOOP Process: |
| 3: while not small enough do |
| 4: Eq. [16] |
| 5: |
| 6: |
| 7: |
| 8: end while |
| 9: return |
Results
Simulations and experiments were conducted to validate the performance of our approach.
Validation on LCENN
A total of 4 virtual models were used to validate the performance of the operative LCENN. These virtual models were collected from healthy and abnormal liver CT scans and then 3D reconstructed. For each model, there are 200 pairs of contours and corresponding camera rotations are randomly selected. The images of contour are served as input, and the output camera pose is compared with the ground-truth camera pose. The position of the camera in the virtual environment is used as the benchmark. The brute-force (BF) method is used to compare with our approach in terms of registration speed and accuracy. There are two different levels of search steps in the BF method. The smaller search step is 0.1°, 0.1°, and 0.1° for each angle, and 1 mm, 1 mm, 1 mm for x-, y-, z-axis translations, while the larger search step uses 1°, 1°, and 1° for each angle, and 3 mm, 3 mm, 3 mm for x-, y-, z-axis translations. We also compared our approach with two state-of-the-art methods: PoseCNN (36) and PVNet (37). PoseCNN is a convolutional neural network designed for 6D pose estimation, directly predicting the 3D translation and rotation of objects from RGB images. PVNet, on the other hand, is a keypoint-based approach that predicts 2D keypoints of an object and utilizes the Perspective-n-Point algorithm to estimate the 6D pose. Given the lack of texture in our virtual liver model, we used the T-LESS dataset (38), which is tailored for 6D pose estimation of texture-less industrial objects, along with silhouettes of the liver models.
The performance of LCENN was compared to state-of-the-art methods PoseCNN, PVNet, and BF approaches with different search step sizes. The mean translational error (TE) and mean rotational error (RE) were evaluated across three degrees of freedom (DOF). As shown in Table 1, LCENN achieved an average TE of 0.51±0.31 mm and an RE of 0.35±0.44°, with a time cost of 0.51±0.32 s. This accuracy is comparable to BF methods with smaller search steps but at over 1,500 times lower computation time. In contrast, BF methods with larger search steps reduced computation time to 782±294 s, but at the cost of significantly lower accuracy. Deep learning methods showed lower performance on this dataset. PVNet, a keypoint-based method, struggled due to the lack of distinct keypoints in the texture-less liver model, achieving a TE of 2.07±0.63 mm and an RE of 2.11±0.79°. PoseCNN, which relies on texture, was also less effective, with a TE of 2.50±0.97 mm and an RE of 3.48±1.48°. LCENN outperformed these methods, combining high accuracy with efficient computation, demonstrating its suitability for texture-less models and challenging pose estimation tasks in surgical applications.
Table 1
| Methods | Metrics | Liver 1 | Liver 2 | Liver 3 | Liver 4 | Average |
|---|---|---|---|---|---|---|
| LCENN | TE (mm) | 0.48±0.34 | 0.44±0.24 | 0.64±0.15 | 0.47±0.49 | 0.51±0.31 |
| RE (°) | 0.45±0.42 | 0.44±0.52 | 0.75±0.50 | 0.46±0.35 | 0.35±0.44 | |
| Time (s) | 0.59±0.31 | 0.45±0.38 | 0.49±0.27 | 0.53±0.33 | 0.51±0.32 | |
| BF(a) | TE (mm) | 0.45±0.34 | 0.67±0.45 | 0.65±0.34 | 0.56±0.23 | 0.58±0.37 |
| RE (°) | 0.39±0.18 | 0.56±0.55 | 0.49±0.23 | 0.55±0.42 | 0.50±0.29 | |
| Time (s) | 9,131±1,344 | 8,483±948 | 8,942±1,230 | 9,245±1,039 | 8,950±1,027 | |
| BF(b) | TE (mm) | 1.25±0.53 | 1.18±0.43 | 1.37±0.34 | 1.28±0.56 | 1.27±0.48 |
| RE (°) | 1.36±0.26 | 1.05±0.31 | 1.17±0.39 | 1.55±0.62 | 1.28±0.39 | |
| Time (s) | 823±492 | 785±292 | 843±122 | 679±434 | 782±294 | |
| PoseCNN (36) | TE (mm) | 2.83±0.76 | 2.12±0.85 | 2.45±1.24 | 2.58±0.76 | 2.50±0.97 |
| RE (°) | 2.95±1.26 | 3.45±1.37 | 3.97±1.48 | 3.55±1.75 | 3.48±1.48 | |
| Time (s) | 0.45±0.25 | 0.36±0.12 | 0.45±0.14 | 0.51±0.16 | 0.44±0.17 | |
| PVNet (37) | TE (mm) | 2.14±0.51 | 2.02±0.45 | 1.75±0.74 | 2.38±0.58 | 2.07±0.63 |
| RE (°) | 2.45±0.86 | 1.95±0.77 | 1.89±0.57 | 2.15±0.78 | 2.11±0.79 | |
| Time (s) | 0.69±0.21 | 0.67±0.32 | 0.57±0.45 | 0.57±0.34 | 0.63±0.38 |
Data are presented as mean ± standard deviation. BF(a) is a brute-force method having smaller search step with 1° for rotation and 1 mm for translation. BF(b) is a brute-force method having larger search step with 3° for rotation and 3 mm for translation. TE represents the mean translational error over three degrees of freedom, and RE represents the mean rotational error over three degrees of freedom. LCENN, long short-term memory-based camera estimation neural network; NRIVR, non-rigid image-volume registration.
The simulations were carried out with various configurations, involving different levels of deformation and noise injected into the contour of the silhouette, as outlined in Table 2. Landmarks are utilized at varying levels, including boundary points, landmark points, and curves. Additionally, different levels of noise are introduced to the landmarks to simulate errors encountered during feature detection. Deformation is quantified using the initial Hausdorff distance between the origin and the deformed object. These deformations are local, eliminating the need for initial alignment. A virtual camera captures images of the deformed liver. To assess performance under occlusion, a portion of the image is deliberately removed. Each group was validated with 10 different cases. In each case, five landmarks are selected both on the image and the 3D liver model.
Table 2
| Group | Deformation | Noise† | Landmarks | Number of cases |
|---|---|---|---|---|
| A1 | 1 mm | N (0.2, 0.5) | 5 | 10 |
| A2 | 1 mm | N (1.0, 0.5) | 5 | 10 |
| A3 | 1 mm | N (2.0, 0.5) | 5 | 10 |
| B1 | 3 mm | N (0.2, 0.5) | 5 | 10 |
| B2 | 3 mm | N (1.0, 0.5) | 5 | 10 |
| B3 | 3 mm | N (2.0, 0.5) | 5 | 10 |
| C1 | 10 mm | N (0.2, 0.5) | 5 | 10 |
| C2 | 10 mm | N (1.0, 0.5) | 5 | 10 |
| C3 | 10 mm | N (2.0, 0.5) | 5 | 10 |
†, data are presented as N (μ, σ).
Simulation on 3D liver model
In this simulation, we evaluate our method using synthetic liver examples with sizes falling within a 180×150×150 mm3 oriented bounding box.
Given that we have access to the virtual model, including the extrinsic parameters of the camera and the shape of the deformed mesh, we validate our method using the mean squared distance (MSD) (39) in 2D space as the criterion for registration error.
The results, shown in Figure 7, demonstrate that the meshes can align with the liver in laparoscopic images, even when parts of the contour are obscured. As the completeness of the contour decreases, the MSD increases. Remarkably, even with only 40% of the contour visible, the mesh and image still align effectively. Figure 8 illustrates the MSD across different groups, with the average MSD being 2.74 mm within these 9 groups. As the level of deformation increases, the MSD also increases. Additionally, higher levels of noise in the landmarks negatively affect the registration accuracy.
Registration results on laparoscopic images
In this experiment, we used real clinical laparoscopic images to validate the performance of our approach. The input consists of a liver mesh paired with its corresponding laparoscopic images. A deformation exists between the mesh and the image, as the mesh is typically reconstructed from preoperative CT scans. The initial pose parameters for the registration are estimated by using anatomical landmarks identified on the liver surface. These landmarks provide sufficient information to compute an initial rigid transformation aligning the 3D liver model with the 2D laparoscopic image.
We conducted experiments on 10 cases, each with 200 different views of laparoscopic images. The laparoscopic images used for validation in the registration process were sourced from the DKFZ dataset (40). This dataset includes preoperative CT scans and laparoscopic images of the liver, providing a robust foundation for evaluating the alignment of preoperative models with intraoperative visuals. We tested 10 different DKFZ models to assess the method’s adaptability to different liver shapes. However, the model was not trained on their dataset. Instead, we used pre-trained weights, which are suitable for various liver shapes, and did not fine-tune the model for the DKFZ dataset. The reason we used this dataset was because it contains paired laparoscopic images and CT scans, which were critical for evaluating our method. The results of the 2D-3D registration on laparoscopic images are presented in Figure 9. The figure shows that, across the four cases, the point cloud aligns well with the laparoscopic images despite the presence of deformation. The alignment remains accurate even when parts of the liver are not visible, underscoring the robustness of our approach.
The mean MSD of NRIVR is 4.26 mm, based on 2,000 laparoscopic images. Figure 10A shows the alignment errors between the point cloud and the images. Figure 10B illustrates the effects of contour length and surface area on the registration accuracy. Since the actual contour length cannot be directly measured from the laparoscopic images, we use the known size of the mesh as a reference. After aligning the mesh with the image, we convert the pixel length to meters. This conversion enables us to accurately compare the contour length and surface area of the 3D mesh = with the corresponding features in the laparoscopic images. Notably, as both the contour length and surface area increase, the registration error decrease, demonstrating the effectiveness of our approach in accurately aligning the preoperative model with the intraoperative visuals.
Figure 11 shows a registration result with and without considering the deformation field. Without incorporating deformation, rigid registration struggles to align the preoperative mesh with the deformed laparoscopic image. The average MSD for our method without accounting for the deformation field is 8.34 mm, which is 36.7% higher than the MSD achieved with nonrigid registration.
Discussion
The results from the LCENN demonstrate that our method efficiently determines the camera pose using the contour of the silhouette. The differentiable nature of LCENN provides a significant advantage in non-rigid 2D-3D registration, facilitating both efficient and accurate alignment.
In validating LCENN, our results show that it can estimate the true camera pose in a very limited time. The LSTM-based network leverages sequential information, as the dataset was collected by gradually varying the parameters. Compared to conventional neural networks, LCENN benefits from its ability to capture and utilize temporal dependencies, leading to more effective training. Our method offers notable advantages over BF techniques, which involve extensive search steps and are incompatible with gradient descent methods like those used in (13). LCENN efficiently encapsulates shape information and performs structured dimensionality reduction on the feature vector, significantly improving localization performance and enabling effective use of contour data for camera pose estimation. The additional time required for generating virtual data and training the neural network is manageable when done preoperatively and after CT scanning. Even with incomplete contour data, LCENN can still identify the optimal camera pose, as the loss function is based on the Hausdorff distance between contours. However, a limitation is that symmetric models may pose challenges for accurate camera pose estimation due to multiple valid solutions. Fortunately, liver asymmetry in laparoscopic surgery generally mitigates this issue.
Simulation experiments provide ground truth for camera pose and 3D mesh, enabling accurate evaluation of our method’s performance. The results show that both deformation levels and landmark noise affect registration accuracy. Landmark noise impacts the optimization of El(13). Our method exhibits robustness to increasing deformation levels and maintains consistent results across varying deformations, largely due to stable correspondences. For deformation between preoperative and intraoperative liver images, the optimization iteration (13) starts with a rigid registration guess. Deformations introduce registration errors during optimization. Simulations indicate that incomplete contours lead to higher registration errors because the rank of the matrix (12) decreases, increasing the likelihood of local optima and reducing the weighting (13). Additionally, short contour lengths can cause registration failures.
In terms of computational performance, the LCENN-based camera pose estimation operates with a runtime of approximately 0.5 seconds per frame on a standard GPU (NVIDIA RTX 3090). Since the input contour data are encoded into a fixed-size representation, the forward inference time remains constant, corresponding to a time complexity (1) with respect to the number of contour points. During non-rigid registration, the optimization process typically converges within 10 iterations, adding an average of 2–3 seconds of computation per frame. Thus, the overall runtime per frame is around 2.5–3.5 seconds, which meets near-real-time requirements for intraoperative guidance, where updates are typically needed at a slower pace than video framerates. This performance highlights the practical viability of our method in surgical applications.
A quantitative evaluation of our algorithm was conducted using a laparoscopic dataset that encompasses diverse cases, deformations, and realistic liver poses. Our method shows promise for AR applications in laparoscopic liver surgery across all four cases. Sources of error include contour length, natural organ deformation, and landmark inaccuracies, as observed in simulations. Errors from liver segmentation also contribute, particularly when parts of the liver are occluded by other organs, leading to partial contours that may not accurately represent the true shape due to the viewing angle. We also assessed the impact of the deformation field, noting its significant influence on registration error, especially in the presence of large deformations.
The assumption of known camera intrinsics is a key part of our method. However, we would like to clarify that the camera calibration process typically requires only one to two minutes and can be performed simultaneously during surgery. Even if the focal length is adjusted during the procedure, the calibration can be easily updated without significant disruption to the surgery. Therefore, there is no need for lengthy training times during surgery, and the calibrated camera can be quickly obtained, ensuring that the method remains feasible in a real surgical environment.
While our method demonstrates robust performance on liver models under various deformations and occlusions, certain limitations remain. First, generalization to other organs with different geometries, such as kidneys or spleen, may require retraining or adaptation of the network, as organ-specific shape characteristics could affect the contour-based registration. Second, although silhouette contours are generally resilient to lighting changes, extreme lighting variations or shadows may impact contour extraction quality and subsequently registration accuracy. Third, heavy occlusions, especially when large portions of the organ are missing from view, can degrade performance due to reduced available features for matching. Future work will focus on enhancing robustness to these challenges, such as integrating multi-view information and domain adaptation techniques to better handle different organs and imaging conditions.
Conclusions
We propose an NRIVR method for aligning a 3D liver mesh, derived from a preoperative CT scan, with the surface reconstruction from an intraoperative laparoscopic video feed. We have validated its performance on synthetic data, a phantom dataset, and retrospective clinical data, addressing the challenges associated with laparoscopic surfaces. Our results suggest that this method could provide an automatic solution for achieving accurate initial alignment between the two surfaces, given the appropriate features. Additionally, it does not require advanced hardware, making it accessible and relatively straightforward to implement in a clinical setting. Future work will focus on integrating the deformation field with the LCENN to enhance the method further.
Acknowledgments
None.
Footnote
Data Sharing Statement: Available at https://qims.amegroups.com/article/view/10.21037/qims-2025-387/dss
Funding: This work was supported by the National Natural Science Foundation of China (Nos. 62473258, W2521022, 62133009, 62211540723); National Key R&D Program of China (Nos. 2024YFB4708802, 2024YFE0198200); Laboratory Open Fund of Key Technology and Materials in Minimally Invasive Spine Surgery (No. 2024JZWC-ZDA03); and The Fundamental Research Funds for the Central Universities (Nos. YG2023ZD05, YG2023ZD14); the Project of Shanghai Key Laboratory of Flexible Medical Robotics; and the Project of Shanghai Jiao Tong University Medical Robotics Institute Research.
Conflicts of Interest: All authors have completed the ICMJE uniform disclosure form (available at https://qims.amegroups.com/article/view/10.21037/qims-2025-387/coif). The authors have no conflicts of interest to declare.
Ethical Statement: The authors are accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0/.
References
- Giannone F, Felli E, Cherkaoui Z, Mascagni P, Pessaux P. Augmented Reality and Image-Guided Robotic Liver Surgery. Cancers (Basel) 2021.
- Gholizadeh M, Bakhshali MA, Mazlooman SR, Aliakbarian M, Gholizadeh F, Eslami S, Modrzejewski A. Minimally invasive and invasive liver surgery based on augmented reality training: a review of the literature. J Robot Surg 2023;17:753-63. [Crossref] [PubMed]
- Golse N, Petit A, Lewin M, Vibert E, Cotin S. Augmented Reality during Open Liver Surgery Using a Markerless Non-rigid Registration System. J Gastrointest Surg 2021;25:662-71. [Crossref] [PubMed]
- Luo H, Yin D, Zhang S, Xiao D, He B, Meng F, Zhang Y, Cai W, He S, Zhang W, Hu Q, Guo H, Liang S, Zhou S, Liu S, Sun L, Guo X, Fang C, Liu L, Jia F. Augmented reality navigation for liver resection with a stereoscopic laparoscope. Comput Methods Programs Biomed 2020;187:105099. [Crossref] [PubMed]
- Ramalhinho J, Yoo S, Dowrick T, Koo B, Somasundaram M, Gurusamy K, Hawkes DJ, Davidson B, Blandford A, Clarkson MJ. The value of Augmented Reality in surgery - A usability study on laparoscopic liver surgery. Med Image Anal 2023;90:102943. [Crossref] [PubMed]
- Jud L, Fotouhi J, Andronic O, Aichmair A, Osgood G, Navab N, Farshad M. Applicability of augmented reality in orthopedic surgery - A systematic review. BMC Musculoskelet Disord 2020;21:103. [Crossref] [PubMed]
- Sun Q, Mai Y, Yang R, Ji T, Jiang X, Chen X. Fast and accurate online calibration of optical see-through head-mounted display for AR-based surgical navigation using Microsoft HoloLens. Int J Comput Assist Radiol Surg 2020;15:1907-19. [Crossref] [PubMed]
- Vijayan S, Reinertsen I, Hofstad EF, Rethy A, Hernes TA, Langø T. Liver deformation in an animal model due to pneumoperitoneum assessed by a vessel-based deformable registration. Minim Invasive Ther Allied Technol 2014;23:279-86. [Crossref] [PubMed]
- Wise PA, Preukschas AA, Özmen E, Bellemann N, Norajitra T, Sommer CM, Stock C, Mehrabi A, Müller-Stich BP, Kenngott HG, Nickel F. Intraoperative liver deformation and organ motion caused by ventilation, laparotomy, and pneumoperitoneum in a porcine model for image-guided liver surgery. Surg Endosc 2024;38:1379-89. [Crossref] [PubMed]
- Johnsen SF, Thompson S, Clarkson MJ, Modat M, Song Y, Totz J, Gurusamy K, Davidson B, Taylor ZA, Hawkes DJ, Ourselin S. Database-based estimation of liver deformation under pneumoperitoneum for surgical image-guidance and simulation. Medical Image Computing and Computer-Assisted Intervention--MICCAI 2015: 18th International Conference. 2015:450-8.
- Deng B, Yao Y, Dyke RM, Zhang J. A survey of non-rigid 3d registration. Comput Graph Forum 2022;41:559-89.
- Hu J, Jones D, Dogar MR, Valdastri P. Occlusion-robust autonomous robotic manipulation of human soft tissues with 3-D surface feedback. IEEE Trans Robot 2024;40:624-38.
- Hu J, Liu J, Guo Y, Cao Z, Chen X, Zhang C. A collaborative robotic platform for sensor-aware fibula osteotomies in mandibular reconstruction surgery. Comput Biol Med 2023;162:107040. [Crossref] [PubMed]
- Yang Z, Simon R, Linte CA. Learning feature descriptors for pre- and intra-operative point cloud matching for laparoscopic liver registration. Int J Comput Assist Radiol Surg 2023;18:1025-32. [Crossref] [PubMed]
- Zhang Y, Zou Y, Liu PX. Point Cloud Registration in Laparoscopic Liver Surgery Using Keypoint Correspondence Registration Network. IEEE Trans Med Imaging 2025;44:749-60. [Crossref] [PubMed]
- Labrunie M, Ribeiro M, Mourthadhoi F, Tilmant C, Le Roy B, Buc E, Bartoli A. Automatic preoperative 3d model registration in laparoscopic liver resection. Int J Comput Assist Radiol Surg 2022;17:1429-36. [Crossref] [PubMed]
- Koo B, Robu MR, Allam M, Pfeiffer M, Thompson S, Gurusamy K, Davidson B, Speidel S, Hawkes D, Stoyanov D, Clarkson MJ. Automatic, global registration in laparoscopic liver surgery. Int J Comput Assist Radiol Surg 2022;17:167-76. [Crossref] [PubMed]
- Robu MR, Ramalhinho J, Thompson S, Gurusamy K, Davidson B, Hawkes D, Stoyanov D, Clarkson MJ. Global rigid registration of CT to video in laparoscopic liver surgery. Int J Comput Assist Radiol Surg 2018;13:947-56. [Crossref] [PubMed]
- Pei J, Cui R, Li Y, Si W, Qin J, Heng PA. Depth-Driven Geometric Prompt Learning for Laparoscopic Liver Landmark Detection. Proceedings of Medical Image Computing and Computer Assisted Intervention – MICCAI 2024. doi:
10.1007/978-3-031-72089-5_15 - Plantefève R, Haouchine N, Radoux JP, Cotin S. Automatic alignment of pre and intraoperative data using anatomical landmarks for augmented laparoscopic liver surgery. In: Bello F, Cotin S, editors. Biomedical Simulation. ISBMS 2014. 2014:58-66.
- Espinel Y, Özgür E, Calvet L, Le Roy B, Buc E, Bartoli A. Combining Visual Cues with Interactions for 3D-2D Registration in Liver Laparoscopy. Ann Biomed Eng 2020;48:1712-27. [Crossref] [PubMed]
- Clements LW, Chapman WC, Dawant BM, Galloway RL Jr, Miga MI. Robust surface registration using salient anatomical features for image-guided liver surgery: algorithm and validation. Med Phys 2008;35:2528-40. [Crossref] [PubMed]
- Koo B, Özgür E, Roy BL, Buc E, Bartoli A. Deformable Registration of a Preoperative 3D Liver Volume to a Laparoscopy Image Using Contour and Shading Cues. Medical Image Computing and Computer Assisted Intervention − MICCAI 2017;2017:326-34.
- Espinel Y, Calvet L, Botros K, Buc E, Tilmant C, Bartoli A. Using multiple images and contours for deformable 3D-2D registration of a preoperative CT in laparoscopic liver surgery. Int J Comput Assist Radiol Surg 2022;17:2211-9. [Crossref] [PubMed]
- Schmidt A, Mohareri O, DiMaio S, Yip MC, Salcudean SE. Tracking and mapping in medical computer vision: A review. Med Image Anal 2024;94:103131. [Crossref] [PubMed]
- Adagolodjo Y, Trivisonne R, Haouchine N, Cotin S, Courtecuisse H. Silhouette-based pose estimation for deformable organs application to surgical augmented reality. 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). 2017:539-44.
- Özgür E, Koo B, Le Roy B, Buc E, Bartoli A. Preoperative liver registration for augmented monocular laparoscopy using backward-forward biomechanical simulation. Int J Comput Assist Radiol Surg 2018;13:1629-40. [Crossref] [PubMed]
- Nakao M, Nakamura M, Matsuda T. Image-to-Graph Convolutional Network for 2D/3D Deformable Model Registration of Low-Contrast Organs. IEEE Trans Med Imaging 2022;41:3747-61. [Crossref] [PubMed]
- Zhao Q, Chou CR, Mageras G, Pizer S. Local metric learning in 2D/3D deformable registration with application in the abdomen. IEEE Trans Med Imaging 2014;33:1592-600. [Crossref] [PubMed]
- Pfeiffer M, Riediger C, Leger S, Kühn JP, Seppelt D, Hoffmann RT, Weitz J, Speidel S. Non-rigid volume to surface registration using a data-driven biomechanical model. Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference 2020. 2020:724-34.
- Mhiri I, Pizarro D, Bartoli A. Neural patient-specific 3D-2D registration in laparoscopic liver resection. Int J Comput Assist Radiol Surg 2025;20:57-64. [Crossref] [PubMed]
- Labrunie M, Pizarro D, Tilmant C, Bartoli A. Automatic 3D/2D deformable registration in minimally invasive liver resection using a mesh recovery network. Medical Imaging with Deep Learning 2023;227:1104-23.
- Walch F, Hazirbas C, Leal-Taixe L, Sattler T, Hilsenbeck S, Cremers D. Image-based localization using lstms for structured feature correlation. Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV) 2017:627-37.
- Zhang X, Zhao Y, Guo K, Li G, Deng N. An Adaptive B-Spline Neural Network and Its Application in Terminal Sliding Mode Control for a Mobile Satcom Antenna Inertially Stabilized Platform. Sensors (Basel) 2017;17:978. [Crossref] [PubMed]
- Amberg B, Romdhani S, Vetter T. Optimal Step Nonrigid ICP Algorithms for Surface Registration. 2007 IEEE Conference on Computer Vision and Pattern Recognition; 2007:1-8.
Xiang Y Schmidt T Narayanan V Fox D. PoseCNN: A convolutional neural network for 6D object pose estimation in cluttered scenes. arXiv: 1711.00199.- Peng S, Liu Y, Huang Q, Zhou X, Bao H. PVNet: Pixel-Wise Voting Network for 6DoF Pose Estimation. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); 2019:4561-70.
- Hoda T, Haluza P, Obdržálek, Matas J, Lourakis M, Zabulis X. T-LESS: An RGB-D Dataset for 6D Pose Estimation of Texture-less Objects. IEEE Winter Conference on Applications of Computer Vision (WACV); 2017:880-8.
- Li M, Kambhamettu C, Stone M. Automatic contour tracking in ultrasound images. Clin Linguist Phon 2005;19:545-54. [Crossref] [PubMed]
- Pfeiffer M, Funke I, Robu MR, Bodenstedt S, Strenger L, Engelhardt S, Roß T, Clarkson MJ, Gurusamy K, Davidson BR, Maier-Hein L, Riediger C, Welsch T, Weitz J, Speidel S. Generating Large Labeled Data Sets for Laparoscopic Image Processing Tasks Using Unpaired Image-to-Image Translation. Medical Image Computing and Computer Assisted Intervention – MICCAI 2019;119-27.




