Last updated: 05/10/2023 06:42PM
This project intends to evaluate the accuracy of an existing image-based 3D reconstruction pipeline of the sinus anatomy by implementing a framework for global and local registration to the ground-truth CT scan. The initial evaluation of the pipeline will then serve as a baseline for subsequent changes to investigate the influence of depth uncertainties and integrate robot kinematics.
Sample Dense Reconstruction and CT Scan

Nasal obstructions are a common clinical problem that can significantly impact patients' quality of life. Septoplasty and turbinate reduction are two of the most common surgical interventions used to address this problem. Clinical studies have provided evidence towards positive impact of these interventions in patients’ quality of life. However, these assessments rely on subjective measures such as patient-reported symptoms and there is limited quantitative evidence to support the effectiveness of these surgeries. This creates a need for objective evaluation methods to inform clinicians about which patients are most likely to benefit from surgical intervention and provide tools for longitudinal assessment of the surgical outcomes.
Quantitative evaluation of the patient sinus anatomy requires geometric information of the nasal cavity. Computed tomography (CT) scans provide this information, however, these scans are expensive and expose patients to harmful radiation. This motivates an image-centered approach that leverages routine endoscopy procedures that physicians use to examine patients. An accurate 3D model of the sinus anatomy is required to retrieve clinically relevant parameters, such as aperture and volume, which can be used for quantitative assessment of patient anatomy.
Liu et. al developed a pipeline to generate a dense 3D reconstruction of the sinus anatomy from endoscopic video as shown in Figure 1.
Figure 1. Dense reconstruction pipeline adapted from X. Liu et al., “Reconstructing sinus anatomy from endoscopic video–towards a radiation-free approach for quantitative longitudinal assessment,” MICCAI 2020.
The pipeline utilizes a Structure from Motion (SfM) algorithm which matches common visible points in input images to produce a 3D structure of the object in the image. The visible points are matched using dense feature descriptors which are extracted using a deep learning model. These dense descriptors are then integrated into SfM to produce a dense point cloud of the anatomy and the camera trajectory for each frame in the input sequence. The sequence is also used to estimate the depth of the images from the camera using a deep learning model. The depth estimation, point cloud, and camera trajectories are then used in a depth fusion method to produce a 3D reconstruction of the sinus cavity. Currently, there is no assessment framework towards the correctness of the image-based 3D reconstruction.
The main goal of this project is to implement a quantitative framework to evaluate the accuracy of the dense reconstruction based on the ground truth CT. The development of this assessment and framework would enable further research towards the usage of the sinus reconstruction pipeline in clinical settings. The specific aims of this project are listed as follows:
This project requires input data of endoscopic video sequences and CT scans of the same sinus anatomy. The data utilized for the maximum goal requires sequences obtained using the Galen robot to retrieve corresponding robot kinematics at the time of video capture. The endoscopic sequence for the dense reconstruction pipeline and the resulting 3D structure will be used for the registration with the corresponding CT scan to report an accuracy evaluation.
The input data must be pre-processed for use in this project. The endoscopic video and robot kinematics are collected using a Robot Operating System based platform and need to be extracted to use as input in the dense reconstruction pipeline. The image frames of the video sequence will be used in the pipeline to generate the 3D reconstruction. The CT scans will also need to be segmented using 3D slicer. The 3D reconstruction and segmented CT will then be used towards the rigid registration.
The general approach is to first implement the registration framework and evaluation methods, analyze uncertainties in the dense reconstruction, and finally integrate robot kinematics in the registration. The proposed workflow of the project is shown in Figure 2.
Figure 2. Proposed workflow of planned modifications and implementations shown in blue.
Based on initial registration results and changes in the deliverables, the final registration workflow is shown in Figure 3.
Figure 3. Final registration workflow.
Registration is dependent on input data of the dense reconstruction (DRECO) pipeline output (processed image sequence, estimated camera trajectories from SfM, and fused mesh of sinus anatomy), the ground-truth anatomical structure of the sinus from the corresponding CT scan and optical tracking data of the endoscope camera. The collected video sequence was preprocessed for use in the DRECO pipeline by curating the images to isolate the subsequence of frames that capture the sinus cavity. The input image frames were also downsampled and undistorted based on checkerboard camera calibration. The preprocessed images were then used as input to the DRECO pipeline. The CT scans were processed using 3D Slicer to segment both the attached marker spheres and sinus anatomy. Instructions for segmentation can be found on the GitHub repository here.
I plan to integrate rigid registration methods including the iterative closest point (ICP) algorithm and iterative most likely point algorithm to register the dense reconstruction to the corresponding CT image. The iterative most likely point algorithm and variations will be integrated using the cisstICP library available on Github. These methods will be used to perform both local and global registrations. At the global scale, the entire reconstruction will be used for the registration. In order to perform local registration, I plan to implement methods to isolate specific anatomical regions of interest in the reconstruction by isolating a subset of frames from the input video sequence to reconstruct only a portion of the sinus anatomy. The resulting reconstruction will then be used to apply local registration to the CT.
Additionally, I plan to develop methods to report evaluation metrics and visualizations for both quantitative and qualitative assessment of the registration. This will include a summary of the error magnitude between projected points of the reconstruction to the ground truth CT points and an overlay of the registered reconstruction and CT. I plan to include visual differences of the points in the overlay to allow for clearer comparison of the variation in error magnitude. This framework will allow me to produce a baseline evaluation of the accuracy of the 3D sinus reconstruction to use as a point of comparison for subsequent changes.
Direct rigid registration of sampled point clouds from the DRECO and CT meshes fails since the dense reconstruction only constructs a portion of the segmented anatomy present in the CT mesh. Therefore, this framework required advanced options including camera pose, keypoint, and coherent point drift registration to align the meshes.
The segmented anatomy marker spheres are required to obtain the ground-truth positions of the endoscope camera in CT space. Based on the segmentation, the centers of these spheres are extracted using a sphere-fitting algorithm to register the CT to the tracked marker geometry of the anatomy recorded by the NDI Polaris tracker. A checkerboard hand-eye calibration was also done to determine the transformation between the endoscope camera and endoscope marker sphere geometry. These transformations along with the tracked endoscope may then be used to compute the ground-truth position of the camera in CT space.
The tracked positions were also manually adjusted by comparing the recorded images and CT renderings when large errors were observed (camera position was outside of the anatomy). This alignment transformation was applied at the beginning of the chain in Equation 1.
The DRECO pipeline estimates the camera trajectories of each input image frame in the SfM algorithm which can be matched to the corresponding ground- truth position of the endoscope for rigid registration. The resulting transforma- tion was then used to transform the DRECO mesh to CT space and initialize the iterative closest point algorithm for Video-CT registration.
The estimated camera trajectories were observed to have large variations com- pared to the ground-truth, reducing the accuracy of the overall registration. Due to these errors, local reconstructions of the sinus anatomy were also evaluated based on a section of the original input sequence as shown in Figure 4.
Figure 4. Camera pose registration.
In addition to rigid registration methods, the Coherent Point Drift (CPD) reg- istration algorithm was also investigated. CPD is a probabilistic method integrated for rigid and affine point cloud registration between sampled points from the dense reconstruction and CT meshes. This method optimizes regis- tration based on the most likely shape of the DRECO mesh within the CT structure, considering that the computed mesh only represents the section of the sinus anatomy visible in the input video.
Table 1 displays the mean errors of the various registration algorithms. Rigid camera pose registration using ICP had the lowest translational error, ranging from 1 to 3mm differences, whereas both Coherent Point Drift algorithms had translational errors on the magnitude of centimeters for the transformed camera poses. All three registration results have significant rotational error, ranging from 12 to 21 degrees. These errors are the same for each type as only the camera position was transformed.
Table 1. Comparison of various registration types using the entire image se- quence (indexes 0 - 1059) and multiple sections for local registration reported as the mean across poses, pixels, and sampled points for camera pose, scale invariant depth, and mesh distance errors, respectively.
The CPD rigid and affine registration algorithms have lower errors in the mean distance between meshes but based on visual inspection of the layered meshes and depth renderings, this does not seem to mean that the anatomy is more aligned. The smaller magnitudes may be a result of the intricate sinus anatomy as the closest points between the meshes do not necessarily correspond to the same points within the sinus cavity.
It is expected that the camera pose + ICP registrations have lower pose errors and the CPD registrations have lower mesh errors because these algorithms com- pute the transformation by minimizing those parameters. The scale invariant depth error serves as a metric independent of the registration. Since the camera pose + ICP registration type had the smallest error in the depth renderings, this algorithm was used to further investigate adjustments to the depth fusion step in the dense reconstruction pipeline.
The dense reconstruction pipeline utilizes depth estimations in addition the SfM point cloud and camera trajectories to generate the 3D structure. This information is integrated into a fusion method which resolves variation between the estimates of common points in multiple frames of the input sequence. The fusion method currently considers every estimate equally; however, the points in the sinus anatomy that are further away from the camera when the image is captured is shown to have more uncertainty as seen in Figure 5.
Figure 5. Heat map of mean and standard deviation of depth estimates with corresponding input image. The deeper mean depth estimation, meaning further away from the camera at capture, exhibits higher uncertainty. Adapted from X. Liu et al., “Dense depth estimation in monocular endoscopy with self-supervised learning methods,” IEEE transactions on medical imaging, 2019.
We hypothesize that this uncertainty may be introducing errors which are propagated into the reconstruction. To further investigate, I generated reconstructions using inverse weighting based on uncertainty to reduce the influence on the final depth estimates. I also removed outliers
outside of the 68th percentile (one standard deviation) of uncertainty and mean depth estimations. These alternate reconstructions are shown in Figure 6 and were also evaluated using the implemented registration framework.
Figure 6. The resulting dense reconstructions with different adjustment schemes (from left to right): original, weighting by uncertainty, large depths removed, and outliers removed.
Robot kinematic data has been collected which corresponds to the endoscopic video sequences and CT scans. This data provides information related to the location of the endoscope camera. The SfM in the current reconstruction pipeline produces camera trajectories which is used as input to the depth fusion and surface extraction to produce the 3D structure. I plan to further analyze this step in the pipeline to develop a stronger understanding of the current implementation using the camera trajectory input. I will then work towards integrating the additional robot kinematic data to fine-tune the SfM trajectories. This integration will produce another reconstruction which will also be evaluated using the rigid registration methods.
[1] M. Lavinsky‐Wolff et al., “Effect of turbinate surgery in rhinoseptoplasty on quality‐of‐life and acoustic rhinometry outcomes: a randomized clinical trial,” The Laryngoscope, vol. 123, no. 1, pp. 82-89, 2013.
[2] M. L. Hytönen, M. Lilja, A. A. Mäkitie, H. Sintonen, and R. P. Roine, “Does septoplasty enhance the quality of life in patients?,” European archives of oto-rhino-laryngology, vol. 269, pp. 2497-2503, 2012.
[3] D. Roblin and R. Eccles, “What, if any, is the value of septal surgery?,” Clinical Otolaryngology & Allied Sciences, vol. 27, no. 2, pp. 77-80, 2002.
[4] X. Liu et al., “Reconstructing sinus anatomy from endoscopic video–towards a radiation-free approach for quantitative longitudinal assessment,” in Medical Image Computing and Computer Assisted Intervention–MICCAI 2020: 23rd International Conference, Lima, Peru, October 4–8, 2020, Proceedings, Part III 23, 2020: Springer, pp. 3-13.
[5] J. L. Schonberger and J.-M. Frahm, “Structure-from-motion revisited,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 4104-4113.
[6] X. Liu et al., “Dense depth estimation in monocular endoscopy with self-supervised learning methods,” IEEE transactions on medical imaging, vol. 39, no. 5, pp. 1438-1447, 2019.
[7] R. Kikinis, S. D. Pieper, and K. G. Vosburgh, “3D Slicer: a platform for subject-specific image analysis, visualization, and clinical support,” Intraoperative imaging and image-guided therapy, pp. 277-289, 2014.
[8] S. D. Billings, E. M. Boctor, and R. H. Taylor, “Iterative most-likely point registration (IMLP): A robust algorithm for computing optimal shape alignment,” PloS one, vol. 10, no. 3, p. e0117688, 2015.
[9] A. Sinha, “cisstICP Library,” ed. https://github.com/AyushiSinha/cisstICP: GitHub, 2019.
[10] B. Curless and M. Levoy, “A volumetric method for building complex models from range images,” in Proceedings of the 23rd annual conference on Computer graphics and interactive techniques, 1996, pp. 303-312.
Here give list of other project files (e.g., source code) associated with the project. If these are online give a link to an appropriate external repository or to uploaded media files under this name space (2023-12).