Contact Us
CiiS Lab
Johns Hopkins University
112 Hackerman Hall
3400 N. Charles Street
Baltimore, MD 21218
Directions
Lab Director
Russell Taylor
127 Hackerman Hall
rht@jhu.edu
Last updated: April 7
In endoscopic surgery, it is often beneficial to reconstruct the 3D geometry of the internal anatomy. Structure from motion (SfM) is an algorithm that treats each video frame as a single projection and computes the geometry of the entire scene. Attempts to use structure from motion to reconstruct 3D anatomy from endoscopic video data have had some success. It is the goal of this project to create a robust SfM pipeline that is able to compute the geometry of the internal anatomy from long videos recorded in various endoscopic surgeries.
While structure from motion is a well developed algorithm, endoscopic images introduce a number of challenges including low texture, abundant specularities, extreme illumination changes from the light source attached to the endoscope, and blurring from the movement of the endoscope. A structure (from motion) recovery algorithm was developed by Wang et al. at Johns Hopkins University which integrated an estimator that could handle as many as 80% outliers in the reconstructed 3D points. This algorithm used a scale-invariant feature transform (SIFT) feature-extraction method and could reconstruct the main structure of the surrounding tissues of the sinus given a 30 fps sinus endoscopy video. This work demonstrated that it was possible to develop a structure from motion algorithm that is at least partially robust to significant blurring, illumination changes, and geometry distortion.
Work done by Leonard et al. at Johns Hopkins University demonstrated a SfM and iterative closest point (ICP) algorithm that was able to register video to CT data with an average registration error of 1.21mm for erectile tissue and 0.91mm for non-erectile tissue. This work supports the use and feasability of such a pipeline for Funcitonal Endoscopic Sinus Surgery (FESS); it registers 70% of the 3D points accurately within the average error, but it was not shown to work for surgeries other than FESS. Additionally, the videos analyzed in this work were typically between 15-30 frames at 30 fps (.5 - 1 sec); longer video sequences introduce new challenges including features that are visible in only a small set of the frames and features that look vastly different between frames with largely different poses and illumination.
Feature Extraction and Matching: SIFT
The first step of the SfM algorithm is to find matchable points between all (or many) frames in the video. A Scale-Invariant Feature Transform (SIFT) method was developed by David G. Lowe at the University of British Columbia. SIFT is able to find common points in images with varying pose, illumination, orientation, and scale. The features found are highly distinctive, so they can be easily matched between frames. For a standard image of size 500×500, about 2000 stable features are found on average using this method. This number will likely be lower for our endoscopic data, as there are many regions in the videos that are low-texture and solid-colored.
From early observation, many of the problems associated with the current SfM pipeline reside in the feature detection and matching. Large, smooth surfaces are hard to track, especially with changing poses and illumination. For this reason, the matching between features will be a large focus of this project. This will require an in depth analysis of the current matching protocol to determine what is causing feature mismatching. For long video sequences, many features that were visible in the early frames move off-screen. It is important that the algorithm is aware that these features have moved off-screen so it does not attempt to match these features with what would be incorrect matches. Additionally, the algorithm must be continuously adding new features as different structures come into view.
Zone-Matching SIFT
A team at Beijing Institute of Technology led by Pengfei Du worked to develop a novel improvement to the SIFT algorithm. In their paper Improved SIFT Matching Algorithm for 3D Reconstruction of Endoscopic Images, they provide the framework for such an improvement, which they call Zone Matching SIFT. Their goal in improving this algorithm was to increase the total number of matched pairs while reducing computation time; they show that this improvement is beneficial for endoscopic video data.
The core matching method in Zone Matching SIFT relies on endoscopic camera's high frame rate and small field of view (FOV). It considers only potential matches within a small patch surrounding where the initial point was located. This increases the accuracy of the match and decreases the total number of points that need to be checked, increasing computation speed. The paper presents four methods to find matching features in a reasonable patch (40x40p) around the initial feature.
I implement the sort-by-x method: features are sorted by x-coordinate in ascending order such that features in range (x-20, x+20) can be accessed in constant time. A matching feature is found by reducing the pool of potential matches by similarity of y-coordinate, then searching through the remaining potential matches.
Camera Pose Reconstruction
Once the 3D geometry of the surgical area is reconstructed, the relative pose of the camera in each frame can be calculated. Given a good initial guess (i.e. from a magnetic tracker), the absolute position and pose of the camera can be computed.
- Past work from Dr. Leonard in C++ (resolved)
- OpenCV, OpenMVG (resolved)
- ROS (resolved)
- Ubuntu 14.04 VM (resolved)
- Access to Galen robot prototype 0 in mock-OR (in progress)
1.Leonard, et al. (2017). Image-based navigation for functional endoscopic sinus surgery using structure from motion.
2. Lowe, David G. (2004). Distinctive Image Features from Scale-Invariant Keypoints.
3. Wang, et al. (2008). Robust Motion Estimation and Structure Recovery from Endoscopic Image Sequences With an Adaptive Scale Kernel Consensus Estimator. {IEEE.}
4. P. Du, Y. Zhou, Q. Xing, and X. Hu, “Improved SIFT matching algorithm for 3D reconstruction from endoscopic images” VRCAI: Virtual Reality Continuum and its Applications in Industry, pp 561-564, New York, USA, 2011.
* SfM Code Package
* Documentation