Structure From Motion From Long Video Sequences

Last updated: April 7


In endoscopic surgery, it is often beneficial to reconstruct the 3D geometry of the internal anatomy. Structure from motion (SfM) is an algorithm that treats each video frame as a single projection and computes the geometry of the entire scene. Attempts to use structure from motion to reconstruct 3D anatomy from endoscopic video data have had some success. It is the goal of this project to create a robust SfM pipeline that is able to compute the geometry of the internal anatomy from long videos recorded in various endoscopic surgeries.

  • Students: Zach Heiman
  • Mentor(s): Xingtong Liu

Background, Specific Aims, and Significance

While structure from motion is a well developed algorithm, endoscopic images introduce a number of challenges including low texture, abundant specularities, extreme illumination changes from the light source attached to the endoscope, and blurring from the movement of the endoscope. A structure (from motion) recovery algorithm was developed by Wang et al. at Johns Hopkins University which integrated an estimator that could handle as many as 80% outliers in the reconstructed 3D points. This algorithm used a scale-invariant feature transform (SIFT) feature-extraction method and could reconstruct the main structure of the surrounding tissues of the sinus given a 30 fps sinus endoscopy video. This work demonstrated that it was possible to develop a structure from motion algorithm that is at least partially robust to significant blurring, illumination changes, and geometry distortion.

Work done by Leonard et al. at Johns Hopkins University demonstrated a SfM and iterative closest point (ICP) algorithm that was able to register video to CT data with an average registration error of 1.21mm for erectile tissue and 0.91mm for non-erectile tissue. This work supports the use and feasability of such a pipeline for Funcitonal Endoscopic Sinus Surgery (FESS); it registers 70% of the 3D points accurately within the average error, but it was not shown to work for surgeries other than FESS. Additionally, the videos analyzed in this work were typically between 15-30 frames at 30 fps (.5 - 1 sec); longer video sequences introduce new challenges including features that are visible in only a small set of the frames and features that look vastly different between frames with largely different poses and illumination.


  • Minimum: (Completed)
    1. Updated matching algorithm with zone-matching SIFT implementation
    2. Documentation on dependencies and file structure for setting up SfM pipeline
  • Maximum: (Expected by 5/15)
    1. Improved density of reconstruction using improved matching

Technical Approach

Feature Extraction and Matching: SIFT

The first step of the SfM algorithm is to find matchable points between all (or many) frames in the video. A Scale-Invariant Feature Transform (SIFT) method was developed by David G. Lowe at the University of British Columbia. SIFT is able to find common points in images with varying pose, illumination, orientation, and scale. The features found are highly distinctive, so they can be easily matched between frames. For a standard image of size 500×500, about 2000 stable features are found on average using this method. This number will likely be lower for our endoscopic data, as there are many regions in the videos that are low-texture and solid-colored.

From early observation, many of the problems associated with the current SfM pipeline reside in the feature detection and matching. Large, smooth surfaces are hard to track, especially with changing poses and illumination. For this reason, the matching between features will be a large focus of this project. This will require an in depth analysis of the current matching protocol to determine what is causing feature mismatching. For long video sequences, many features that were visible in the early frames move off-screen. It is important that the algorithm is aware that these features have moved off-screen so it does not attempt to match these features with what would be incorrect matches. Additionally, the algorithm must be continuously adding new features as different structures come into view.

Zone-Matching SIFT

A team at Beijing Institute of Technology led by Pengfei Du worked to develop a novel improvement to the SIFT algorithm. In their paper Improved SIFT Matching Algorithm for 3D Reconstruction of Endoscopic Images, they provide the framework for such an improvement, which they call Zone Matching SIFT. Their goal in improving this algorithm was to increase the total number of matched pairs while reducing computation time; they show that this improvement is beneficial for endoscopic video data.

The core matching method in Zone Matching SIFT relies on endoscopic camera's high frame rate and small field of view (FOV). It considers only potential matches within a small patch surrounding where the initial point was located. This increases the accuracy of the match and decreases the total number of points that need to be checked, increasing computation speed. The paper presents four methods to find matching features in a reasonable patch (40x40p) around the initial feature.

I implement the sort-by-x method: features are sorted by x-coordinate in ascending order such that features in range (x-20, x+20) can be accessed in constant time. A matching feature is found by reducing the pool of potential matches by similarity of y-coordinate, then searching through the remaining potential matches.

Camera Pose Reconstruction

Once the 3D geometry of the surgical area is reconstructed, the relative pose of the camera in each frame can be calculated. Given a good initial guess (i.e. from a magnetic tracker), the absolute position and pose of the camera can be computed.


- Past work from Dr. Leonard in C++ (resolved)

- OpenCV, OpenMVG (resolved)

- ROS (resolved)

- Ubuntu 14.04 VM (resolved)

- Access to Galen robot prototype 0 in mock-OR (in progress)

Milestones and Status

  1. Milestone name: Sort through existing code, set up ROS, OpenCV, OpenMVG C++ environment; explore and compare different feature extraction and matching methods
    • Planned Date: 3/16
    • Status: completed
  2. Milestone name: Improve matching algorithm with Zone-matching implementation
    • Planned Date: 4/14
    • Status: completed
  3. Milestone name: Update pipeline for improved reconstruction from matching algorithm
    • Planned Date: 5/6
    • Expected Date: 5/20
    • Status: in progress

Reports and presentations

Project Bibliography

1.Leonard, et al. (2017). Image-based navigation for functional endoscopic sinus surgery using structure from motion.

2. Lowe, David G. (2004). Distinctive Image Features from Scale-Invariant Keypoints.

3. Wang, et al. (2008). Robust Motion Estimation and Structure Recovery from Endoscopic Image Sequences With an Adaptive Scale Kernel Consensus Estimator. {IEEE.}

4. P. Du, Y. Zhou, Q. Xing, and X. Hu, “Improved SIFT matching algorithm for 3D reconstruction from endoscopic images” VRCAI: Virtual Reality Continuum and its Applications in Industry, pp 561-564, New York, USA, 2011.

Other Resources and Project Files

* SfM Code Package

  • Read documentation to learn how to set up environment and build package

* Documentation

courses/456/2018/456-2018-22/project-22.txt · Last modified: 2019/08/07 12:01 (external edit)