Automated Segmentation of Temporal Bone CT Imaging for Robot-Assisted Microsurgery
Summary
Background, Specific Aims, and Significance
Deliverables
Technical Approach
Dependencies
Milestones and Status
Reports and presentations
Project Bibliography
Other Resources and Project Files

Automated Segmentation of Temporal Bone CT Imaging for Robot-Assisted Microsurgery

Last updated: 3/31/2021 8:41

Summary

Surgery within the temporal bone involves maneuvering around small and geometrically complex anatomy, which poses a high risk of accidental injury. This has motivated several groups to build cooperative control robotic systems for temporal bone surgery in an effort to reduce hand tremor, increase economy of motion, and ultimately augment surgeon skill in this space. This project aims to provide an automated system for creating high-quality segmentations of patient temporal bone CTs, which can be used to inform semiautonomous surgical systems about critical patient anatomy that should be avoided.

Students: Andy Ding (andy.ding@jhmi.edu) & Jessica Soong (jsoong1@jhu.edu)
Mentor(s): Dr. Francis X. Creighton, Dr. Russell H. Taylor, Dr. Mathias Unberath, Max Zhaoshuo Li

Background, Specific Aims, and Significance

Background

Operating in the temporal bone and lateral skull base is technically challenging. This region contains a complex geometry of nerves, arteries, veins, the end-organs for both hearing and balance, as well as the cranial nerves responsible for speech and swallowing. To access this region, surgeons drill through varying densities of bone to identify surgical landmarks. In addition to the limited visibility of the surgical field and complex anatomical geometry in this space, critical anatomical structures are often within millimeters of each other.

Due to these conditions, temporal bone surgery poses a high risk of accidental damage to surrounding structures during free-hand procedures. For example, after cochlear implantation, cochlear implantation, 45% of patients experience changes in taste, with 20% of those patients having unresolved symptoms by the end of their follow-up period. In more rare cases, patients also are at risk for facial paralysis due to accidental damage to the facial nerve. Accidental damage to the brain or to the membrane surrounding the brain (dura) can lead to CSF leakage. Damage to the sigmoid sinus, which drains blood from the brain to the jugular vein, can lead to abnormal closure or even clotting of the sinus itself.

One possible solution in mitigating accidental damage to surrounding structures is using a cooperative control robot intraoperatively. Previously, our group in the LCSR has developed such a robot that holds on to the surgical drill, which the surgeon can freely control. Robot-assisted surgery has the potential to reduce hand tremor and limit movement around sensitive structures, thereby increasing patient safety and improving long-term outcomes. However, a key dependency for realizing this technology in the operating room is providing meaningful information about patient anatomy so that the robot can safely guide the surgeon throughout the procedure. Effectively, this means highlighting important structures on patient CT imaging that can be registered to a robotic system.

Previous Work

Our previous work in the LCSR has focused on segmenting CTs through registration methods. By manually segmenting a template CT, we are able to apply deformable registration methods to map (propagate) template segmentations to target CTs that have not been segmented before. These segmentations can then be locally optimized to produce a final segmentation for the target CT. With this method, we are able to achieve submillimeter accuracy for segmenting inner and middle ear structures, with an average surface distance of < 0.2 mm and almost 90% overlap with groundtruth segmentations.

Specific Aims

Generate realistic temporal bone CT data using statistical shape modeling on registered deformation fields.
Implement state state-of-the-art deep learning methods for semantic segmentation of the temporal bone.
To build the largest comprehensively annotated temporal bone CT database to date.

Significance

Successful completion of this project will allow for more complete virtual safety barriers for robot-assisted temporal bone surgery. It can also be used to generate patient-specific segmentations as learning cases for junior otologists. Finally, our project has the potential to create the most complete dataset for model training and research.

Aside from the NIH OpenEar dataset, we have the most complete segmentations of the temporal bone compared to any other group that has previously published in this area. In terms of anatomical boundaries of a mastoidectomy, which is the first step in virtually all temporal bone procedures, previous groups have only labeled one: the sigmoid sinus. Our datasets not only label the sigmoid sinus, but also label the surrounding brain and the external auditory canal, which are the remaining mastoidectomy boundaries. By segmenting these areas, a cooperative control robotic system can then be able to apply virtual safety barriers to each of these boundaries, thereby providing for safe drilling throughout the procedure.

Deliverables

Minimum: (Expected by 4/2/2021)
1. Statistical shape model of temporal bone CTs.

Expected: (Expected by 5/4/2021)
1. Code for nnU-Net implementation + trained models & documentation.
2. Internal Validation Results
3. External Validation Results

Maximum: (Expected by 5/11/2021)
1. Implementing GAN label refinement model for CT segmentation + trained models & documentation.
2. Final manuscript to be submitted.
3. High quality segmented temporal bone CT dataset built from trained models.

Technical Approach

Data Generation through Statistical Shape Modeling
1. Since our dataset is relatively small, we will generate new data to simulate different data types.
2. We will do this by finding the SSM of deformations from registering a template CT to many other (unlabeled) CTs.
3. The result will be that we can generate hundreds of deformable registrations and effectively create as many data points as needed.

nnUnet

3D patch based network, with a 2D network as well (unused).
The advantage of 3D convolutions over a 2D ensemble is that spatial information is retained, and there are fewer redundant convolutions, since the 2D ensemble will have to do 2D convolutions over the same voxels but in multiple directions.
This is advantageous especially in cases with extreme label imbalance, as it may reduce the intensity of floating islands or holes.
New bench-marking pipeline developed to standardized medical imaging.
33 top leader board results for 53 different datasets with this method.

GAN Label Refinement

The output of the raw nnUnet has many floating islands. Some post processing can be done to remove this but to do an end-to-end deep learning method we propose to use a GAN type set up to help refine the labels after the generator (nnUNet) network has been pretrained.
The output of the nnUnet is a noisy label, which will be fed into the discriminator along with the ground truth. The discriminator will learn to distinguish the ground truth from the noisy label, and this loss will be added to the overall loss. Then, nnUnet will undergo backprop to further refine the network weights.
The general idea here is that as the generator improves, the noisy labels will look closer to the ground truth labels, and as the generator improves, its adversary (discriminator) will as well.

Dependencies

(describe dependencies and effect on milestones and deliverables if not met)

Dr. Unberath Supervision Agreement (2/12)
- The need for this is that we need an expert in deep learning to be a consultant for the project. However, Max has deep learning experience and if this falls through we can continue with him as a lead.
- The effect is that we may run into some issues if Max's expertise cannot help us through some issues.
- This was resolved on 2/12, as Dr. Unberath has agreed to supervise the project.
Label/Annotation Finalization (2/12 → 4/4)
- This was not finished on time because we have two annotators, and the two annotators had vastly different quality of annotations as well as annotating conventions. The new expected date is 4/4.
- We have restructured the project goals significantly, partially due to this, as well as hardware issues. The effect is that our current models need to be retrained as they have been trained on inconsistent labels. The effect is reflected in the updated gantt chart, but the new maximum deliverables are still achievable. We can use the current model to develop a workflow for the GAN label refinement, as well as the external dataset and HQ dataset labeling tasks.
Workstation Arrival (2/27 → 4/15)
- Due to the delay and MARCC hardware debugging issues, project goals and timelines have been restructured. Training luckily needs very scant hyperparameter tuning, but due to the limited availability of the needed hardware on MARCC, training takes up to 2 weeks.
- Originally the effect of this delay was that it may affect how many networks we are able to train and decrease the number of networks we can compare to each other, which is true.
- However, the project is still on track to make for an interesting manuscript and shows promising results.

Milestones and Status

Finalize Ground Truth Labels:
- Planned Date: 2/12
- Expected Date: 4/4
- Status: 75%
Setup Environment
- Planned Date: 2/19
- Expected Date: 2/19
- Status: 100%
Build Statistical Shape Model
- Planned Date: 2/27
- Expected Date: 3/17
- Status: 100%
Finish Data Generation
- Planned Date: 3/17
- Expected Date: 4/2
- Status: 90%
nnUnet Implementation
- Planned Date: 3/25
- Expected Date: 4/18
- Status: 90%
GAN Label Refinement Implementation
- Planned Date: 4/30
- Expected Date: 4/30
- Status: 0%
External Dataset Validation
- Planned Date: 5/4
- Expected Date: 5/4
- Status: 10%
High Quality Dataset Labeling
- Planned Date: 4/15
- Expected Date: 5/4
- Status: 0%
Final Technical Report & Code
- Planned Date: 5/4
- Expected Date: 5/11
- Status: 0%

Reports and presentations

Project Plan
- Project plan presentation
- Project plan proposal
Project Background Reading
- See Bibliography below for links.
Project Checkpoint
- Project checkpoint presentation
Paper Seminar Presentations

Project Final Presentation
Project Final Report
- Final Report
- links to any appendices or other material

Project Bibliography

Cousins VC. Lateral skull base surgery: a complicated pursuit?. The Journal of Laryngology & Otology. 2008;122(3):221-229. doi:10.1017/s0022215107000436.
Lloyd S, Meerton L, Cuffa RD, Lavy J, Graham J. Taste change following cochlear implantation. Cochlear Implants International. 2007;8(4):203-210. doi:10.1179/cim.2007.8.4.203.
Fayad JN, Wanna GB, Micheletto JN, Parisier SC. Facial Nerve Paralysis Following Cochlear Implant Surgery. The Laryngoscope. 2003;113(8):1344-1346. doi:10.1097/00005537-200308000-00014.
Zanoletti E, Cazzador D, Faccioli C, Martini A, Mazzoni A. Closure of the sigmoid sinus in lateral skull base surgery. Acta Otorhinolaryngol Ital. 2014;34(3):184-188.
Razavi CR, Wilkening PR, Yin R, et al.. Image-Guided Mastoidectomy with a Cooperatively Controlled ENT Microsurgery Robot. Otolaryngology–Head and Neck Surgery. 2019;161(5):852-855. doi:10.1177/0194599819861526.
Sinha A, Leonard S, Reiter A, Ishii M, Taylor RH, Hager GD. Automatic segmentation and statistical shape modeling of the paranasal sinuses to estimate natural variations. In: ; 2016.. doi:10.1117/12.2217337.
Neves CA, Tran ED, Kessler IM, Blevins NH. Fully automated preoperative segmentation of temporal bone structures from clinical CT scans. Scientific Reports. 2021;11(1). doi:10.1038/s41598-020-80619-0.
Nikan S, Van Osch K, Bartling M, et al.. PWD-3DNet: A Deep Learning-Based Fully-Automated Segmentation of Multiple Structures on Temporal Bone CT Scans. IEEE Transactions on Image Processing. 2021;30:739-753. doi:10.1109/tip.2020.3038363.
Isensee F, Jaeger PF, Kohl SAA, Petersen J, Maier-Hein KH. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature Methods. 2021;18(2):203-211. doi:10.1038/s41592-020-01008-z.
Li, Xiaoguang & Gong, Zhaopeng & Yin, Hongxia & Zhang, Hui & Wang, Zhenchang & Zhuo, li. (2020). A 3D deep supervised densely network for small organs of human temporal bone segmentation in CT images. Neural Networks. 124. 10.1016/j.neunet.2020.01.005.
Gibson, Eli et al. “NiftyNet: a Deep-Learning Platform for Medical Imaging.” Computer Methods and Programs in Biomedicine 158 (2018): 113–122. Crossref. Web.
Liu S. et al. (2018) 3D Anisotropic Hybrid Network: Transferring Convolutional Features from 2D Images to 3D Anisotropic Volumes. In: Frangi A., Schnabel J., Davatzikos C., Alberola-López C., Fichtinger G. (eds) Medical Image Computing and Computer Assisted Intervention – MICCAI 2018. MICCAI 2018. Lecture Notes in Computer Science, vol 11071. Springer, Cham. doi:10.1007/978-3-030-00934-2_94.
Fauser, J., et al. (2019). “Toward an automatic preoperative pipeline for image-guided temporal bone surgery.” International Journal of Computer Assisted Radiology and Surgery 14(6): 967-976.

Other Resources and Project Files

Here give list of other project files (e.g., source code) associated with the project. If these are online give a link to an appropriate external repository or to uploaded media files under this name space (2021-04).

Table of Contents