Last updated: 18:21, April 3, 2022
Eustachian tube dysfunction is a disorder resulting from impairment in middle ear ventilation and pressure regulation. Patients affected with experience a myriad of symptoms such as ear pain, pressure, clicking, and difficulty hearing. Eustachian tube dilation allows for the surgical management of this condition but to date, there is a lack of a registration-based image-guided surgical system that utilizes CT segmentations. As a first step in the development of this surgical system, we aim to utilize deep learning to develop a platform for automated segmentation of the eustachian tube.

Figure 1 - Anatomy of the Eustachian tube with Near-by Critical Structures
Eustachian tube dilation is a procedure approved for the surgical management of eustachian tube dysfunction (ETD). ETD results from impairment of middle ear ventilation and pressure regulation. As a result, patients experience a range of symptoms ranging from ear pain, pressure, cracking, or difficulty hearing which has a significant impact on patients’ quality of life [1]. Given the proximity of the eustachian tube to certain critical structures such as the internal carotid artery, it is important to carefully review the preoperative scans and assess for anatomical variations present amongst patients. Currently, existing registration-segmentation propagation pipelines has varying accuracy and can be computationally expensive. Furthermore, there is a lack of image-registration surgical navigation system utilizing automated segmentations of preoperative CT’s for this procedure. Therefore, we aim to assess the utility of and develop a deep learning pipeline to perform automated segmentation of the eustachian tube, define near-by critical structures, and establish the first pipeline that can be integrated into a surgical navigation system. A paragraph or so here.
The specific aims for developing of the deep learning pipeline include:
Explain significance
The proposed pipeline will utilize a ground truth dataset which will be co-registered as part of the preprocessing. The data will then serve as input to the deep learning pipeline for performing semantic segmentation. The proposed deep learning algorithms include nnUNet, VoxelMorph, and DeepReg. The predicted segmentations will then be validated in comparison with the ground truth (Figure 1).
i. Data Preprocessing
In order to make the data compatible with the nnUNet, VoxelMorph and DeepReg frameworks, the raw images are required to be co-registered. We will be using ANTsPy, a Python library which includes blazing-fast IO, registration, segmentation, statistical learning, and visualization functionalities amongst others [1]. For this, we will randomly choose the first image as a template, register the remaining images to the template, and apply the ‘forward’ deformation field to each image to ensure co-aligned within the dataset. Furthermore, this will confirm that images have the same rotation, angle, and spacing.
ii. nnU-Net Algorithm
nnU-net algorithm is the first segmentation method designed to deal with dataset diversity found in the medical image segmentation domain [2]. For our project, we will be focusing on using nnU-net as the basis for our deep learning model for semantic segmentation of CT images (Figure 2). The workflow of nnU-net is as follows: nnU-net first uses its novel heuristic rule to determine the data-dependent hyperparameters, or data fingerprints, to automatically ingest the training data set. The blueprint parameters (such as loss function, and network architecture), inferred parameters (such as image resampling and batch size) along with the data fingerprint generate the pipeline fingerprints. The pipeline fingerprints then form network training for 2D, 3D, and 3D-Cascade U-Net using the hyperparameters determined so far. Using post-processing and ensembling strategies (i.e., assigning weights to each model and combining them together), the best configuration will be used by nnU-net to produce the final prediction.
A motivation behind using nnU-net is due to its ability to handle a wide variety of target structures [2]. Unlike other deep learning models, nnU-net is not a specialized solution for a certain type of data set, but rather an algorithm that is generalizable and has proven to surpass most existing approaches for data segmentation tasks. Furthermore, nnU-Net has a self-configuring ability in that it allows us to quickly train and use the model, which is computationally feasible. Finally, the results can serve as a benchmark that can be improved upon if the training is not successful.
Proposed workflow is shown in figure 3. First, we will obtain and preprocess our training (as discussed in previous section) and generate test data sets manually and feed them into the nnU-net. Then, we will see if the trained model generalizes well to test data (i.e., high Mean Surface Distance), and if yes (which is not likely at the first trial) we can move on to building and training other unsupervised models such as VoxelMorph and DeepReg and compare the results. If the training with nnU-net is not successful, we will consider our initial result as a baseline and attempt the following three modifications to the nnU-net algorithm. First, we will try CT-specific pre-processing which includes denoising, CT data interpolation with different splines, CT data registration, and finally windowing to increase the contrast across a region of interest which all can be improved to target CT data sets. Second, we will try manual adaptation of the loss function. nnU-net uses a dice loss (region-based loss function) OR a cross-entropy loss (distribution-based loss function) but we can try to cascade the distribution based loss and region based loss, or even try giving weights to the background area of the label, which can soften the hard label used in loss functions. This can result in a regularization effect, increasing the robustness of the model, lowering the chances of overfitting as suggested in recent research papers [4] that are trying to improve the dice loss for segmentation tasks. Finally, we can extend or modify the heuristics used in nnU-net as suggested in the original nnU-net paper if the training fails, because the current heuristics may not be generalized enough to handle our domain-specific head CT scans.
iii. VoxelMorph Algorithm VoxelMorph is a fast unsupervised-learning-based framework for deformable, pairwise medical image registration [3]. Compared to traditional registration methods, it treats registration as a function to map paired input images to a deformation field that make them aligned. Registration is formulated as an objective function and used in the convolution neural network to build the model that can optimize this function. In this algorithm, the first setting includes training the model to maximize standard image matching objective functions that are based on the image intensities. In the second setting, the auxiliary segmentations are leveraged in the training data, which increase accuracy when predicting on test datasets.
iv. nnUNet Model Validation
One measure used for model validation includes the dice similarity coefficient (DSC), a scoring system which measures volumetric overlap between two images. However, as the eustachian tube is a very thin structure, the conventional use of DSC is not appropriate for our project due to the Eustachian tube’s long narrow structure. Thus, we will use the following metrics that capture the structure similarity.
Heat Map
Compute the closest distance between each vertex of prediction mesh to ground truth mesh, draw the heat map about this closest distance with the prediction mesh.
Mean Surface Distance
The mean surface distance, dmean, is the distance between the the surface (S) and the reference surface (Sref) where d(S,Sref) is the mean of distances between every surface voxel in S and the closest surface voxel in Sref, while d(Sref,S) is computed in a similar way.
Weighted Hausdorff Distance (WHD)
The maximum Hausdorff distance (HD) is the maximum distance of a set to the nearest point in the other set. More formally, the maximum Hausdorff distance from set X to set Y is a max-min function, defined as:
WHD is similar to the maximum HD; however, it is based on the probability map for the region of interest. Larger weight will take more concerns and vice versa. The purpose for using WHD is to make the clinician focus on the important parts or the parts they expect to observe.
describe dependencies and effect on milestones and deliverables if not met
| Solution | Alternative | Status | Deadline | Effect | |
|---|---|---|---|---|---|
| Computation | Remote GPU access at Homewood | Google Colab or MARCC | Obtained GPU Access | Feb 15 | Will not be able to train neural network |
| Imaging Dataset | Access to Deidentified Head CT's | Public Dataset | Obtained access to JHU Dataset | Mar 15 | Will not be able to segment ROI |
| Imaging Labels | Manual Segmentations via 3D Slicer | Public Dataset with Labels | Currently performing manual segmentation of the CT's | Mar 25 | Will not be able to train neural network |
Reading Material
References
Here give list of other project files (e.g., source code) associated with the project. If these are online give a link to an appropriate external repository or to uploaded media files under this name space (2022-01).
Source + Version Control (GitHub): https://github.com/mikami520/CIS2-EustachianTube
Eustachian tube Dataset: https://teams.microsoft.com/l/team/19%3aM09Sep-UUQloVvhUHp_OQk7sKNRhfUd6s2LqZgAPBIg1%40thread.tacv2/conversations?groupId=fef5d618-9991-4439-9a14-fedc6818965b&tenantId=9fa4f438-b1e6-473b-803f-86f8aedf0dec
nnUNet (Source Code): https://github.com/MIC-DKFZ/nnUNet
VoxelMorph (Source Code): https://github.com/voxelmorph/voxelmorph
DeepReg (Source Code): https://github.com/DeepRegNet/DeepReg
MONAI (Source Code): https://github.com/Project-MONAI/MONAI