Trace: • project-09

Predicting Hemorrhage Related Outcomes with CT Volumetry for Traumatic Hemothorax

Last updated: May 6, 2021

Summary

We are developing an automated method of quantifying blood volume from CT scans of patients with hemothoraces (accumulation of blood in the pleural sac surrounding the lungs). To do this, deep segmentation networks are used to extract hemothorax pixels from each image slice to ultimately sum the segmented voxels.

Students: Benjamin Albert, Chang Yan, Gary Yang
Mentor(s): Dr. David Dreizin, Dr. Mathias Unberath

Introduction

Hemothorax (HTX)—blood accumulation in the pleural cavity around the lungs—patients are currently treated by qualitative estimates of blood volume using CT scans. To automate this analysis, deep neural networks are employed to segment hemothoraces from patient CT scans. The network segmentation is converted to an estimated volume, yielding an adjusted R of 0.91 compared with manually segmented volume, done by radiologists. This predicted volume is then used as a predictor for a composite variable: patient requires massive transfusion or dies. Together with clinical data, a random forest classifier achieves an auROC of 0.944, indicating strong predictive capabilities for the composite variable.

Background

The current standard for estimating hemothorax volume is a qualitative grading done by radiologists. However, such measurement is subjective and the reliability of measurements relies on radiologists’ level of experiences. Even expert radiologists often disagree on the qualitative estimates. In contrast to the qualitative measurement, manual segmentation produces precise, quantifiable blood volume. However, this task is time-impermissible, especially for trauma cases. Because an accurate hemothorax volume estimate can assist physicians in predicting patient outcomes, such as the need for massive transfusion and mortality, there is a need for developing a fast and reliable method to segment hemothorax CT scans and estimate corresponding volume.

There is no prior attempt for automatic hemothorax volumetry, yet researchers have tried to semi-automate volumetry for pleural effusion, a condition where excess liquid including water and blood accumulates around the lung [4]. Pleural effusion is a comparable condition to hemothorax. However the methods utilized are generally rule-based or atlas based, which cannot sufficiently handle anatomical distortion, heterogeneity of attenuation, and traumatic lung scenes.

We choose to develop a deep-learning model because it has shown to perform well in other segmentation tasks. U-Net [1] and U-Net 3D [2] are two of the most famous deep neural networks in the field of medical image segmentation.

Deliverables

Minimum: (February 8th - March 22nd)
- An application that takes a set of axial CT scans as input and outputs estimated hemothorax volume (0 if not present).
- An benchmark report evaluating the performance of all the open-source models we used
Expected: (March 8th - April 5th)
- Implemented PIPO-FAN deep network and other testing networks
- Full program using Unet-FAN combined deep net work to predict hemothorax volume
- Result and visualization report for the performence of our program
Maximum: (March 29th - April 30th)
- A trained model to predict the composite outcome of each patient given their hemothorax volume and clinical data
- Result report for the performence of our prediction

Technical Approach

Preprocessing

Dr. Dreizin manually segmented the Hemothorax volumes on 94 patient CT scans. Of these 94 CT scans, 79 are suitable for the task because the other 15 have corrupt metadata in which voxel dimensions are distorted and/or non-sense (e.g. some patients are 12m tall or impossibly proportioned).

After removing these 15 bad data, the data are manually cropped to represent chest CT scans. Most of the original data are full-body CT scans, so approximately 70GB out of the 90GB total are removed through cropping. Scans are cropped on the z-axis below the liver and up to the neck. The axial perspectives are preserved.

The cropped data are then converted from NIfTI format to 3D numpy arrays with isotropic voxels. To transform the NIfTI data, python scripts are used to interpolate data to 1mm cubic voxels, after which they are saved to disk in compressed format, requiring 7-8 GB total for the compressed scans and segmentation masks. Decompressed, the data is approximately 18 GB.

The input is then padded to a fixed size for use in neural networks. To do this, the max shape size of all 79 input volumes is found, after which the data are padded along the borders such that the unpadded volume is in the center of the padded volume. Data are padded with -1024 to approximate the Hounsfield unit for air.

Lastly, the data are normalized in two manners to be used for experimentation: normalization and standardization. Normalization represents a simple linear scaling to the bounds [0,1]. Standardization centers the mean about zero and scales the data to a standard deviation of 1; this method does not bound the data.

The dataset consists of axial CT scans with 1.5mm voxel resolution from 94 patients. In total, the dataset is approximately 90 GB before preprocessing. Preprocessing involves three primary stages; smoothing, interpolation, and construction of sagittal and coronal perspectives. Smoothing is necessary to fill holes that are erroneously present from noisy manual labelling; it is applied only to the segmentation masks. Trilinear interpolation is used to generate 1mm voxels so that additional sagittal/coronal slices can be generated. This is useful for network training as it smooths the objective functions. However, in total, the preprocessing multiplies the dataset size in memory by 4.5 fold, reaching approximately 400 GB, averaging 4 GB per patient.

3/16 update: 15/94 instances have corrupt metadata which make them unusable; the dimensionality of each voxel is unknown. These instances are removed. Cropping is performed on the remaining instances to make each instance only a chest CT scan so that the axial perspective is preserved while the z-axis is cropped to the neck and below the liver. Additionally, the 1mm cubic voxels are downsampled to 2mm to reduce the memory footprint and padded to 256x192x256. Each instance, including both the scan and the mask, is then 144 MB. Larger networks will require downsampling to 8mm cubic voxels.

4/29 update: Of the 94 patients, only 78 had valid CT imagery as some had corrupted metadata which prevented the calculation of the ground truth hemothorax volume. After removing these erroneous instances, another case was removed for not having all clinical variables. Therefore, the prognostics dataset used 77 patient cases.

Deep Learning

Three deep networks are evaluated: UNet (2.5D) [1], UNet 3D [2], and UNet-FAN. UNet-FAN, the architecture of which is illustrated below, was developed as a combination UNet (2.5D) and PIPO-FAN [3]. PIPO-FAN validation yielded poor performance, so the trained UNet models were used as replacement to the PIPO module to train the FAN scale-invariant attention module post hoc. This transfer learning approach allowed the FAN module to apply attention mechanisms to the multiscale features learned in UNet, slightly improving dice.

It was observed that training deep segmentation networks on left and right lungs individually yielded superior dice scores than from training on the union of the left and right masks. The final predicted volume takes the union of the left and right prediction masks.

Below is a box plot of Dice score of all deep net modes used. UNet-FAN achieves slightly higher dice score than UNet and much better than UNet 3D for all data: left lung, right lung, and the union of the lungs.

Machine Learning

All work was conducted on the cluster outlined below:

Machine learning was applied to predict a composite variable: whether a patient needed a massive transfusion and/or the patient died in the hospital. Univariate analysis was conducted to ascertain the predictive power of the expert volume estimation and the automatically predicted hemothorax volumes for the composite outcome variable. Logistic regression was applied to the independent variable volumes: qual (manual expert volume estimation), U-Net 3D, U-Net (2.5D), and a model called U-Net-FAN, which is a combination of U-Net with PIPO-FAN. The logistic regression results are outlined in the below tables as the average over 5-fold cross validation:

In addition to univariate analysis, clinical features are used for multivariate predictions. Eight machine learning models [6] are evaluated: logistic regression, bayesian network with global tabu architecture search, discrete naive bayes, gaussian naive bayes, decision table, linear support vector machine, RBF support vector machine, and random forest. The random forest models performed best, demonstrating comparable performance between the manual and automatic features for the composite variable prognostics. The random forest results are detailed in the below tables:

Visualization

Results

Figure A: Dot matrix plot with best-fit line and 95% CI shows a correlation between automated volume (vol.) and manual hemoperitoneum volume. The prediction from human experts and our deep learning is consistent.

Figure B: Bland-Altman plot shows 95% limits of agreement and measurement bias. On average, there is a 0.6-mL underestimation by the deep learning algorithm. The bias is relatively small and the standard deviation is 155.6 mL.

Figure C: Distribution of Dice similarity coefficients (DSCs). The box plot in C2 shows DSC improves/variance decreases with increasing vols at volume range 0-600 ml, (Levene’s test, p < 0.00001) explaining low DSCs in rows 4 and 5 (image left). In volume range >600ml, we have only 7 instances and some of them are outliers, so the deep network does not learn this range well so behavior in this range is not clear.

Figure D: Clustered box and whisker plots show prediction of a composite outcome for the need for massive transfusion and in hospital mortality. Manual and HTXvol-auto vols both have significant association with composite outcome (MT + IHM), with p = 0.0003 and 0.015 respectively.

In general, the deep network predicted volume and manual segmented volume are highly associated with adjusted R=0.91 and the bias is very low at -0.6 mL. The Dice similarity coefficient improves and its variance decreases as volume increases. The small hemothoraces with lesser dice scores are clinically insignificant compared to the larger accumulations of blood. Therefore, it is important that the performance of the automated volume estimates is best for larger hemothoraces. Both manual and predicted volume have significant association with the requirement for mass transfusion and in-hospital mortality. We are able to predict the composite outcome of MT+IHM using automated prediction volume and 6 patient metadata (Age, Sex, HR, BP, lactate, injury-type: blunt / penetrating) with random forest model and reach an auROC of 0.9440. This is at least as good as using expert information from 2 radiologists. The results suggest that the automated methods can replace expert analysis with comparable performance, thereby reducing costs, labor, and improving availability to accurate prognostics.

Dependencies

Milestones and Status

Preparation: (February 8th - March 1st)
- Literature and model selection
  - Feb 8 - Feb 22
  - Status: Done
- Environment setup
  - Feb 15 - Feb 22
  - Status: Done
- Data preprocessing
  - Feb 19 - Mar 1
  - Status: Done
Minimum: (February 8 - March 21)
- Interpolate CT scans and convert data to PyTorch tensor type
  - Feb 22 - Mar 1
  - Status: Done
- Build a network framework consists of Python classes
  - Feb 22 - Mar 1
  - Status: Done
- implement and train the open-source models selected
  - Mar 1 - Mar 15
  - Status: Done
- Benchmark existing open-source models with Dice/Jaccard
  - Mar 8 - Mar 21
  - Status: Done
Expected: (March 21 - April 11)
- Implement several multiscale models
  - ~~Mar 8 - Mar 15~~ Mar 21 - Apr 4
  - Status: Done
- Implement a combined model (Uent-FAN) that outperforms others
  - ~~Mar 15 - Mar 29~~ Mar Apr 4 - Apr 11
  - Status: Done
- Documentation
  - Mar 22 - Apr 5
  - Status: Done
Maximum: (March 29th - April 30th)
- Build a model to predict the composite outcome of patients
  - Apr 15 - May 4
  - Status: Done
- An algorithm that computes/visualize certainty for segmentation
  - ~~Mar 29~~ Apr 11 - Apr 30
  - Status: Cancelled
- ~~A GUI-program that incorporates the framework + models~~
  - ~~Apr 12 - Apr 30~~
  - ~~Status: Planned~~

Reports and presentations

Project Plan
- plan_proposal.pdf
- plan_presentation.pdf
Project Background Reading
- See Bibliography below for links.
Project Checkpoint
- checkpoint_presentation.pdf
Paper Seminar Presentations
Project Final Presentation
Project Final Report
- final_report.pdf

GITHUB REPO

https://github.com/benjaminalbert/CIS_2

Bibliography

Dreizin, D., Zhou, Y., Zhang, Y., Tirada, N., & Yuille, A. L. (2020). Performance of a Deep Learning Algorithm for Automated Segmentation and Quantification of Traumatic Pelvic Hematomas on CT. Journal of digital imaging, 33(1), 243–251. https://doi.org/10.1007/s10278-019-00207-1

Dreizin, D., Zhou, Y., Fu, S., Wang, Y., Li, G., Champ, K., Siegel, E., Wang, Z., Chen, T., & Yuille, A. L. (2020). A Multiscale Deep Learning Method for Quantitative Visualization of Traumatic Hemoperitoneum at CT: Assessment of Feasibility and Comparison with Subjective Categorical Estimation. Radiology. Artificial intelligence, 2(6), e190220. https://doi.org/10.1148/ryai.2020190220

Zeiler, J., Idell, S., Norwood, S., & Cook, A. (2020). Hemothorax: A Review of the Literature. Clinical pulmonary medicine, 27(1), 1–12. https://doi.org/10.1097/CPM.0000000000000343

Sangster, G. P., González-Beicos, A., Carbo, A. I., Heldmann, M. G., Ibrahim, H., Carrascosa, P., Nazar, M., & D'Agostino, H. B. (2007). Blunt traumatic injuries of the lung parenchyma, pleura, thoracic wall, and intrathoracic airways: multidetector computer tomography imaging findings. Emergency radiology, 14(5), 297–310. https://doi.org/10.1007/s10140-007-0651-8

Yao, J., Bliton, J., & Summers, R. M. (2013). Automatic segmentation and measurement of pleural effusions on CT. IEEE transactions on bio-medical engineering, 60(7), 1834–1840. https://doi.org/10.1109/TBME.2013.2243446

B. A. Albert, “Deep Learning From Limited Training Data: Novel Segmentation and Ensemble Algorithms Applied to Automatic Melanoma Diagnosis,” in IEEE Access, vol. 8, pp. 31254-31269, 2020, https://doi.org/10.1109/ACCESS.2020.2973188

Çiçek Ö., Abdulkadir A., Lienkamp S.S., Brox T., Ronneberger O. (2016) 3D U-Net: Learning Dense Volumetric Segmentation from Sparse Annotation. In: Ourselin S., Joskowicz L., Sabuncu M., Unal G., Wells W. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2016. MICCAI 2016. Lecture Notes in Computer Science, vol 9901. Springer, Cham. https://doi.org/10.1007/978-3-319-46723-8_49

F. Milletari, N. Navab and S. Ahmadi, “V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation,” 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 2016, pp. 565-571, https://doi.org/10.1109/3DV.2016.79.

Chen, S., Ma, K., & Zheng, Y. (2019). Med3D: Transfer Learning for 3D Medical Image Analysis. ArXiv, abs/1904.00625. https://arxiv.org/abs/1904.00625

Wang Y, Dou H, Hu X, Zhu L, Yang X, Xu M, Qin J, Heng PA, Wang T, Ni D. Deep attentive features for prostate segmentation in 3D transrectal ultrasound. IEEE transactions on medical imaging. 2019 Apr 25;38(12):2768-78.

Pawlowski N, Castro DC, Glocker B. Deep structural causal models for tractable counterfactual inference. arXiv preprint arXiv:2006.06485. 2020 Jun 11.

Other Resources and Project Files

Here give list of other project files (e.g., source code) associated with the project. If these are online give a link to an appropriate external repository or to uploaded media files under this name space (2021-09).

courses/456/2021/projects/456-2021-09/project-09.txt · Last modified: 2021/05/06 22:06 by 127.0.0.1

Table of Contents