Last Updated: 5/09/2019
Our objective is to determine if we can identify VAP risk early through analyzing time series data and other identification markers gathered in the PICU. There is a multidisciplinary team working on addressing this issue; we are specifically working with the radiology team to focus on X-ray image changes associated with VAP risk. Our goal is to develop an algorithm that can accurately predict if a patient has VAP by analyzing time-series images throughout a patient’s addressed history.
Clinical Background: Ventilator-associated Pneumonia (VAP), “specifically refers to pneumonia developing in a mechanically ventilated patient more than 48 hours after tracheal intubation.”(1) When under ventilation, patients are in a critical, life-sustaining ICU therapy; the body is at a fragile state and susceptible to diseases, including bacterial infections that are attributed to VAP. There are other risks involved as well, as further disease progression, volume overload, latrogenic infection, and ventilator injury. VAP has had such a deteriorating effect in the ICU ward that it is now the leading cause of mortality among nosocomial infections and the leading cause of nosocomial morbidity (1). Specifically, acquiring VAP increases the risk of morbidity by 30% for any given patient. This is compounded by the fact that 10-20% of ICU patients are diagnosed with VAP annually. Ventilator-associated complications are also correlated with a much greater length of stay and time under ventilation. This leads to greater strain on the entire healthcare value chain, from the provider, insurer, and most importantly, the patient.
Motivation and significance: While there is a multi-disciplinary team working on addressing the issues arising from VAP in the ICU (including the PICU team working on identifying biomarkers and the ID team working on appropriate cultures and antibiotics), there hasn’t been a comprehensive study connecting the radiology component of monitoring patients with VAP and/or risk of VAP. This is mainly due to the clinical data collection process. The X-ray images that are collected occur over many different hospitals, at different orientations of patients, on different machines, and either at inspiry or expiry. There is not a standardized process for data collection, thus leading to an aggregation of thousands of X-ray images, but ones that are not easily ingestible into an algorithm that is able to readily classify the risk of VAP for a specific patient. For example, the largest chest X-ray dataset of adult images, MIMICS CXR, contains over 224,000 images of over 60,000 patients in various studies. Furthermore, when applying an algorithm to pediatric patients, we must understand that the patients grow much more quickly over time, thus any algorithm must account for the changes in image dimension.
Therefore, there is need for a protocol development that can aggregate X-ray information and an algorithm that can accurately help to predict occurrence of VAP based image-related features. This specifically manifests itself in the following:
We hope our project will lead to a renewed focus in the ICU on these high-risk patients, while avoiding unnecessary therapy for low-risk patients.
The project workflow is split into the following main components:
The bulk of our time will be focused on the second component, testing models and determining the best-performing neural network to classify VAP occurrence on a static image.
Data Aggregation and Screening
First, we will be assembling a database of chest X-Ray images which will be hosted through MARCC (Maryland Advanced Research Computing Center). This database will comprise of data collected from publicly available datasets such as MIMICS CXR, NIH, etc. We are focusing first on the publicly available datasets as we await IRB approval for the pediatric data from JHU. These datasets will be screened for abnormalities and outliers and cleaned accordingly so. Initially, we believed that we had access to time-series data of X-ray images; while this is not the case, there is still a wide scope of work we can accomplish with classification techniques.
Assembling Input/Output Module – Convolutional Neural Network Classification
To first get a pulse for how our system will analyze chest X-ray images, we will need to create a static image predictor – namely being, choosing an apt neural network to correctly classify a patient’s occurrence of VAP based off one image. Once we can train and identify the best performing neural net for static image prediction, we will be able to use the time-series images for a more accurate prognosis. Potential neural networks to consider include:
The main goal here is to reduce the number of links between layers for more efficient and accurate processing.
Before ingesting the data into the training set pipeline, we will be analyzing each neural net and the strength of each (based off image properties). We will start by training the networks on a specified training dataset and subsequently testing it on specified training set. After determining a set accuracy threshold with our mentors and clinical collaborators, we can identify the best performing neural net. We will have to fine-tune the parameters of the neural net to best apply to our data, and perhaps split the data more rigorously (i.e. omitting multiple copies of the same patient) so that we are able to have the best performing model possible.
Performing Saliency Mapping and Class Activation on Image Features In computer vision, a saliency map is an image that shows each pixel's unique quality. The goal of a saliency map is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. This is highly important when applying our model to the Chest X-ray images. Beyond prediction alone, it is extremely helpful for physicians (and can help them be more efficient) to highlight where exactly the source of pneumonia (or any other thoracic pathology) is on a certain image. Previous work has been done to identify regions of interest for the physician. Thus, our aim is to apply saliency mapping frameworks to highlight the regions on the Ches X-ray images after our model has correctly identified a case of pneumonia.
Using Unsupervised Learning Methods to Cluster Images The most highly-used convolutional neural networks are trained on ImageNet, which contains 1 million images to cover most types of classification (up to 1000 labels). To increase performance, one could feasibly increase the size of the dataset by a factor of 10-100x, but that would concern much more manual annotation, placing a burden on human effort and is much more extensive than what is currently available in the data science community. Thus, it is imperative to produce a model that can create generalizations for visual features to apply to any large-scale dataset that does not require supervision. This is critical to understanding of classification for my project because of the complexities of X-ray data when applied to pneumonia diagnoses with different demographics (specifically ages), ventilations, and machines. One goal of our collaborators is to be able to apply the data in the PICU – however, most of the publicly available data has only adult chest X-ray data. Applying unsupervised clustering methods might unlock generalization of pneumonia features in my project that previously was not available with supervised learning. Thus, for my maximum deliverable, I will be applying the DeepCluster technique developed my Mathilde et. al. to cluster the features produced by the CNN. The insight gathered here is focused on clustering features instead of direct labels, which could be potentially more important in thoracic pathology classification.
[[https://github.com/surajshah980/CISIIProject] Technical Appendix and Source Code]