Multivariate time series analysis of ICU mortality

Last updated: 2012/05/10 20:12 by Danning


  • Students: Danning He
  • Mentor(s): Dr. Jim Fackler, Dr. Harold Lehmann

The modern ICU is a complex, expensive and resource-intensive environment, admitted patients are usually under life-threatening conditions that require advanced medical care and invasive/noninvasive monitoring. Since adjusted mortality rate is a useful marker of ICU quality, tools that quickly and accurately make prediction of a patient’s mortality risk of great significance. It allows for better clinical decision-making by the physicians and helps control hospital expenses and manages medical resources.

The objective of this project is to develop a patient-specific mortality prediction model based on physiologic derangement during first 48h of an ICU stay. Under a probabilistic framework, the risk features are defined as log likelihood ratio, and aggregated by a logistic function to generate a probability score. The classification performance of the method is evaluated, contribution from each individual feature is analyzed, and finally, limitations and possible extensions of this study is discussed.

Background, Specific Aims, and Significance

There’s rich content in time series modeling using either the time domain approach to predict future value of a series as a parametric function of current and past observations, or the frequency domain approach to characterize periodic variations of interest. Real-world processes produce series of measureable observations as a function of underlying hidden states. Similar to clinical diagnosis, which is inferred from several observations with significant degree of uncertainty, generation of multivariate physiologic profiles by latent disease status or signatures can help reveal the manifestation of disease. For the task of feature discovery or latent signature detection in univariate, continuous time series, various unsupervised learning approaches are available [1,2], with the underlying assumption that there’s a fixed set of disease topics common to the collection of time series (such as physiologic heart rate, HR) distributed among patient samples. And the disease topic is again a distribution over the vocabulary of all ‘words’ in the corpus. The topic proportions can be used as features for explanatory grading task. However, in many cases, a common difficulty from multivariate physiological data is its irregular measurement in terms of time and frequency from patient to patient (the variables are measured from once 30 minutes to once several hours, and not all of them are taken for each individual). Figure 1 show 4 out of 37 physiological variables extracted from a patient in the data set. Therefore, for the task of outcome prediction in ICU, majority of existing acuity models and severity scoring systems are based on such supervised algorithms as logistic regression or artificial neural networks, trained with static variables on admission [3], sequential assessment of organ dysfunctions [4], daily adverse events [5], 24h acuity score [6] and log likelihood ratio [7].


  • Minimum: (Expected by Week 7)
  1. Logistic regression with log odds ratios as risk features
  2. Performance evaluation: ROC, AUC
  • Expected: (Expected by Week 10)
  1. Minimum deliverables
  2. Feature analysis (Done)
  3. Incorporate dependencies between observations (Substitute with PCA)
  4. Try features constructed from standard HMM, Kalman Filter (Hold on due to limitations of the data itself)
  • Maximum: (Expected by the end of the course)
  1. Expected deliverables
  2. Optimize features to achieve better classification performance (Done: separate counts and values)
  3. Documentation (Done)
  4. partial AUC (Substitute with ROC threshold analysis)

Technical Approach

here describe the technical approach in sufficient detail so someone can understand what you are trying to do

I) Log likelihood ratio as risk features

Figure above shows the distribution of values (right figure) and counts (left figure) of three hepatic tests (ALT,ALP,AST) in case v.s. control.

We adopted the previously developed Bayesian modeling paradigm [7] to capture the nonlinear relationships between the observed values/counts (physiological variables or laboratory tests) and the outcome (death/survive). For each risk factor xi, a parametric distribution of observed values/counts is fitted for each class of patients, P(xi|death) and P(xi|survive), using maximum likelihood from five candidates long-tail probability distributions: exponential, Weibull, lognormal, normal and gamma. To also incorporate the non-randomly missing observations, a discrete distribution (poisson distribution) of T is fitted to get P(T|death) and P(T|survive). The log likelihood ratio of the risk imposed by the observed values and counts of measurements is incorporated into the model, when a risk factor is completely missing (T=0) in a patient, its log likelihood ratio is defined as log P(T=0|death)/P(T=0|survive).

where T is the total number of measurements for risk factor xi and

In classical logistic regression, the probability of a particular class conditional on data is defined via a logistic function that aggregated individual risk features:

where n was the number of features, wi represents the weight of the contribution of xi to patient outcome.

The nonlinear transformation function associating the parameter with the risk of death v.s. survival, y-axis corresponds to P(death|value), x-axis corresponds to values,

The left figure shows the ROC curve and associated area under the curve values for our method (10-fold cross-validation) and SAPS-I and SOAP (based on all available samples). The overall classification performance of our method (AUC=0.801) is better than SAPS-I (AUC=0.638) and SOAP (AUC=0.628) (but also note that the later two methods use fewer number of variables). The mortality rate in our data is 554/4000=0.138, but a threshold of 0.05 achieves a sensitivity of 0.945 and specificity of 0.351, this means if we predict patients with P(death)>0.05 to die and those with P(death)<0.05 to survive, we correctly identify 94.5% patients that died and 35.1% patients that survived. Alternatively, threshold of 0.5 that assigns patients with P(death)>0.5 to die, and those with P(death)<0.5 to survive correctly identify 98.0% patients that survived but only 23.6% patients that died. The use of lower threshold improves sensitivity at the price of specificity, falsely rejecting some of the patients whose life-threatening conditions could have been reversed with intensive care, while the higher threshold increases specificity at the price of sensitivity, falsely accepting some of the patients that won’t have good chance of survive even with advanced treatment.

For the right figure, the HL test statistic follows asymptotically the chi-squired distribution, with D-2 degree of freedom, while an ideal and unattainable score is 0, which corresponds to 0 cumulative probability, our H statistics equals 3.10, corresponds to 0.0719 cumulative probability, the maximum accepted H statistics for this model is 15.507, corresponds to 0.95 cumulative probability.

II) Discussion

The observed physiological signals and their dynamics over time are affected by many factors, from the intrinsic state of disease, the setup of the monitoring instruments, to the medical interventions received by the patients. Most importantly, our data is a mixture of different patient cohorts, having different baseline characteristics, but detailed information regarding their origins is not available. Also, the disease state manifests itself through physiology, which is measured by various digital equipment, and our observations are, again, a mixture of all these confounding variables. However, the logistic model only adopts the strongest assumption: there’s only one latent variable controlling the patient’s binary outcome, all the physiological variables are independent and directly affect the outcome, and the model remains exactly the same over time and across patients. Further investigation can be related to eliminate some of the assumptions and generalize the model.

outcome-predicting model in terms of accuracy is also strongly affected by population characteristics and healthcare delivery systems, which is changing continuously and become more and more important, justifying the need to ‘reinvent the wheel’ from time to time. Given the diversity and complexity of medical interventions we can offer today, the physiological impact is actually much lower compared with what it used to be in the past, justifying the definition of ‘death’ as those whose extreme conditions cannot be reversed. Moreover, in terms of pragmatic usage, prediction models differ in the number and types of variables required have different data collection burden. All of the above points are worth further investigation in the future.


describe dependencies and effect on milestones and deliverables if not met

Milestones and Status

  1. Milestone name: Burn-in
    • Planned Date: 2/23
    • Expected Date: 2/23
    • Status: finished
  2. Milestone name: Minimum expectation
    • Planned Date: 2/23
    • Expected Date: 3/26
    • Status: finished
  3. Milestone name: Expected expectation
    • Planned Date: 2/23
    • Expected Date: 5/1
    • Status: partially finished
  4. Milestone name: Project Report
    • Planned Date: 2/23
    • Expected Date: 5/8
    • Status: done

Reports and presentations

Project Bibliography

  1. Saria S, Koller D, Penn AA. (2010) Learning individual and population level traits from Clinical Temporal data. Neural Information Processing Systems.
  2. Imhoff M, Kuhls S. (2006) Alarm algorithms in critical care monitoring. Anesth Analg 102: 1525-1537.
  3. Zimmerman JE, Kramer AA, McNair DS, Malila FM. (2006) Acute Physiology and Chronic Health Evaluation (APACHE) IV: hospital mortality assessment for today's critically ill patients. Crit Care Med 34: 1297-1310.
  4. Ferreira FL, Bota DP, Bross A, Mélot C, Vincent JL. (2001) Serial Evaluation of the SOFA Score to Predict Outcome in Critically Ill Patients. JAMA 286: 1754-1758.
  5. Silva A, Cortez P, Santos MF, Gomes L, Neves J. (2006) Mortality assessment in intensive care units via adverse events using artificial neural networks. Artif Intell Med 36: 223-234.
  6. Hug CW, Szolovits P. (2009) ICU Acuity: Real-time Models versus Daily Models. AMIA Annu Symp Proc: 260-264.
  7. Saria S, Rajani AK, Gould J, Koller D, Penn AA. (2010) Integration of Early Physiological Responses Predicts Later Illness Severity in Preterm Infants. Sci Transl Med 2: 48ra65. the definition of risk features in this project are mainly based on it
  8. Hug CW, Clifford GD, et al. (2010). Clinician blood pressure documentation of stable intensive care patients: an intelligent archiving agent has a higher association with future hypotension. Crit Care Med. 39(5): 1006-1014.
  9. Nabney, I. T. (2002). Netlab: Algorithms for Pattern Recognition. Springer.

Other Resources and Project Files

Here give list of other project files (e.g., source code) associated with the project. If these are online give a link to an appropriate external repository or to uploaded media files under this name space.

matlab code related to fitting distributions

courses/446/2012/446-2012-08/project08.txt · Last modified: 2012/12/10 17:36 (external edit)