Large Scale Probabilistic Phenotyping Applied to Patient Record Summarization

This project creates novel methods and tools for the analysis of large-scale Electronic Health Record (EHR) data. Models of disease, or phenotypes, are derived from a large collection of patient characteristics, as recorded in the EHR. To assess their value and robustness in a clinical application, the phenotypes are incorporated into a longitudinal patient record summarization system for clinicians at the point of patient care, named HARVEST. This is a collaboration between Columbia University Biomedical Informatics, Columbia University Applied Physics and Applied Mathematics, and NewYork-Presbyterian Hospital. The project is supported by the National Science Foundation, part of the Smart and Connected Health program (award 1344668; 2/2014-1/2018).
The research for this project contributes to two inter-related outcomes: (i) a probabilistic graphical model of a patient record and the patient's latent phenotypes. Models that can handle the heterogeneous data types in the EHR, along with their challenges, such as sparseness and artificial redundancy are investigated. For the models to be useful in the clinical world, they must be interpretable by humans, easily adaptable for EHR-driven applications, and clinically relevant. This is achieved by specifying prior clinical knowledge into the models and learning from clinicians' feedback automatically; and (ii) a patient record summarizer for clinicians at the point of patient care. HARVEST, our summarization system, leverages the probabilistic patient model and learns new models of salience through the clinicians' interactions with the deployed summarizer, in essence learning relevance of different patient phenotypes. For the evaluation of the phenome model and the summarizer, particular care is given to assessing their value in a real-world clinical setting.

People

Noémie Elhadad (PI)
Chris Wiggins (Co-PI)
Sharon Lipsky Gorman, MS
Gal Levy Fix
Adler Perotte, PhD
Rajesh Ranganath
Bharat Srikishan
John Angiolillo, MD (alumn)
Edouard Grave, PhD (alumn)
Jamie Hirsch, MD, MA (alumn)
Rimma Pivovarov, PhD (alumn)

Relevant Papers

Rajesh Ranganath, Adler Perotte, Noémie Elhadad, David Blei.
Deep Survival Analysis.
2016. Machine Learning in Healthcare (MUCMD). Los Angeles, CA. [arXiv]
Rimma Pivovarov, Adler Perotte, Edouard Grave, John Angiolillo, Chris Wiggins, Noémie Elhadad.
Learning Probabilistic Phenotypes from Heterogeneous EHR Data.
2015. Journal of Biomedical Informatics (JBI). [html]
Rajesh Ranganath, Adler Perotte, Noémie Elhadad, David Blei.
The Survival Filter: Joint Survival Analysis with a Latent Time Series.
2015. UAI. Amsterdam, Netherlands. [pdf]
Edouard Grave, Noémie Elhadad.
A Convex and Feature-rich Discriminative Approach to Dependency Grammar Induction.
2015. ACL. Beijing, China. [pdf]
Rimma Pivovarov, Noémie Elhadad.
Automated Methods for the Summarization of Electronic Health Records.
2015. Journal of the American Medical Informatics Association (JAMIA). [html]
Adler Perotte, Rajesh Ranganath, Jamie Hirsch, David Blei, Noémie Elhadad.
Risk Prediction for Chronic Kidney Disease Progression Using Heterogeneous Electronic Health Record Data and Time Series Analysis.
2015. Journal of the American Medical Informatics Association (JAMIA). [html]
Jamie Hirsch, Jessica Tanenbaum, Sharon Lipsky Gorman, Connie Liu, Eric Schmitz, Dritan Hashorva, Artem Ervits, David Vawdrey, Marc Sturm, Noémie Elhadad.
HARVEST, a Longitudinal Patient Record Summarizer.
2015. Journal of the American Medical Informatics Association (JAMIA). 22(2):263-274. [html]
Noémie Elhadad, Sharon Lipsky Gorman, Jamie Hirsch, Connie Liu, David Vawdrey, Marc Sturm.
HARVEST, a Holistic Patient Record Summarizer at the Point of Care.
2014. AMIA Fall Symposium. [pdf]
Rimma Pivovarov, David Albers, Jorge Sepulveda, and Noémie Elhadad.
Identifying and Mitigating Biases in EHR Laboratory Tests.
2014. Journal of Biomedical Informatics. 51:24-34. [html]
David Albers, Noémie Elhadad, Esteban Tabak, Adler Perotte, and George Hripcsak.
Dynamical phenotyping: Using Temporal Analysis of Clinically Collected Physiological Data to Stratify Populations.
2014. PloS ONE. 9(6): e96443. [html]
Raphael Cohen, Iddo Aviram, Michael Elhadad, and Noémie Elhadad.
Redundancy-Aware Latent Dirichlet Allocation for Patient Record Notes.
2014. PloS ONE 9(2): e87555. [html]

Last Updated: 7/2016
noemie.elhadad @ columbia.edu