Large Scale Probabilistic Phenotyping Applied to Patient Record Summarization

This project creates novel methods and tools for the analysis of large-scale Electronic Health Record (EHR) data. Models of disease, or phenotypes, are derived from a large collection of patient characteristics, as recorded in the EHR. To assess their value and robustness in a clinical application, the phenotypes are incorporated into a longitudinal patient record summarization system for clinicians at the point of patient care, named HARVEST. This is a collaboration between Columbia University Biomedical Informatics, Columbia University Applied Physics and Applied Mathematics, and NewYork-Presbyterian Hospital. The project is supported by the National Science Foundation, part of the Smart and Connected Health program (award 1344668; 2/2014-1/2018).

The research for this project contributes to two inter-related outcomes: (i) a probabilistic graphical model of a patient record and the patient's latent phenotypes. Models that can handle the heterogeneous data types in the EHR, along with their challenges, such as sparseness and artificial redundancy are investigated. For the models to be useful in the clinical world, they must be interpretable by humans, easily adaptable for EHR-driven applications, and clinically relevant. This is achieved by specifying prior clinical knowledge into the models and learning from clinicians' feedback automatically; and (ii) a patient record summarizer for clinicians at the point of patient care. HARVEST, our summarization system, leverages the probabilistic patient model and learns new models of salience through the clinicians' interactions with the deployed summarizer, in essence learning relevance of different patient phenotypes. For the evaluation of the phenome model and the summarizer, particular care is given to assessing their value in a real-world clinical setting.

probabilistic phenotyping approach


Relevant Papers