George Hripcsak, MD, MS hripcsak2006


Vivian Beaumont Allen Professor of Biomedical Informatics, Columbia University


Chair, Department of Biomedical Informatics, Columbia University


Director, Medical Informatics Services, NewYork-Presbyterian Hospital/Columbia





George Hripcsak, MD, MS, is Vivian Beaumont Allen Professor and Chair of Columbia University’s Department of Biomedical Informatics and Director of Medical Informatics Services for NewYork-Presbyterian Hospital. Dr. Hripcsak is a board-certified internist with degrees in chemistry, medicine, and biostatistics. He led the effort to create the Arden Syntax, a language for representing health knowledge that has become a national standard. Dr. Hripcsak’s current research focus is on the clinical information stored in electronic health records and on the development of next-generation health record systems. Using nonlinear time series analysis, machine learning, knowledge engineering, and natural language processing, he is developing the methods necessary to support clinical research and patient safety initiatives. As Director of Medical Informatics Services, he oversees a 12,000-user, 4-million-patient clinical information system and data repository. He co-chaired the Meaningful Use Workgroup of the U.S. Department of Health and Human Services’s Office of the National Coordinator of Health Information Technology; it defines the criteria by which health care providers collect incentives for using electronic health records. Dr. Hripcsak was elected fellow of the American College of Medical Informatics in 1995 and served on the Board of Directors of the American Medical Informatics Association (AMIA). As chair of the AMIA Standards Committee, he coordinated the medical-informatics community response to the U.S. Department of Health and Human Services for the health-informatics standards rules under the Health Insurance Portability and Accountability Act of 1996. Dr. Hripcsak chaired the U.S. National Library of Medicine’s Biomedical Library and Informatics Review Committee, and he is a fellow of the National Academy of Medicine, the American College of Medical Informatics, and the New York Academy of Medicine. He has served on several National Academy of Medicine and National Academy of Sciences committees, and he has published over 250 papers.



What Is Informatics?

Biomedical Informatics is the study of information and computation in biology and health. Its researchers study and manage information, study behavior related to decisions, and develop computational methods and use them to generate knowledge.

The Columbia University Department of Biomedical Informatics is among the oldest in the nation. Our goals are discovery and impact: to discover new information methods, to augment the biomedical knowledge base, and to improve the health of the population. Our 30 faculty members and 60 students work in a highly collaborative environment, applying informatics from the atomic level to global populations. Our areas of application include:

    * CLINICAL CARE, such as designing clinical information systems and mining the electronic health record

    * BIOLOGY, including systems biology, structural biology, and virology, in partnership with the Center for Computational Biology and Bioinformatics

    * PUBLIC HEALTH, such as designing systems to promote and protect the health of communities, improving public health systems, and deploying information technology internationally

    * TRANSLATIONAL RESEARCH, including integrating biological and clinical knowledge and facilitating multidisciplinary science



My research focuses on understanding and using the clinical information stored in the electronic health record. This theme has several components:


1. Data mining and knowledge discovery (see “Discovering and applying knowledge in clinical databases” project web site). Machine learning and visualization are examples of techniques to uncover knowledge from vast clinical databases. My work focuses on testing and extending existing discovery methods to improve their performance on clinical databases. Important issues include training set size, data accuracy, data completeness, and representation (e.g., how to accommodate diagnostic data, which is nominal with many categories). Recent work includes the use of non-linear time series analysis to characterize the electronic health record. Here is a study of serum glucose, where predictability quantified as mutual information reveals the diurnal variation of glucose (evidenced by the ridges). See [Albers DJ, Hripcsak G. A statistical dynamics approach to the study of human health data: resolving population scale diurnal variation in laboratory data. Physics Letters A 2010;374:1159-64] for a related study on creatinine.




2. Natural language processing. In most institutions, the vast majority of the richly detailed clinical information is stored as narrative text, which is not generally amenable to automated analysis. Natural language processing can parse the narrative text, converting it to a structured and coded format. See [Hripcsak G, Elhadad N, Chen C, Zhou L, Morrison FP. Using empirical semantic correlation to interpret temporal assertions in clinical texts. J Am Med Inform Assoc 2009;16:220-7] for a study of the degree to which the true time of an event varies from what is stated in the patient record. It is illustrated below, where 1 marks the time of the writing of the note, and 0 marks the stated time.



3. Evaluation methodology. The complexity of clinical data, the presence of inaccurate and missing values, and the large but heterogeneous collection of patients conspire to make it difficult to draw conclusions using traditional statistical methods. Bias that would not affect a traditional randomized trial can overwhelm the true effect in a retrospective study of the electronic medical record. Here is a short piece on measuring agreement [Hripcsak G, Rothschild AS. Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 2005;12:296-8].


5. Clinical demonstration. Demonstrating the usefulness of the above methods is critical to gather support and to focus new work in important areas. The methods can be applied to clinical research (largely hypothesis refinement) and clinical care (by generating timely advice and monitoring patient safety). Recent work has included syndromic surveillance and pharmacovigilance. See [Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, Calman NS, Mostashari F. Syndromic surveillance using ambulatory electronic health records. J Am Med Inform Assoc 2009;16:354-61] for a study of surveillance.


In addition, I am studying next-generation electronic health records. Current technology supports individual clinician tasks, such as documenting and ordering, in a manner that is largely similar to that of traditional paper records. Improved understanding of workflow, information needs, cognition, and the science of collaboration can lead to improved systems that exploit human abilities, facilitate teams, and disseminate expertise. Here is a recent editorial [Shea S, Hripcsak G. Accelerating the use of electronic health records in physician practices. NEJM 2010;362:192-5].



As Director of Medical Informatics Services for NewYork-Presbyterian Hospital/Columbia, I oversee the clinical data warehouse, terminology, iNYP, immunization, infection control, and physician outreach and collaborate on clinician documentation, health information exchange, and patient portals.



We offer programs at all levels of informatics training, including PhDs, master's degrees, postdoctoral fellowship, certificate training, and education for students in medicine, nursing, dentistry, and public health. See


Further information

Publications via Pubmed

Curriculum vitae


George Hripcsak, MD, MS
622 West 168th Street, PH20
New York, NY 10032