Publications for “Discovering and Applying Knowledge in Clinical Databases” (funded by LM006910)



1.             Elkins JS, Friedman C, Boden-Albala B, Sacco RL, Hripcsak G. Coding neuroradiology reports for the Northern Manhattan Stroke Study: a comparison of natural language processing and manual review. Comput Biomed Res 2000;33:1–10.

2.            Wilcox A, Hripcsak G, Friedman C. Using knowledge sources to improve classification of medical text reports. Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2000 August 20; Boston; 2000:115–6.

3.            Wilcox A, Hripcsak G. Medical text representations for inductive learning. Proc AMIA Symp 2000:923–7.

4.            Yu H, Hripcsak G. Hereditary disease discovery from a clinical data warehouse. Proc AMIA Symp 2000:1161.

5.            Yu H, Hripcsak G. A large scale, cross-disease family health history data set. Proc AMIA Symp 2000:1162.

6.            Cai J, Hripcsak G, Johnson S. Generic data modeling for home telemonitoring of chronically ill patients. Proc AMIA Symp 2000:116–20.

7.            Liu H, Friedman C. A method for vocabulary development and visualization based on medical language processing and XML. Proc AMIA Symp. 2000:502–6.

8.            Chuang J, Hripcsak G, Jenders RA. Considering clustering: a methodological review of clinical decision support system studies. Proc AMIA Symp 2000:146–50.

9.            Wilcox A. Automated Classification of Medical Text Reports (dissertation). Columbia University, Department of Medical Informatics, 2000.


10.         Stetson PD, McKnight LK, Bakken S, Curran CC, Kubose TT, Cimino JJ. Development of an ontology to model medical errors, information needs and the clinical communication space. Proc AMIA Symp 2001;672–6.

11.          Krauthammer M, Hripcsak G. A knowledge model for the interpretation and visualization of NLP-parsed discharge summaries. Proc AMIA Symp 2001: 339–43.

12.         Campbell DA, Johnson SB. Comparing syntactic complexity in medical and non-medical corpora. Proc AMIA Symp. 2001;90–4.

13.         Friedman C, Liu J, Shagina L, Johnson S, Hripcsak G. Evaluating the UMLS as a source of lexical knowledge for medical language processing. Proc AMIA Symp 2001:189–93.


14.         Hripcsak G, Heitjan D. Measuring agreement in medical informatics reliability studies. J Biomed Inform 2002;35:99–110.

15.         Hripcsak G, Austin JHM, Alderson PO, Friedman C. Use of natural language processing to translate clinical information from a database of 889,921 chest radiographic reports. Radiology 2002;224:157–63.

16.         Hripcsak G, Wilcox A. Reference standards, judges, comparison subjects: roles for experts in evaluating system performance. J Am Med Inform Assoc 2002;9:1–15.

17.         Chuang JH, Hripcsak G, Heitjan DF. Design and analysis of controlled trials in naturally clustered environments: implications for medical informatics. J Am Med Inform Assoc 2002;9:230–8.

18.         Yu H, Hripcsak G, Friedman C. Mapping abbreviations to full forms in biomedical articles. J Am Med Inform Assoc 2002;9:262–72.

19.         Stetson PD, Johnson SB, Scotch M, Hripcsak G. The sublanguage of cross-coverage. Proc AMIA Symp 2002: 742–6.

20.        McKnight LK, Wilcox A, Hripcsak G. The effect of sample size and disease prevalence on supervised machine learning of narrative data. Proc AMIA Symp 2002:519–22.

21.         Krauthammer M, Johnson SB, Hripcsak G, Campbell DA, Friedman C. Representing nested semantic information in a linear string of text using XML. Proc AMIA Symp 2002:405–9.

22.        Chuang JH, Friedman C, Hripcsak G. A comparison of the Charlson comorbidities derived from medical language processing and administrative data. Proc AMIA Symp 2002: 160–4.

23.        Malhotra R, Austin JHM, Mujoomdar A, Powell CA, Pearson GDN, Shiau MC, Raftopoulos H, Hripcsak G. Non-small cell carcinoma of lung: size of the primary cancer, gender, and age of the patient as predictors of metastasis to thoracic lymph nodes, lung, bone, liver, and adrenal glands (abstract). RSNA 2002.

24.        Wilcox A, Hripcsak G, Knirsch C. Knowledge discovery using the electronic medical record (poster). Proc AMIA Symp 2002: 1198.

25.        Bu D, Hripcsak G. Case-based reasoning for medical risk stratification: contrast associated nephropathy (poster). Proc AMIA Symp 2002: 986.


26.        Hripcsak G, Bakken S, Stetson PD, Patel VL. Mining complex clinical data for patient safety research: a framework for event discovery. J Biomed Inform 2003;36:120–30.

27.        Cao H, Stetson P, Hripcsak G. Assessing explicit error reporting in the narrative electronic medical record using keyword searching. J Biomed Inform 2003;36:99–105.

28.        Murff HJ, Patel VL, Hripcsak G, Bates DW. Detecting adverse events for patient safety research: a review of current methodologies. J Biomed Inform 2003;36:131–43.

29.        Wilcox AB, Hripcsak G. The role of domain knowledge in automating medical text report classification. J Am Med Inform Assoc 2003;10:330–8.

30.        Cao H, Stetson P, Hripcsak G. Assessing explicit error reporting in the narrative electronic medical record using keyword searching (poster). Proc AMIA Symp 2003:803.

31.         Johnson SB, Campbell DA, Krauthammer M, Tulipano PK, Medonca EA, Friedman C, Hripcsak G. A native XML database design for clinical document research (poster). Proc AMIA Symp 2003:883.

32.        Bates, D.W., R.S. Evans, H. Murff, P.D. Stetson, L. Pizziferri, and G. Hripcsak, Detecting adverse events using information technology. J Am Med Inform Assoc, 2003. 10(2): p. 115-28.


33.        Friedman C, Shagina L, Lussier Y, Hripcsak G. Automated encoding of clinical documents based on natural language processing . J Am Med Inform Assoc 2004;11:392–402.

34.        Hripcsak G, Stetson PD, Gordon P. Using the FCIM curricular guide and administrative codes to assess internal medicine resident breadth of experience. Acad Med 2004;79:557–63.

35.        Zhou L, Hripcsak G, Parsons S, Das AK, Johnson SB. Reasoning about time in electronic discharge summaries using temporal constraint satisfaction techniques (abstract). Intelligent Data Analysis in Medicine and Pharmacology Workshop (IDAMAP-2004); 2004 September 6; Stanford, CA; 2004.

36.        Yu A, Stetson PD, Hripcsak G. Detection of adverse events using conflicts in the electronic medical record (poster). Medinfo 2004:1923.

37.        Cao H, Chiang MF, Cimino JJ, Friedman C, Hripcsak G. Automatic summarization of patient discharge summaries to create problem lists using medical language processing (poster). Medinfo 2004:1540.


38.        Markatou M, Tian H, Biswas S, Hripcsak G. Analysis of variance of cross-validation estimators of the generalization error. Journal of Machine Learning Research, 2005;6:1127–68..

39.        Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing in discharge summaries. J Am Med Inform Assoc 2005;12:448–57.

40.       Hripcsak G, Rothschild AS. Agreement, the F-measure, and reliability in information retrieval. J Am Med Inform Assoc 2005;12:296–8.

41.         Hripcsak G, Zhou L, Parsons S, Das AK, Johnson SB. Modeling electronic discharge summaries as a simple temporal constraint satisfaction problem. J Am Med Inform Assoc 2005;12:55–63.

42.        Cao H, Markatou M, Melton GB, Chiang MF, Hripcsak, G. Mining a clinical data warehouse to discover disease-finding associations using co-occurrence statistics. Proc AMIA Symp 2005:106–10.

43.        Zhou L, Friedman C, Parsons S, Hripcsak G. System architecture for temporal information extraction, representation and reasoning in clinical narrative reports. Proc AMIA Symp 2005:869–73.

44.       Stetson PD, Keselman A, Rappaport D, Van Vleck T, Cooper M, Boyer A, Hripcsak G. Electronic discharge summaries (abstract). Proc AMIA Symp 2005:1121.


45.        Melton GB, Parsons S, Morrison FP, Rothschild AS, Markatou M, Hripcsak G. Inter-patient distance metrics using SNOMED CT defining relationships. J Biomed Inform 2006;39:697–705.

46.       Zhou L, Melton GB, Parsons S, Hripcsak G. A temporal constraint structure for extracting temporal information from clinical narrative. J Biomed Inform 2006;39:424–39.

47.        Zhou L, Parsons S, Hripcsak G. Handling implicit and uncertain temporal information in medical text (poster). Proc AMIA Symp 2006;1158. PMC1839470

48.       Chen ES, Hripcsak G, Friedman C. Disseminating natural language processed clinical narratives. Proc AMIA Symp 2006:126-30. PMC1839529

49.       Hripcsak G, Bamberger A, Friedman C. Fever detection in clinic visit notes using a general purpose processor (abstract). Fifth Annual Syndromic Surveillance Conference; 2006 September 19–20; Baltimore, MD: International Society for Disease Surveillance, 2006.

50.        Chen ES, Wajngurt D, Qureshi K, Hyman S, Hripcsak G. Automated real-time detection and notification of positive infection cases (poster). Proc AMIA Symp 2006;883. PMC1839567


51.         Cao H, Hripcsak G, Markatou M. A statistical methodology for analyzing cooccurrence data from a large sample. J Biomed Inform 2007;40:343-52. PMC2041889

52.        Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Using discordance to improve classification in narrative clinical databases: an application to community-acquired pneumonia. Comput Biol Med 2007;37:296–304.

53.        Zhou L, Hripcsak G. Temporal reasoning with medical data—A review with emphasis on medical natural language processing. J Biomed Inform 2007;40: 183-202. (republished in the IMIA Yearbook of Medical Informatics)

54.        Xu H, Fan JW, Hripcsak G, Mendonca EA, Markatou M, Friedman C. Gene symbol disambiguation using knowledge-based profiles. Bioinformatics 2007;23:1015–22.

55.        Chen ES, Stetson PD, Lussier YA, Hripcsak G, Friedman C. Detection of practice pattern trends through natural language processing of clinical narratives and biomedical literature. Proc AMIA Symp 2007.

56.        Hripcsak G, Kaushal R, Johnson KB, Ash JS, Bates DW, Block R, Frisse ME, Kern LM, Marchibroda J, Overhage JM, Wilcox AB. The United Hospital Fund Meeting on Evaluating Health Information Exchange. J Biomed Inform. 2007 Dec;40(6 Suppl):S3–10. PMC2140082

57.        Hripcsak G. Automated public health reporting: a familiar but cantankerous friend. Advances in Disease Surveillance 2007;3(4):1-2.

58.        Hripcsak G, Sengupta S, Wilcox A, Green RA. Emergency department access to a longitudinal medical record. J Am Med Inform Assoc 2007;14:235-8. PMC2213459


59.        Chen ES, Hripcsak G, Xu H, Markatou M, Friedman C. Automated acquisition of disease-drug knowledge from biomedical and clinical documents: an initial study. J Am Med Inform Assoc 2008;15:87-98. PMC2274872

60.       Zhou L, Parsons S, Hripcsak G. The evaluation of a temporal reasoning system in processing clinical discharge summaries. J Am Med Inform Assoc 2008;15:99–106. PMC2274869

61.         Chapman WW, Dowling JN, Hripcsak G. Evaluation of training with an annotation schema for manual annotation of clinical conditions from emergency department reports. Int J Med Inform 2008;77:107–13.

62.        Sengupta S, Calman NS, Hripcsak G. A model for expanded public health reporting in the context of HIPAA. J Am Med Inform Assoc 2008;15:569-74.

63.        Cao H, Melton GB, Markatou M, Hripcsak G. Use of abstracted patient-specific features to assist an information-theoretic measurement to assess similarity between medical cases. J Biomed Inform 2008;41:882-8. PMC2584163

64.       Lai AM, Parsons S, Hripcsak G. Fuzzy temporal constraint networks for clinical information. Proc AMIA Symp 2008:374-8. PMC2655952

65.        Gold S, Elhadad N, Zhu X, Cimino JJ, Hripcsak G. Extracting structured medication event information from narrative clinical notes. Proc AMIA Symp 2008:237-41. PMC2655993

66.       Morrison FP, Li L, Lai A, Hripcsak G. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes? J Am Med Inform Assoc 2008;16:37-9. PMC2605586


67.        Hripcsak G, Elhadad N, Chen C, Zhou L, Morrison FP. Using empirical semantic correlation to interpret temporal assertions in clinical texts. J Am Med Inform Assoc 2009;16:220-7. PMC2649319

68.       Wang X, Hripcsak G, Markatou M, Friedman C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc 2009;16:328-37. PMC2732239

69.       Sinha A, Hripcsak G, Markatou M. Large data sets in biomedicine: analytical issues. J Am Med Inform Assoc 2009;16:759-67. PMC3002128

70.        Hripcsak G, Soulakis ND, Li L, Morrison FP, Lai AM, Friedman C, Calman NS, Mostashari F. Syndromic surveillance using ambulatory electronic health records. J Am Med Inform Assoc 2009;16:354-61. PMC2732227

71.         Wang X, Hripcsak G, Friedman C. Characterizing environmental and phenotypic associations using information theory and electronic health records. BMC Bioinformatics 2009;10(Suppl 9):S13. PMC2745684

72.        Morrison FP, Sengupta S, Hripcsak G. Using a pipeline to improve de-identification performance. Proc AMIA Symp 2009;447-51. PMC2815438

73.        Albers DJ, Hripcsak G. An information-theoretic approach to the phenome (abstract). AMIA Summit on Translational Bioinformatics; 2009 March 15-17; San Francisco, CA; 2009.

74.        Albers DJ, Hripcsak G. Characterizing EHR laboratory data with information theoretic predictability (poster). Proc AMIA Symp 2009;759.

75.        Weng C, Smiley R, Flood P, Cheng B, Friedman C, Hripcsak G. A phenome-wide association study through secondary uses of clinical and research data (poster). 2009 AMIA Summit on Translational Bioinformatics.

76.        Zhu X, Gold S, Lai A, Hripcsak G, Cimino JJ. Using Timeline Displays to Improve Medication Reconciliation. International Conference on eHealth, Telemedicine, and Social Medicine (eTELEMED 2009), 2009 February 1-7; Cancun, Mexico, 2009.


77.        Albers DJ, Hripcsak G. A statistical dynamics approach to the study of human health data: resolving population scale diurnal variation in laboratory data. Physics Letters A 2010;374:1159-64. PMC2882798

78.        Khiabanian H, Holmes AB, Kelly BJ, Gururaj M, Hripcsak G, Rabadan R. Signs of the 2009 influenza pandemic in the New York-Presbyterian electronic health records. PLoS ONE 2010;5(9): e12658. PMC2936568

79.        Botsis T, Anagnostou V, Hartvigsen G, Hripcsak G, Weng C. Developing a multivariable prognostic model for pancreatic endocrine tumors using the clinical data warehouse resources of a single institution. Appl Clin Inform 2010;1:38-49. PMC3087306

80.       Albers DJ, Hripcsak G. EHR dynamics and stratification (abstract). AMIA Summit on Translational Bioinformatics; 2010 March 10-12; San Francisco, CA; 2010.

81.         Albers DJ, Hripcsak G. Aggregating EHR subpopulations to leverage EHR population size for calculating temporal information theoretic quantities (abstract). Proc AMIA Symp 2010.

82.        Wang X, Chase H, Markatou M, Hripcsak G, Friedman C. Selecting information in electronic health records for knowledge acquisition. J Biomed Inform 2010;43:595-601. PMC2902678

83.        Perotte A, Hripcsak G. Using the entropy of ICD9 documentation across patients to characterize disease chronicity (abstract). Proc AMIA Symp 2010.


84.       Hripcsak G, Albers DJ, Perotte A. Exploiting time in electronic health record correlations. J Am Med Inform Assoc 2011;18:Suppl 1 i109-i115. Published Online First 2011 Nov 23. doi:10.1136/amiajnl-2011-000463.

85.        Hripcsak G, Knirsch C, Zhou L, Wilcox A, Melton GB. Bias associated with mining electronic health records. J Biomed Discov Collab 2011;6:48-52. PMC3149555

86.       Kleinberg S, Hripcsak G. A review of causal inference for biomedical informatics. J Biomed Inform 2011;44:1102-12.

87.        Wilcox AB, Chen YH, Hripcsak G. Minimizing electronic health record patient-note mismatches. J Am Med Inform Assoc 2011;18(4):511-4. PMC3128397

88.       Hripcsak G, Vawdrey DK, Fred MR, Bostwick SB. Use of electronic clinical documentation: time spent and team interactions. J Am Med Inform Assoc 2011;18:112-7. PMC3116265

89.       Harpaz R, Perez H, Chase HS, Rabadan R, Hripcsak G, Friedman C. Biclustering of adverse drug events in FDA’s spontaneous reporting system. Clin Pharmacol Ther 2011;89:243-50. PMC3282185

90.       Albers DJ, Schmidt M, Hripcsak G. Population physiology: conjoining EHR dynamics with physiological modeling (abstract). In: AMIA Summit on Translational Bioinformatics; 2011 March 7-9; San Francisco, CA; 2011.

91.         Kleinberg S, Hripcsak G. Understanding variable representation for causal inference in EHRs (poster). AMIA Summit on Translational Bioinformatics; 2011 March 7-9; San Francisco, CA; 2011.

92.        Perotte A, Hripcsak G. Using Density Estimates to Aggregate Patients and Summarize Disease Evolution (poster). AMIA Summit on Translational Bioinformatics; 2011 March 7-9; San Francisco, CA; 2011;138.

93.        Hripcsak G, Albers DJ. Electronic health record (EHR) dynamics: an introduction (abstract). Society for Industrial and Applied Mathematics Conference on Applications of Dynamical Systems; 2011 May 22-26; Snowbird, UT; 2011.

94.       Hripcsak G, Albers DJ, Perotte A. Using lagged linear correlation to find relationships between laboratory values and clinician concepts (abstract). AMIA Summit on Translational Bioinformatics; 2011 March 7-9; San Francisco, CA; 2011.

95.        Albers DJ, Hripcsak G. Macroscopic physiology (abstract). Society for Industrial and Applied Mathematics Conference on Applications of Dynamical Systems; 2011 May 22-26; Snowbird, UT; 2011.


96.       Hripcsak G, Albers DJ. Next-generation phenotyping of electronic health records. J Am Med Inform Assoc 2012;20:117-21.

97.        Vawdrey D, Hripcsak G. Publication bias in clinical trials of electronic health records. J Biomed Inform 2012;46:139-41.

98.       Mamykina L, Vawdrey D, Stetson P, Zheng K, Hripcsak G. Clinical Documentation: Composition Or Synthesis? J Am Med Inform Assoc 2012;19:1025-31.

99.       Albers DJ, Hripcsak G, Schmidt M. Population physiology: leveraging electronic health record data to understand human endocrine dynamics. PLoS ONE 2012;7(12):e48058. doi:10.1371/journal.pone.0048058.

100.    Albers DJ, Hripcsak G. Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations. Chaos 24 January 2012;22:013111; doi:10.1063/1.3675621.

101.      Albers DJ, Hripcsak G. Estimation of time-delayed mutual information and bias for irregularly and sparsely sampled time-series. Chaos, Solitons & Fractals 2012;45:853-60.

102.     Hripcsak G. Visualizing the Operating Range of a Classification System. J Am Med Inform Assoc 2012;19:529-32.

103.     Hripcsak G, Albers D. Interpreting lagged linear correlation and using range to prioritize (abstract). AMIA Summit on Translational Bioinformatics; 2012 March 19-21; San Francisco, CA; 2012.

104.    Albers D, Claassen J,  Perotte A, Kleinberg S, Hripcsak G. Using NICU data to understand physiology and identify damage in patients with acute brain injury (poster). AMIA Summit on Translational Bioinformatics; 2012 March 19-21; San Francisco, CA; 2012.

105.     Kleinberg S, Hripcsak G. Automated temporal causal inference from EHR data (abstract). AMIA Summit on Translational Bioinformatics; 2012 March 19-21; San Francisco, CA; 2012.

106.    Maurer MS, Albers D, Perotte A, Chen C, Hripcsak G. Hemoconcentration is associated with lower mortality post hospitalization for heart failure (poster). American College of Cardiology 61st Annual Scientific Session and ACC-i2 with TCT; 2012 March 24-27; Chicago, IL; 2012.

107.     Sedigh-Sarvestan M, Albers DJ, Gluckman BJ. Data assimilation of glucose dynamics for use in the intensive care unit. 34th Annual International Conference of the Engineering in Medicine and Biology Society; 2012 Aug 28 – Sep 1; San Diego, CA; 2012.

108.    Maurer MS, Albers D, Perotte A, Chen C, Hripcsak G. Hemoconcentration is associated with lower mortality post hospitalization for heart failure (poster). American College of Cardiology 61st Annual Scientific Session and ACC-i2 with TCT; 2012 March 24-27; Chicago, IL; 2012.


109.    Albers DJ, Claassen J, Schmidt M, Hripcsak G. A methodology for detecting and exploring non-convulsive seizures in patients with SAH. 2013 International Symposium on Nonlinear Theory and its Applications (NOLTA2013); 2013 September 8-12; Santa Fe, CA; 2013.

110.      Overby CL, Weng C, Haerian K, Perotte A, Friedman C, Hripcsak G. Evaluation considerations for EHR-based phenotyping algorithms: a case study for drug-induced liver injury. AMIA Summit on Translational Bioinformatics; 2013 March 18-20; San Francisco, CA; 2013.

In press

111.       Perotte A, Hripcsak G. Temporal properties of diagnosis code time series in aggregate. IEEE Transactions on Information Technology in Biomedicine, in press.

112.      Collins SA, Cato K, Albers D, Scott K, Stetson PD, Bakken S, Vawdrey DK. Relationship between nursing documentation and mortality. J Am Med Inform Assoc, accepted for publication.

113.      Boland MR, Hripcsak G, Albers DJ, Wei Y, Wilcox AB, Wei J, Li J, Lin S, Breene M, Myers R, Zimmerman J, Weng C. Discovering medical conditions associated with periodontitis using linked electronic health records. Journal of Clinical Peridontology, in press.

114.      Claassen J, Perotte A, Albers D, Kleinberg S, Schmidt JM, Tu B, Badjatia N, Lantigua H, Hirsch LJ, Mayer SA, Connoly ES, Hripcsak G. Nonconvulsive seizures after subarachnoid hemorrhage: multimodality detection and outcomes. Annals of Neurology, in press.