Carol Friedman, Ph.D.
My primary research interests are natural language text processing (NLP), data mining, clinical knowledge representation, and developing clinical applications utilizing NLP.
My current work in natural language involves mainly two areas: i) development of methods for processing narrative patient reports in order to make clinical data accessible to other automated procedures - some current projects involve developing a statistical parser for the biomedical field, corpus annotation, and word sense disambiguation; ii) research involving development of applications utilizing information in clinical reports in collaboration with other faculty members and trainees - some current projects include developing novel methods for discovering unknown adverse drug events, data mining to discover associations between entities in clinical reports and to discover information concerning phenotypes, using NLP for summarizing the patient record, quality assurance for patients with asthma, using NLP for clinical trials recruitment.
Coded data is needed for applications such as decision-support, quality assurance, datamining, patient management, automated SNOMED, and UMLS encoding, vocabulary development tools, clinical research, and error detection. Although different types of clinical data are presently available online in textual form, data cannot be reliably retrieved from text. In order for data to be accessed appropriately, it must be in a structured form consisting of well-defined controlled vocabulary terms. The function of processing is to perform the extraction, structuring, and encoding of the underlying clinical information in the reports. The system we have developed is called MedLEE, and it has been integrated into the New York Presbyterian Hospital (NYPH) Clinical Information System. It was initially applied to radiological examinations of the chest and to mammograms, and has been operative since 1995. MedLEE has been extended to all of radiology, and also to pathology, echocardiograms, electrocardiograms, discharge summaries, office visits, and progress notes. MedLEE has been independently evaluated numerous times and has been shown to perform effectively for clinical applications.
Another area of my NLP research involves the biomolecular domain. An NLP system BioMedLEE has been developed based on adaption of MedLEE. It extracts and encodes biomedical entities and relations from the literature. The goal is to use the information as part of a tool to assist in genomics research. Continued development will address numerous interesting research issues. The most exciting research challenge will involve furthering drug discovery and the understanding of genetic causes of diseases by linking information in the clinical patient record to genomic information obtained from the literature.
A third area of research involves knowledge representation, because the clinical information in the patient reports and the biomedical information in the literature must be represented using a well-defined structure and well-defined symbolic terms. Knowledge representation issues involve balancing opposing requirements: completeness of expression and ease of access to the data once it has