Student Projects
[back to Noémie's page]


The following are examples of projects on which I would be happy to work. Interested students should contact me (noemie @ dbmi.columbia.edu).


HARVEST - An Intelligent Summarization System for the Electronic Health Record.

The goal of this project is to build a tool that automatically generates a comprehensive and up-to-date summary of the information present in a patient record. It harvests information either present in the structured part of the EHR, such as lab values, as well as in the narrative text entered by physicians. We are looking for P&S students to help us conduct studies with physicians to evaluate our prototype, understand better the information needs of clinicians using the EHR, and ultimately improving our tools. Working with the HARVEST team will provide interested students with in-depth experience of informatics and its impact on medicine.


Multi-label Classification of Topic and Attitude of User-written Web Posts

Patients and health consumers rely more and more on online communities to obtain both emotional and informational support. Unfortunately, mailing lists and forums are not always equipped with appropriate tools to access information easily. The goal of this project is to identify the internal structure of these posts,and cast the problem as a multi-label classification problem.
Preference will go to students with experience in natural language processing, computational modeling, and statistical learning.
Keywords: health consumers, sentiment analysis, topic classification, discourse representation, multi-label classification.


Extraction of Medication Information from Free-Text Clinical Notes

The goal of this project is to implement a tagger for identifying medication information from clinical notes. While there are lexicons for drug names, they are not always complete and up to date. Furthermore, drug names are often mispelled in clinical notes. In addition to the drug name, full medication information includes a route, a dosage, and a frequency.
We already have implemented a parser, which relies on hand-built rules. For this project, we want to apply a more complex machine learning approach to identify full medication information in different contexts.
Preference will go to students with experience in natural language processing, computational modeling, and statistical learning.
Keywords: electronic patient record, name entity recognition, sequence labeling.


Alignment of Simple and Complex Medical Texts

This project focuses on articles from Wikipedia and Simple English Wikipedia in the health domain. Both sites contain pages targeted at health consumers, but the language and the content of Simple English Wikipedia articles is simpler (hence the name...). There are two parts to this project: (1) We want to build a corpus of pairs of articles specific to diseases. This is a challenge in itself, as we want to pair the articles automatically so we can get a large number of pairs. (2) Once the corpus is built, we want to align segments of texts from the pairs based on how much content they share. This project is useful because it enables us to study how information can be conveyed at different levels of linguistic complexity. From an informatics standpoint, the project will help us understand better the challenges of health literacy.
Preference will go to students with experience in natural language processing, computational modeling, and statistical learning.
Keywords: health literacy, paraphrasing, text similarity metrics, alignment methods, Wikipedia.


Last Updated: 06/2009
noemie @ dbmi.columbia.edu