The following are examples of projects on which I would be happy to work.
Interested students should contact me (noemie @ dbmi.columbia.edu).
HARVEST - An Intelligent Summarization System for the Electronic Health Record.
The goal of this project is to build a tool that automatically generates a comprehensive and up-to-date summary of the information present in a patient record. It harvests information either present in the structured part of the EHR, such as lab values, as well as in the narrative text entered by physicians. We are looking for P&S students to help us conduct studies with physicians to evaluate our prototype, understand better the information needs of clinicians using the EHR, and ultimately improving our tools. Working with the HARVEST team will provide interested students with in-depth experience of informatics and its impact on medicine.
Patients and health consumers rely more and more on online communities to
obtain both emotional and informational support. Unfortunately, mailing lists
and forums are not always equipped with appropriate tools to access information
easily. The goal of this project is to identify the internal structure of these
posts,and cast the problem as a multi-label classification problem.
Preference
will go to students with experience in natural language processing,
computational modeling, and statistical learning.
Keywords: health consumers, sentiment analysis, topic classification, discourse
representation, multi-label classification.
The goal of this project is to implement a tagger for identifying medication
information from clinical notes. While there are lexicons for drug names, they
are not always complete and up to date. Furthermore, drug names are often
mispelled in clinical notes. In addition to the drug name, full medication
information includes a route, a dosage, and a frequency.
We already have
implemented a parser, which
relies on hand-built rules. For this project, we want to apply a more complex
machine learning approach to identify full medication information in different
contexts.
Preference
will go to students with experience in natural language processing,
computational modeling, and statistical learning.
Keywords: electronic patient record, name entity recognition, sequence
labeling.
This project focuses on articles from Wikipedia and Simple English Wikipedia in the health
domain. Both sites contain pages targeted at health consumers, but the
language and the content of Simple English Wikipedia articles is simpler (hence
the name...). There are two parts to this project: (1) We want to build a
corpus of pairs of articles specific to diseases. This is a challenge in
itself, as we want to pair the articles automatically so we can get a large
number of pairs. (2) Once the corpus is built, we want to align segments of
texts from the pairs based on how much content they share. This project is
useful because it enables us to study how information can be conveyed at
different levels of linguistic complexity. From an informatics standpoint, the
project will help us understand better the challenges of health literacy.
Preference
will go to students with experience in natural language processing,
computational modeling, and statistical learning.
Keywords: health literacy, paraphrasing, text similarity metrics, alignment
methods, Wikipedia.
Last Updated: 06/2009
noemie @ dbmi.columbia.edu