About
I am a a postdoctoral Computing Innovation Fellow at Columbia University, working with George Hripcsak in the Department of Biomedical Informatics.
As of May 2010, I completed my PhD in Computer Science at NYU, where I was advised by Bud Mishra, and a member of the Bioinformatics Group. Before that I was an undergraduate at NYU in Computer Science and Physics.
My broad research interests are causality, bioinformatics, logic and reasoning. At the core of all of these though is a fascination with time and temporal data. My current work combines these areas, uniting temporal logic and tools from computer science with philosophical theories of causality to solve biomedical problems. While I have previously applied these methods to stock return time series as well as to political speeches and popularity ratings, I currently work on inference using electronic heath record data (in collaboration with Columbia University Medical Center and Geisinger Health System). These longitudinal records allow insight into the health of large populations across their lifespan and could allow us to find not just whether a disease like lung cancer is caused by smoking, but exactly how long it will take to develop.
Research overview
When we make causal inferences -- whether these are in finance, politics, or bioinformatics -- we generally aim to use the resulting relationships to predict future behavior or guide interventions that will produce (or prevent) particular outcomes. However, to trade stocks based on causal relationships we need to know when to take a position in the market; to engineer speeches we need to know what combination of phrases will sway voters; and to develop pharmaceuticals we need to understand the complex sets of factors influencing a disease as well as their temporal progression. In all of these cases, we must account for the complex and temporal nature of the causal relationships. We also need to know that our inferences are truly causal, since manipulation of a factor that is only correlated with an effect will not influence it. Finally, while we often want to infer general relationships such as that between smoking and lung cancer, we also aim to explain events that have already occurred, such as determining whether a particular patient's symptoms are due to lung cancer and if that cancer was caused by smoking.
To address these problems, I have developed a new approach to 1) identifying complex temporal causal relationships from observational time series data 2) finding causes of particular events. The approach centers on representation of causal relationships using probabilistic temporal logic formulas. At the type level, this allows explicit description of the time between cause and effect and automated testing of arbitrarily complex relationships using methods I developed for testing formulas directly in traces (without first inferring a model). After computing the average impact of a cause on its effect, we can use techniques for false discovery control to help determine which of the inferred causes are significant. At the token level, I have recently shown that we may use the significance of the general (type-level) relationships to reason about and assess the significance of potential token causes in a way that allows for incomplete information.
622 W 168th st, VC-5
New York, NY 10032
212.305.4510
samantha@dbmi.columbia.edu
News
- My book Causality, Probability, and Time is now under contract with Cambridge University Press.
- I will be presenting Automated Temporal Causal Inference from EHR Data at the AMIA Translational Summit in San Francisco.
- Our paper reviewing causal inference in biomedical informatics was just published in JBI (Journal of Biomedical Informatics) and is available online.
- My CI fellowship was renewed for a second year.
- I will be presenting Temporal Token Causal Explanation at the Causality and Explanation in the Sciences conference in Ghent.
- My paper A Logic for Causal Inference in Time Series with Discrete and Continuous Variables will be presented as both a talk and a poster at IJCAI 2011 in Barcelona.
- We (Columbia & Geisinger) have received an NLM Computational Thinking contract for work on inference that will link structured and unstructured (text) EHR data, and apply these methods to CHF and CKD data.
- I have been awarded an NSF/CRA Computing Innovation (CIFellow) Fellowship. I will be working with George Hripcsak at Columbia University.