Probability and Uncertainty (Introduction to Medical Informatics) (http://www.cpmc.columbia.edu/edu/textbook) LAST REVIEWED: 26 November 1997 NEED FOR PROBABILITY Tversky A, Kahneman D. Judgement under uncertainty: heuristics and biases. Science 1974;185:1124-31. try to go from findings to diagnosis or diagnosis to therapy no definite answer each patient is different too many parameters and combinations non-deterministic anyway people use heuristics to make decisions despite uncertainty medicine is very uncertain eg, heart disease - should you operate or use medicines heuristic: empiric method (esp rule of thumb) to find solution vs. rigorous algorithm good up to a point but occasionally, things go wrong three common heuristics 1. representativeness most basic pattern matching if A looks like a B, then A probably is a B eg, A is gregarious and enthusiastic what is probability that A is a marketing rep problems: insensitive to prior probability given 70% engineer, 30% lawyer or reverse, what is chance that someone good at math is an engineer insensitive to sample size two hospitals, one delivers 100 babies per day, and another delivers 10; which one has a greater chance that 60% of babies will be boys misconceptions about chance which is more likely: boy-girl-boy or boy-boy-boy insensitive to difficulty of predictability try to predict future (competence of student teacher in 5 years) based on unreliable data (one trial lesson) illusion of validity too much confidence in result despite known problems; often base confidence on internal consistency (grades = Bs thought more predictable instead As and Cs), which is usually redundant misconceptions of regression "regression toward the mean" - if you pick outliers, they will be closer to average on retesting (eg, flight training found students did better after punishment for bad landing, but worse after praise for good landing) 2. availability lets your heuristic evolve with experience assess frequency by ease of remembering examples eg, how often do people have heart attacks problems: biases due to the retrievability of instances more men or women in list of celebrities - will tend to pick gender with more famous people; or probability of accident based on seeing one vs reading about one biases due to effectiveness of a search set are there more words with r as first or third letter - people find it easier to recall by first letter biases of imaginability how many distinct subcommittees of size N given 10 people total - it is easier to imagine 2 person subcommittees than 8 person ones, but size is the same illusory correlation pre-existing bias selects instances for recall; show pictures drawn by mental patients, see shifty eyes in paranoia patients' drawings 3. adjustment and anchoring relate the unfamiliar to the familiar assess initial value then adjust for this case problems: insufficient adjustment people tend not to adjust far enough; eg asked percentage of African nations in UN, gave random number as starting point, 10 -> 25 but 65 -> 45 even though subjects knew it was random biases in evaluation of conjunctive&disjunctive events ORs are underestimated ANDs are overestimated eg, underestimate failure of complex system (reactor) anchoring in the assessment of subjective prob. distributions people tend to pick overly narrow confidence limits risk people are risk adverse for gains, risk taking for losses disease will kill 600, try two therapies (save 200) vs (1/3 save 600, 2/3 lose 600) 72% vs 28% (lose 400) vs (1/3 no one dies, 2/3 600 die) 22% vs 78% therefore attempt to use rigorous methods to come to BASIC PROBABILITY elementary events occur in some sample space S P() = probability distribution on S = mapping of S to real numbers such that P(A) >= 0 for some event A in S P(S) = 1 P(A or B) = P(A) + P(B) for 2 mutually exclusive events *probability of an event A happening is P(A) eg, on coin flip, P(heads) = 0.5 probability of an event not occurring is P(not A) P(not A) = 1 - P(A) eg, P(heads) = 1 - P(tails) = 1 - 0.5 = 0.5 probability of either of 2 events happening is P(A or B) eg, on 2 flips, P(first heads or second heads) = 0.75 P(A or B) = P(A) + P(B) - P(A and B) if A and B are disjoint (can never occur together) P(A or B) = P(A) + P(B) eg, P(head or tail) = 0.5 + 0.5 = 1.0 probability of both of two events happening is P(A and B) eg, on 2 flips, P(first head and second head) = 0.25 if A and B are disjoint, P(A and B) = 0 A and B are independent iff P(A and B) = P(A) P(B) conditional probability P(A|B) is the probability that A will occur given that B has occurred eg, P(2 heads | first head) = 0.5 P(A|B) = P(A and B) / P(B) if A and B are independent P(A|B) = P(A) ie, the fact that B occurred has no effect on A eg, P(second head | first head) = P(second head) = 0.5 *DIAGNOSIS no test is perfect, no diagnosis is certain can only estimate probability of disease (prognosis...) definitions and terminology disease D test T findings, manifestations how good is a test (D=heart disease, T=positive stress test) TP = true positive = D and T FP = false positive = not D and T TN = true negative = not D and not T FN = false negative = D and not T accuracy = # correctly classified patients (TP+TN) / (TP + FP + TN + FN) one measure of how good a test is but accuracy is dependent on patient population patient population varies enormously eg, test always says false if none of your patients have disease, test is perfectly accurate; if all do, test has zero accuracy therefore use sensitivity and specificity sensitivity what proportion of patients WITH disease will have a POSITIVE result "true positive rate" P(T|D) TP / (TP + FN) specificity what proportion of patients WITHOUT disease will have a NEGATIVE result "true negative rate" P(not T|not D) TN / (TN + FP) if test is always negative, sensitivity = 0 specificity = 1 values same regardless of patient population prior probability prob of patient having disease given no other info "prevalence" P(D) = (TP + FN) / (TP + FP + TN + FN) want probability of having disease D given a test result T: posterior probability of D what proportion of patients with POSITIVE result HAVE disease "predictive value positive" P(D|T) TP / (TP + FP) posterior probability of not D what proportion of patients with NEGATIVE result DO NOT have disease "predictive value negative" P(not D|not T) TN / (TN + FN) but textbooks give probability of result given disease (ie, sensitivity and specificity) "most patients with pneumonia have fever" Bayes Theorem (simple form) convert from P(T|D) to P(D|T) P(D|T) = P(D and T) / P(T) ...defn of cond prob P(D and T) = P(D|T) P(T) ...rearrangement = P(T|D) P(D) P(D|T) = [P(D) P(T|D)] / P(T) ...one form of Theorem given event B = (B and A) or (B and not A) ...mut excl then P(T) = P(T and D) + P(T and not D) = P(D) P(T|D) + P(not D) P(T|not D) then substitute for P(T) in above Theorem form... (sens) (prior) post prob = ----------------------------------------- (sens) (prior) + (1 - spec) (1 - prior) P(T|D) P(D) P(D|T) = ----------------------------------- P(T|D) P(D) + P(T|not D) P(not D) simple form of Bayes Theorem assumes "conditional independence" P(T1|D) and P(T2|D) must be independent for all T1,T2 most tests are not conditionally independent eg, anemia and fatigue are related; a patient with one will tend to have the other simple form of Bayes Theorem assumes diseases are mutually exclusive ie, the patient has one disease at a time ie, assume all abnormalities come from one disease the more general form of Bayes Theorem circumvents both problems P(D|T1 and ... and Tk) = [P(T1 and ... and Tk|D) P(D)] / P(T1 and ... and Tk) which requires exponential number of probabilities, therefore simplify can simplify by assuming conditional independence where P(T1|T2, E) = P(T1|E) for some evidence E that shows that T1 and T2 really are independent Example (simple form): HIV test has sensitivity of 98%, specificity of 99%; assuming 1/1000 people are actually HIV+, what is the probability of being HIV+ given pos test P(D) = 0.001 P(T|D) = 0.98 P(not T|not D) = 0.99 P(T|not D) = 1 - 0.99 = 0.01 0.98 0.001 P(D|T) = ------------------------- 0.98 0.001 + 0.01 0.999 = 0.089 only 1 out of 11 people with a positive test are HIV+ eg, what if 1/10 people of HIV+ 0.98 0.1 P(D|T) = --------------------- 0.98 0.1 + 0.01 0.9 = 0.92 10 out of 11 people with a positive test are HIV+ graphical interpretation of sensitivity and specificity normal test- | test+ oo | ooo ooo | oooo TN oooo| ooooo ooFPo ------------------------+------------------------- xxxxx FN | xxxxx xxxx | TP xxxx xxx xxx diseased |xx by changing the +/- threshold, one can increase the sensitivity at the expense of specificity, or the converse if it is more important to get everyone who might have disease despite FPs, then move threshold to left (eg, in bacterial meningitis, do not want to miss anyone) if it is more important to avoid false diagnoses despite missing some true ones, then move threshold to right (eg, deciding whether to give a dangerous therapy) how do do tell which test is better, when you can change the sensitivity and specificity on both of them? receiver operator characteristic (ROC) curve plot sensitivity vs 1-specificity | xxxxxxxxxo | xxxxx o | xxx o | xx o | x o sens | x o | x o | x o |x o |x o x o x o +------------------------ 1 - specificity whichever is closer to the upper left corner is better test x is better than test o simple form of Bayes Theorem assumes "conditional independence" P(T1|D) and P(T2|D) must be independent for all T1,T2 otherwise you are counting information twice most tests are not conditionally independent eg, anemia and fatigue are related; a patient with one will tend to have the other simple form of Bayes Theorem assumes diseases are mutually exclusive ie, the patient has one disease at a time ie, assume all abnormalities come from one disease the more general form of Bayes Theorem circumvents both problems related reading: Tversky A, Kahneman D. Judgement under uncertainty: heuristics and biases. Science 1974;185:1124-31.