Clinical Databases and Database Models (Introduction to Medical Informatics) (http://www.cpmc.columbia.edu/edu/textbook) LAST REVIEWED: 1 October 1997 database = collection of related data DATABASES FOR HEALTH CARE large 4 gigabyte laboratory result table therefore need performance (most important factor) complex nested data especially history, physical exam therefore need rich representation many different types of concepts not merely many instances of same things (eg, simple accounting vs. managing investments) but 1000s of types of things with different structures therefore need rich representation also need tools for maintenance keep vocabulary separate new concepts are added regularly not only new examples of old concepts (new lab test) but also whole new paradigms (care plans) therefore need flexible and extensible structure multiple views: fundamental differentiation clinical care - within a patient clinical research - across patients (this difference usually results in 2 DBs) storage and retrieval are very different want to store clusters of data based upon how and when it was collected want to retrieve based upon medical relevance: chronologic order by type of data (eg, electrolytes) by problem (eg, anything about pneumonia) but it is not all bad news easy to partition the database for clinical care can achieve high performance (millisecond access times) INDEXED FILES organization a field contains a data element records are made of fields key fields index the records files are made of records each file contains records of similar structure files are identified by filenames indexed file data access is via programs (procedural) that specify the filename, constraints on the key fields, and which fields are of interest within a record DATABASE MANAGEMENT SYSTEMS want to store data without worrying about computer issues (eg, not want to worry about how paper is manufactured, ...) DBMS=tool (set of programs and definitions) that insulate the users of data (store and retrieval, programmers and clinicians, ...) from the details of implementation main benefit: independence from database implementation what is stored rather than how it is stored can add a field without changing a program how does a DBMS work? It uses a: "schema" computer-readable definition of the contents and organization of the database change one schema, not many programs (picture program or user -> schema -> disk) (model = type of database like relational, hierarchical, OO) SCHEMA vs VOCABULARY vocabulary tells you what things mean (represents concepts and relations among them) organization is based on conceptual definitions used for selecting terms and for self-maintenance schema tells you how data is stored in the database organization is based on storage and retrieval used for storage and retrieval of data uses terms defined in the vocabulary (as data) schema vs instance: schema = database structure instance = particular population of data in that structure example: vocabulary medical entity battery chem7 test potassium test sodium test substances potassium sodium specimen blood urine schema patient results (lab) chem7 primary time update time provider ID status sodium test's value potassium test's value ... chem7 ... note that schema can leave out information that is defined in the vocabulary sodium test => specimen=blood 1. in the database, only need to fill in what is specific to the patient 2. but the schema must include space for patient-specific information not defined in the vocabulary (update time) 3. example: sodium test => specimen=blood on retrieval vocabulary to find out what you want is called (in the schema) schema to find it in the database vocabulary to interpret the result tabular vs generic (name-value pair) schema (show diagram of two schemata) 1. example chem7 vs. MRN | Na | K | Cl code value ----------------- code value | | | ... ----------------- 2. one table for each conceptual medical relation simplest approach queries consist of straightfoward joins waste space due to sparse tables large number of tables (1000s) this is bad for maintenance and performance not very flexible (overall, may be better for clinical research) 3. use of generic tables table of name-value pairs parent-child pointers join table on itself to create nested data flexible - can add new data structures as needed but queries are more complex: must compare more rows separate header vs. component table allows some rigid columns along with flexibility (overall, may be better for clinical care) logical vs physical model user's or designer's view vs. how it is stored OTHER BENEFITS OF DBMS wrt medicine better organization of the data guide to how DB is arranged, force documentation richer set of relations among data make relations explicit, not implicit in programs well-defined operations over the relations (eg, lab test always belongs to a patient) simplify actions on the database offers declarative rather than procedural access multiple views there is a single physical organization but users require different views view = model, perspective, structure eg, MD wants lab tests by patient but billing wants lab tests by type for charging use of a subschema for each view simultaneous access = concurrency read one record in middle of its update read several records while half are updated (eg, bank) A read -> B update -> A update therefore use of locking (many medical systems do not worry about sophisticated locking) recovery what if problem in middle of complex update error, bug, system crash logical unit of work = one transaction always commit or rollback database integrity check data within fields (eg, type) check relations (eg, referential integrity) eg, lab test must belong to a patient security granting of privileges for specific parts of the database (records, fields) eg, only some can see salaries for specific operations (read, write) for specific views optimization for performance eg, caching tools forms tools for data entry report generation backup and recovery after crash CHOICES OF DBMSs network: collection of records (itself a collection of fields) connected to each other by links hierarchical: collection of rooted trees relational: relational algebra basic intro to SQL select from where normalization: decompose large tables to avoid redundancy, improve maintenance demographics table mrn gender_code dob name select dob, name, gender_code from demographics where mrn = 123 gender_translate table gender_code gender_text -------------------------- M male ... select gender_translate.gender_text from demographics, gender_translate where demographics.mrn = 123 and demographics.gender_code=gender_translate.gender_code show CPMC generic tables header, component tables show query for chem7 battery show query for single sodium test value show query for sodium > 141 show queries for CIS, step through result display screens demographics query, batteries query, test query object-oriented: uses methods, data hiding, inheritance CLINICAL EXAMPLES chem7 on 1993-01-01 sodium = 141 potassium = 4.3 chloride = 101 bicarbonate = 25 BUN = 12 creatinine = 1.0 glucose = 95 blood culture on 1993-01-02 direct exam: >100K per ml gram positive cocci in clusters <10K per ml gram negative rods 10K per ml E. coli AMP T/S TIM PIP CIP CFZ CFX CAZ A/S CFT >16 S >64 >64 <1 >16 >16 >16 R 16 R S R R S R R R R S 10K per ml E. coli AMP T/S TIM PIP CIP CFZ CFX CAZ A/S CFT <1 S <1 >64 <1 >16 >16 >16 R 16 S S S R S R R R R S >100K per ml Klebsiella pneumoniae AMP T/S TIM PIP CIP CFZ CFX CAZ A/S CFT >16 R >64 >64 <1 >16 >16 >16 R 16 R R R R S R R R R S patient-physician relations MD1 taking care of pt 123 on 1993-01-01 MD2 taking care of pt 123 on 1993-02-02 MD1 stops taking care of pt 123 on 1993-03-03 MD2 transfer pt 123 to MD3 on 1993-04-04 [pt 123 is discharged on 5/5 => implicit stop on MD3] MD2 taking care of which patients? pt 123 is cared for by which MDs? related reading: Blois MS. Medical records and clinical databases: what is the difference? M.D. Computing 1984;1(3):24-8.