Interpretable Deep Learning Model [viz demo]
Edward Choi, Mohammad Taha Bahadori, Andy Schuetz, Walter F. Stewart, Jimeng Sun. "RETAIN: Interpretable Predictive Model in Healthcare using Reverse Time Attention Mechanism", NIPS'16 [code] [video]
Accuracy and interpretation are two goals of any successful predictive models. Most existing works have to suffer the tradeoff between the two by either picking complex black box models such as recurrent neural networks (RNN) or relying on less accurate traditional models with better interpretation such as logistic regression. To address this dilemma, we present REverse Time AttentIoN model (RETAIN) for analyzing EHR data that achieves high accuracy while remaining clinically interpretable. RETAIN is a two-level neural attention model that can find influential past visits and significant clinical variables within those visits (e.g,. key diagnoses). RETAIN mimics physician practice by attending the EHR data in a reverse time order so that more recent clinical visits will likely get higher attention. Experiments on a large real EHR dataset of 14 million visits from 263K patients over 8 years confirmed the comparable predictive accuracy and computational scalability to the state-of-the-art methods such as RNN. Finally, we demonstrate the clinical interpretation with concrete examples from RETAIN.
Heart Failure Onset Prediction using Recurrent Neural Network
* Edward Choi; Andy Schuetz; Walter F Stewart; Jimeng Sun "Using recurrent neural network models for early detection of heart failure onset" Journal of the American Medical Informatics Association 2016;
doi: 10.1093/jamia/ocw112 [abstract, pdf, code]
Objective We explored whether use of deep learning to model temporal relations among events in electronic health records (EHRs) would improve model performance in predicting initial diagnosis of heart failure (HF) compared to conventional methods that ignore temporality.
Materials and Methods Data were from a health system’s EHR on 3884 incident HF cases and 28 903 controls, identified as primary care patients, between May 16, 2000, and May 23, 2013. Recurrent neural network (RNN) models using gated recurrent units (GRUs) were adapted to detect relations among time-stamped events (eg, disease diagnosis, medication orders, procedure orders, etc.) with a 12- to 18-month observation window of cases and controls. Model performance metrics were compared to regularized logistic regression, neural network, support vector machine, and K-nearest neighbor classifier approaches.
Results Using a 12-month observation window, the area under the curve (AUC) for the RNN model was 0.777, compared to AUCs for logistic regression (0.747), multilayer perceptron (MLP) with 1 hidden layer (0.765), support vector machine (SVM) (0.743), and K-nearest neighbor (KNN) (0.730). When using an 18-month observation window, the AUC for the RNN model increased to 0.883 and was significantly higher than the 0.834 AUC for the best of the baseline methods (MLP).
Conclusion Deep learning models adapted to leverage temporal relations appear to improve performance of models for detection of incident heart failure with a short observation window of 12–18 months.
Med2Vec: Multi-layer Representation learning for medical Concepts
* Edward Choi, Mohammad Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo, and Jimeng Sun. “Multi-layer Representation Learning for Medical Concepts.” KDD 16. (code)
Proper representations of medical concepts such as diagnosis, medication, procedure codes and visits from Electronic Health Records (EHR) has broad applications in healthcare analytics.
Patient EHR data consists of a sequence of visits over time, where each visit includes multiple medical concepts, e.g., diagnosis, procedure, and medication codes.
This hierarchical structure provides two types of relational information, namely sequential order of visits and co-occurrence of the codes within a visit.
In this work, we propose Med2Vec, which not only learns the representations for both medical codes and visits from large EHR datasets with over million visits, but also allows us to interpret the learned representations confirmed positively by clinical experts.
In the experiments, Med2Vec shows significant improvement in prediction accuracy in clinical applications compared to baselines such as Skip-gram, GloVe, and stacked autoencoder, while providing clinically meaningful interpretation.
Can Machine beat human in diagnosing disease?
Large amount of Electronic Health Record (EHR) data have been collected over millions of patients over multiple years. The rich longitudinal EHR data documented the collective experiences of physicians including diagnosis, medication prescription and procedures. We argue it is possible now to leverage the EHR data to model how physicians behave, and we call our model Doctor AI. Towards this direction of modeling clinical behavior of physicians, we develop a successful application of Recurrent Neural Networks (RNN) to jointly forecast the future disease diagnosis and medication prescription along with their timing. Unlike traditional classification models where a single target is of interest, our model can assess the entire history of patients and make continuous and multilabel predictions based on patients' historical data. We evaluate the performance of the proposed method on a large real-world EHR data over 260K patients over 8 years. We observed Doctor AI can perform differential diagnosis with similar accuracy to physicians. In particular, Doctor AI achieves up to 79% recall@30, significantly higher than several baselines. Moreover, we demonstrate great generalizability of Doctor AI by applying the resulting models on data from a completely different medication institution achieving comparable performance.
SPACESHIP: Scalable Health Analytic Systems
* Yuyu Zhang, Mohammad Bahadori, Hang Su, and Jimeng Sun, “FLASH: Fast Bayesian Optimization for Data Analytic Pipelines”, KDD 16 [code]
* Chen, Robert, Hang Su, Yi Zhen, Mohammed Khalilia, Daniel Hirsch, Michael Thompson, Tod Davis, Yue Peng, Sizhe Lin, Javier Tejedor-Sojo, Elizabeth Searles and Jimeng Sun. Cloud-based Predictive Modeling System and its Application to Asthma Readmission Prediction, AMIA 2015
* Kenney Ng, Amol Ghoting, Steven R. Steinhubl, Walter F. Stewart, Bradley Malin, and Jimeng Sun. “PARAMO: A PARAllel Predictive MOdeling Platform for Healthcare Analytic Research Using Electronic Health Records.” Journal of Biomedical Informatics. Accessed January 7, 2014. doi:10.1016/j.jbi.2013.12.012.
Phenotyping from Electronic Health Records (project page)
* RL Richesson, Jimeng Sun, J Pathak, AN Kho, and JC Denny, A survey of clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods, Artificial Intelligence in Medicine 71, 57-61
* Chen, You, Joydeep Ghosh, Cosmin Adrian Bejan, Carl A. Gunter, Siddharth Gupta, Abel N. Kho, David M. Liebovitz, Jimeng Sun, Joshua C. Denny, and Bradley Malin. “Building Bridges across Electronic Health Record Systems through Inferred Phenotypic Topics.” Journal of Biomedical Informatics 55 (2015): 82–93. doi:10.1016/j.jbi.2015.03.011
* Wang, Yichen, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel Kho, You Chen, Bradley A. Malin, and Jimeng Sun. “Rubik: Knowledge Guided Tensor Factorization and Completion for Health Data Analytics.” In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’15. ACM, 2015.
* Joyce Ho, Joydeep Ghosh, Jimeng Sun. Marble: High-throughput Phenotyping from Electronic Health Records via Sparse Nonnegative Tensor Factorization. ACM SIGKDD, 2014
* Ho, Joyce C., Joydeep Ghosh, and Jimeng Sun. “Extracting Phenotypes from Patient Claim Records Using Nonnegative Tensor Factorization.” In Brain Informatics and Health - International Conference, BIH 2014, Warsaw, Poland, August 11-14, 2014,Proceedings, 8609:142–51. Lecture Notes in Computer Science. Springer, 2014. doi:10.1007/978-3-319-09891-3_14.
* Ho, Joyce C., Joydeep Ghosh, Steven R. Steinhubl, Walter F. Stewart, Joshua C. Denny, Bradley A. Malin, and Jimeng Sun. “Limestone: High-Throughput Candidate Phenotype Generation via Tensor Factorization.” Journal of Biomedical Informatics 52 (2014): 199–211.
Summary: As the adoption of electronic health records (EHRs) has grown, EHRs are now composed of a diverse array of data, including structured information (e.g., diagnoses, medications, and lab results), molecular sequences, unstructured clinical progress notes, and social network information. There is mounting evidence that EHRs are a rich resource for clinical research, but they are notoriously difficult to leverage because of their orientation to healthcare business operations, heterogeneity across commercial systems, and high levels of missing or erroneous entries. Moreover, the interactions among different data sources within an EHR are challenging to model, hampering our ability to leverage traditional analytic frameworks. In recognition of this problem, various efforts have been untaken to transform EHR data into concise and meaningful concepts, or phenotypes. Yet, to date, these efforts have been ad hoc and labor intensive, resulting in specific phenotypes for specific environments. There is an urgent need for scalable phenotyping methods, but several major challenges must be addressed. The goal of this project is to address these challenges by developing a general computational framework for transforming EHR data into meaningful phenotypes with only modest levels of expert guidance.
* Sun, Jimeng, Fei Wang, Jianying Hu, and Shahram Edabollahi. “Supervised Patient Similarity Measure of Heterogeneous Patient Records.” ACM SIGKDD Explorations Newsletter 14, no. 1 (December 10, 2012): 16. doi:10.1145/2408736.2408740.
* Wang, Fei, Jianying Hu, and Jimeng Sun. “Medical Prognosis Based on Patient Similarity and Expert Feedback.” In ICPR, 1799–1802, 2012.
* Wang, Fei, Jimeng Sun, and Shahram Ebadollahi. “Composite Distance Metric Integration by Leveraging Multiple Experts’ Inputs and Its Application in Patient Similarity Assessment.” Statistical Analysis and Data Mining 5, no. 1 (2012): 54–69.
* ———. “Integrating Distance Metrics Learned from Multiple Experts and Its Application in Inter-Patient Similarity Assessment.” In SDM, 59–70, 2011
* Wang, Fei, Jimeng Sun, Jianying Hu, and Shahram Ebadollahi. “iMet: Interactive Metric Learning in Healthcare Applications.” In SDM, 944–55, 2011.
Summary: Patient similarity assessment is an important task in the context of patient cohort identification for comparative effectiveness studies and clinical decision support applications. The goal is to derive clinically meaningful distance metric to measure the similarity between patients represented by their key clinical indicators. How to incorporate physician feedback with regard to the retrieval results? How to interactively update the underlying similarity measure based on the feedback? Moreover, often different physicians have different understandings of patient similarity based on their patient cohorts. The distance metric learned for each individual physician often leads to a limited view of the true underlying distance metric. How to integrate the individual distance metrics from each physician into a globally consistent unified metric?
We describe a suite of supervised metric learning approaches that answer the above questions. In particular, we present Locally Supervised Metric Learning (LSML) to learn a generalized Mahalanobis distance that is tailored toward physician feedback. Then we describe the interactive metric learning (iMet) method that can incrementally update an existing metric based on physician feedback in an online fashion. To combine multiple similarity measures from multiple physicians, we present Composite Distance Integration (Comdi) method. In this approach we f rst construct discriminative neighborhoods from each individual metrics, then combine them into a single optimal distance metric. Finally, we present a clinical decision support prototype system powered by the proposed patient similarity methods, and evaluate the proposed methods using real EHR data against several baselines.
Visual Analytics for Epilepsy Treatment Analysis
Anti-Epileptic Drug (AED) usage pattern of population (AED Line graph)
Anti-epileptic drug usage on individual patients (AEDviz)
Risk distribution of Epilepsy patient population (t-SNE plot)
Deep learning methods for predictive modeling in healthcare start showing promis- ing performance, but two important challenges remain:
• Data insufficiency: Often in healthcare predictive modeling, the sample size is insufficient for deep learning methods to achieve satisfactory results.
• Interpretation: The representations learned by deep learning models should align with the ground truth medical knowledge.
To address these challenges, we propose GRAM, a GRaph-based Attention Model that combines electronic health records and a medical ontology. By dynamically combining the ancestors from the medical ontology via attention mechanism, GRAM learns interpretable representations for medical concepts leveraging medical concepts in EHR and the hierarchical structure of the ontology.
We conduct predictive modeling experiments for disease progression and heart failure prediction using GRAM and other baselines. Compared to the basic recurrent neural network (RNN), GRAM achieves 10% improved accuracy for predicting less common diseases and 3% improved area under the curve for predicting heart failure with small training data. Unlike other baseline methods, the resulting concept representations of GRAM are clinically meaningful, and well aligned with the structure the ontology. Finally, GRAM can exhibit intuitive attention behaviors by adaptively generalizing to higher level concepts when facing data insufficiency at the lower level concepts.