A computational phenotype is a set of clinically relevant and interesting characteristics that describe patients with a given condition. Various machine learning methods have been proposed to derive phenotypes in an automatic, high-throughput manner. Among these methods, computational phenotyping through tensor factorization has been shown to produce clinically interesting phenotypes. However, few of these methods incorporate auxiliary patient information into the phenotype derivation process. In this work, we introduce Phenotyping through Semi-Supervised Tensor Factorization (PSST), a method that leverages disease status knowledge about subsets of patients to generate computational phenotypes from tensors constructed from the electronic health records of patients. We demonstrate the potential of PSST to uncover predictive and clinically interesting computational phenotypes through case studies focusing on type-2 diabetes and resistant hypertension. PSST yields more discriminative phenotypes compared to the unsupervised methods and more meaningful phenotypes compared to a supervised method.

Learning Objective 1: To show how to use semi-supervised constraints to encourage patients with different disease statuses to belong to different phenotypes.

Learning Objective 2: To show how using partial, additional information yields more discriminative phenotypes.


Jette Henderson (Presenter)
The University of Texas at Austin

Huan He, Emory University
Bradley Malin, Vanderbilt University
Joshua Denny, Vanderbilt University
Abel Kho, Northwestern University
Joydeep Ghosh, The University of Texas at Austin
Joyce Ho, Emory University

Presentation Materials: