De-identification of patient data has been proposed as a solution to facilitate secondary uses of clinical data and protect patient data privacy. Automated approaches based on Natural Language Processing have been evaluated, allowing for much faster de-identification than manual approaches. This pilot study includes the evaluation of three versions of a new text de-identification application, with pattern matching, machine learning, and ensemble methods. A new annotated corpus and the 2014 i2b2 challenge corpus were used.

Learning Objective 1: Discover and understand various methods for text de-identification and their characterictics.


Stephane Meystre (Presenter)
Medical University of South Carolina

Paul Heider, Medical University of South Carolina
Youngjun Kim, Medical University of South Carolina
Andrew Trice, Clinacuity, Inc.
Gary Underwood, Clinacuity, Inc.

Presentation Materials: