We propose a frame-based natural language processing (NLP) method that extracts cancer-related information from clinical narratives. We focus on three frames: cancer diagnosis, cancer therapeutic procedure, and tumor description. We utilize a deep learning-based approach, bidirectional Long Short-term Memory (LSTM) Conditional Random Field (CRF), which uses both character and word embeddings. The system consists of two constituent sequence classifiers: a frame identification (lexical unit) classifier and a frame element classifier. The classifier achieves an F1 of 93.70 for cancer diagnosis, 96.33 for therapeutic procedure, and 87.18 for tumor description. These represent improvements of 10.72, 0.85, and 8.04 over a baseline heuristic, respectively. Additionally, we demonstrate that the combination of both GloVe and MIMIC-III embeddings has the best representational effect. Overall, this study demonstrates the effectiveness of deep learning methods to extract frame semantic information from clinical narratives.

Learning Objective 1: After participating in this session, the learner should be better able to:

1. Learn the current trend regarding deep learning based natural language processing(NLP) system for extracting concepts and relations and get familiar with the challenges and potential methods to improve the performance of NLP system.
2. Learn the concept of frame semantics and how to apply frame semantics into the development of NLP system.
3. Understand the current progress regarding cancer-related information extraction from clinical narratives and be capable to provide advice to enlarge the scope of cancer-related knowledge


Yuqi Si (Presenter)
The University of Texas Health Science Center at Houston

Kirk Roberts, The University of Texas Health Science Center at Houston

Presentation Materials: