Health question answering systems often depend on the initial step of question type classification. Practitioners face several modeling choices for this component alone. We evaluate the effectiveness of different modeling choices in both the embeddings and architectural hyper-parameters of the classifier. In the process, we achieve improved performance over previous methods, achieving a new best 5-fold accuracy of 85.3% on the GARD dataset. The contribution of this work is to evaluate the performance of sentence classification methods on the task of consumer health question type classification and to contribute a dataset of 2,882 medical questions annotated for question type.

Learning Objective 1: Learn the challenges associated with question answering and dialog systems when applied in a health setting and gain insight into how certain modeling choices affect the ability of a system to account for linguistic features.


William Kearns (Presenter)
University of Washington

Jason Thomas, University of Washington

Presentation Materials: