Medication doses, one of the determining factors in medication safety and effectiveness, are present in the literature, but only in free-text form. We set out to determine if the systems developed for extracting drug prescription information from clinical text would yield comparable results on scientific literature and if sequence-to-sequence learning with neural networks could improve over the current state-of-the-art. We developed a collection of 694 PubMed Central documents annotated with drug dose information using the i2b2 schema. We found that less than half of the drug doses are present in the MEDLINE/PubMed abstracts, and full-text is needed to identify the other half. We identified the differences in the scope and formatting of drug dose information in the literature and clinical text, which require developing new dose extraction approaches. Finally, we achieved 83.9% recall, 87.2% precision and 85.5% F1score in extracting complete drug prescription information from the literature.

Learning Objective 1: Understand the current issues in drug prescription information extraction from the literature.

Learning Objective 2: Learn challenges and possible solutions in drug prescription information extraction from the literature.


Dina Demner-Fushman (Presenter)

James Mork, NLM
Willie Rogers, NLM
Sonya Shooshan, NLM
Laritza Rodriguez, NLM
Alan Aronson, NLM

Presentation Materials: