Common Data Elements (CDEs) are defined as "data elements that are common to multiple data sets across different studies" and provide structured, standardized definitions so that data may be collected and used across different datasets. CDE collections are traditionally developed prospectively by subject-matter and domain experts. However, there has been little systematic research and evidence to demonstrate how CDEs are used in real-world datasets and the subsequent impact on data discoverability. Our study builds upon previous mapping work to investigate the number of CDEs that could be identified using a varying level of commonness threshold in a real-world data repository, the Database of Phenotypes and Genotypes (dbGaP). In an analyzed collection of mapped variables from 426 dbGaP studies, only 1,414 PhenX variables are observed (out of all 24,938 defined PhenX variables). Results include CDEs that are identified with varying levels of commonness thresholds. After the semantic grouping of 68 PhenX variables collected in at least 15 studies (n=15), we observed 32 truly "common" common data elements. We discuss benefits of post-hoc mapping of study data to a CDE framework for purposes of findability and reuse, as well as the informatics challenges of pre-populating clinical research case report forms with data from Electronic Health Record that are typically coded in terminologies aimed at routine healthcare needs.

Learning Objective 1: Understand in detail what are research common data elements


Vojtech Huser (Presenter)

Liz Amos, NIH

Presentation Materials: