Genomic dataset is the world’s largest, most diverse of its kind, integrated with other robust data
The National Institutes of Health's All of Us Research Program has significantly expanded its data to now include nearly a quarter million whole genome sequences for broad research use. About 45% of the data was donated by people who self-identify with a racial or ethnic group that has been historically underrepresented in medical research. The data expansion provides registered researchers access to the world’s largest and most diverse dataset of its kind paving the way to help advance health equity and uncover health care approaches better tailored to people's genes, lifestyles, and environments.
"When All of Us began national enrollment five years ago, we were excited about the promise of how we could advance health research,” said Josh Denny, M.D., M.S., chief executive officer of the All of Us Research Program. “Now, through a partnership with participants, researchers, and diverse communities across the country, we are seeing incredible progress towards powering scientific discoveries that can lead to a healthier future for all of us.”
In total, the expanded dataset provides researchers information from more than 413,450 participants. In addition to the whole genome sequences, this resource also includes data from surveys, electronic health records, physical measurements and Fitbit devices. Fitbit device data will now include information on sleep, in addition to activity, step count and heart rate. Sleep data, when used alongside participants’ electronic health record data, could be useful for studying how sleep patterns affect overall health and disease progression, including for conditions such as heart disease, high blood pressure, diabetes, depression and dementia.
The program has also released more than 1,000 detailed long-read sequenced genomes. Long-read sequences are a novel data type that have the potential to provide a more complete picture of the genome. Through the program’s secure, cloud-based platform for data analysis, the Researcher Workbench, registered researchers can use these data to study genetic variation and potentially identify genetic variants tied to certain conditions. Access to the Researcher Workbench is available to researchers from eligible organizations after they complete training and other requirements.
“For years, the lack of diversity in genomic datasets has limited our understanding of human health,” said Andrea Ramirez, M.D., M.S., chief data officer of the All of Us Research Program. “By engaging participants from diverse backgrounds and sharing a more complete picture of their lives – through genomic, lifestyle, clinical and social environmental data – All of Us enables researchers to begin to better pinpoint the drivers of disease.”
The program’s expansive dataset also allows researchers to better define important genetic risks for diseases. The whole genome sequence dataset alone includes variation at more than 1 billion locations, which is nearly one-third of the entire human genome. These small genetic variations could offer important clues to develop predictive markers for disease risk or explain differences in the effectiveness of certain drugs for different people.
The genomic data are also used to develop personalized health-related DNA results for participants. So far, more than 25,000 participants have requested to receive one or more of these reports detailing whether they have an increased risk for specific health conditions or how their body might process certain medications. These health-related DNA results are part of the program’s commitment to share information and return value to participants, while also making their data broadly available for research use.
More than 5,000 researchers have registered to use the All of Us Researcher Workbench. Registered researchers have access to an expanded set of tools to use in working with available data types. Additionally, featured workspaces are available to provide researchers examples of ways to segment and analyze the data, which can be used to kickstart or enhance their work.
All of Us aims to ultimately make data available from at least 1 million research participants across the U.S. and to engage more researchers over time. For more information on anticipated future data releases, please see the newly released All of Us Data Roadmap.
Learn more about becoming involved in this historic effort to advance medical breakthroughs and precision medicine at JoinAllofUs.org. Go to ResearchAllofUs.org to register to use the All of Us dataset.
Available data types include
- More than 413,350 survey responses,
- More than 337,500 physical measurements,
- More than 312,900 genotyping arrays,
- More than 287,000 electronic health records (EHRs),
- More than 245,350 whole genome sequences,
- More than 15,600 Fitbit records, and
- More than 1,000 long-read whole genome sequences
This article appears in the April 2023 issue of All of Us Research Roundup. Subscribe to receive future issues of the bimonthly researcher newsletter.
All of Us is a registered service mark of the U.S. Department of Health & Human Services (HHS).