275 Million New Genetic Variants Identified in NIH Precision Medicine Data

February 19, 2024
Student researcher works at computer

All of Us Researcher Workbench includes nearly 250,000 whole-genome sequences for broad research use  

Study details the unprecedented scale, diversity, and power of the All of Us Research Program

Leer en español  

Researchers have discovered more than 275 million previously unreported genetic variants, identified from data shared by nearly 250,000 participants of the National Institutes of Health’s All of Us Research Program. Half of the genomic data are from participants of non-European genetic ancestry. The unexplored cache of variants provides researchers new pathways to better understand the genetic influences on health and disease, especially in communities who have been left out of research in the past. The findings are detailed in Nature, alongside three other articles in Nature journals.

Nearly 4 million of the newly identified variants are in areas that may be tied to disease risk. The genomic data detailed in the study are available to registered researchers in the Researcher Workbench, the program’s platform for data analysis.

“As a physician, I’ve seen the impact the lack of diversity in genomic research has had in deepening health disparities and limiting care for patients,” said Josh Denny, M.D., M.S., chief executive officer of the All of Us Research Program and an author of the study. “The All of Us dataset has already led researchers to findings that expand what we know about health – many that may not have been possible without our participants' contributions of DNA and other health information. Their participation is setting a course for a future where scientific discovery is more inclusive, with broader benefits for all.”

To date, more than 90% of participants in large genomics studies have been of European genetic ancestry. NIH Institute and Center directors noted in an accompanying commentary article in Nature Medicine that this has led to a narrow understanding of the biology of diseases, and impeded the development of new treatments and prevention strategies for all populations. They emphasize that many researchers are now utilizing the All of Us dataset to advance precision medicine for all.

For example, in a companion study published in Communications Biology, a research team led by Baylor College of Medicine, Houston, reviewed the frequency of genes and variants recommended by the American College of Medical Genetics and Genomics across different genetic ancestry groups in the All of Us dataset. These genes and variants mirror those in the program’s Hereditary Disease Risk research results offered to participants. The authors found significant variability in the frequency of variants associated with disease risk between different genetic ancestry groups and compared with other large genomic datasets.

While more research is needed before these findings can be used to tailor genetic testing recommendations for specific populations, researchers believe the difference in the number of these variants may be influenced by past studies’ limited diversity and their disease-focused approach to participant enrollment, rather than a difference in the prevalence of the variants.

In a separate study, investigators with the eMERGE program tapped the All of Us dataset to calibrate and implement 10 polygenic risk scores for common diseases across diverse genetic ancestry groups. These scores calculate an individual’s risk of disease by taking into account genetic and family history factors. Without accounting for diversity, polygenic risk scores could cause false results that misrepresent a person’s risk for disease and create inequitable genetic tools. Without the diversity of the All of Us data, these polygenic risk scores would have only been applicable to some of the population.

All of Us values intentional community engagement to ensure that populations historically underrepresented in biomedical research can also benefit from future scientific discoveries,” said Karriem Watson, D.H.Sc., M.S., M.P.H., chief engagement officer of the All of Us Research Program. “This starts with building awareness and improving access to medical research so that everyone has the opportunity to participate.”

More than 750,000 people have enrolled in All of Us to date. Ultimately, the program plans to engage at least one million people who reflect the diversity of the United States and contribute data from DNA, electronic health records, wearable devices, surveys, and more over time. The program regularly expands and refreshes the dataset as more participants share information. 

To learn more about All of Us’ scientific resources, visit researchallofus.org.


February 23, 2024

Statement from Josh Denny, MD, MS, Chief Executive Officer of All of Us Research Program

Our focus in the All of Us Research Program is to advance health equity by engaging participants who reflect the diversity of the United States, including those historically underrepresented in medical research, to ensure scientific advances can be applied and distributed more equitably and reliably. We are grateful for the partnership of more than 760,000 participants nationwide who enable the progress we are making toward this goal together. We have more work to do and All of Us will continue to be on the leading edge.

A recent publication in Nature, authored by leaders within All of Us, has prompted conversation around how genetic ancestry and self-identified race and ethnicity should be analyzed and represented in large diverse research datasets. In this study, our team examined the program’s genomic dataset through several different lenses to convey the dimensionality and diversity of it -- the degree to which has not previously been reported. Many excellent points have been raised and this sort of dialogue is crucial to advance the entire scientific enterprise. 

Race and ethnicity are social constructs yet are often conflated with genetic similarity. The attempt in the study to represent both genetic similarities and self-identified race and ethnicity in Figure 2 raised this concern. More work is needed to address the intersecting constructs that exist when examining self-identified race and ethnicity, genetic similarity and factors like social determinants of health. As a research program, we strive to disentangle and accurately represent these dimensions of diversity.

The feedback highlights how quickly this field of research is evolving, as well as its complexity. We appreciate being a part of these discussions, as it is essential that we hear from leaders across the many domains of biomedical research and health. We look forward to continuing to engage with communities, have these challenging conversations and advance the understanding of health and disease, for the benefit of all.