Researchers Invited to Give Feedback on Initial Dataset and Tools
In partnership with our participants—now nearly 350,000 and counting—we’re working to build one of the world’s largest and most diverse datasets to advance health research. Today, I’m happy to announce that we’ve opened our research platform, the All of Us Researcher Workbench, for beta testing. Now, researchers can begin using our initial dataset and tools in studies and tell us what’s working and what we can improve. This moment is an important step in our effort to accelerate new discoveries.
During this beta testing phase, researchers may notice that All of Us does things a little differently than other research programs, including data access. For starters, we aren’t waiting to share data until after participant recruitment and data collection have ended. We launched national enrollment just two years ago and continue to enroll new participants each week, on our way to our goal of one million. We’re also making regular updates to our protocol to add new data types over time, with data curation—a way of organizing data—ongoing throughout. This approach reflects our program’s iterative design. By sharing data early and often, we can get useful feedback to help make the All of Us resource more valuable as we go.
We’ve also adopted a “data passport” model to make the data broadly accessible. After researchers register with the program, agree to our rules, and complete our training on the responsible conduct of research, we will grant them permission to explore All of Us data for a wide range of studies, rather than determining access for all studies on a project-by-project basis. Ultimately, we expect All of Us to support thousands of studies on many different aspects of health and disease, leading to more individualized treatments and prevention strategies in the future.
Features and Limitations of the Beta Platform
This early version of our Researcher Workbench includes data generously shared with us from nearly 225,000 of our first participants, 75% of whom are from communities that are historically underrepresented in research, and more than 45% of diverse races and ethnicities. Researchers will find information from electronic health records (leveraging the OMOP Common Data Model); six initial surveys covering demographics, lifestyle factors, and overall health; and baseline physical measurements taken by program staff.
The platform uses a Jupyter Notebook environment to power in-depth analyses, with tools to help researchers set up collaborative workspaces and build customized cohorts. At this time, researchers (or their team members) will need experience with R or Python programming languages to conduct analyses on the platform. We do not yet support integrations with other statistical programs or software, but are working to expand analysis tools for future iterations of the Researcher Workbench.
The platform will grow more robust over time with additional data and tools, including genomics, wearable device data, and linkages to other datasets. We’re planning regular releases of new data.
The current version of our platform has some key limitations. Because participants take part in the program at different paces and we are still enrolling, we don’t have variables for all participants; in particular, survey completion rates vary, and the collection and harmonization of electronic health record data remain a work in progress. We have done some preliminary testing on biological plausibility of the data; other curation efforts are still underway.
In addition, we’ve blurred some of the data to protect participant privacy. While we already strip out names and other identifiers from participant data at the outset, we’ve made additional adjustments in the curation process. These include shifting dates and hiding or grouping the records of small clusters of participants to further reduce the risks of reidentification. These modifications may pose challenges for epidemiological studies or research on specific subcategories of people.
As another privacy measure, we require researchers to analyze data within the secure cloud-based All of Us platform. Researchers may never download individual-level program data on local computers.
As with any beta testing period, there will be technical bugs, and we’ll work through them. We rely on researchers’ feedback to help us identify usability issues, iterate with us to improve the data and tools, and plan future enhancements.
Currently, researchers with NIH eRA Commons accounts may apply for access if their institutions have signed a data use agreement with the program. Right now, any U.S.-based academic, nonprofit, or health care organization can enter into our data use agreement.
Bioinformatics and health services researchers will likely find the most value in our initial dataset, particularly for studies that evaluate the frequency of certain diseases or conditions. Researchers with a focus on health disparities and underrepresented populations will also find the current dataset useful, given its size and diversity.
After this initial stage of our beta phase, we will add other means of identity verification beyond eRA Commons and open the platform more broadly. Our program is committed to serving researchers of all kinds, and we’re already planning workshops and user studies to gather more input from additional research communities, including citizen and community scientists and researchers in the private sector.
We extend a warm welcome to researchers interested in exploring the Researcher Workbench. We want to design this platform with you, to make it the best resource it can be. Your comments and suggestions are central to that effort. Thank you in advance for being generous with feedback so we can improve.
If you’d like to learn more about the Researcher Workbench, please visit ResearchAllofUs.org. There you’ll get full details about our initial dataset and tools, along with more information about the data access process. You can also sign up for our Research Hub newsletter to receive regular emails on the Workbench and other news from the All of Us Data and Research Center.
Thanks for your support and stay tuned for more updates ahead!
Josh Denny, M.D., M.S.
Chief Executive Officer
All of Us Research Program