>

Biomedical data repositories require governance for artificial intelligence/machine learning applications at every step

Clayton, E. W., Rose, S., Nebecker, C., Novak, L., Bensoussan, Y. E., Chen, Y., Collins, B. X., Cordes, A., Evans, B. J., Ferryman, K. S., Hurst, S., Jiang, X., Lee, A. Y., McWeeney, S., Parker, J., Bélisle-Pipon, J.-C., Rosenthal, E. S., Yin, Z., Yracheta, J. M., & Malin, B. A. (2025). Biomedical data repositories require governance for artificial intelligence/machine learning applications at every stepJAMIA Open8(6), ooaf134. https://doi.org/10.1093/jamiaopen/ooaf134

This article examines the experience of the NIH’s Bridge2AI Program, which funded four large biomedical and behavioral datasets designed to be well documented and ready for use with artificial intelligence (AI) and machine learning (ML). The goal of these datasets is to encourage responsible and effective use of AI in research, but building them raised many ethical, legal, social, and practical challenges. The authors describe the key steps involved in creating and managing these AI-ready datasets, including deciding which data to collect and why, responding to public concerns, handling participant consent based on how the data were obtained, ensuring responsible future use, determining where and how data are stored, clarifying how much control participants have over data sharing, and setting rules for data access and downloading.

Across these steps, the projects faced important questions about long-term data storage, future uses of the data, and how to balance openness with privacy and participant protection. The authors highlight the different choices made by the four projects, such as how they gathered public input, selected data storage solutions, and defined criteria for who can access and download the data. Although the governance approaches varied, common themes emerged, suggesting shared best practices.

Overall, the article summarizes key lessons learned from the Bridge2AI Program about how to collect, manage, and govern large datasets intended for AI and ML. These insights can guide future initiatives in designing datasets that are not only technically useful for AI, but also ethically sound, socially responsible, and trustworthy.

Figure 1.

Steps in governance of data collection and decision-making and responsible use for the development of AI with greater attention to public concerns throughout. The first 2 steps—promoting responsible selection—address the primary work of the DGPs, while the remaining 4 steps—promoting responsible use—are crucial factors the DGPs must consider.

Explore Story Topics