NIH Data Management and Sharing Plans

All National Institutes of Health (NIH) research grant proposals, both new and competing renewals, need to include a Data Management and Sharing plan, have costs accounted for in the budget, and have progress shared with NIH in annual reports. This page describes the Data Management and Sharing policy and makes recommendations regarding how our faculty can implement it for their NIH projects. You can send questions to ric@vanderbilt.edu.

Policy and Guidance

The new policy and guidance was presented to schools, centers, and staff throughout the winter of 2023. To request a presentation for your department or research group, please contact ric@vanderbilt.edu.

Overview

This policy applies only to research grants, not training grants, fellowships, infrastructure grants, instrument grants, nor non-competitive renewals.

The policy dictates how data generated using support from these grants must be managed and shared. Scientific Data is defined as “recorded factual material commonly accepted in the scientific community as of sufficient quality to validate and replicate research findings, regardless of whether the data are used to support scholarly publications.” The NIH definition excludes “data not necessary (or of sufficient quality) to validate and replicate research findings,” laboratory notebooks, preliminary analyses, and physical objects.

All new and competing grant proposals must include a plan based on the NIH DMS form template.

The plan must cover six elements:

Data Type
Related tools, software, and/or code
Standards
Data Preservation, access, and associated timelines
Access, distribution, and reuse considerations
Oversight of Data Management & Sharing

PIs are expected to maximize sharing, which typically mean depositing data and associated metadata into publicly accessible repositories, which in all cases must be associated with a “persistent unique identifier,” usually a web-based DOI code. The goal is that data should be deposited in a way that complies with the “F.A.I.R.” principle (findable, accessible, interoperable, and reusable).

Data must be shared at time of first publication or at the end of the project period, whichever comes first. Unpublished data (that meet the definition above) must be deposited and reported at end of project period even if they will end up in a published paper.

Best Practices for Sharing

Some grant programs, Institutes, Offices, or FOA may indicate particular data repositories to be used – follow any special instructions.
Prioritize using established discipline or data-type specific repositories to make it easy for people in your field to find (PDB, Genbank, etc.) This may include data that is part of a publication or it’s supporting/supplementary materials, as well as online pre-prints (e.g., BioRxiv). NIH supports many Scientific Data Repositories and you can also search the Registry of Research Data Repositories.
For small data sets generated by a graduate student, you may be able to include data in their Ph.D. dissertation (including appendix), which eventually will be assigned a DOI.
Otherwise, use “generalist” repositories.
Not all repositories are the same, so make sure you select on that has desirable repository characteristics (see FAQ).
If the repository charges a fee for storing your data, it will typically have a one-time data publishing cost that can be paid during the NIH project period and allow for sharing beyond the project period.

Selecting a Generalist Repository

We have identified 5 generalist repositories that meet the basic requirements of the NIH and are suitable to recommend: Harvard Dataverse, Figshare, Open Science Framework (OSF)*, Dryad, and Zenodo. The three in bold are recommended based on ease of use. OSF is distinct in that it is both easy to use and provides for planning throughout the data life cycle. The Vanderbilt Library provides additional information comparing repositories, and Zenodo published a comparison chart of generalist repositories.

Harvard Dataverse
Pros
- Free
- Vanderbilt Institutional login is available
- Each dataset can associate with personal identifiers (a persistent digital identifier (ex. ORCID iD) that you own and control)
- Very detailed User Guide
- Great dashboard for tracking your datasets
- Retain complete control over your data. Create customizable access to your data (metadata is always public no matter what level of restriction you set for your data. People can search and find your data. When restricted, they can request access, and you can grant or deny them access)
- Automatically generates a data citation to use in publications
Cons
- File size limit < 2.5 GB
- Total storage limit of 1 TB/user
Figshare
Pros
- Dedicated dashboard
- Can create projects and collections
- Has extensive tutorials and How-To articles for help using the service
- Integration with lab archives (electronic lab notebook)
- Has Figshare+ for larger datasets >20 GB (5 TB file size limit). Pricing is baed on stroage with a 'up to 100GB'tier and then tiered increments of $250GB ($585 per 250GB) + a review fee ($160) per deposit
Cons
- File size limit 20 GB
- Free dataset limit of 20 GB
- For storage needs, you need to make a purchase for each new dataset to be deposited.
Open Science Framework (OSF)
Pros
- Curation of data from the start with an all-in-one project management system
- Integration with preprint services
- Compatible with some reference management services (Mendeley, Zotero)
- Can purchase storage space for up to 100GM ($500) to 1TB ($4,000), with additional storage tiers available upon request.
- Costs are associated with the user account (unlike for Figshare+ which charges by dataset)
- Can add-on storage applications to avoid storage caps (Dropbox, Figshare, Googel Drive, OneDrive, etc.).
- There are also many add-ons for code (Bitbucket, GitHub, GitLab)
- Links with ORCHiD
Cons
- File size limit < 5 GB
- The private data storage limit is 5 GB, and the Public storage limit of 50 GB. Necessary to have add-on storage applications due to total dataset limits of 50 GB for public storage and 5 GB for private storage
Dryad
Pros
- Quality control and assistance (internal curators)
- Linked to ORCiD (must log in with these credentials)
- Requires the addition of a “README.md” file for each dataset
Cons
- The interface isn’t intuitive
- Not free (fee covers curation and preservation)
- Data publishing charge of $120/dataset < 50 GB, $50 per additional 10 GB
Zenodo
Pros
- Free data-hosting initiative associated with the European Organization for Nuclear Research (CERN)
- No limit on the number of 50GB datasets you upload
- Lots of communities to choose from to upload your data into. Some are OA journals and discipline-specific curated communities as well.
- Will notify funding agencies if you enter the grant number
Cons
- No obvious dashboard for your data
- File size limit 50 GB, can upload multiple files.

When depositing data into a generalist repository, you will be required to include metadata and, typically, an associated README.txt file to explain the data in the directory. You will also want to cite your data in publications and as a research product in your CV or biosketch. See the FAQs for recommendations on developing each of those items.

FAQs

How does this compare to other federal data management policies?

NIH previously required data magementand sharing for Projects with budgets >$500,000/year and genomic research. Additionally, the National Science Foundation (NSF) requires Data Management Plans at time of proposal, although with different policy details.
This new policy applies to all NIH research grants and requires more detail than previous policies.
Many institutes, centers, and research programs have instituted additional, specific data sharing policies. If your research was previously subject to NIH Genomic Data Sharing Policy (GDS Policy), please see the Implementation Changes for Genomic Data Sharing Plans Included with Applications Due on or after January 25, 2023.
If you you are unsure of which NIH policies apply to your research, you can use the NIH decision tool.
What data are included in the policy and what is excluded?
Included
- All Scientific Data must be managed (see NIH definition of “data”). Researcher decide what constitutes data and how to maximize sharing data and justification for what is and isn’t shared.
- Data that led to null findings
- Data sets of all sizes
- Data generated with SBIR support (a 20-year delay is allowable)
- Data for which there is no known repository
- Qualitative Data, unless there are justifiable limitations to sharing (For example: field reports and ethnographic writings that contextualize and interpret rich participant-observation data.)
- Data that requires a Data Use Agreement for sharing (in other words, data still has to be shared, but with appropriate restrictions on public access).
Excluded
- Non-research grants: training, fellowship, conferences, and infrastructure,
- Data not necessary for or of sufficient quality to validate and replicate research findings,
- Laboratory notebooks,
- Preliminary analyses,
- Completed case report forms,
- Drafts of scientific papers,
- Plans for future research,
- Peer reviews,
- Communications with colleagues, or
- Physical objects, (e.g., laboratory specimens)
How do you budget for data management and sharing?
Allowable Costs:
- Cost specific to the project are allowable
  - Curating data, de-identifying data, developing supporting documentation, metadata, and formatting for repository deposition
  - Preserving/sharing data through repositories (data deposition fees)
  - Local data management considerations
- Cost must be incurred during the performance period (you can only spend grant funds while the grant is active or in the no-cost extension period).
- Do not include general infrastructure costs not associated with the specific project or costs associated with gaining access to research data.
How Much to Budget:
- Jessica Logan, Vanderbilt Associate Professor of Special Education has published on data sharing and recommends budgeting 5% to 10% of a research grant for preparing the collected data to be shared.
- The Realities of Academic Data Sharing (RADS) Initiative Report on the Expenses of Making Data Publicly Accessible finds
  - the Average percent of overall grant award that was used by researchers for Data Management and Sharing is 6%
  - the Average cost directly incurred by researchers per funded research project for Data Management and Sharing is $29,80. By funding agency it is
    $36,000—US National Institutes of Health
    $19,000—US National Science Foundation
How to Submit:
- DMS costs must be requested in the appropriate cost category, e.g., personnel, equipment, supplies, and other expenses, following the instructions for the R&R Budget Form or PHS 398 Modular Budget Form, as applicable.
- In the R&R budget form, estimated costs must be outlined in the budget justification attachment in a section clearly labeled "Data Management and Sharing Justification". Provide a summary of the type and amount of scientific data to be preserved and shared and the name of the established repository(ies) where they will be preserved and shared. Indicate general cost categories such as curating data and developing supporting documentation, local data management considerations, preserving and sharing data through established repositories, etc., including an amount for each category and a brief explanation. The recommended length of the justification should be no more than half a page.
- In the modular budget form, use the Additional Narrative Justification attachment and include a section clearly labeled "Data Management and Sharing Justification" followed by the requested dollar amount.
For more information on allowable costs and how to submit line items and budget justifications, see the NIH page on Budgeting for Data Management & Sharing and the NIH FAQs.
Are there some data that need to be managed but don't need to be shared?
Yes, some data that will be managed, does not need to be shared:
- Data already public
- Data for which there are justifiable ethical, legal, and technical reasons for limiting and/or delaying sharing. This includes but not limited to:
  - Informed consent will not permit or limits scope of sharing or use
  - Privacy or safety of research participants would be compromised and available protections are insufficient (see the Supplemental Information on Protecting Privacy When Sharing Human Research Participant Data)
  - Explicit federal, state, local, or Tribal law, regulation, or policy prohibits disclosure
  - Restrictions imposed by existing or anticipated agreements with other parties
  - Datasets cannot practically be digitized with reasonable efforts
How do you handle human subjects data?
The DMS Policy encourages respect for participants by encouraging researchers and award recipients to:
- Address data management and sharing plans during the informed consent process to ensure prospective participants understand how their data will be managed and shared. The Vanderbilt Informed Consent Template includes language about how deidentified information may be shared for research puposes.
- Outline steps they will take for protecting the privacy, rights, and confidentiality of prospective participants (i.e., through de-identification, Certificates of Confidentiality, and other protective measures);
- Assess limitations on subsequent use of data and communicate these limitations to the individuals or entities (e.g., repositories) preserving and sharing the data; and
- Consider whether access to shared scientific data derived from humans should be controlled , even if de-identified and lacking explicit limitations on subsequent use. Sharing via controlled access may be specified by certain funding opportunity announcements (FOAs) or the funding NIH ICO(s).
- Some data may not be able to be shared due to justifiable ethical, legal, and technical reasons.
For more information, see
How do you handle VUMC BioVU/Synthetic Derivative data?

VUMC provides a guide for sharing data that are generate from from BioVU, including sample language for each section of the Data Management and Sharing Plan Template. If you have any questions, please email biovu@vumc.org.
VUMC also includes guidance in StarBRITE. Under the Data Management menu, select “NIH Data Management and Sharing (DMS) Policy”. The right side menu has a NIH DMS Policy-FAQ, which will have the most up-to-date recommendations regarding sharing Synthetic Derivative data.
How do renewals, no-cost extensions, pre-prints, and patents factor into sharing deadlines?

Data must be reported irrespective of the whether a competitive renewal application for the relevant NIH grant is being prepared or has been approved. However, according to NIH FAQs, "When a competitive renewal is successful, it may be appropriate to extend the sharing of scientific data into the new competitive renewal period (for example, for longstanding research projects with an established data deposition schedule that has been approved by the funding ICO)."
If the grant cycle includes a period of no-cost extension (NCE), the deadline for posting unpublished data is the end date of the NCE.
Pre-Prints are not considered “papers” under NIH DMS policy. They may, however, be used as data repositories.
For evaluating an invention for patent protection or filing a patent application, a delay of 60 days beyond DMS Policy data sharing timelines is generally viewed as a reasonable.
How long do data need to be shared?

Data needs to be stored and made available for the full duration of the grant (including possible future renewals) plus 3 years.
What are desirable repository characteristics?
- Unique Persistant Identifiers
- Long-Term Sustainability
- Metadata
- Curation and Quality Assurance
- Includes user Dashboard
- Free and Easy Access
- Broad and Measured Reuse
- Clear Use Guidance
- Security and Integrity
- Confidential
- Common Format
- Provenance
- Retention Policy
How do you submit a plan?
- NIH provides a suggested format. The Data Management and Sharing Plan format page will be added to list of Format Pages and incorporated into FORMS-H application instructions. Open Science Framework has a checklist to help researchers complete the NIH Form.
- Plans are recommended to be 2 pages or less and may not have hyperlinks.
- A new “Other Plan(s)” field will be added to the PHS 398 form to collect a single PDF attachment.
- Data Sharing Plans and Genomic Data Sharing Plans will no longer be submitted to the “Resource Sharing Plan(s)” field.
How do you address NIH Form Element 3 and what are common data standards?
Element 3 says, “State what common data standards will be applied to the scientific data and associated metadata to enable interoperability of datasets and resources, and provide the name(s) of the data standards that will be applied and describe how these data standards will be applied to the scientific data generated by the research proposed in this project. If applicable, indicate that no consensus standards exist.”
Data standards help to support the exchange of accurate information and are developed to ensure that data is collected similarly to guarantee the interoperable aspect of the FAIR principles. Standards may be applied in four broad areas:
- Standard Metadata schemas for describing datasets
  - Example: Dublin Core – Dublin Core Metadata Initiative, https://www.dublincore.org/specifications/dublin-core/dcmi-terms/)
- Standard Terminologies, Controlled Vocabulary, and Ontologies
  - Example: NIH Health Data Standards, https://www.nlm.nih.gov/healthit/index.html
- Common Data Elements
  - Example: 0 - 10 Numeric Pain Rating Scale. NIH hosts a Common Data Elements repository, https://cde.nlm.nih.gov/home
- Content / Encoding Standards, including for storing and transmitting data
  - Example: DICOM – Digital Imaging and Communications in Medicine, https://www.dicomstandard.org/concepts
Open Science Framework explains each of these data terms.
When there is no standard, indicate that no consensus standards exist. Then a detailed data dictionary describing the data fields and format of the data should be provided (typically in a README.txt file).
How do you address NIH Form Element 6 about Oversight?

Element 6 says, “Describe how compliance with this Plan will be monitored and managed, frequency oversight, and by whom at your institution (e.g., titles, roles).”
We recommend that the “compliance manager/overseer” should be either the PI or another permanent senior member of the lab (for example, the Lab Manager or a Ph.D. level staff member). If this individual is not the PI of the project, consider whether or not their biosketch might need to be included as key personnel and/or whether to describe their role within the application, even if there is no effort allocated to that individual in the budget. Remember that you can allocate budget to pay for the effort for whoever will curate and manage the data.
SAMPLE LANGUAGE
[ Name of Grant PI or senior member of lab (give title)] will be responsible for verifying management, storage, retention, and dissemination of project data. [NAME] will review data management and sharing activity annually and compare it to this plan. If discrepancies are noted, [NAME] will adjust study procedures or submit a revised Data Management and Sharing Plan to NIH.
*If you are using human subjects data, then some oversight is provided by the Human Research Protections Program. You may include an additional statement such as "Data management is a part of the IRB approved protocol. Under the Human Research Protections Program, the study is subject to post-approval monitoring and deviations from the plan would be reportable to the IRB."
**Note that in cases where there are multiple investigators included in the project (and possibly a subcontract), this statement will have to be altered and extended to explain who will be responsible for managing the data generated by each participating lab and who will be responsible for checking on compliance.
Where can you learn good data management practices?

The DMT Clearing House provides links to many data management training options. DataONE has skillbuilding and training options.
Should you be using an Electronic Lab Notebook?
Lab PIs will need to establish rigorous scientific record keeping practices in their laboratories to satisfy Data Management & Sharing requirements. PIs may want to consider switching to the use of electronic lab notebooks (ELN) in their lab to facilitate the organizing of data they aren’t publishing
Here are a few resources for selecting a eleronic lab notebook
- Best ELN Software in 2023: Key Features of the Top Electronic Lab Notebooks
- ELN-Scorecard.pdf (labfolder.com)
How are plans reviewed?

NIH program staff will review determine if Plan is acceptable or unacceptable. Unacceptable plans are returned to the PI during the Just-in-Time period for revision and resubmission by the PI and re-review by the NIH PO. Funding of grants may be delayed if an acceptable Data Managerment & Sharing plan is not submitted and approved in a timely manner. Peer reviewers might make recommendations concerning appropriateness of plans, but plans are not factored in to scoring.
How is compliance monitored?

NIH reviews the progress of plans as presented by PI in annual progress reports (RPPR). NIH is revising the RPPR templates to include Data Management & Sharing reporting. NIH Program Managers are still learning how to implement the policy. In addition, institutions/labs are expected to monitor data management and include details in section 6 of their Plans.
How do you change an approved plan?

To change an approved Data Management and Sharing Plan, you must submit a timely formal prior approval request to the funding NIH Institute, Center, or Office (ICO) through the Prior Approval Module in eRA Commons.
What metadata should you include in a generalist repository?
Minimum Metadata Required
- Title: a succinct summary of both the data and study or focus (usually 8-10 words that adequately describe the content of the dataset)
- Author(s): Name, email, institutional affiliation of the main researcher
- Abstract: Brief summary of the structure and concepts of the dataset (should focus on the information relevant to the data itself)
- Research domain: primary research domains or drawn from OECD Fields of Science and Technology classification
- Journal Name (if associated with a manuscript)
Recommended Metadata
- Funding information: funder, grant number
- Keyword(s): minimum of 5 descriptive words to help with data discovery (more is better)
- Methods: special chemicals or specific antibodies/reagents necessary to replicate dataset
- Usage Notes: programs and/or software required to open the files
- Related Works: resources associated with the data (publications, related datasets, etc.)
What should you put into a README.txt file?

README.txt files should be written in English, have no password restriction, and have no personal or sensitive data. Here is an example README file, courtesy of Cornell University. Some repositories refer you to their own example README files.
How do you cite data?
Most repositories have an automatic citation generator on each page. These are the minimum elements required for dataset identification and retrieval. Different elements may be requested by author guidelines or style manuals. Be sure to include as many elements as needed to precisely identify the dataset you have used.
- Author: Name(s) of each individual or organizational entity responsible for the creation of the dataset.
- Date of Publication: Year the dataset was published or disseminated.
- Title: Complete title of the dataset, including the edition or version number, if applicable.
- Publisher and/or Distributor: Organizational entity that makes the dataset available by archiving, producing, publishing, and/or distributing the dataset.
- Electronic Location or Identifier: Web address or unique, persistent, global identifier used to locate the dataset (such as a DOI). Append the date retrieved if the title and locator are not specific to the exact instance of the data you used.
Here are a few examples of citation formats.
APA (6th edition)
- Smith, T.W., Marsden, P.V., & Hout, M. (2011). General social survey, 1972-2010 cumulative file (ICPSR31521-v1) [data file and codebook]. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor]. doi: 10.3886/ICPSR31521.v1
MLA (7th edition)
- Smith, Tom W., Peter V. Marsden, and Michael Hout. General Social Survey, 1972-2010 Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center [producer]. Ann Arbor, MI: Inter-university Consortium for Political and Social Research [distributor], 2011. Web. 23 Jan 2012. doi:10.3886/ICPSR31521.v1
Chicago (16th edition) (author-date)
- Smith, Tom W., Peter V. Marsden, and Michael Hout. 2011. General Social Survey, 1972-2010 Cumulative File. ICPSR31521-v1. Chicago, IL: National Opinion Research Center. Distributed by Ann Arbor, MI: Inter-university Consortium for Political and Social Research. doi:10.3886/ICPSR31521.v1
What resources does NIH provide?
- Website: NIH Scientific Data Management & Sharing
- NIH Data Management & Sharing Policy FAQs
- Format Page
- NIH-supported repositories
- Special guidance regarding the handling of Human Subjects and Genomics data
- Email Box: Sharing@nih.gov
- NIH Grant Program Managers
What resources does VU provide?
- Library – data curation and repositories
- Creative Data Solutions (CDS) – a Vanderbilt Shared Resource Core that can assist with data annotation, structured and unstructured data deposition, open source code, FAIR, and more.
- Advanced Computing Center for Research and Education (ACCRE) – local storage for data and consulting for their research data needs
- SPA – Data Use Agreements
- Research Integrity & Compliance – interpreting the policy and best practices
- Human Research Protections Program – addressing data sharing with informed consent
In addition, Vanderbilt Associate Professor of Special Education Jessica Logan and Florida State University Professor Sara Hart feature NIH Data Management and Sharing in episode S4E2 of their podcast Within & Between, a Developmental Science Podcast.
Are there any example plans or recommended templates?
- NIH posted several plans.
- The Federal Demonstration Partners (FDP) is piloting two templates and evaluating their effectiveness and usability.
- DMPTool uses a series of questions and guidance to create an exportable data management plan that meets your sponsor's requirements. When you log in and select Vanderbilt as your institution, the tool will use your VUnetID to create an account for you. The FDP Alpha and Bravo templates are available in DMPTool.
Please contact us if you are awarded an NIH grant under this new policy and are willing to add your plan to a set of examples for use by Vanderbilt investigators.

Special thanks to the Research Integrity & Compliance Subcommittee on NIH Data Management and Sharing Plans; Chuck Sanders, Vice Dean, School of Medicine Basic Sciences; Selene Colon, Assistant Dean of Research Logistics and Compliance, School of Medicine Basic Sciences; Jon Shaw, University Librarian; and Steven Baskauf, Data Science and Data Curation Specialist, Jean and Alexander Heard Libraries.

Last updated 7/1/2024

NIH Data Management and Sharing Plans

Policy and Guidance

Overview

Best Practices for Sharing

Selecting a Generalist Repository

Pros

Cons

Pros

Cons

Pros

Cons

Pros

Cons

Pros

Cons

FAQs

Included

Excluded

Allowable Costs:

How Much to Budget:

How to Submit:

Minimum Metadata Required

Recommended Metadata