Sharing De-identified Data in Repositories


As part of a push to improve research reproducibility an increasing number of health science funders and publishers are asking researchers to share the de-identified data underlying their research. This includes the new NIH Data Management and Sharing Policy, which expects researchers to "maximize the appropriate sharing of scientific data." This page explains how to plan for data sharing so that you can meet these requirements while also following UCSF guidance on privacy and data security.

Note that sharing P4 data types including identified PHI/PII/RHI data and limited data sets does not fall under this process, and researchers interested in sharing this kind of data will need to follow the guidance on this page.

If you have questions about sharing models or algorithms please reach out to Industry Contracts at [email protected]

Project Planning + Grant Writing Stage

  1. Check your sponsor agreements for any guidance or restriction on data sharing.
  2. Research data repositories, paying attention to recommended data formats, access restrictions, costs, and submission timelines.
    1. UCSF recommends that de-identified human subjects data be shared in controlled access repositories
    2. The UCSF Library can help you select a relevant data repository.  
  3. Include information about data sharing in your IRB paperwork and consent forms.
    1. UCSF consent form templates include appropriate sample language.
  4. Include data sharing costs in your grants – including data curation work and de-identification charges, which can be substantial.
  5. If you are funded by NIH, write your Data Management and Sharing (DMS) plan.
    1. See sample NIH DMS template language

Data Submission Stage

  1. Prepare your research data and documentation for sharing.
    1. Prepare a de-identified version of your dataset following appropriate de-identification methods.
      1. CTSI’s data de-identification service can provide advice on de-identification and connect you with third party de-identification validation services.
      2. The UCSF Privacy Office is another resource for questions about HIPAA and data de-identification.
    2. Gather all necessary data documentation and format your data to meet the standards of your repository, using open file formats whenever possible.
  2. Get institutional approvals to share your de-identified human subjects data (not needed for non-human data). Note that this step can take 1-2 weeks.
    1. If you are sharing in an approved controlled access repository (recommended) you may be asked by the repository to sign a Data Sharing Agreement. If you are, submit a Material and Data Transfer Agreement Form to UCSF Industry Contracts so they can work with you to evaluate and sign the agreement.
    2. If you wish to share your data in an open repository or a controlled access repository not listed on the above list, email [email protected] with a short description of your data and the name of your chosen repository and the EIA Steering Committee will review and approve.
  3. Upload your de-identified data to your selected data repository and get your DOI or accession number to include in your grant reports and CV.


Contact [email protected] or [email protected] for help and guidance