How to Get De-identified Clinical Data for Cohort Studies, Pattern Recognition, and More

Why get de-identified data? Examples:

  • Develop a predictive model to identify patients at high risk for hypoglycemia, with covariates and outcome data. 
  • Quickly search clinical notes to verify/determine use of specific terms that are written and may not be in the structured data
  • Pattern recognition and machine learning to understand why patients with seemingly identical risk factors demonstrate such wide variety in cardiac disease manifestations and outcomes. 
  • You want to explore the data yourself, don't need patient identifiers and/or have very limited funding.

Your options:

TIP! Once you are approved and are granted access to UCSF's de-identified clinical data, you will automatically have access to the tools and applications! Start by requesting data access for research.


  • De-identified Data Tools
    • These include both point-and-click PatientExploreR tool, EMERSE and programmable options like the De-identified Clinical Data Warehouse (CDW) and access to the de-identified clinical notes via the Solr API.
  • Information Commons
    • Clinical data at scale and on high performance compute environment (HPC)
    • An environment suited to pattern recognition and machine learning (Apache Spark)
    • Currently provides de-identified structured data. De-identified images and de-identified clinical notes coming soon
  • Data Extraction Consultation
    • Get guidance from a UCSF expert who will help you define the data you need and clarify the data that is available.
    • After the initial free hour, recharge fees will apply
    • UC Health data may be requested, but processing time can be lengthy.

Compliance requirements

IRB approval not required

Online attestation AND training required

Data De-identification Validation required if sharing with external partners

Working with clinical data? Preparation is key.

Be ready with adequate computing capabilities and tools for:

Use the APeX Pick List  or the ZSFG Pick List (Large Excel files via UCSF Box) to identify variables for your research and to define your cohort.

  • Diagnoses
  • Meds
  • Labs
  • Procedures
  • Flowsheet
  • Departments
  • Smart Data Elements