How to Get De-identified Clinical Data for Cohort Studies, Pattern Recognition, and More

Why get de-identified data? Example:

  • Develop a predictive model to identify patients at high risk for hypoglycemia, with covariates and outcome data. 
  • Pattern recognition and machine learning to understand why patients with seemingly identical risk factors demonstrate such wide variety in cardiac disease manifestations and outcomes. 

Your options:

  • First time User? Request access to Research Data and Tools
  • De-identified Data Applications
    • These include both point-and-click (RDB) and programmable options (De-identified Clinical Data Warehouse (CDW), flat files).
  • Information Commons
    • Clinical data at scale and on high performance compute environment (HPC)
    • An environment suited to pattern recognition and machine learning (Apache Spark)
    • Currently provides de-identified structured data (RDB).  De-identified images and de-identified clinical notes coming soon
  • Data Extraction Consultation
    • Get guidance from a UCSF expert who will help you define the data you need and clarify the data that is available.
    • After the initial free hour, recharge fees will apply
    • UC Health data may be requested, but processing time can be lengthy.

Compliance requirements

IRB approval not required

Online attestation required

Data De-identification Validation required if sharing with external partners

Working with clinical data? Preparation is key.

Be ready with adequate computing capabilities and tools for:

Use the APeX Pick List (Excel 26MB) to identify variables for your research and to define your cohort. Find codes for:

  • Diagnoses
  • Meds
  • Labs
  • Procedures
  • Flowsheet
  • Departments
  • Smart Data Elements