How to Get De-identified Clinical Data for Cohort Studies, Pattern Recognition, and More

Why get de-identified data? Examples:

Develop a predictive model to identify patients at high risk for hypoglycemia, with covariates and outcome data.
Quickly search clinical notes to verify/determine use of specific terms that are written and may not be in the structured data
Pattern recognition and machine learning to understand why patients with seemingly identical risk factors demonstrate such wide variety in cardiac disease manifestations and outcomes.
You want to explore the data yourself, don't need patient identifiers and/or have very limited funding.

Your options:

First time User? Request Data Access for Research
Already using the De-identified clinical data warehouse or De-id OMOP? Join the active User Group!

TIP! Once you are approved and are granted access to UCSF's de-identified clinical data, you will automatically have access to the tools and applications! Start by requesting data access for research.

De-identified Data Tools
- These include both point-and-click PatientExploreR tool, EMERSE and programmable options like the De-identified Clinical Data Warehouse (CDW) and access to the de-identified clinical notes via the Solr API.
Information Commons
- Clinical data at scale and on high performance compute environment (HPC)
- An environment suited to pattern recognition and machine learning (Apache Spark)
- Currently provides de-identified structured data. De-identified images and de-identified clinical notes coming soon
Data Extraction Consultation
- Get guidance from a UCSF expert who will help you define the data you need and clarify the data that is available.
- After the initial free hour, recharge fees will apply
- UC Health data may be requested, but processing time can be lengthy.

Compliance requirements

IRB approval not required

Online attestation AND training required

Data De-identification Validation required if sharing with external partners

Working with clinical data? Preparation is key.

Be ready with adequate computing capabilities and tools for:

Large file transfers
Secure, ample storage
Data management
Data analysis

Use the APeX Pick List or the ZSFG Pick List (Large Excel files via UCSF Box) to identify variables for your research and to define your cohort.

Diagnoses
Meds
Labs
Procedures

Flowsheet
Departments
Smart Data Elements