UCSF electronic health record data: What's available for research?
UCSF has identified and de-identified data available for research.
* Identified data: requires an IRB approved protocol and typically requires funding to work with centralized data experts to extract data on your behalf. FAQs & Learn more!
* De-identified (DeID) data: does not require IRB approval and is self-serve via SQL server or point-and-click tools. Learn more!
Many access paths, one point of entry
Need help or guidance? Do you need data for your research with identifiers like birth dates or medical record numbers? Want to make sure you are compliant with regulations? Your first hour of consultation is free. Make sure you head in the right direction.
- First time User? Request Data Access for Research
- Not sure what option is best for your project? Request a free brief consultation for advice.
- Already using the data for research? Join the active User Group!
- If you'd like to learn more about what happens behind the scenes, read about the process.
About the UCSF electronic health record (EHR) data:
- APeX data dating back to 2012
- STOR data dating back to 1988
- Benioff Children's Hospital (BCH) Oakland data dating from March 2020 (with additional select historical data)
- Images
- Clinical notes
Plus additional data, such as:
- Geocoded address data
- CA Death Registry data
- ZSFG and other Department of Public Health data
- UC Health data (EHR data from UC Davis, UC Irvine, UCLA, UCSD, UCSF)
>> COVID-19 specific data for research is also available
There’s a big difference between "identified" and "de-identified (DeID)" data.
Comparing de-identified data
De-identified Clinical Data Warehouse (DeID CDW and DeID OMOP) |
Information Commons AWS Cluster | |
---|---|---|
Learn more & access: De-ID CDW Knowledge Base (login req'd) |
Learn more and access (login req'd) | |
Additional data, including:
|
Data based on DeID CDW Machine-redacted Clinical notes Plus:
|
|
Access via SQL server | In cloud (AWS) | |
Suited for high speed queries & data mining | ||
Large files, need analytics tool skills for queries | Berkeley Spark based, need SQL, Python or R skills | |
Includes De-identified Data from APex:
|
||
Does not require IRB approval | ||
Point & click interface available |
** Requires IRB approval currently, but "certified" de-identified versions are coming.
First time User? Request Data Access for Research
Not sure what option is best for your project? Request a free brief consultation for advice.
Already using the DeID CDW or DeID OMOP? Join the active User Group!
Request identified clinical data; you need a consultation
Identified data is provided by consultation only. The first hour of your consultation is free!
- Clarity - closest data to APeX; clinical notes available
- Clinical Data Warehouse (CDW) - concise, pulls common data in Clarity into one field
- OMOP - uses a national common data model on data derived from PCORnet pSCANNER
Data is further from original state and there is potential to lose information
The consultant will help you define a data specification. The APeX Pick List and/or ZSFG Pick List (Large Excel files via UCSF Box) are helpful tools for this work - see more information below.