Data Sets

Automated Image Retrieval (AIR) - PACS

PACS AIR is a self-service platform which enables Automated Image Retrieval (AIR) from UCSF’s clinical and research picture archiving and communication system (PACS). AIR is capable of automated deidentification of header data, but cannot identify or remove PHI in the pixel data of images. Investigators using this service must arrange for creating discs or transmitting to central repositories on their own and have IRB approval.

CMS Data (Re-Use)

UCSF has a 20% sample of Centers for Medicare and Medicaid Services Restricted Identifiable Files (RIF) available to UCSF investigators for re-use. These data are hosted on UCSF RAE secure servers and can only be accessed after completing a reuse application with CMS. Contact SOM Tech for help filing a reuse application. 

Dryad (formerly Dash)

Self-service online public data repository that allows researchers to:

  • Publish and share research data using any file format
  • Get a permanent DOI for their data
  • Search for and download public research data sets
  • Meet funder and publisher requirements for data sharing

Information Commons

Clinical data at scale and very high performance, and an environment suited to pattern recognition and machine learning. This high performance compute cluster on AWS offers:

  • Access to de-identified structured EHR data; additional data sets coming soon, including de-identified clinical notes and images
  • Spark analytics engine, that enables fast data query via Spark-SQL, Machine Learning via Spark MLib, R via SparkR
  • Query data using PatientExploreR  
Free for UCSF Community

OptumLabs Data Warehouse (OLDW)

UCoP is partnering with OptumLabs, a collaborative center for research and innovation.

  • Access 160 million de-identified records across claims and clinical information to conduct investigations on populations
  • Annual opportunity for funded projects - sign up for notifications

Population Health and Health Services Research Datasets

A searchable database with more than 100 dataset resources for population health, health services research and health equity research.

  • Local, California, national & global data
  • Resources for geocoding and health equity

SFHN / ZSFG Clinical Data

De-identified SFHN / ZSFG Data

De-identified structured clinical data from the San Francisco Health Network (SFHN), which includes Zuckerberg San Francisco General (ZSFG) and Laguna Honda hospitals, and SFHN community clinics are now available for direct self-service access along with our UCSF Health clinical data! ​

  • Patient identities are matched across UCSF and SFHN (mostly) ​
  • Data are combined in the OMOP data model, with encounters from both UCSF and SFHN available in a single database ​ (and they are also available separately)
  • As of December 2024, combined patient population = 6.8+ million patients​
    • UCSF = 6+ million patients​
    • SFHN = 1.1+ million patients ​
    • Nearly 400K patients are in both systems​

​How to get started: 


How to get help: 

You do not need an IRB, and you do not need to submit a ZSFG Research Protocol Application to analyze the data.  However, if you plan to formally present or publish results from the SFHN data, you will need to complete a brief form and submit the abstract, manuscript or slides to SFDPH, thirty days prior to the first time the results are presented or published. More information about this requirement is coming soon. 


Looking to use Identified SFHN / ZSFG Data? 

UCSF Clinical Data

Research access to UCSF electronic medical record data (APeX) - Research Data Browser (RDB), Clinical Data Warehouse (CDW), and more.

  • Summary statistics
  • Generation of condition-specific patient populations for study recruitment / chart review
  • Outcomes research using historical data in hospital databases

VA Data Core Consultation

Access a central data repository with health information from the electronic medical records of over 9 million US Veterans.

  • Provides consultation regarding available data
  • Facilitates necessary paperwork, approvals and regulatory compliance
  • Assists with identifying, extracting, and merging variables of interest
Hourly Recharge, first hour free per project