Data Sets

UCSF Clinical Data

Research access to UCSF electronic medical record data (APeX) - Research Data Browser (RDB), Clinical Data Warehouse (CDW), and more.

  • Summary statistics
  • Generation of condition-specific patient populations for study recruitment / chart review
  • Outcomes research using historical data in hospital databases

Information Commons

Clinical data at scale and very high performance, and an environment suited to pattern recognition and machine learning. This high performance compute cluster on AWS offers:

  • Access to de-identified structured EHR data; additional data sets coming soon, including de-identified clinical notes and images
  • Spark analytics engine, that enables fast data query via Spark-SQL, Machine Learning via Spark MLib, R via SparkR
  • Query data using PatientExploreR  
Free for UCSF Community

Population Health and Health Services Research Datasets

A searchable database with more than 100 dataset resources for population health, health services research and health equity research.

  • Local, California, national & global data
  • Resources for geocoding and health equity

VA Data Core Consultation

Access a central data repository with health information from the electronic medical records of over 9 million US Veterans.

  • Provides consultation regarding available data
  • Facilitates necessary paperwork, approvals and regulatory compliance
  • Assists with identifying, extracting, and merging variables of interest
Hourly Recharge, first hour free per project

OptumLabs Data Warehouse (OLDW)

UCoP is partnering with OptumLabs, a collaborative center for research and innovation.

  • Access 160 million de-identified records across claims and clinical information to conduct investigations on populations
  • Annual opportunity for funded projects - sign up for notifications


Self-service online tool allows researchers to:

  • Describe, upload and share research data using any file format
  • Search for and download research data sets

Automated Image Retrieval (AIR) - PACS

PACS AIR is a self-service platform which enables Automated Image Retrieval (AIR) from UCSF’s clinical and research picture archiving and communication system (PACS). AIR is capable of automated deidentification of header data, but cannot identify or remove PHI in the pixel data of images. Investigators using this service must arrange for creating discs or transmitting to central repositories on their own and have IRB approval.