Data Sets

Automated Image Retrieval (AIR) - PACS

PACS AIR is a self-service platform which enables Automated Image Retrieval (AIR) from UCSF’s clinical and research picture archiving and communication system (PACS). AIR is capable of automated deidentification of header data, but cannot identify or remove PHI in the pixel data of images. Investigators using this service must arrange for creating discs or transmitting to central repositories on their own and have IRB approval.

CMS Data (Re-Use)

UCSF has a 20% sample of Centers for Medicare and Medicaid Services Restricted Identifiable Files (RIF) available to UCSF investigators for re-use. These data are hosted on UCSF RAE secure servers and can only be accessed after completing a reuse application with CMS. Contact SOM Tech for help filing a reuse application. 

Dryad (formerly Dash)

Self-service online public data repository that allows researchers to:

  • Publish and share research data using any file format
  • Get a permanent DOI for their data
  • Search for and download public research data sets
  • Meet funder and publisher requirements for data sharing

Information Commons

Clinical data at scale and very high performance, and an environment suited to pattern recognition and machine learning. This high performance compute cluster on AWS offers:

  • Access to de-identified structured EHR data; additional data sets coming soon, including de-identified clinical notes and images
  • Spark analytics engine, that enables fast data query via Spark-SQL, Machine Learning via Spark MLib, R via SparkR
  • Query data using PatientExploreR  
Free for UCSF Community

OptumLabs Data Warehouse (OLDW)

UCoP is partnering with OptumLabs, a collaborative center for research and innovation.

  • Access 160 million de-identified records across claims and clinical information to conduct investigations on populations
  • Annual opportunity for funded projects - sign up for notifications

Population Health and Health Services Research Datasets

A searchable database with more than 100 dataset resources for population health, health services research and health equity research.

  • Local, California, national & global data
  • Resources for geocoding and health equity

UCSF Clinical Data

Research access to UCSF electronic medical record data (APeX) - Research Data Browser (RDB), Clinical Data Warehouse (CDW), and more.

  • Summary statistics
  • Generation of condition-specific patient populations for study recruitment / chart review
  • Outcomes research using historical data in hospital databases

VA Data Core Consultation

Access a central data repository with health information from the electronic medical records of over 9 million US Veterans.

  • Provides consultation regarding available data
  • Facilitates necessary paperwork, approvals and regulatory compliance
  • Assists with identifying, extracting, and merging variables of interest
Hourly Recharge, first hour free per project