Data Repository

The CHOICE Institute hosts a variety of healthcare datasets which are available to the UW community under varying permission structures. A description of available datasets and access details are described below.

TriNetX offers access to two distinct databases containing de-identified data on patients aged 40-90 years from networks of healthcare organizations (HCO) and other data providers around the US. 1) DATABASE A – open claims-based data from TriNetX’s 200 million patient Diamond Network with selected laboratory data, and 2) DATABASE B – an electronic medical record (EMR)-based dataset from TriNetX’s 50 million patient Dataworks network.

These networks were developed and curated by TriNetX to ensure accuracy and capture nearly 100% of all utilization for patients within their networks. CHOICE faculty leads can request customizable longitudinal cohorts from these databases. Data includes:

  • Demographics
  • Diagnoses
  • Medications
  • Genetic Sequencing
  • Procedures
  • Labs
  • Zipcode (for Claims)

The CHOICE Institute has access to customizable datasets available for internal and collaborative efforts. Data dictionaries are available for internal use upon request.

Cohorts from each Database will be considered as separate request. Please follow these steps:

  1. All requests (project request form) should be submitted to Connor Henry. All requests should come from CHOICE faculty leads.
  2. Connor Henry will submit request to TriNetX for approval. Getting IRB approval is CHOICE faculty lead’s responsibility (IRB approval not requested to obtain data).
  3. Once approved, CHOICE faculty lead will then sign the TriNetX DUA/order form and work with Connor to construct the requested cohort on the TriNetX online platform and send request to TriNetX for downloading.
  4. Once requested cohort data becomes available on the TriNetX platform, they can be downloaded onto secure server.
  5. Cohort data and all analyses on it should be carried out on the secured server.

Data Availability:

Data is most reliable after 2008 – Current year (data updated monthly)

Data Location:

TriNetX Servers


Connor Henry

The IBM MarketScan® Commercial Claims and Encounters Research Databases contain longitudinal inpatient, outpatient, pharmacy claims, and insurance coverage data for patients across the U.S. who are covered by commercial insurance plans. The inpatient and outpatient claims databases include procedure and visit level details from medical claims such as ICD-9-CM diagnosis and procedure codes, Current Procedural Terminology (CPT) medical procedure codes, dates of service, and variables describing financial expenditures. The pharmacy claims database provides details including National Drug Codes (NDC) of the drugs dispensed, dates dispensed, quantity and days’ supply, and payments made for each claim. A separate eligibility and demographics file includes additional information about each subject such as age, gender, insurance plan type, employment status and classification, geographic location, and enrollment status by month.

Paul K. Kraegel
Connor Henry

Data availability:
2007 – (Current year – 2)

Data location:
HSERV & UW Data Collaborative (UWDC) Servers


Access Renewal Cycle: Annual

Accessibility: CHASE Alliance / CHOICE consortium members

SEER-Medicare linked database (2007-2014) The SEER-Medicare data reflect the linkage of two large population-based sources of data that provide detailed information about Medicare beneficiaries with cancer. The data come from the Surveillance, Epidemiology and End Results (SEER) program of cancer registries that collect clinical, demographic and cause of death information for persons with cancer and the Medicare claims for covered health care services from the time of a person’s Medicare eligibility until death.

The linkage of these two data sources results in a unique population-based source of information that can be used for an array of epidemiological and health services research. For example, investigators using this combined dataset have conducted studies on patterns of care for persons with cancer before a cancer diagnosis, over the period of initial diagnosis and treatment, and during long-term follow-up. Investigators have also examined the use of cancer tests and procedures and the costs of cancer treatment.

Connor Henry 

Data availability:
2007 – 2014 (breast, prostate, lung, colorectal, leukemia, Melanoma, Non-Hodgkin Lymphoma, CML, Endometrial, bladder, kidney and liver)

Data location:
UWDC Servers




Access Renewal Cycle: Annual

Accessibility: UW researchers (with permission from Prof. Anirban Basu)

All investigators must be aware of the following:

Access to these data are permitted on an annual basis only. A request to a sponsoring CHOICE faculty member and UWDC must be submitted to renew access, if needed.

All investigators must understand requirements for publications, including manuscript review from SEER-Medicare, prior to submitting a manuscript for any journal review; additionally, proper acknowledgements must be included in the manuscript.

Investigators must ensure that the proposed research analysis aligns with the aims stated in the CHOICE SEER-MEDICARE proposal.

All manuscript drafts must be submitted to the CHOICE faculty sponsor for review prior to publishing and your CHOICE faculty sponsor will work directly deal with SEER-Medicare contacts.

No individual level data can be exported out of UWDC servers.

Requesting Data Access

To request access to these data via UW Data Collaborative (UWDC) remote desktops please review and complete the below steps:

Fill out the UWDC Request for Data Access form:

  • Please note: you will need to provide a brief description of the proposed research.

Review and sign the UWDC DUA

Review and sign the SEER-Medicare DUA

  • Please return the SEER-MEDICARE DUA to once completed, CHOICE staff will store and track DUAs.