Skip Navigation
National Cancer Institute U.S. National Institutes of Health National Cancer Institute
Cancer Genome Anatomy Project

Visit the Cancer Genome Anatomy Project.


Data Access

Open and controlled-access tiers of CGCI datasets are available for each cancer studied. Open-access tier data are currently available through the CGCI Data Matrix listed below. To protect patient privacy and confidentiality, certain data are available only through controlled access. To gain access to controlled-access tier data, investigators need to obtain specific permission by filling out the Data Access Request (controlled-access tier dataset instructions and form).

Requests for access to protected data will only be considered for those research projects using the data for research relevant to the biology, causes, treatment and late complications of treatment of HIV-related, lymphoid or pediatric cancers. Access to protected medulloblastoma data will be granted solely for those research projects that can only be conducted using pediatric data (i.e., the research objectives cannot be accomplished using data from adults) and that focus on the development of more effective treatments, diagnostic tests, or prognostic markers for childhood cancers. Applications proposing methods, software, or other tool development would not be considered acceptable uses of the pediatric data.

The datasets for the Cancer Genome Characterization Initiative are coordinated by the Data Coordinating Center. The Data Coordinating Center (DCC) provides resources for project teams actively involved in OCG research initiatives, as well as the scientific community at large. It is a single location where molecular characterization data from NCI OCG large-scale genomic projects are located and can be easily accessed. As such, the DCC ensures:

  • Central storage area for data generated for all OCG projects
  • Definition of the data types available
  • Uniformity of data files
  • Inclusion of metadata
  • Quality control of data files
  • Version control and a version index to provide accurate history of changes
  • Connection between the data and various analytical tools
Below is the data matrix for CGCI project datasets:

Cancer Type Examples of Analysis Performed Data Access
Open Controlled
  • Copy Number Analysis of tumor DNA
  • Targeted exome (and miRNA gene) sequencing of tumor DNA
  • Targeted resequencing with tumor/normal of DNA
Available Coming Soon
Non-Hodgkin Lymphoma
  • Whole genome sequencing of tumor/normal DNA
  • mRNA-seq of tumor RNA
Coming Soon Available
  • Diffuse large B-cell lymphoma
  • Lung cancer
  • Cervical cancer
  • Whole genome sequencing of tumor/matched normal (blood) DNA
  • mRNA-seq of tumor RNAs
This project is currently collecting samples. Data will be available in the future.