Skip Navigation
National Cancer Institute U.S. National Institutes of Health National Cancer Institute
Cancer Genome Anatomy Project

Visit the Cancer Genome Anatomy Project.


CGCI Projects

Non-Hodgkin Lymphoma                                                        
HIV+ Tumor Molecular Characterization Project                                                        
Lung Carcinogenesis                                                         

Non-Hodgkin Lymphoma

Project Overview

Non-Hodgkin lymphoma (NHL), a common malignancy among North American adults, is a diverse group of cancers that originate from the white blood cells of the immune system. Follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL) are the two most prevalent types of NHL, comprising about 60% of all NHL cases. FL is an indolent (slow-growing) disease that derives exclusively from germinal center B-cells. A subset of FLs transition into DLBCL, and the cause is yet unknown. DLBCL, on the other hand, is an aggressive (fast-growing) cancer that derives from either germinal center B-cells or activated B-cells. The cell of origin is used to classify DLBCL into at least two distinct subtypes, each of which responds in a different manner to standard treatment.

Understanding the molecular mechanisms of the two most abundant forms of NHL will improve their diagnosis and treatment, leading to better outcomes for a considerable number of patients. Previous research revealed the presence of a few recurrent genetic abnormalities in FL and DLBCL, but many questions remain about their underlying genetic events. CGCI initiated the non-Hodgkin lymphoma project to advance the new genomic methods and apply them towards the identification of novel genetic alterations of FL and DLBCL, which may ultimately develop into viable diagnostic markers and/or therapeutic targets.

Experimental Approach

Next generation sequencing technologies, including whole genome sequencing and RNA sequencing (mRNA-seq), are used to survey NHL tumors for somatic mutations, chromosomal alterations (e.g. translocations) and expression levels. The tumor samples used for sequencing came from biopsies taken from NHL patients that were uniformly staged, treated and monitored in British Columbia, Canada.

As of 2012, investigators have accomplished the following sequencing analysis on NHL tumor samples as well as some additional cell lines:

  • Whole Genome/Exome Sequencing -
    14 NHL tumors total (matched constitutional DNA sequenced to comparable depths)
    • 13 DLBCL cases
    •   1 FL cases
  • Transcriptome Sequencing (mRNA-seq) -
    117 NHL tumors and 10 cell lines total
    • 92 DLBCL cases
    • 13 FL cases
    •   8 B-cell NHL cases
    • 10 DLBCL-derived cell lines (both germinal center B-cell (GCB) and activated B-cell (ABC) subtypes)

Data available through CGCI Data Coordinating Center (DCC) and dbGaP


Scott, DW et al (2012) TBL1XR1/TP63: a novel recurrent gene fusion in B-cell non-Hodgkin lymphoma. Blood 119, 4949-4952 (PMID: 22496164)

Morin, RD et al. (2011) Frequent mutation of histone-modifying genes in non-Hodgkin lymphoma. Nature 476, 298-303 (PMID: 21796119)

Morin, RD et al. (2010) Somatic mutations altering EZH2 (Tyr641) in follicular and diffuse large B-cell lymphomas of germinal-center origin. Nature Genetics 42, 181-185 (PMID: 20081860)


All investigators are affiliated with the British Columbia Cancer Agency in Vancouver, BC, Canada.

RoleName Center/University
Principal InvestigatorMarco A. Marra, Ph.D.Genome Sciences Centre
Co-InvestigatorSteven Jones, Ph.D.Genome Sciences Centre
Co-InvestigatorRandy D. Gascoyne, M.D.Centre for Lymphoid Cancers /University
of British Columbia
Co-InvestigatorJoseph M. Connors, M.D. Centre for Lymphoid Cancer /University
of British Columbia


Project Overview

Medulloblastoma (MB) is the most common malignant brain tumor in children, accounting for approximately 20% of all pediatric brain tumors. Despite significant progress in treatment over the last several decades, about 50% of patients do not live more than 5 years after diagnosis. Moreover, treatment of the disease is very aggressive, oftentimes causing serious long-term neurological side effects. To improve survivorship and quality-of-life for MB patients, new therapeutic strategies must be developed. The design of better drugs and clinical markers stems from insights into the molecular events driving MB tumorigenesis. In some MB patients, there are several causative genetic alterations that have already been identified, including alterations in Hedgehog and Wnt pathway genes. However, causative alterations have not been determined in all MB patients. Thus, more comprehensive genome-wide analysis is needed.

CGCI developed the medulloblastoma project to apply newly emerging genomic methods towards the discovery of novel genetic alterations in MB patients. Investigators examined a number of MB tumors using high-density microarrays and Sanger sequencing. The bioinformatics analysis uncovered several frequently altered genes that were not previously identified as mutated in MB, many of which have significant biological implications. The alteration profiles between pediatric and adult tumors are different, suggesting that the transformation mechanism of those cancers is likewise different. The completion of this project opens new doors for research and is an initial step towards finding improved therapeutic strategies for patients suffering from medulloblastoma.

Experimental Approach

CGCI investigators performed two phases of genome-wide analysis, the initial "Discovery Screen" followed by the "Prevalence Screen," to determine genes that were frequently altered in MB tumors.

Discovery Screen

Investigators searched for single-base mutations and small insertions or deletions by Sanger sequencing of over 20,000 protein-coding and 715 microRNA genes. They also searched for alterations in copy number using high-density microarrays. They verified the tumor-specific association of each mutation by re-sequencing these genes using either Illumina GAII or Sanger sequencing.

  • Samples:
  • 22 pediatric MB samples total, with 1 matched normal blood sample
    • 17 primary tumors
    •   4 xenograft mouse models
    •   1 cell line
  • Technologies:
  • Protein-coding and micro RNA gene sequencing (Sanger)
  • High-density microarrays (Illumina SNP arrays)

Prevalence Screen

Investigators sequenced genes that were frequently mutated in the Discovery Screen to determine the frequency of each mutation in a larger cohort of MB tumors. They used bioinformatic algorithms on the sequence mutations to predict the likelihood the mutation would disrupt protein function. 66 pediatric and adult tumors were analyzed in total, none of which were included in the Discovery Screen.

Data available through CGCI Data Coordinating Center (DCC) and dbGaP


Parsons, WD et al. (2010) The Genetic Landscape of the Childhood Cancer Medulloblastoma. Science 28 331, 435-439 (PMID: 21163964)

RoleName Institute
Principal InvestigatorVictor Velculescu, M.D., Ph.D. The Johns Hopkins University Sidney Kimmel
Cancer Center, Baltimore, MD, USA
Co-InvestigatorD. William Parsons, M.D., Ph.D. Texas Children's Hospital, Houston, Texas

HIV+ Tumor Molecular Characterization Project (HTMCP)

Project Overview

Acquired immunodeficiency syndrome (AIDS), caused by infection with human immunodeficiency virus (HIV), is a complex and devastating disease brought about by the systematic destruction of the immune system's helper T cells. The advent of highly active anti-retroviral therapy (HAART) has considerably slowed disease progression from HIV to full-blown AIDS, thereby increasing the number of people living with HIV. Despite this success in survivorship, certain types of cancers are becoming more prevalent in the expanding pool of HIV-infected individuals. This poses a challenge to global health, since approximately 34.2 million people are living with HIV worldwide (22.9 million in Sub-Saharan Africa and over 1 million in the US, according to UNAIDS World AIDS Day Report, 2011).

While co-infecting viruses and, possibly, immunodeficiency may play a role in the pathogenesis of HIV-associated cancers, our understanding of its etiology is inadequate.

The Office of Cancer Genomics, along with the Office of HIV and AIDS Malignancies (OHAM), initiated the HTMCP to gain insight into the genetic events driving HIV-associated cancers and to determine why certain cancers, but not others, have higher incidences in HIV-positive patients. A deeper understanding of the molecular causes of these tumors holds potential for the development of effective therapies for a growing population of patients doubly afflicted with HIV and cancer.

Based on their high rates of incidence and morbidity among HIV-positive patients, the following three cancers were selected for study in HTMCP:

Cervical Cancer: In the general population, cervical cancer is primarily caused by human papillomavirus (HPV) infection. Cervical cancer is classified as an AIDS-defining malignancy, like Kaposi's sarcoma, because of its prevalence in AIDS patients. Women infected with HIV are 5 times more likely to develop cervical cancer than women not infected with HIV.

Diffuse Large B-Cell Lymphoma (DLBCL): Diffuse Large B-cell Lymphoma is the most common form of non-Hodgkin lymphoma (NHL), a cancer of the white blood cells. DLBCL is an AIDS-defining malignancy that is common in AIDS and HIV-positive patients. People infected with HIV are over 20 times more likely to be diagnosed with NHL, largely in the form of DLBCL, than those without HIV infection.

Lung Cancer: Lung cancer is the deadliest type of cancer for both men and women. It has been established that cigarette smoking is the leading cause of squamous cell carcinoma of the lung; however, lung adenocarcinoma develops in people who have never smoked. HIV-infected individuals have a 3-fold increased risk of lung cancer, mostly in the form of adenocarcinoma, as compared to the general population. This rate is expected to rise over time. Notably, the molecular subtypes of lung cancers in HIV-positive patients are different from that seen in HIV-negative cases, suggesting differences in underlying biological mechanisms.

Experimental Approach

HTMCP will generate comprehensive molecular characterization of the three tumor types from both HIV-positive and HIV-negative patients using state-of-the-art genomic methods. Comparing molecular profiles between the two sets of patients with the same tumor type will enable identification of potentially causative mutations and pathways occurring in HIV-associated cancers, which may respond to novel therapies.

HTMCP is presently accruing cervical cancer, diffuse large B-cell lymphomas, and lung cancer tissues and matched control from HIV-positive patients from centers across the world. The Genome Sciences Centre, British Columbia Cancer Agency in Vancouver, Canada is generating sequence-based characterization on each case:

  • Whole Genome: provides the DNA sequence of the entire genome of tumor and matched normal
  • Transcriptome (mRNA-seq and miRNA-seq): provides sequences from transcribed RNA

The analysis will uncover somatic mutations, genomic rearrangements, copy number alterations, gene expression differences, chimera transcripts and single-nucleotide polymorphisms (SNPs) for loss of heterozygosity.


This section is under development.

Lung Carcinogenesis:
Challenge Grant (RC1)

Project Overview

Lung cancer is the second most common type of cancer in North America. Because it has the greatest impact on mortality than any other cancer, an abundance of genomics research has focused on understanding its genetic causes and susceptibilities. New advances in high-throughput genomic technologies are now allowing the systematic characterization of tumor transcriptomes and epigenomes, opening up new frontiers in cancer research. The CGCI supported a pilot effort that will apply these new technological approaches to the study of lung carcinogenesis, looking specifically for novel alterations in lung epithelial cancer cells. Investigators experimentally tested candidate transcriptomic and epigenomic alterations to determine their functional relationship in the transformation of lung cancer. Additionally, investigators compared candidate alterations between the various pathologically-determined epithelial phenotypes, including normal, dysplastic, neoplastic and malignant, to identify transcriptome and epigenome alterations that associate with those phenotypes. Alterations that correlate with early-stage phenotypes (e.g. dysplastic and neoplastic) are likely to play a role in the initiation of lung cancer. Completion of this comparison analysis may implicate a new set of regulatory pathways in lung carcinogenesis, which can be pursued in future prospective biomarker studies aimed at risk assessment, early diagnosis, and targeted therapies.

Experimental Approach

In order to generate a more comprehensive molecular landscape of lung cancer, investigators analyzed lung tumor tissue using three distinct genome-wide approaches: gene expression profiling, microRNA sequencing (a type of transcriptome sequencing), and epigenomic sequencing. The human lung tumor and adjacent non-tumor tissues are from >25 donors that were accrued by the Spivack laboratory through the support of other funding mechanisms.

  • Gene Expression Profiling: Surveyed mRNA levels using expression microarrays and verified by mRNA-seq.
  • microRNA Sequencing (miRNA-seq): Sequenced all known microRNAs using parallel sequencing and verified by both microRNA-qPCR and microRNA pull-down.
  • Epigenomic Sequencing: Surveyed epigenetic markers using the Methylome-wide HpaII tiny fragment Enrichment Ligation-mediated PCR (HELP-seq by massively parallel sequencing) assay, verified phenotype-associated alterations using mass spectroscopy-based sequencing (MassArray) and re-verified using tagged bisulfite genomic sequencing (tBGS).

Principal InvestigatorSimon Spivack, M.D., M.P.HAlbert Einstein College of Medicine,
New York, NY, USA