Skip Navigation
NCI banner National Cancer Institute U.S. National Institutes of Health National Cancer Institute
SAGE Genie
  • Human SAGE Genie Tools
  • Downloads
  • Digital Karyotyping
Cancer Genome Characterization Initiative

Visit the database of genomic characterization data for multiple tumor types.


General Definitions


  • The rank of a virtual tag database is the percentage of virtual tags that are represented in a confident SAGE tag list. A higher percent yields a higher rank order and reflects that on average this virtual tag database has more commonly expressed (confident) SAGE tags. Therefore, a higher rank predicts that the database entries on average will correctly predict a SAGE virtual tag.

  • A virtual tag database is a collection of 10 bp sequences extracted from cDNA or EST sequences, which are found in a variety of public sources including the Mammalian Gene Collection, NCBI Reference Sequences, GenBank, and UniGene. A sequence is assigned to a particular virtual tag database depending on the presence or absence of a polyA tail and/or polyadenylation signal, and the tag's position in the transcript.


  • An association is the link between a transcript sequence and a SAGE tag sequence, and is used to determine which genes are producing which SAGE tags.

  • A reliable association is a tag to gene association from a virtual tag database ranked greater then 67%, excluding internally primed tags.

  • The weight of an association is the rank of the database times the absolute frequency of the tag, unless the tag is a repetitive tag. For an sssociation with an repetitive tag, the weight is defined to be 0.


  • A confident tag is a tag that has been reliably observed in human mRNA.

  • The confident SAGE tag list contains over 194,000 tags repeatedly observed in the human transcriptome and is used as a means to rank virtual tag databases by their percent representation in this list.

  • The frequency of a tag is the sum of the counts of all the SAGE libraries currently available in SAGE Genie.

  • A repetitive tag (identified by '#', e.g., CCTGTAATCC#) is one of 50 different tags that were present in 20 or more gene clusters. These tags are flagged since they likely represent the combined expression of many genes.

  • A SAGE tag is the 10 bp immediately downstream from the most 3' Nla III restriction site from a transcript.

  • A tag highlighted in pink is the best tag. e.g.,

  • A virtual tag is a prediction of the 10 base pair tag from a transcript sequence that would be observed in a SAGE experiment if this transcript was expressed, cloned and sequenced by SAGE.

  • The virtual tag classification gives a description of the position of the virtual tag in relation to the longest reliable sequence in the database and a possible means by which this virtual tag was generated, such as internal priming or alternative polyadenylation. Undefined 3' ends are also noted, if there is no polyA tail or signal found to define
Rules for Selecting the Best Tag for a Gene or Accession

Apply these rules in order:

  1. If the gene or accession is among the reliable associations, choose the association with the greatest weight. If there are multiple reliable associations with the greatest weight, then choose the mapping from the database with the highest rank.
  2. If there is no reliable association, choose the unreliable association from the database with the highest rank.
  3. If there are multiple unreliable associations from the same highest ranked database, then choose the tag that has the lowest level of ambiguity (that is, the tag that is associated with the smallest number of different genes or unclustered accessions).
  4. If there are multiple unreliable associations from the same highest ranked database and with the lowest level of ambiguity, then choose the tag that has the highest frequency.
  5. If there are multiple unreliable associations from the same highest ranked database, the lowest level of ambiguity, and the highest frequency, then choose randomly among these associations.
Find the Best Tag for a Gene, Accession, or UniGene Cluster
  • Search by gene symbol or keyword:
    1. Enter a gene symbol, an alias, or a keyword.
    2. Use "*"as a wild card, e.g., GST*.
    3. The search term can generate one gene or a list of genes; select one.
    4. When one gene is matched, the query is identical to query by UniGene cluster number (see below).

  • Search by accession number.
    1. Finds the best associations if the complete database of tag-to-gene associations contains a mapping that was derived explicitly from the accession number.
    2. Generates an identical result to a query on the UniGene cluster number (see below) of the cluster containing the accession.
    3. Note: The best tag for a given accession may be different from the best tag for the gene represented by that accession since the association that explicitly mentions the given accession may derive from a database with a lower rank. For example, compare a search on NM_002969 with a search on gene MAPK12.

  • Search by cluster number
    1. Finds the best tag for the UniGene cluster.
    2. If the best tag is a reliable association and the tag has reliable associations with other UniGene clusters, then the most confident alternative association is shown.
    3. If the best tag is an unreliable association and the tag has associations with other UniGene clusters, the most confident alternative association (reliable or unreliable) is shown. When alternative tag associations are displayed, the tag appears with an appended '*', e.g. CAGAGGAAGG*
Find the Best Gene for a Tag
  • A search for the best gene(s) for a tag produces all reliable associations for the tag or, in absence of reliable associations, all associations. Redundant, less-reliable, associations to the same gene are masked.

"All Tags" Report
  • Tags
    1. If the best tag is from a confident association, then the "all tags" screen gives information on all alternative reliable associations for the gene (or accession).
    2. If the best tag is not from a reliable association, then the details screen gives information on all associations (reliable or not) for the gene (or accession).

  • Masking of redundant associations.
    1. If an association at a given level of reliability between a tag and an accession or UniGene cluster is displayed, then a less-reliable association between the same tag, the same accession and the same cluster is not displayed.
    2. Masking of redundant associations applies to associations with the queried gene as well as to alternative associations.
SAGE Library Nomenclature

A SAGE Genie library is named according to the following convention:

  • SAGE_Organ_histology_code_unique identifier, e.g., SAGE_Colon_adenocarcinoma_CL_Caco2

  • Codes: B = bulk; CL = cell line; CS = short-term cell culture; MD = micro-dissected; AP = antibody purified.

Return to SAGE Genie