Skip Navigation
National Cancer Institute U.S. National Institutes of Health National Cancer Institute
SAGE Genie
  • Human SAGE Genie Tools
  • Downloads
  • Digital Karyotyping

Genomic Location Finder for Long SAGE Tags

What the Genomic Location Finder Tool Can Do

The Genomic Location Finder tool finds the genomic location (chromosome, strand, and starting position) for one or more long SAGE tags. The genomic locations of long SAGE tags have been determined by finding all the NlaIII sites (CATG) in the genome. In the human genome there are approximately 27.3 million NlaIII sites. Of these locations, about 19.4 million sites define a long SAGE tag that is unique in the genome. The approximately 7.9 million other sites define about 1.0 million tags that occur in multiple locations in the genome. 563 human long SAGE tags have more than 1,000 genomic locations each. The complete list of genomic positions for all long SAGE tags is available on the SAGE Genie download site.

To find the location of a small number of tags, use the first of the following methods, "By entering tags". If you use this method, separate multiple tags with a comma. This method returns all genomic matches for the queried tag(s) in a text file with up to 2 sections. The first section gives all matches. This section has four columns: tag, chromosome, start, and strand. The second section, having only one column, identifies those tags that have no match to the genome.

If you want to find the genomic location of a large number of tags, you should use the second of the following methods, "By uploading a file of tags". If you use this method, you should upload a file consisting of a sigle column of long SAGE tags. If your file contains fewer than 200,000 tags, results should be available in 4 minutes or less. Larger sets of tags should be divided into sets that are smaller than 200,000. This method returns a text file with up to 3 sections. The first section gives all matches that are unique in the genome. This section has four columns: tag, chromosome, start, and strand. The second section identifies all tags that map to multiple locations in the genome, but it does not enumerate those multiple locations. Instead, this section has only two columns: tag and frequency (i.e., the number of genomic locations to which the tag maps). The third section, having only one column, identifies those tags that have no match to the genome.

It is not uncommon to find that some experimentally observed tag sequences do not occur anywhere in the genome adjacent to a NlaIII site.

Human genome build:
NCBI Build Number: 37
Version: 1
Release date: 04 August 2009

Mouse genome build:
NCBI Build Number: 37
Version: 1
Release date: 05 July 2007

Use the Genomic Location Finder


By entering tags
   
4. Submit query:
By uploading a file of tags
   
4. Submit query:

Upload File Format

  • The file should NOT contain the same tag more than once.
  • Tags should be all uppercase.
  • The text file must list the tags in a vertical column like this:
    ATGAAGATGGAATGGGT
    CTGCTTGCGTGAGATTC
    TAATTCTCATCGTCTGC
    GCTGATATTTAAAAGAG
    Or
    AAAAAAAAAAAAAAAAAAACT
    AAAAAAAAAAAAAAACTCCTG
    AAAAAAAAAAAAAAAAAAAGC
    TCACCTGGCCTAGCCTGCCCT
    Or
    GAACAGCACCCCCACTCACAGGTGAT
    TCATATGTTACACCCTGAAATTTGTG
    TTTAAAAAATCCATTGCGGCGGCAGC
    ATGTCTCGGTGGGGCTCAGGTATCAG