Genomic Location Finder for Long SAGE Tags
What the Genomic Location Finder Tool Can Do
The Genomic Location Finder tool finds the genomic location (chromosome, strand, and starting position) for one or more long SAGE tags. The genomic locations of long SAGE tags have been determined by finding all the NlaIII sites (CATG) in the genome. In the human genome there are approximately 27.3 million NlaIII sites. Of these locations, about 19.4 million sites define a long SAGE tag that is unique in the genome. The approximately 7.9 million other sites define about 1.0 million tags that occur in multiple locations in the genome. 563 human long SAGE tags have more than 1,000 genomic locations each. The complete list of genomic positions for all long SAGE tags is available on the SAGE Genie download site.
To find the location of a small number of tags, use the first of the following methods, "By entering tags". If you use this method, separate multiple tags with a comma. This method returns all genomic matches for the queried tag(s) in a text file with up to 2 sections. The first section gives all matches. This section has four columns: tag, chromosome, start, and strand. The second section, having only one column, identifies those tags that have no match to the genome.
If you want to find the genomic location of a large number of tags, you should use the second of the following methods, "By uploading a file of tags". If you use this method, you should upload a file consisting of a sigle column of long SAGE tags. If your file contains fewer than 200,000 tags, results should be available in 4 minutes or less. Larger sets of tags should be divided into sets that are smaller than 200,000. This method returns a text file with up to 3 sections. The first section gives all matches that are unique in the genome. This section has four columns: tag, chromosome, start, and strand. The second section identifies all tags that map to multiple locations in the genome, but it does not enumerate those multiple locations. Instead, this section has only two columns: tag and frequency (i.e., the number of genomic locations to which the tag maps). The third section, having only one column, identifies those tags that have no match to the genome.
It is not uncommon to find that some experimentally observed tag sequences do not occur anywhere in the genome adjacent to a NlaIII site.
Human genome build:
NCBI Build Number: 37
Release date: 04 August 2009
Mouse genome build:
NCBI Build Number: 37
Release date: 05 July 2007
Use the Genomic Location Finder
Upload File Format
- The file should NOT contain the same tag more than once.
- Tags should be all uppercase.
- The text file must list the tags in a vertical column like this:
ATGAAGATGGAATGGGT CTGCTTGCGTGAGATTC TAATTCTCATCGTCTGC GCTGATATTTAAAAGAG
AAAAAAAAAAAAAAAAAAACT AAAAAAAAAAAAAAACTCCTG AAAAAAAAAAAAAAAAAAAGC TCACCTGGCCTAGCCTGCCCT
GAACAGCACCCCCACTCACAGGTGAT TCATATGTTACACCCTGAAATTTGTG TTTAAAAAATCCATTGCGGCGGCAGC ATGTCTCGGTGGGGCTCAGGTATCAG