gabos_header

GABOS/GAFEP Help Page


GABOS (Get A Bit Of Sequence) is a tool to retrieve sequence data from a range of genomes for various Genomic objects.
GAFEP (Get A Few Exon Primers) is the tool to manage the primer creation.

GABOS

GABOS requires a local data set consisting of chromsome fasta files and, optionally, a set of annotation files for each genome. The installation of GABOS and the data files required are described in the GABOS Installation document.

GABOS retrieves sequence from a range of genomes that is determined by the data available. The GABOS web page reads the list of genome directories in the data folder and allows the user to select the required genome.

GABOS retrieves the list of annotation files available for all genomes and when a genome is selected, GABOS displays the list of annotation files for the selected genome. The annotation files can be gene definition files (like RefGene/refFlat(RefSeq genes), Genscan predicted genes, mgcGenes(Mammalian Gene Collection genes), knownGenes(determined by UCSC), They could be mRNA gene definitions,

GABOS will retrieve sequence for various genomic objects. These are:

1. Exons.
2. Introns.
3. 5'UTR's = Exonic Bases from Transcript Start to CDS Start(can be zero bases).
4. 3'UTR's = Exonic Bases from CDS End to Transcript End(can be zero bases).
5. CDS's = Coding Sequence = All Exon bases between CDS Start and CDS End.
6. Transcripts = The join of all Exon bases.
7. Genomic Sequence = Every base between Transcript Start(Exon 1,base 1) and Transcript End(Last Exon,last base))
8. DNA Chromosome Sequence = All the bases between a specified start and end co-ordinate on a chromosome.
9. GLU's = Genic Locus Units (See below for details).

When genes (or Repeat Sequences) are determining the sequence to be retrieved, their definitions are taken from the appropriate UCSC annotation data file.

All data files are downloaded from the UCSC site.

Select Genome

From the drop down list of genomes, select the required genome. Note that the numbers on the end of the genome names reflect the versions, as determined by UCSC. The genome names are the abbreviated versions used by UCSC. Commonly used genomes are:

bosTauCow
calJacMarmoset
canFamDog
caePbzC. brenneri
ceC. elegans
caeRemC. Remanei
danRerZebrafish
dmD. melanogaster
droSimD. sumlans
droYakD. yakuba
equCabHorse
frFugu
galGalChicken
hgHuman
mmMouse
monDomOpossum
OryLatMedaka
panTroChimpanzee
ponAbeOrangutang
priPacP. pacificus
rheMacRhesus
rnRat
taeGutZebra Finch


Select Exon to retrieve sequence for all exons in the specified gene. Select DNA to retrieve the DNA sequence specified by the Chromosome Name, Sequence Range and Strand selection entries. Select Transcript to retrieve all transcribed bases for the list of genes in the Gene Name box, or if no gene name list, then all the genes (as defined in the annotation file) that lie in the Chromosome Name, Sequence Range and Strand selection entries.

Select Genome

Select Genome
From the drop down list, select the genome you are interested in.

Annotation File Selection

Select Annotation File
From the drop down list, select the Annotation File you are interested in. Not all annotation Files are available for all Genomes.

Gene Name List

Specify Gene Name List
Type or paste a list of Gene Names. Gene Names take precedence over the Chromosome Name and Sequence Range for Exon and Transcript data. The strand option may limit the Genes selected from this list.

Gene Name List

Data object range
The data object range applies to the Exon and Intron(when available) data choice. For example a range of exons may be specified using numbers or the letter "e" representing the last (or ending) exon. Typical entries would be:
"1" - specifying just exon number 1 from each gene
"1 - 3" - specifying the exons numbered 1, 2 and 3 for each gene
"e" - specifying just the last exon
"1 - e" - this is the default value, specifying all exons from each gene
More complicated expressions will be available in later versions of GABOS.
For eg. "[e-2] - e" specifying the last three exons in each gene (NOT Available yet).


If you are having touble using the interface, have any suggestions on how to improve it's usability, or any suggestions regarding functionality of the program please contact Keith Satterley at keith@wehi.edu.au.