prettyseq

 

Function

Output sequence with translated ranges

Description

This writes out a nicely formatted display of the sequence with the translation (within specified ranges) displayed beneath it.

The translated nucleic acid region will be shown in lower-case letters while the rest of the input sequence will be left in the input case.

The base and residue numbers of the sequences are shown beside the sequences in the output.

Slightly unusually, this application uses the codon usage tables to translate the codons.

Usage

Here is a sample session with prettyseq


% prettyseq 
Output sequence with translated ranges
Input sequence: tembl:paamir
Range(s) to translate [1-2167]: 135-1292
Output file [paamir.prettyseq]: 

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Standard (Mandatory) qualifiers:
  [-sequence]          sequence   Sequence USA
   -range              range      Range(s) to translate
  [-outfile]           outfile    Output file name

   Additional (Optional) qualifiers:
   -[no]ruler          boolean    Add a ruler
   -[no]plabel         boolean    Number translations
   -[no]nlabel         boolean    Number DNA sequence

   Advanced (Unprompted) qualifiers:
   -cfile              codon      Codon usage table name
   -width              integer    Width of screen

   Associated qualifiers:

   "-sequence" associated qualifiers
   -sbegin1             integer    Start of the sequence to be used
   -send1               integer    End of the sequence to be used
   -sreverse1           boolean    Reverse (if DNA)
   -sask1               boolean    Ask for begin/end/reverse
   -snucleotide1        boolean    Sequence is nucleotide
   -sprotein1           boolean    Sequence is protein
   -slower1             boolean    Make lower case
   -supper1             boolean    Make upper case
   -sformat1            string     Input sequence format
   -sdbname1            string     Database name
   -sid1                string     Entryname
   -ufo1                string     UFO features
   -fformat1            string     Features format
   -fopenfile1          string     Features file name

   "-outfile" associated qualifiers
   -odirectory2         string     Output directory

   General qualifiers:
   -auto                boolean    Turn off prompts
   -stdout              boolean    Write standard output
   -filter              boolean    Read standard input, write standard output
   -options             boolean    Prompt for standard and additional values
   -debug               boolean    Write debug output to program.dbg
   -verbose             boolean    Report some/full command line options
   -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning             boolean    Report warnings
   -error               boolean    Report errors
   -fatal               boolean    Report fatal errors
   -die                 boolean    Report deaths


Standard (Mandatory) qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence USA Readable sequence Required
-range Range(s) to translate Sequence range Whole sequence
[-outfile]
(Parameter 2)
Output file name Output file <sequence>.prettyseq
Additional (Optional) qualifiers Allowed values Default
-[no]ruler Add a ruler Boolean value Yes/No Yes
-[no]plabel Number translations Boolean value Yes/No Yes
-[no]nlabel Number DNA sequence Boolean value Yes/No Yes
Advanced (Unprompted) qualifiers Allowed values Default
-cfile Codon usage table name Codon usage file in EMBOSS data path Ehum.cut
-width Width of screen Integer 10 or more 60

Input file format

prettyseq reads any nucleic acid sequence USA.

Input files for usage example

'tembl:paamir' is a sequence entry in the example nucleic acid database 'tembl'

Database entry: tembl:paamir

ID   PAAMIR     standard; DNA; PRO; 2167 BP.
XX
AC   X13776; M43175;
XX
SV   X13776.1
XX
DT   19-APR-1989 (Rel. 19, Created)
DT   17-FEB-1997 (Rel. 50, Last updated, Version 22)
XX
DE   Pseudomonas aeruginosa amiC and amiR gene for aliphatic amidase regulation
XX
KW   aliphatic amidase regulator; amiC gene; amiR gene.
XX
OS   Pseudomonas aeruginosa
OC   Bacteria; Proteobacteria; gamma subdivision; Pseudomonadaceae; Pseudomonas.
XX
RN   [1]
RP   1167-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (16-DEC-1988) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG.
XX
RN   [2]
RP   1167-2167
RX   MEDLINE; 89211409.
RA   Lowe N., Rice P.M., Drew R.E.;
RT   "Nucleotide sequence of the aliphatic amidase regulator gene of Pseudomonas
RT   aeruginosa";
RL   FEBS Lett. 246:39-43(1989).
XX
RN   [3]
RP   1-1292
RX   MEDLINE; 91317707.
RA   Wilson S., Drew R.;
RT   "Cloning and DNA seqence of amiC, a new gene regulating expression of the
RT   Pseudomonas aeruginosa aliphatic amidase, and purification of the amiC
RT   product.";
RL   J. Bacteriol. 173:4914-4921(1991).
XX
RN   [4]
RP   1-2167
RA   Rice P.M.;
RT   ;
RL   Submitted (04-SEP-1991) to the EMBL/GenBank/DDBJ databases.
RL   Rice P.M., EMBL, Postfach 10-2209, Meyerhofstrasse 1, 6900 Heidelberg, FRG.
XX
DR   SWISS-PROT; P10932; AMIR_PSEAE.
DR   SWISS-PROT; P27017; AMIC_PSEAE.
DR   SWISS-PROT; Q51417; AMIS_PSEAE.


  [Part of this file has been deleted for brevity]

FT                   phenotype"
FT                   /replace=""
FT                   /gene="amiC"
FT   misc_feature    1
FT                   /note="last base of an XhoI site"
FT   misc_feature    648..653
FT                   /note="end of 658bp XhoI fragment, deletion in  pSW3 causes
FT                   constitutive expression of amiE"
FT   conflict        1281
FT                   /replace="g"
FT                   /citation=[3]
XX
SQ   Sequence 2167 BP; 363 A; 712 C; 730 G; 362 T; 0 other;
     ggtaccgctg gccgagcatc tgctcgatca ccaccagccg ggcgacggga actgcacgat        60
     ctacctggcg agcctggagc acgagcgggt tcgcttcgta cggcgctgag cgacagtcac       120
     aggagaggaa acggatggga tcgcaccagg agcggccgct gatcggcctg ctgttctccg       180
     aaaccggcgt caccgccgat atcgagcgct cgcacgcgta tggcgcattg ctcgcggtcg       240
     agcaactgaa ccgcgagggc ggcgtcggcg gtcgcccgat cgaaacgctg tcccaggacc       300
     ccggcggcga cccggaccgc tatcggctgt gcgccgagga cttcattcgc aaccgggggg       360
     tacggttcct cgtgggctgc tacatgtcgc acacgcgcaa ggcggtgatg ccggtggtcg       420
     agcgcgccga cgcgctgctc tgctacccga ccccctacga gggcttcgag tattcgccga       480
     acatcgtcta cggcggtccg gcgccgaacc agaacagtgc gccgctggcg gcgtacctga       540
     ttcgccacta cggcgagcgg gtggtgttca tcggctcgga ctacatctat ccgcgggaaa       600
     gcaaccatgt gatgcgccac ctgtatcgcc agcacggcgg cacggtgctc gaggaaatct       660
     acattccgct gtatccctcc gacgacgact tgcagcgcgc cgtcgagcgc atctaccagg       720
     cgcgcgccga cgtggtcttc tccaccgtgg tgggcaccgg caccgccgag ctgtatcgcg       780
     ccatcgcccg tcgctacggc gacggcaggc ggccgccgat cgccagcctg accaccagcg       840
     aggcggaggt ggcgaagatg gagagtgacg tggcagaggg gcaggtggtg gtcgcgcctt       900
     acttctccag catcgatacg cccgccagcc gggccttcgt ccaggcctgc catggtttct       960
     tcccggagaa cgcgaccatc accgcctggg ccgaggcggc ctactggcag accttgttgc      1020
     tcggccgcgc cgcgcaggcc gcaggcaact ggcgggtgga agacgtgcag cggcacctgt      1080
     acgacatcga catcgacgcg ccacaggggc cggtccgggt ggagcgccag aacaaccaca      1140
     gccgcctgtc ttcgcgcatc gcggaaatcg atgcgcgcgg cgtgttccag gtccgctggc      1200
     agtcgcccga accgattcgc cccgaccctt atgtcgtcgt gcataacctc gacgactggt      1260
     ccgccagcat gggcggggga ccgctcccat gagcgccaac tcgctgctcg gcagcctgcg      1320
     cgagttgcag gtgctggtcc tcaacccgcc gggggaggtc agcgacgccc tggtcttgca      1380
     gctgatccgc atcggttgtt cggtgcgcca gtgctggccg ccgccggaag ccttcgacgt      1440
     gccggtggac gtggtcttca ccagcatttt ccagaatggc caccacgacg agatcgctgc      1500
     gctgctcgcc gccgggactc cgcgcactac cctggtggcg ctggtggagt acgaaagccc      1560
     cgcggtgctc tcgcagatca tcgagctgga gtgccacggc gtgatcaccc agccgctcga      1620
     tgcccaccgg gtgctgcctg tgctggtatc ggcgcggcgc atcagcgagg aaatggcgaa      1680
     gctgaagcag aagaccgagc agctccagga ccgcatcgcc ggccaggccc ggatcaacca      1740
     ggccaaggtg ttgctgatgc agcgccatgg ctgggacgag cgcgaggcgc accagcacct      1800
     gtcgcgggaa gcgatgaagc ggcgcgagcc gatcctgaag atcgctcagg agttgctggg      1860
     aaacgagccg tccgcctgag cgatccgggc cgaccagaac aataacaaga ggggtatcgt      1920
     catcatgctg ggactggttc tgctgtacgt tggcgcggtg ctgtttctca atgccgtctg      1980
     gttgctgggc aagatcagcg gtcgggaggt ggcggtgatc aacttcctgg tcggcgtgct      2040
     gagcgcctgc gtcgcgttct acctgatctt ttccgcagca gccgggcagg gctcgctgaa      2100
     ggccggagcg ctgaccctgc tattcgcttt tacctatctg tgggtggccg ccaaccagtt      2160
     cctcgag                                                                2167
//

You can specifiy a file of ranges to extract by giving the '-range' qualifier the value '@' followed by the name of the file containing the ranges. (eg: '-range @myfile').

The format of the range file is:

An example range file is:


# this is my set of ranges
12   23
 4   5       this is like 12-23, but smaller
67   10348   interesting region

Output file format

Output files for usage example

File: paamir.prettyseq

PRETTYSEQ of PAAMIR from 1 to 2167

           ---------|---------|---------|---------|---------|---------|
         1 GGTACCGCTGGCCGAGCATCTGCTCGATCACCACCAGCCGGGCGACGGGAACTGCACGAT 60
                                                                        

           ---------|---------|---------|---------|---------|---------|
        61 CTACCTGGCGAGCCTGGAGCACGAGCGGGTTCGCTTCGTACGGCGCTGAGCGACAGTCAC 120
                                                                        

           ---------|---------|---------|---------|---------|---------|
       121 AGGAGAGGAAACGGatgggatcgcaccaggagcggccgctgatcggcctgctgttctccg 180
         1               M  G  S  H  Q  E  R  P  L  I  G  L  L  F  S  E 16

           ---------|---------|---------|---------|---------|---------|
       181 aaaccggcgtcaccgccgatatcgagcgctcgcacgcgtatggcgcattgctcgcggtcg 240
        17   T  G  V  T  A  D  I  E  R  S  H  A  Y  G  A  L  L  A  V  E 36

           ---------|---------|---------|---------|---------|---------|
       241 agcaactgaaccgcgagggcggcgtcggcggtcgcccgatcgaaacgctgtcccaggacc 300
        37   Q  L  N  R  E  G  G  V  G  G  R  P  I  E  T  L  S  Q  D  P 56

           ---------|---------|---------|---------|---------|---------|
       301 ccggcggcgacccggaccgctatcggctgtgcgccgaggacttcattcgcaaccgggggg 360
        57   G  G  D  P  D  R  Y  R  L  C  A  E  D  F  I  R  N  R  G  V 76

           ---------|---------|---------|---------|---------|---------|
       361 tacggttcctcgtgggctgctacatgtcgcacacgcgcaaggcggtgatgccggtggtcg 420
        77   R  F  L  V  G  C  Y  M  S  H  T  R  K  A  V  M  P  V  V  E 96

           ---------|---------|---------|---------|---------|---------|
       421 agcgcgccgacgcgctgctctgctacccgaccccctacgagggcttcgagtattcgccga 480
        97   R  A  D  A  L  L  C  Y  P  T  P  Y  E  G  F  E  Y  S  P  N 116

           ---------|---------|---------|---------|---------|---------|
       481 acatcgtctacggcggtccggcgccgaaccagaacagtgcgccgctggcggcgtacctga 540
       117   I  V  Y  G  G  P  A  P  N  Q  N  S  A  P  L  A  A  Y  L  I 136

           ---------|---------|---------|---------|---------|---------|
       541 ttcgccactacggcgagcgggtggtgttcatcggctcggactacatctatccgcgggaaa 600
       137   R  H  Y  G  E  R  V  V  F  I  G  S  D  Y  I  Y  P  R  E  S 156

           ---------|---------|---------|---------|---------|---------|
       601 gcaaccatgtgatgcgccacctgtatcgccagcacggcggcacggtgctcgaggaaatct 660
       157   N  H  V  M  R  H  L  Y  R  Q  H  G  G  T  V  L  E  E  I  Y 176

           ---------|---------|---------|---------|---------|---------|
       661 acattccgctgtatccctccgacgacgacttgcagcgcgccgtcgagcgcatctaccagg 720
       177   I  P  L  Y  P  S  D  D  D  L  Q  R  A  V  E  R  I  Y  Q  A 196



  [Part of this file has been deleted for brevity]

      1441 GCCGGTGGACGTGGTCTTCACCAGCATTTTCCAGAATGGCCACCACGACGAGATCGCTGC 1500
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1501 GCTGCTCGCCGCCGGGACTCCGCGCACTACCCTGGTGGCGCTGGTGGAGTACGAAAGCCC 1560
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1561 CGCGGTGCTCTCGCAGATCATCGAGCTGGAGTGCCACGGCGTGATCACCCAGCCGCTCGA 1620
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1621 TGCCCACCGGGTGCTGCCTGTGCTGGTATCGGCGCGGCGCATCAGCGAGGAAATGGCGAA 1680
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1681 GCTGAAGCAGAAGACCGAGCAGCTCCAGGACCGCATCGCCGGCCAGGCCCGGATCAACCA 1740
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1741 GGCCAAGGTGTTGCTGATGCAGCGCCATGGCTGGGACGAGCGCGAGGCGCACCAGCACCT 1800
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1801 GTCGCGGGAAGCGATGAAGCGGCGCGAGCCGATCCTGAAGATCGCTCAGGAGTTGCTGGG 1860
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1861 AAACGAGCCGTCCGCCTGAGCGATCCGGGCCGACCAGAACAATAACAAGAGGGGTATCGT 1920
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1921 CATCATGCTGGGACTGGTTCTGCTGTACGTTGGCGCGGTGCTGTTTCTCAATGCCGTCTG 1980
                                                                        

           ---------|---------|---------|---------|---------|---------|
      1981 GTTGCTGGGCAAGATCAGCGGTCGGGAGGTGGCGGTGATCAACTTCCTGGTCGGCGTGCT 2040
                                                                        

           ---------|---------|---------|---------|---------|---------|
      2041 GAGCGCCTGCGTCGCGTTCTACCTGATCTTTTCCGCAGCAGCCGGGCAGGGCTCGCTGAA 2100
                                                                        

           ---------|---------|---------|---------|---------|---------|
      2101 GGCCGGAGCGCTGACCCTGCTATTCGCTTTTACCTATCTGTGGGTGGCCGCCAACCAGTT 2160
                                                                        

           -------
      2161 CCTCGAG 2167
                   

Data files

The codon usage table is read by default from "Ehum.cut" in the 'data/CODONS' directory of the EMBOSS distribution. If the name of a codon usage file is specified on the command line, then this file will first be searched for in the current directory and then in the 'data/CODONS' directory of the EMBOSS distribution.

EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.

To see the available EMBOSS data files, run:

% embossdata -showall

To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:


% embossdata -fetch -file Exxx.dat

Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".

The directories are searched in the following order:

Notes

None.

References

None.

Warnings

None.

Diagnostic Error Messages

"Range outside length of sequence" - this is self explanatory. You should specify a range of sequences to translate that is within the length of the input sequence.

Exit status

It always exits with a status of 0.

Known bugs

None.

See also

Program nameDescription
abiviewReads ABI file and display the trace
backtranseqBack translate a protein sequence
cirdnaDraws circular maps of DNA constructs
coderetExtract CDS, mRNA and translations from feature tables
lindnaDraws linear maps of DNA constructs
pepnetDisplays proteins as a helical net
pepwheelShows protein sequences as helices
plotorfPlot potential open reading frames
prettyplotDisplays aligned sequences, with colouring and boxing
remapDisplay a sequence with restriction cut sites, translation etc
seealsoFinds programs sharing group names
showalignDisplays a multiple sequence alignment
showdbDisplays information on the currently available databases
showfeatShow features of a sequence
showorfPretty output of DNA translations
showseqDisplay a sequence with features, translation etc
sixpackDisplay a DNA sequence with 6-frame translation and ORFs
textsearchSearch sequence documentation text. SRS and Entrez are faster!
transeqTranslate nucleic acid sequences

showseq has more options for specifying various ways of displaying a sequence, with or without various ways of translating it.

Author(s)

Alan Bleasby (ableasby © rfcgr.mrc.ac.uk)
MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK

History

Written (1999) - Alan Bleasby

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments

None