STRIDE: Protein secondary structure assignment from atomic coordinates
Dmitrij Frishman & Patrick Argos
European Molecular Biology Laboratory Postfach 102209, Meyerhofstr. 1 69012 Heidelberg Germany FRISHMAN@EMBL-HEIDELBERG.DE ARGOS@EMBL-HEIDELBERG.DE
1. About the methodSTRIDE  is a program to recognize secondary structural elements in proteins from their atomic coordinates. It performs the same task as DSSP by Kabsch and Sander  but utilizes both hydrogen bond energy and mainchain dihedral angles rather than hydrogen bonds alone. It relies on database-derived recognition parameters with the crystallographers' secondary structure definitions as a standard-of- truth. Please see Frishman and Argos  for detailed description of the algorithm.
2. Copyright notice
All rights reserved, whether the whole or part of the program is concerned. Permission to use, copy, and modify this software and its documentation is granted for academic use, provided that:
i. this copyright notice appears in all copies of the software and related documentation;
ii. the reference given below (Frishman and Argos, 1995) must be cited in any publication of scientific results based in part or completely on the use of the program;
iii. bugs will be reported to the authors.
The use of the software in commercial activities is not allowed without a prior written commercial license agreement.
WARNING: STRIDE is provided "as-is" and without warranty of any kind, express, implied or otherwise, including without limitation any warranty of merchantability or fitness for a particular purpose. In no event will the authors be liable for any special, incidental, indirect or consequential damages of any kind, or any damages whatsoever resulting from loss of data or profits, whether or not advised of the possibility of damage, and on any theory of liability, arising out of or in connection with the use or performance of this software.
For calculation of the residue solvent accessible area the program NSC [3,4] is used and was kindly provided by Dr. F.Eisenhaber (EISENHABER@EMBL-HEIDELBERG.DE). Please direct to him all questions concerning specifically accessibility calculations.
Executables of STRIDE for several UNIX platforms, VAX/VMS, OpenVMS, Dos and Mac together with documentation and source code are available by anonymous FTP from ftp.ebi.ac.uk (directories /pub/software/unix/stride, /pub/software/dos/stride, /pub/software/vms/stride, /pub/software/mac/stride). We are willing to compile the program for other architectures if temporary access to them will be granted by an interested user.
Data files with STRIDE secondary structure assignments for the current release of the PDB  databank are in the directory /pub/databases/stride of the same site. Atomic coordinate sets can be submitted for secondary structure assignment through electronic mail to email@example.com. A mail message containing HELP in the first line will be answered with appropriate instructions. See also WWW page http://www.embl-heidelberg.de/stride/stride_info.html.
For UNIX, DOS and Mac no installation is needed. Just download the executable corresponding to your platform, and you are all set. For VAX and OpenVMS you need only to link the executable with a logical name; for example:
yourlogicalname:= $ $yourdiskname:[your.directory.name]stride.exe
and then use yourlogicalname as the program name.
5. Using STRIDE
The only required parameter for STRIDE is the name of the file containing a set of atomic coordinates in PDB  format. By default STRIDE writes to standard output, i.e. your screen. On systems that allow to redirect output you can do so to create a disk file. Help is available if you just type STRIDE without parameters. The following options are accepted:
-fFilename Write output to the file "Filename" rather than to stdout.
-h Report hydrogen bonds. By default no hydrogen bond information is included in the output.
-o Report secondary structure summary only.
-rId1Id2.. Read only chains Id1, Id2 etc. of the PDB file *). All other chains will be ignored. By default all valid protein chains are read.
-cId1Id2.. Process only chains Id1, Id2 ...etc *). Secondary structure assignment will be produced only for these chains, but other chains that are present will be taken into account while calculating residue accessible surface and detecting inter-chain hydrogen bonds and, possibly, interchain beta-sheets. By default all protein chains read are processed.
-mFilename Generate a Molscript  file. Using the program Molscript by Per Craulis you can create a postscript picture of your structure. You can manually edit the Molscript file produced by STRIDE to achieve the desired orientation and to include additional details.
-q[Filename] Generate sequence file in FASTA  format and die. Filename is optional. If no file name is specified, stdandard output is used.
All options are case- and position-insensitive.
1. Calculate secondary structure assignment for 1ACP including hydrogen bond information:
stride 1acp.brk -h
2. Calculate secondary structure assignment for 4RUB and write the output to the file 4rub.str
stride 4rub.brk -f4rub.str
3. Calculate secondary structure assignment for chain B of 4RUB. Ignore all other chains. Generate a Molscript file 4rub.mol.
stride 4rub.brk -rb -m4rub.mol
4. Calculate secondary structure assignment for chain C of 2GLS in the presence of chains A and B. Report secondary structure summary only.
stride 2gls.brk -rabc -cc -o
6. Output format
STRIDE produces output that is easily readable both visually and with computer programs. The side effect of this conveniency is larger file size of individual STRIDE entries. Every record is 79 symbols long and has the following general format:
Position Description 1-3 Record code 4-5 Not used 6-73 Data 74-75 Not used 75-79 Four letter PDB code (if available)
Below follows the description of each record type.
Code Description and format of data
REM Remarks and blank lines
HDR Header. Protein name, date of file creation and PDB code
CMP Compound.Full name of the molecule and identifying information
SRC Species, organ, tissue, and mutant from which the molecule has been obtained
AUT Names of the structure authors
CHN File name and PDB chain identifier*).
Format: File name beginning from position 6 followed by one space and one-letter chain identifier
SEQ Amino acid sequence
Format: 6-9 First residue PDB number 11-60 Sequence 62-65 Last residue PDB number
STR Secondary structure summary Format: 11-60 Secondary structure assignment **)
LOC Location of secondary structure elements Format: 6-17 Element name 19-21 First residue name 32-26 First residue PDB number 28-28 First residue chain identifier 36-38 Last residue name 42-45 Last residue PDB number 47-47 Last residue chain identifier ASG Detailed secondary structure assignment Format: 6-8 Residue name 10-10 Protein chain identifier 12-15 PDB residue number 17-20 Ordinal residue number 25-25 One letter secondary structure code **) 27-39 Full secondary structure name 43-49 Phi angle 53-59 Psi angle 65-69 Residue solvent accessible area DNR Donor residue Format: 6-8 Donor residue name 10-10 Protein chain identifier 12-15 PDB residue number 17-20 Ordinal residue number 26-28 Acceptor residue name 30-30 Protein chain identifier 32-35 PDB residue number 37-40 Ordinal residue number 42-45 N..0 distance 47-52 N..O=C angle 54-59 O..N-C angle 61-66 Angle between the planes of donor complex and O..N-C 68-73 angle between the planes of acceptor complex and N..O=C ACC Acceptor residue Format: 6-8 Acceptor residue name 10-10 Protein chain identifier 12-15 PDB residue number 17-20 Ordinal residue number 26-28 Donor residue name 30-30 Protein chain identifier 32-35 PDB residue number 37-40 Ordinal residue number 42-45 N..0 distance 47-52 N..O=C angle 54-59 O..N-C angle 61-66 Angle between the planes of donor complex and O..N-C 68-73 angle between the planes of acceptor complex and N..O=C
HDR, CMP, SCR and AUT records are directly copied from the PDB file, if supplied by the authors. If only the secondary structure summary is requested, only CHN, SEQ, STR and LOC records will be output. Hydrogen bond information (records DNR and ACC) was made very redundant to facilitate human reading and will not be reported by default.
*) IMPORTANT NOTE: if the protein chain identifier is ' ' (space), it will be substituted by '-' (dash) everywhere in the STRIDE output. The same is true for command line parameters involving chain identifiers where you have to specify '-' instead of ' '.
**) One-letter secondary structure code is nearly the same as used in DSSP  (see Frishman and Argos  for details):
H Alpha helix G 3-10 helix I PI-helix E Extended conformation B or b Isolated bridge T Turn C Coil (none of the above)
For each record (data line) except those with codes REM and STR the number of fields is consistent and is readily suitable for processing with external tools, such as awk, perl, etc.
7. Bug reports and user feedback
Please send your suggestions, questions and bug reports to FRISHMAN@EMBL-HEIDELBERG.DE. Send your contact address to get information on updates and new features.