DelPhi- A Macromolecular Electrostatics Modelling Package

DelPhi is a software package which calculates electrostatic potentials in and around macromolecules. It incorporates the effects of ionic strength through the (non)linear Poisson Boltzmann equation. Any value for the dielectric constant of the molecule and solvent may be specified. Periodic boundary conditions can be used to model long periodic molecules such as DNA. The output from the programs can be used to calculate interactions, changes in pKa, solvation energies and many other properties of interest. Potentials may be displayed as 3-Dimensional isopotential contours, 2-Dimensional contour slices or as color coded molecular surfaces

Authors:

Kim A. Sharp, Anthony Nicholls Dept. of Biochemistry and Molecular Biophysics, Columbia University, 630 W168th St. New York, NY 10032 Phone: (212) 305-6884 Bitnet: SHARP@CUMBG Internet: sharp@128.59.98.1 nicholls@128.59.98.1

References

The following references may be useful for any use of this software in research that will be published. In particular references 1, 3 and 4 if read in detail will provide a basic tutorial in the way we do electrostatics, and provide a guide to how various electrostatic quantities can be calculated.

1) Klapper, I., Hagstrom, R., Fine, R., Sharp, K., Honig, B. (1986). Focusing of electric fields in the active site of Cu-Zn Superoxide Dismutase: Effects of ionic strength and amino-acid modification. Proteins 1, p47.

2) K.A. Sharp, M.K.Gilson, R.M.Fine and B.H. Honig. (1987). Electrostatic interactions in proteins. UCLA Symposium on Molecular and Cell Biology, Vol 69: Protein Structure, Folding and Design, Ed. D.L. Oxender, p235.

3) Gilson, M., Sharp, K., Honig, B. Calculating electrostatic interactions in bio-molecules: Method and error assessment. J. Computational Chem. 9, pp327-335.

4) Gilson, M., Honig, B. Total Electrostatic Energy of a Protein. Proteins, 4, p7 (1988).

5) B. Jayaram, K.A.Sharp and B.H.Honig. The electrostatic potential of B-DNA. Biopolymers, 28, p975 (1989).

6) K. Sharp, and B. Honig. Lattice Models of Electrostatic Interactions: The Finite Difference Poisson-Boltzmann Method. Chemica Scripta, In Press.

7) K. Sharp, and B. Honig. Electrostatic Interactions in Macromolecules: Theory and Applications. Ann. Rev. Biophys. Biopys. Chem. 19:301-32 (1990). This review has references to all known applications of the finite-difference Poisson Boltzmann method to macromolecules at the time of writing.

The original reference to the use of the finite difference method for macromolecular electrostatics is:

J. Warwicker and H.C. Watson, J. Mol. Biol., 157, p671 (1982).


Table of Contents

_*1_*.INTRODUCTION
_*1_*._*7_*.DIRECTORIES
_*2_*._*3_*.SCRIPT FILES
_*2_*._*4_*.EXAMPLES
_*4_ .FILES
_*4_*._*1_*. FILE NAMING
_*4_*._*2_*. FILE FORMATS
_*4_*._*2_*._*1_*. FILE.pdb:
_*4_*._*2_*._*2_*. FILE.prm:
_*4_*._*2_*._*3_*. FILE.siz:
_*4_*._*2_*._*4_*. FILE.crg:
_*4_*._*2_*._*5_*. FILE.frc:
_*4_*._*2_*._*6_*. CONTOUR.KT:
_*4_*._*2_*._*7_*. FILE.phi:
_*4_*._*2_*._*8_*. FILE.eps:
_*5_*. PROGRAMS
         QDIFF/QDIFFX
         CONTOUR2D
         GCSHP.A
         PHITOINS
         PHITOMBK
         PHITOPDB
         ASCIIPHI
         MBKBIN
         NMPTCH
         LOOKEPS
         REFORM
_*6_*. PROGRAM QDIFF
_*6_*._*1_*. BRIEF DESCRIPTION
_*6_*._*2_*. ALGORITHM
_*6_*._*3_*. INPUT FILES
_*6_*._*4_*. OUTPUT FILES
_*6_*._*5_*. INPUT FORMAT
_*6_*._*5_*._*1_*. UNIT 10
_*6_*._*5_*._*2_*. UNIT 11
_*6_*._*5_*._*3_*. UNIT 12
_*6_*._*5_*._*4_*. UNIT 13
_*6_*._*5_*._*5_*. UNIT 15
_*6_*._*5_*._*6_*. UNIT 18
_*6_*._*6_*. OUTPUT FORMATS
_*6_*._*6_*._*1_*. UNIT 6
_*6_*._*6_*._*2_*. UNIT 14
_*6_*._*6_*._*3_*. UNIT 16
_*6_*._*6_*._*4_*. UNIT 17
_*6_*._*6_*._*5_*. UNIT 19
_*6_*._*7_*. OVERALL PROGRAM FLOW:
_*6_*._*8_*. NOTES ON INPUT PARAMETERS
_*6_*._*8_*._*1_*. IGRID
_*6_*._*8_*._*2_*. PERFIL
_*6_*._*8_*._*3_*. OFFSET
_*6_*._*8_*._*4_*. EPSIN, EPSOUT
_*6_*._*8_*._*5_*. IBCTYP
_*6_*._*8_*._*6_*. IPER
_*6_*._*8_*._*7_*. NLIT
_*6_*._*8_*._*8_*. ISPH
_*7_*. PROGRAM QDIFFX
_*7_*._*1_*. BRIEF DESCRIPTION
_*7_*._*2_*. UNFORMATTED VERSIONS OF INPUT FILES for QDIFFX
_*7_*._*3_*. UNFORMATTED OUTPUT FILES for QDIFFX
_*7_*._*4_*. FORM OF UNFORMATTED OUTPUT FILES
_*7_*._*5_*. INPUT PARAMETER FILE FORMAT
_*7_*._*5_*._*1_*. EXAMPLE OF PARAMETER FILE USE
_*7_*._*5_*._*2_*. EXAMPLE PARAMETER FILE
_*8_*. PROGRAM QDIFFXS
_*8_*._*1_*. BRIEF DESCRIPTION
_*9_*. WORKED EXAMPLES
_*9_*._*1_*. THE POTENTIAL AROUND CU,ZN SUPEROXIDE DISMUTASE
_*9_*._*2_*. TEST CHARGES IN A SPHERE- Comparison with analytical solutions 
_*9_*._*3_*. SOLVATION ENERGY OF THE ACETATE ION 
_*9_*._*4_*. PK SHIFTS IN SUBTILISIN
_*9_*._*5_*. ELECTROSTATIC POTENTIALS AND ION CONCENTRATIONS AROUND B-DNA
_*9_*._*6_*. INTRINSIC PK CALCULATION IN TRYPSIN
_*9_*._*7_*. PACKING ENERGIES FOR HEMERYTHRIN HELICES
_*1_*0_*. HINTS


_*1_*. INTRODUCTION

This is a collection of programs to take a brookhaven data base format coordinate file of a molecule and calculate the potential in and around the molecule, using a finite differ ence solution to the non-linear POISSON-BOLTZMANN equation for any given ionic strength, solvent and molecule dielectrics.

Briefly, it consists of the following steps, explained in more detail below:

_*1_*._*1_*. Prepare three files containing: a) run parameters b) atomic radii c) atomic charges.

_*1_*._*2_*. Modify the brookhaven format atomic coordinate file as necessary: eg. if a full partial charge set on all atoms is required it will be necessary to build in hydrogens using a suitable modelling program. Also if the output of potentials at various specified atoms are required, a second atomic coordinate file may be needed.

_*1_*._*3_*. Assign logical units to files.

_*1_*._*4_*. Run the program QDIFF (finite difference algorithm), in batch or interactively directing the output to unit 6 into a log file if required. QDIFFX is a much faster algo rithm which supersedes QDIFF, except when the non-linear option is required. QDIFFXS is the latest version of QDIFFX, which provides a more accurate representation of the surface, hence gives better solvation energies, particularly at small scales (less than 2 grids/angstrom).

_*1_*._*5_*. Analyse the results. Two things that are commonly required are to read out and display the potentials. Several programs are provided to do this:

A conversion utility PHITOINS will convert the poten tial map output into a format that can be read by INSIGHT, Biosym's molecular modelling package. Alter natively, QDIFF can be made to output INSIGHT compati ble files directly.

The program PHITOMBK will convert the potential map output into Peter Goodford's brickmap format which can be read for example by QUANTA or FRODO to produce 3D contour displays of the potential.

The program PHITOPDB will write the potentials from the DelPhi output into the B factor, or temperature factor field of a brookhaven data base format coordinate file. The atoms may then be displayed and colored by poten tial by any display program that can color atoms by B, or Temperature factor, eg QUANTA.

_*1_*._*6_*. Transferring files.

The primary output file from the program is a three dimen sional array of potentials calculated at the lattice points. This is a large file (65**3) and is written in binary to save space and time. In some cases it will be necessary to transfer files between different machines. This can be done in two ways:

The program ASCIIPHI converts a binary potential map file to ascii or vice-versa. The ascii form may be reconverted to the binary form on the receiving machine. Some truncation of potentials and loss of pre cision occurs during the conversion so these files should only be used for display purposes.

The program MBKBIN converts a binary brickmap file to ascii or vice-versa. The ascii form may be reconverted to the binary form on the receiving machine. Some trun cation of potentials and loss of precision occurs dur ing the conversion so these files should only be used for display purposes.

Note that the real world enters the picture through three data files: 1) atom coodinates, in file.pdb; 2) atom radii, in file.siz; 3) atom charges, in file.crg.

_*1_*._*7_*. DIRECTORIES

Upon installation, a directory structure should be created in order to group files in some kind of order. The directories should be arranged as:

parent_directory/
   bin/ all executables and script files
   data/ atomic radii and charge data files
   docs/ documentation
   examples/
     ex1/ protein potentials: SUPEROXIDE DISMUTASE
     ex2/ test results: SPHERE
     ex3/ solvation energy: ACETATE
     ex4/ interaction energy: pK change in SUBTILISIN
     ex5/ non-linear case: DNA potentials
     ex6/ intrinsic pKa calculation on rat trypsin
     ex7/ binding/packing energy on haemerythrin 
   source/
     gcs/ Graphics library
     cont2d/ 2-D contour program
     qdiff_v2/ original finite difference program (does non-linear)
     qdiff_v3/ faster version of qdiff (~60 times). 
     qdiffxs/ more accurate version of qdiffx for solvation energy 
              calculations using a more accurate dielectric boundary 
              representation
     utility/ Conversion programs, etc

The older veresions of qdiff in directories qdiff_v2 and qdiff_v3 are included so for compatability and as a fallback if bugs/limitations are found in latest version (qdiffxs)

_*2_*._*3_*. SCRIPT FILES

The directory bin contains various Cshell scripts. assqdiffx generates a parameter file for QDIFFX, and makes logical unit assignments for QDIFFX. The script qdefault takes a *.pdb file name, uses default values for everything and runs QDIFFX, then invokes 2 other script files, mkcont and sur face. QDEFAULT is really aimed at getting you up and running quickly by providing a complete run through using defaults for everything- qdefault, mkcont and surface in effect pro vide template scripts which you can use to modify and learn how the programs work. The path name in qdefault will have to be changed to that on your system before running it. The script may need extensive modification (for other shells like Bourne, other versions of Unix), and only run under Unix of course. Other script files are: SURFACE, which start with an *.eps map, reformats it with REFORM, and con tours it with BIOSYM's CONTOUR program (provided with INSIGHT) to produce a *.grd file. When displayed in INSIGHT this provides a nice representation of the molecule's sol vent accessible surface; MKCONT starts with a *.phi file, and generates potential contours at +/- 0.25, 0.5, 1., 2., 4 kT/e (levels read in from file continp1) using CONTOUR. Output is in a *.grd file which can be displayed in INSIGHT;

_*2_*._*4_*. EXAMPLES

The best way to check the installation is by working through the examples (see note on script file QDEFAULT above). In particular, it is recommended that example 2, on charges in a sphere be worked through carefully, and the results com pared to those in section 9.2: The logfiles and *.frc output files should be compared with those provided from the test run on the convex (*cvx.log, *cvx.frc). This will provide a good check that the program is working, and also is a tuto rial on how delphi can be used to calculate the various com ponents of an electrostatic system. It is no exaggeration to say that if you understand the case of two charges in a sphere then that is 90% of continuum electrostatics! The input files, and the output files that are not in binary form (ie not *.phi, *.eps) are provided in exam ples/ex1...ex7. Explanation and analysis is also provided in Section 9. Plot files that were generated from some of the *.phi potential map files are given in *.plt. Type (or cat) these out to a Hewlett Packard HP7475A or similar plot ter to give plots that can be compared to those generated by CONTOUR2D. These examples are taken from published papers that used this or similar forms of DelPhi, and these should be referred to for more details. The calculations are typi cal of those that can be made with this package, although for the purpose of illustration, several shortcuts were taken. Also read section 10, Hints.

_*4_ .FILES

_*4_*._*1_*. FILE NAMING

Most file types have default extensions to simplify things:

file.pdb: Atomic coordinates, input to QDIFF, CON TOUR2D, PHITOPDB. Ouput from QDIFF and PHITOPDB.

file.siz: Atomic radii, input to QDIFF.

file.crg: Atomic charges, input to QDIFF.

file.prm: Parameters for QDIFF.

file.log: Logfile from QDIFF.

contour.kt: Contour levels in kT/e, input to CONTOUR2D.

file.phi: Binary potential map, output of QDIFF, input to CONTOUR2D, PHITOPDB, PHITOINS, PHITOMBK, ASCI IPHI.

file.big: Ascii version of file.phi, input/output of ASCIIPHI.

file.ins: Binary output from PHITOINS, input to Biosym's INSIGHT contour routine.

file.mbk: Binary brickmap form of potential map. Output from PHITOMBK, and input to Polygen's QUANTA pack age

file.mba: Ascii form of file.mbk file. Input/Output of MBKBIN.

file.frc: Potentials and fields, output from QDIFF.

file.plt: Plot file for HP75 plotter, output from CON TOUR2D.

file.eps: Dielectric bit map, output of QDIFF, input of CONTOUR3D, LOOKEPS.

_*4_*._*2_*. FILE FORMATS

_*4_*._*2_*._*1_*. FILE.pdb:

(6A1,I5,1X,A4,A1,A3,1X,A1,I4,A1,3X,3F8.3,2F6.2,1X,I3) for atom record, where fields contain 'ATOM--' or 'HETATM' atom serial number, atom name, alternate location indicator, residue name, chain identifier, residue sequence number, residue insertion code, x,y,z coordinates, occupancy, tem perature factor, footnote number. QDIFF reads the records somewhat differently, treating the records before the coor dinates as all ascii- See Section 6.5.4 for more details.

_*4_*._*2_*._*2_*. FILE.prm:

parameters for QDIFF in free format (See Section 6.5.1 for more details):

igrid== (odd) number of points in cubic lattice, min=5,max=65.

perfil== percentage of grid box filled by molecule.

offset== x,y,z of molecule centre in grid units from lattice centre.

epsin, epsout== dielectric constants for molecule and solvent.

rionst== ionic strength in moles/litre

exrad, radprb== ion exclusion and probe radii in Angstroms.

ibctyp== integer flag for boundary condition .

iper(3)== logical flags for periodic boundary condi tions.

nlit== number of iterations with linear P-B equation.

nnit== number of iterations with non-linear P-B equa tion.

iconc,ibios== logical flags for mode of output.

isite== logical flag for site potential output.

iatout== logical flag for modified *.pdb file output.

toplbl== 60 characters of ascii title information for run.

isph== logical flag for charge distribution algorithm.

_*4_*._*2_*._*3_*. FILE.siz:

Van der Waals radii to be assigned to each atom/residue type.

1: any number of lines with ! in the first position are treated as comments.

2: a single line of ascii information that DOESN'T have ! in the first column, which marks the positions of the various fields for human readability and indicates the start of the radius records, but is otherwise not used.

3: an indefinite number of lines each containing an atom label record a residue label record and a radius in angstroms in the format (A6,A3,F8.3). For example:

aaaaaannnrrrrrrrr
c             1.8 
ca            1.7 
ca    pro     1.6

Using this file, all proline ca atoms will have radius 1.6A, all other ca atoms will have radius 1.7, and all other types of carbon atoms will have radius 1.8.

_*4_*._*2_*._*4_*. FILE.crg:

Charges to be assigned to each atom/residue/#/chain type.

1: any number of lines with ! are ignored as comments.

2: a single line of ascii information that DOESN'T have ! in the first column which marks the positions of the various fields for human readability and indicates the start of the charge records, but is otherwise not used.

3: an indefinite number of lines each containing an atom label record, a residue label record, a residue number record, a chain label record and a charge in the format: (A6,A3,A4,A1,F8.3). For example:

 !example of charge file
 aaaaaarrrnnnncqqqqqqqq
 ca                -0.2
 ca    pro        -0.23
 ca    pro   1    -0.25
 ca    pro   1y   -0.20

In this fictitious example all alpha carbons will have a charge of -0.2, except for proline C alpha's, which will have a charge of -0.2 on the y chain PRO 1, a charge of -0.25 on all other chain PRO 1 atoms, and a charge of -0.25 on all other PRO's. Atoms that do not find a match in the *.crg file will be neutral

_*4_*._*2_*._*5_*. FILE.frc:

List of potentials and fields at cooordinates in *.pdb file 12 lines of ascii header information, followed by a variable number of records written as:

 230    format(8G10.3)
        write(16,230)xo,chrgv,phiv,fx,fy,fz

where xo(3) are x,y,z coordinates of charge, chrgv is the charge value, phiv is the potential in kT/e at that point, and fx,fy,fz are the field components in kT/a/A. (1 kT/e = 25.6mV = 0.593 kcal/mole/e).

_*4_*._*2_*._*6_*. CONTOUR.KT:

Free format, one value per line, any number of lines where values are contour levels in kT/e

_*4_*._*2_*._*7_*. FILE.phi:

Unformatted file

 character*20 toplabel
 character*10 head,character*60 title
 real*4 phi(65,65,65)
 character*16 botlabel
 real*4 scale, oldmid(3)

where phi is potential map, scale, oldmid relate grid cordi nates back to real space coords. other variables are text information.

_*4_*._*2_*._*8_*. FILE.eps:

Unformatted file

 real*4 oldmid(3)
 integer*4 imap
 integer*2 eps(5,65,65,3)

 write (10) imap, scale, oldmid
 write (10) eps 

where eps is the dielectric map, scale,oldmid relate grid coords. to real space coords, imap is unused flag

_*5_*. PROGRAMS

QDIFF/QDIFFX

Subroutines: elb.f up.f cent.f cfind.f ichash.f irhash.f rent.f qdiff2.f rfind.f phintp.f scaler.f chrgit.f setbc2.f conplt.f itit2.f expand.f mkeps.f wrteps.f qdiffpar.h

Purpose: solves finite difference form of non-linear Poisson- Boltzmann equation.

Input: file.pdb, file.prm, file.siz, file.crg (file1.pdb, file1.phi)

Output: file.log, file.phi, file.eps (file.frc, file2.pdb)

CONTOUR2D

Subroutines: cont2dsub.f, read_atom.f, gcshp.a (graph ics library)

Purpose: interactive production of 2D plots of any arbitrary slice through potential maps, and out line of slice through molecule. The program is totally menu driven, so the best way to learn this program is to play with it.

Input: file.phi, file.pdb, contour.kt. Menu driven

Output: colour plots on AED512, HP75A, mono-chrome on Tektronix 4014 emulators such as the VT241, Visual 550. Output to HP75 is written to unit 98 for sub sequent tranfer to plotter.

GCSHP.A

GCS graphics package library required by CONTOUR2D.

PHITOINS

Subroutines: none

Purpose: convert output potential map format from QDIFF (file.phi) to INSIGHT input format (file.ins) and vice-versa.

Input: file.phi (file.ins). Prompts for file names.

Output: file.ins (file.phi)

PHITOMBK

Subroutines: none

Purpose: convert output potential map format from QDIFF (file.phi) to Peter Goodfords brickmap format which can be read by QUANTA (file.mbk), and I think also by FRODO.

Input: file.phi. Prompts for file names

Output: file.mbk

PHITOPDB

Subroutines: none

Purpose: takes output potentials from QDIFF output (file1.phi) and writes the potentials at the coor dinates of an input brookhaven protein data bank format file (file2.pdb), to the B factor field of a modified protein data bank file (file3.pdb). The protein can then be displayed and colored by potential by any display program that can color atoms by B, or temperature factor.

Input: file1.phi file2.pdb

Output: file3.pdb

ASCIIPHI

Subroutines: none

Purpose: Enables transfer of binary potential files: takes output potentials from QDIFF output (file.phi) and reformats and compresses potentials to integer*4 format, writing result as an ascii file, and vice-versa. The ascii file may then be transferred between machines, and reformatted as a binary *.phi file for display.

Input: file.phi/big. Prompts for file name

Output: file.big/phi.

Notes: i) a version of ASCIIPHI must exist on both machines. ii) On compression to an integer, the potentials are truncated so loss of precision results- the resultant files should only be used for display, not high accuracy calculations.

MBKBIN

Subroutines: none

Purpose: Enables transfer of binary potential files: takes output potentials from PHITOMBK output (file.mbk) writing result as an ascii file, and vice-versa. The ascii file may then be trans ferred between machines, and reformatted as a binary *.mbk file for display programs.

Input: file.mbk/mba. Prompts for file name

Output: file.mba/mbk.

Notes: i) a version of MBKBIN must exist on both machines. ii) On conversion the potentials are truncated so loss of precision results- the resul tant files should only be used for display, not high accuracy calculations.

NMPTCH

A collection of subroutines that rename or provide for some non-existent intrinsic functions- to improve portatibility without changing source code.

LOOKEPS

Subroutines: none

Purpose: to display a slice trhough a dielectric map on terminal in a line plot form, in order to check the dielectric map.

Input: file.eps. Prompts for files names.

Output: to alpha-numerics terminal.

REFORM

Subroutines: none

Purpose: reformats a *.eps file into *.phi form- this enables the solvent accessible surface to be dis played using the same methods as used for the potentials ('potential' values range from

Input: file.eps. Prompts for files names.

Output: file.phi

_*6_*. PROGRAM QDIFF

last revision: 11 nov 1988

All arrays, including the maximum lattice dimension, are dimensioned in an include file: qdiffpar.h.

_*6_*._*1_*. BRIEF DESCRIPTION

QDIFF is a program to solve the (non-linear) Poisson Boltzmann equation by finite difference methods on a NSZxN SZxNSZ cubical lattice. It can handle both the linear and non-linear forms of the equation. The program can use: peri odic or focussing boundary conditions on the lattice edge; a variable ion exclusion (or Stern) layer around the molecule; a variable probe radius to define the solvent accessible surface; a variable bulk solvent ionic strength; any required dielectric constant values for the molecule and solvent.

For input the program requires four files, containing i) Run parameters, ii) atom coordinates iii) Rules for assigning atomic radii, and iv) Rules for assigning atomic charges. For focussing boundary conditions a fifth file containing potentials calculated from a previous run of the program is also needed.

The output is a 3D array giving the potential (or net sol vent ion concentration) at each lattice point and (optional) a list of potentials at charge locations contained in another atom coordinate file,

_*6_*._*2_*. ALGORITHM

QDIFF relaxes the difference equation form of the nonlinear Poisson-Boltzmann equation (Equation 1)

 Pn = (1-W)Po + W.sum(Ei.Pi) + 4PI.Qo/
 (sum Ei + Do(1 + Po**2/6 + Po**4/120...)
 i = 1,6

where W is a relaxation parameter between 0 and 2 that con trols the rate of iteration.

Po is the current estimate of the potential at the o'th lat tice point.

Pn is next estimate of the potential at the o'th lattice point.

Pi is potl. at 6 lattice points surrounding the o'th.

Ei is the dielectric constant assigned to the lattice lines connecting the grid point to its 6 neighbours.

Q is charge at the o'th lattice point.

D is debye huckel term at the o'th lattice point.

Terms in powers of Po in the rhs denominator represent suc cessive non-linear corrections to the linear equation. The current version of the program includes two such correction terms, which appears to be sufficient for the types of potentials encountered in aqueous macromolecular systems.

The largest permissible grid size is NGRID (default=65) CUBED. The difference equation is iterated until the change in potential at the grid ponts is smaller than some required value. Grid points on the boundary of the lattice do not have a full set of six neighbors unless periodic boundary conditions are applied to them, hence they are not relaxed, and initial values of the potential must be provided for them through some kind of boundary condition.

_*6_*._*3_*. INPUT FILES

logical units for read/write should be assign to file names at the system level. Use of certain default file types will simplify things

 UNIT  NAME
 -------------------------------------------
 10    *.prm    input parameters
 11    *.siz    atomic van der Waals radii
 12    *.crg    atomic charges
 13    *.pdb    brookhaven format atom coordinates
 (15)  *.pdb    list of atom coordinates if site potential output is required
 (18)  *.phi    binary potential needed if focussing boundary conditions are selected.
 -------------------------------------------

Units 15 and 18 are optional depending on values for the site potential output flag, and boundary condition option respectively, given in the parameter file.

_*6_*._*4_*. OUTPUT FILES

 UNIT  NAME
 -------------------------------------------
 6     *.log    logfile when run in batch
 14    *.phi    output binary potential map 
(16)   *.frc    list of potentials at coordinates contained in the unit 15 pdb file
 17    *.eps    binary dielectric map output
 (19)  *.pdb    output atom coordinate file with radii and charge records added
 -------------------------------------------

Units 16 and 19 are not always written, depending on values assigned in the parameter file for the site potential output flag, and modified pdb file output flag options respec tively.

_*6_*._*5_*. INPUT FORMAT

A description of the input required by QDIFF is given below, with some suggested parameter values. However no hard and fast rules can be given since in many cases a tradeoff is involved. More details are given in the notes, Section 6.8.

_*6_*._*5_*._*1_*. UNIT 10

Default extension .prm. Contains input parameters read in free format. Values are real unless stated otherwise:

line 1== IGRID ---(odd) integer number of points per side of the cubic lattice, min=5, max=65 (=NGRID). A larger grid size will in general mean a better resolution representation of the molecule on the lattice, and hence more accurate potentials, but will require more time (see note 1).

line 2== PERFIL ---percentage (> 0) of the lattice that the largest of the x,y or z linear dimensions of the 'molecule' will fill. This will determine the scale of the lattice (grids/angstrom). The per centage fill of the lattice will depend on the application (see note 2).

line 3== OFFSET(3) ---x,y,z position in GRID UNITS from centre of lattice where the centre of molecule will be positioned. ie 0,0,0 will put the centre of molecule at the centre of the grid; x,x,x {x = (IGRID+1)/2.} will put it at one corner; -x,-x,-x will put it at opposite corner. See note 3.

line 4== EPSIN, EPSOUT ---dielectric constants (>=1.0) for the molecule and surrounding solvent.

line 5== RIONST ---ionic strength of solvent, in moles/litre (>= 0.).

line 6== EXRAD, RADPRB ---thickness of the ion exclusion layer around molecule; and the radius of the solvent probe molecule that will define sol vent accessible surface in Lee and Richard's sense (both in angstroms,>0.0). See note 4.

line 7== IBCTYP ---integer flag specifying the type of boundary condition imposed on the edge of the lat tice- allowed options:

1== potential is zero.

2== Approximated by the Debye/Huckel potential of the equivalent dipole to the molecular charge dis tribution, where

phi = q+.exp(-r/d)/(diel*r) + q-.exp(-r/d)/(diel*r)

where phi is the potential estimate at a given lattice boundary point, q+ (q-) is the sum of all positive (negative) charges, and r is distance from the point to the centre of positive (nega tive) charge, d is the debye length, and diel is the solvent dielectric constant.

3== focussing, where a potential map from a previous calculation is read in on unit 18, and values for the potential at the lattice edge are interpolated from this map- clearly the first map should have been generated with a coarser grid (greater dis tance between lattice points) and positioned such that current lattice lies completely within old lattice or program will protest. (See note 5)

4== Approximated by the sum of Debye/Huckel poten tials of all the charges.

phi = sum over i [qi.exp(-r/d)/(diel*r)]

where now qi is the i'th charge, and r is distance from the lattice boundary point to the charge.

line 8== IPER(3) ---three logical flags (t/f) for peri odic boundary conditions for the x,y,z edges of the lattice respectively. Note that periodic boundary conditions will overide other boundary conditions on edges to which they are applied. See note 6.

line 9== NLIT ---integer number (> 3) of iterations with linear equation. See note 7.

line 10== NNIT ---integer number (> = 0) of non-linear iterations. If linear PB equation only is required, set NNIT=0.

line 11== ICONC,IBIOS ---two logical flags (t/f) con trolling the output format of the potential map on unit 14.

iconc=f produces standard potential output in kT/e (aproximately equal to 25.6 mV at 25oC, or to 0.593 kcal/mole of charge).

iconc=t will give net solvent ion concentration output in M/l, where for every lattice point inside the molecule the concentration is 0, and the outside concentration is obtained from the ionic strength*2*sinh(potential).

ibios=t INSIGHT output format *.ins file is out put.

ibios=f DELPHI output format *.phi file is output.

line 12== ISITE ---logical flag (t/f). isite=t will cause potentials and fields at specified coordi nates to be written to unit 16. Coordinates must be provided in a *.pdb file which will be read in on unit 15. This pdb file may be the same as that assigned to unit 13.

line 13== IATOUT ---logical flag (t/f). iatout=t will produce a modified PDB file written on unit 19, containing the radius and charge assigned to each atom written after the coordinates, in the fields used for occupancy and B factor. It is recom mended that this option be set initially so that you can check that all the radius and charge assignments are correct. An additional check on the charge assignment can be made by looking at the total charge written to the log file. Also this modified PDB file is required for displaying cross-sections of the molecule surface in the 2D potential contour program CONTOUR2D

line 14== TOPLBL ---60 characters of ascii information that will be written as a header in potential map output for identification purposes.

line 15== ISPH ---logical flag (t/f) controlling the algorithm by which the charge is assigned to the lattice points. Set this to false initially, and read note 8 for further details.

_*6_*._*5_*._*2_*. UNIT 11

Default extension *.siz. List of rules describing the van der Waals radii to be assigned to each atom/residue pdb record type. The format is described in Section 4.2.3 on files. Note the atom and residue fields ignore case and leading blanks. The residue field may be left blank (wild card), causing a match with the given atom type of any residue. ONLY if the residue field is left blank, the LAST 5 characters of the atom record maybe left blank. In this case all atom types beginning with the letter in column 1 will be matched. Records of greater specificity overide those of less specificity. Beware of ambiguities like calcium (ca) and alpha carbon! All atoms of an input *.pdb file must be assigned a radius through the *.siz file, even if it is 0, or an error will be flagged.

_*6_*._*5_*._*3_*. UNIT 12

Default extension *.crg. List of rules describing the atomic charges to be assigned to each atom/residue/number/chain pdb record type. The format is described in Section 4.2.4 on files. The ascii fields for atom, residue, number and chain ignore case and leading blanks. Any field except the atom name may be left blank and will be treated as a wild card. Records of greater specificity overide those of less speci ficity as for the *.siz file above.

search order:

atom_res_num_chain

atom_res_num______

atom_res_____chain

atom_res__________

atom_____num_chain

atom_____num______

atom_________chain

atom______________

Atoms that do not find a match in the *.crg file will be neutral (q=0.0)

_*6_*._*5_*._*4_*. UNIT 13

Standard format Brookhaven protein data bank file containing atom labels and coordinates. Only records starting with ATOM or HETATM are used. Default extension *.pdb. Format (as assumed by the program: beware many minor and not so minor differences abound in the literature and on disk!): (A6,5X,A5,X,A3,X,A1,A4,4X,3F8.3) header, atom name, residue name, chain name, residue number, x,y,z. coordinates. Note that the program treats the residue number as an ascii string, not an integer.

_*6_*._*5_*._*5_*. UNIT 15

Default extension: *.pdb. list of coordinates where site potentials are output, format as for unit 13

_*6_*._*5_*._*6_*. UNIT 18

default extension *.phi potential map for focussing boundary conditions. Potentials are in kT/e (25.6mV, 0.593 kcal/mole at 25oC). Format:

 unformatted (binary file)
 character*20 uplbl
 character*10 nxtlbl,character*60 toplbl
 real*4 phi(65,65,65)
 character*16 botlbl
 real*4 scale,oldmid(3)

uplbl,nxtlbl,toplbl,botlbl are ascii information. Phi is the 3D array conatining values of potential for all the lattice points. Index order is x,y,z. Scale is lattice scale in grid squares/angstrom. Oldmid is the x,y,z coordinates in real space (angstroms) of the centre of the lattice: thus the real space coordinates x,y,z of the lattice point for phi(IX,IY,IZ), for the case where IGRID = 65, are:

 x = (IX - 33)/scale + oldmid(1)
 y = (IY - 33)/scale + oldmid(2)
 z = (IZ - 33)/scale + oldmid(3)

where 33 = (65+1)/2 is the middle point of the grid.

_*6_*._*6_*. OUTPUT FORMATS

_*6_*._*6_*._*1_*. UNIT 6

Output from program, including error messages and conver gence history. When run interactively, appears on standard output. Default extension *.log when run in batch

_*6_*._*6_*._*2_*. UNIT 14

If flag IBIOS is false, then output is in DELPHI format, default extension *.phi. Output potential map or concentra tion map, with format same as for unit 18 above. When igrid<65(NGRID), then potentials are interpolated to produce 65X65X65 density lattice for compatibility with dis play/analysis programs If flag IBIOS is true, then output is in INSIGHT format, default extension *.ins. This is an unformatted (binary)file

character*132 toplbl !ascii header 
integer*4 ivary !0 => x index varys most rapidly
integer*4 nbyte !=4, # of bytes in data
integer*4 inddat !=0, floating point data
real*4 xang,yang,zang !=90,90,90 unit cell angles
integer*4 intx,inty,intz !=igrid-1, # of intervals/grid side
real*4 extent !maximum extent of grid
real*4 xstart,xend !beginning, end of grid sides
real*4 ystart,yend !in fractional
real*4 zstart,zend !units of extent
write(14)toplbl write(14)ivary, nbyte, intdat, extent, extent, extent, xang, yang, zang, xstart, xend, ystart, yend, zstart, zend, intx, inty, intz 
do k = 1,igrid 
   do j = 1,igrid 
      write(14)(phimap(i,j,k),i=1,igrid) 
   end do 
end do

Note that for grid sizes less than 65, INSIGHT format files will occupy less disk space than the corresponding DELPHI files. *.ins files are designed as input to a Biosym Corp. stand alone utility called CONTOUR, supplied with INSIGHT Version 2.4. This program will produce contour files for display with INSIGHT.

_*6_*._*6_*._*3_*. UNIT 16

Default extension *.frc. A list of potentials and fields at coordinates in *.pdb file read on unit 15. Format: 12 lines of ascii header information, followed by a variable number of records written as:

 230 format(8G10.3)
 write(16,230)xo,chrgv,phiv,fx,fy,fz

where xo(3) are x,y,z coordinates of charge, chrgv is the charge value, phiv is the potential (in kT/e) at that point, and fx,fy,fz are the field components (in kT/e/Angstrom). The last line of the file is the sum of chrgv*phiv/2 over all the charges in the file. This quantity can be used for calculating solvation and interation energies- see examples 2 and 3 in Section 7.

_*6_*._*6_*._*4_*. UNIT 17

Dielectric bit map, default extension: *.eps. There are 3*65*65*65 lines joining neighbouring grid points, 65*65*65 each in of the x,y,z directions. The midpoint of each line is given a value of 1 if it lies within the solvent accessi ble volume of the molecule, 0 if outside. By this means the shape of the molecule is defined, and space is separated into two regions with different dielectrics. For compact output purposes the array of REAL*4, epsmap(65,65,65,3), is compressed into an INTEGER*2 array, neps(5,65,65), by bitmapping: the first index of epsmap, range 1-65 is com pressed into the first index of neps, range 1-5, where the indices 1-16 go into bits 0-15 of the word with index 1, indices 17-32 -> bits 0-15 of word with index 2 etc. The array neps is then written to an unformatted binary file:

 write (17) imap, scale, oldmid
 write (17) neps

where imap is an unused integer*4 flag and scale, oldmid(3) are real*4 scaling information as above.

_*6_*._*6_*._*5_*. UNIT 19

List of input coordinates read from unit 13, default exten sion *.pdb, with the assigned atomic radius and charge of each atom placed in columns 55-60 and 61-67 in formats F6.2,F7.3 respectively. Other formats are same as for unit 13

_*6_*._*7_*. OVERALL PROGRAM FLOW:

In general outline, the program flow is as follows:

Header with time, date written.

Parameters are read from unit 10 and echoed to output.

Radius data read from unit 11 and stored in hash table for efficient look up.

Charge data read from unit 12 and stored in hash table for efficient look up.

Atomic coordinates are read from unit 13 and scaling is computed.

Arrays that describe 3D distribution of dielectric and ion accessibility in space are initialized.

Atomic coordinate and label data read from unit 13 again, and radii and charge assigned to each atom. Dis tribution of dielectric values, ionic strength parame ters and charge values over the lattice are determined from the coordinate/charge/radius data

Atom file with charge and radii records outputted on unit 19 if requested.

Centres of + and - charge distributions, and net charge calculated for check on charge distribution.

Arrays are set up for the difference eqn. iteration.

Boundary values are set- either through analytical expressions or from interpolation into the potential map read on unit 18.

Linear then non-linear iterative relaxations are done and convergence histories are displayed as simple log/lin line plots.

Potentials are converted to concentrations if requested.

Potentials and fields are calculated at the coordinates of the atoms read on unit 15, and outputted to unit 16 if requested.

Grid of potentials is increased in fineness to 65x65x65 if necessary and outputted on unit 14.

The dielectric map is outputted on unit 17.

_*6_*._*8_*. NOTES ON INPUT PARAMETERS

_*6_*._*8_*._*1_*. IGRID

The number of iterations required to reach a certain conver gence will increase approximately linearly with parameter IGRID. Since the time per iteration will go up as the cube of this parameter the amount of calculation will thus increase at about the fourth power of IGRID. For some applications a smaller value of IGRID will be adequate (eg 45 to 55), but it is recommended that the largest value pos sible be used (currently set at 65 by the parameter NGRID in the qdiffpar.h file).

_*6_*._*8_*._*2_*. PERFIL

A large % fill will provide a more detailed mapping of the molecular shape onto the lattice. On the other hand it will bring the dielectric boundary of the molecule closer to the lattice edge. This will cause larger errors arising from the boundary potential estimates, which are set to zero or approximated by coulombic/debye-huckel type functions using a uniform solvent dielectric. At higher salt concentra tions, and for weakly charged molecules the potentials at the boundary, and consequently, the error, will be smaller. Smaller percentages will increase the accuracy of the bound ary conditions, but result in a coarser representation of the molecule. For a single run, a reasonable compromise seems to be about 60% for looking at potentials outside a protein, and 90% for solvation energy calculations. The tradoff involved in scaling can at the cost of extra comput ing be avoided by FOCUSSING. Start with a small percentage, say 10-20%, using zero or debye-huckel boundary conditions, and then focus in to say 90%, in one (or two) stages, using focussing boundary conditions for the second (and third) runs.

It is not necessary for the molecule to lie completely within the grid although then the potential boundary condi tions must be generated by focussing. However when calcu lating solvation energies with box fills of > 100% remember that unexpected results may be obtained since parts of the surface, (and perhaps some charges) are not included in the grid (see notes on QDIFFXS regarding energy calculations).

In any quantitative calculation, it is advised that the largest scale possible be used, preferably above 2 grids/angstrom (for larger molecules this may not be possi ble without increasing IGRID, and thus NGRID in the *.h include file, above 65). Whatever the grid scale, calcula tions should be repeated at different scales to assess the size of lattice resolution errors.

_*6_*._*8_*._*3_*. OFFSET

This parameter is used to specify the grid point at which the centre of the molecule [pmid(3)] is placed, or con versely, what point of the molecule [oldmid(3)] is placed at the centre or the grid. The relationship between real space r(i) and grid g(i) coordinates for a grid size of igrid, with a scale of gpa grids/angstrom is as follows:

The centre of the grid is midg = (igrid+1)/2

oldmid(i) = pmid(i) - OFFSET(i)/gpa

g(i) = (r(i) - pmid(i))*gpa + midg + OFFSET(i)

r(i) = (g(i) - midg)/gpa + oldmid(3)

The scale, gpa, and oldmid are printed in the logfile.

Note that a certain error inevitably results from the map ping of the molecule onto the grid. By moving the molecule slightly (changing OFFSET between 0,0,0 and 1,1,1) and repeating the calculations it is possible to see whether the results are sensitive to the particular position on the grid, and if so, to improve the accuracy by averaging (This is related to rotational averaging, discussed in the J. Comp Chem paper of Gilson et al.). However using a larger scale is a more effective way of improving accuracy than averag ing.

To check on the positioning of the molecule within the grid as determined by PERFIL and OFFSET, the output dielectric bit map, FILE.eps, can be examined by taking slices using the utility program LOOKEPS.

_*6_*._*8_*._*4_*. EPSIN, EPSOUT

A value of epsout=1 corresponds to the molecule in vacuum, epsout=80 to the molecule in water, and epsout=epsin to a reference state where there is no dielectric boundary. Depending on the application runs with epsout equal to either of these values may be used to represent different states in a thermodynamic cycle. A value of epsin=1 corre sponds to a molecule with no polarizability- a state of affairs assumed in most molecular mechanics applications. Epsin=2 represents a molecule with only electronic polariz ability (ie assuming no re-orientation of fixed dipoles peptide bonds etc). A value of 2 is based on the experimen tally observed high frequency dielectric behavior of essen tially all organic materials. Epsin=4-6 represents a pro cess where some small reorganization of molecular dipoles occurs which is not represented explicitly (for example in modelling the effects of site directed mutageneis experiments, when the structure of the wild type, but not mutant protein is known). This value is based on a number of experimental and theoretical papers (eg M.K. Gilson and B. Honig, Biopolymers, 25:2097 (1986)) which indicate that a macroscopic material which consisted of similar dipole den sity, moment and flexibility as globular proteins would have a dielectric of 4-6. In modelling any process where a large reorientation of dipoles, or large conformational change occurs, ie upon folding or denaturation, then using a dielectric constant for the molecule would be inappropriate, and the change in conformation should be modelled explic itly.

EXRAD,RADPRB

The first parameter, EXRAD, in combination with the atomic van der Waals radii in the *.siz file, determines the regions of space, and hence the lattice points, which are inaccessible to solvent ions (ie where Do=0 in equation 1).

The second parameter, RADPRB, in combination with the atomic van der Waals radii in the *.siz file, determines the regions of space, and hence the lattice points, which are inaccessible to solvent molecules (water) (ie where Ei=EPSIN in equation 1).

Suggested values are EXRAD = 2.0 for sodium chloride, and RADPRB = 1.4-1.8 for water. To understand how these parame ters work, you should be familiar with the concepts of con tact and solvent accessible surface, as discussed by Lee and Richards, and by Mike Connolly.

For the purpose of QDIFF, a solvent ion is considered as a point charge, which can approach no closer than its ionic radius, EXRAD, to any atoms van der Waals surface. The ion excluded volume is thus bounded by the contact surface, which is the locus of the ion centre when in van der Waals contact with any accessible atom of the molecule. A zero value for EXRAD will just yield the van der Waals volume. A non zero value of EXRAD will thus introduce a Stern, or ion exclusion layer around the molecule where the solvent ion concentration will be zero, and whose dielectric constant is that of the solvent, EPSOUT.

For the purpose of QDIFF, any region of space that is acces sible to any part of a solvent (water) molecule is consid ered as having a dielectric of EPSOUT.

A value of zero for RADPRB used with a *.siz file containing the standard van der Waals radii values will assign any region of space not inside any atom's van der Waals sphere to the solvent. Note that this can include regions of the molecule not actually accessible to the solvent, such as the interstices where the spherical atoms pack together in the molecule's interior, which would erroneously be assigned the solvent dielectric. Thus this combination of parameters should generally not be used.

If a constant distance (say equal to the solvent molecule radius) is added to each radius in the *.siz file, and RAD PRB is set to 0, then any point lying within the contact surface (as defined by by Lee and Richards) will be assigned the molecule dielectric. Note that on the surface a shell of space that is in fact accessible to the solvent will be included as part of the molecule dielectric- This could be used for example to model a shell of immobilized water around a molecule that had a low dielectric constant.

If RADPRB is not zero, and standard van der Waals radii are used (default situation), then the solvent accessible volume will be generated. This is the region inaccessible to any part of a solvent probe sphere of radius RADPRB. Its surface as define by Lee and Richards, consists (Connolly) of convex portions composed of the atomic van der Waals surfaces when the probe sphere is in contact with one atom, and concave portions composed of the probe sphere surface, when the lat ter is in contact with two or more atoms.

_*6_*._*8_*._*5_*. IBCTYP

Unless focussing is being used, it is recommended that the coulombic boundary condition (ibctyp=4) be used. For focussing boundary conditions, the program reads in a poten tial map from a previous run, and compares the scale of the focussing map with that for the current run. If they are the same, it assume that this is a continuation of a previous run, and iteration of the potentials contained in the previ ous potential map is continued. If the scales are not the same, it checks to ensure that the new lattice lies com pletely within the old lattice before interpolating the boundary conditions.

_*6_*._*8_*._*6_*. IPER

Periodic boundary conditions can be applied in one or more of the x,y or z directions. When applied, the potential at each periodic lattice boundary point is iterated by equation 1 by supplying its missing neighbor(s) from the correspond ing point on the opposite edge of the lattice.

This can be used for example to model an infinite length of DNA. Assume that the helical axis of the DNA in the *.pdb file is aligned along the Z axis. The periodic boundary flags are set to false,false,true, and the percent fill of the box, PERFIL, is adjusted so that an integral number of turns just fill the box in the Z direction. Normal boundary conditions are applied to the X,Y boundaries.

By setting two, or three of the boundary flags to true, one can simulate 2 dimensional or 3 dimensional cubic lattices of molecules.

_*6_*._*8_*._*7_*. NLIT

The convergence behaviour of the finite difference procedure is reported in the log file as both the mean and maximum absolute change in potential at the grid points between suc cessive iterations. The latter is probably more important since it puts an upper bound on how much the potential is changing at the grid points. It is suggested that sufficient iterations be performed to give a final maximum change of less than 0.001 kT/e. The number of iterations per se is not important, as long as its sufficient to give the required convergence. The convergence behavior can also be judged from the slope of the semi-log plot of the mean and max changes given in the log file. NLIT is best determined by experience, since the convergence rate depends on several factors. Start with say 100 iterations, and then increase the number of iterations until sufficient. Note that a run can be restarted by using focussing boundary conditions with exactly the same PERFIL and OFFSET values (see note 5). Some guidelines are: The number of iterations needed will increase with grid size. It will decrease with decreasing PERFIL, since the potentials converge more rapidly in the solvent. It will decrease with increasing ionic strength. The number is fairly insensitive to the size and number of charges on the molecule.

Convergence will slow with use of the non-linear equation (NNIT not zero). It is suggested that for more rapid con vergence and greater numerical stability do 30% linear iter ations, then refine the solution with 70% non- linear itera tions.

_*6_*._*8_*._*8_*. ISPH

If an atomic charge does not lie exactly on a grid point, then it must somehow be distributed onto the grid points. If this flag is set false, the standard algorithm is used which distributes charge to nearest 8 grid points (quick and simple, see the Proteins paper of Klapper et al.). If this flag is set true, then an algorithm is used which gives a more spherically symmetric charge distribution, although the charge is now spread over a wider region of space. For cer tain cases this gives higher accuracy for potentials less than 3 grid units from a charge (see J.Comp. Chem paper), although this point has not been exhaustively explored.

_*7_*. PROGRAM QDIFFX

last revision: 22nd may 89

All arrays, including the maximum lattice dimension, are dimensioned in an include file: QDIFFpar4.h. source files specific to QDIFFX have the postfix 4.

_*7_*._*1_*. BRIEF DESCRIPTION

The program QDIFFX supersedes QDIFF except when you bomb out when using the non-linear Poisson-Boltzmann equation option. QDIFFX uses a superior algorithm for solving the finite difference matrix equation, plus a speeded up setup routine. It should run 20-30 times faster. Input and out put are similar to that of QDIFF. Parameter files generated to run QDIFF are compatible with QDIFFX, and apart from the increase in speed, changeover should be transparent to the user. However to take advantage of QDIFFX, users should take note of the differences between the programs, described below.

The assignment file for QDIFFX is ASSQDIFFX, as ASSQDIFF is for QDIFF. It handles the assignment of file names to logi cal unit numbers, and allows you to create an extended parameter file.

The differences between QDIFF and QDIFFX that affect the user refer mainly to the unformatted I/O options now avail able. Since charges and radii have to be assigned to a molecules atoms before the Poisson Boltzmann equation can be solved, the program has assignment routines. Since many runs are often done on the same protein/dna/whatever, it seemed sensible to be able to avoid the repetition of this by writing out a file with the coordinates, radii and charges. Secondly, since unformatted read/writes are much quicker, we store the information in an unformatted file, typically with extension

Further, since assignments of charge are made if a site potential file is to be written, one might want to avoid this too, so there is an option to store the frc-type pdb file in an unformatted file, so as to be read in rapidly on successive runs.

Finally, the site potential file itself can be written in unformatted form.

Unformatted versions of the files required for input do not initially exist- only formatted *.pdb files exist. The unformatted versions are generated by QDIFFX itself, by set ting certain output flags in the parameter file (lines 16 and 17). Old parameter files that do not contain these lines will by default not write unformatted files. Once the unformatted files exist, they can be linked to the appropri ate input units. On subsequent runs of QDIFFX, the program will automatically read the files as formatted/unformatted, depending on what type of file is linked.

_*7_*._*2_*. UNFORMATTED VERSIONS OF INPUT FILES for QDIFFX

 UNIT   NAME
 -------------------------------------------
 13     *.bb1 unformatted input atom coordi nates:
     generated from a previous run writing to unit 20
 15     *.bb2 unformatted list of atom coordi nates
     if site potential output is required: gener ated
     from a previous run writing to unit 21
 -------------------------------------------

_*7_*._*3_*. UNFORMATTED OUTPUT FILES for QDIFFX

 UNIT   NAME
 -------------------------------------------
 20     *.bb1 unformatted output atom coordinate
     file with radii and charge records added for faster,
     unformatted input on unit 13 in subsequent QDIFFX runs
 21     *.bb2 unformatted list of coordinates for site
     potentials1 generated from the formatted pdb file attached to 15
 22     *.bb2 unformatted list of potentials at coordinates
     read from the unformatted coordinate file attached to unit 15
 -------------------------------------------

_*7_*._*4_*. FORM OF UNFORMATTED OUTPUT FILES

The format of files written on units 21 and 22 are both:

inum,xo,chrgv,pot,ex,ey,ez

where inum= atom number

xo = three entry array with atom coordinates

chrgv=any charge associated with the atom

pot=the potential

ex,ey,ez = the field at the atom.

For the file on unit 21, pot, Ex, Ey and Ez are all set at zero.

The format for the unformatted pdb file is:

xo,rad,chrgv

where rad is the atom radius, other quantities as before.

_*7_*._*5_*. INPUT PARAMETER FILE FORMAT

line 9== NLIT ---integer number (> 3) of iterations with linear equation. If the number is replace by 0, or the letter A, (automatic) the optimal number of iterations required for convergence as judged from the spectral radius of the finite difference matrix will be used. Generally between 80-150 iterations will be sufficient. Some experimenta tion and an examination of the convergence behav ior recorded in the logfile will soon show the correct value to use.

line 10== NNIT ---integer number (> = 0) of non-linear iterations. Generally 100-200 iterations will be sufficient. It is possible that for highly charged molecules, and high salt concentrations that the non-linear part of QDIFFX will not con verge, in which case QDIFF should be used. In any case, when using the non-linear form, the conver gence behavior should be closely monitored.

line 16== logical flag, TRUE: write out an unformatted pdb file with radii, charges, on unit 20.

line 17== logical flag, TRUE: write out an unformatted pdb file for generating site potentials, on unit 21

line 18== upto three letters which switch on the calcu lation of certain electrostatic energy quantities by the program: G grid energy (now called total energy). C coulombic energy, S for induced sur face, or solvation energy (now called reaction field energy). See example 2, and the references for a detailed description of these quantities.

The grid energy is that obtained from the poten tial at each charge WITHIN the grid, multiplying by that charge, and summing over all such sites. This is not a meaningful number in itself, but can be useful in extracting solvation energies, salt effects etc.

The coulombic energy is calculated via coulombs law, and is the energy to bring the charges pre sent from infinity to their final resting place, in a dielectric of that specified as being inter nal. This includes all charges, not just those in the grid.

The third energy term is obtained by calculating the induced surface charge at each surface point within the box, then using these charges to calcu late the potential at every charge, not just those in the box. If the molecule lies entirely within the box, and there is no salt present, this corre sponds to the energy of taking the molecule from a solvent of dielectric equal to that of its inte rior, to that of the exterior. Results are not accurate if the grid spacings are less than one grid per angstrom (see QDIFFXS, which calculates this quantity more accurately).

Warning: The later two calculations require square roots. If the number of charges is large (e.g. if one is using a partial charge set), then one may be better off doing two runs for the sol vation energy, using the grid energy approach, which needs no square roots (see example 3).

Line 19== two logical flags, two integers. controls log file output characteristics. The first logical flag toggles the plot in the log file of the con vergence. The second logical flag toggles the lengthy potential output to the log file. This output is mainly for debugging and reproducibility checks. The two integer parameters deal with how the convergence criteria are calculated. The first decides at what iteration interval conver gence is checked, the second what fraction of grid points are used in assessing this (1=all, 2=half, 5=fifth etc). If in doubt, there are default val ues for these of 10 and 1. The idea behind these parameters is to allow convergence to be checked less frequently to reduce the amount of time spent.

_*7_*._*5_*._*1_*. EXAMPLE OF PARAMETER FILE USE

(a) Producing an unformatted pdb file (but with radii and charge information):

1) Run QDIFFX with the write flag set true for unformatted pdb (line 16) This results in a file being written to fortran unit 20.

2) Rename this file as appropriate, and link this to unit 13, the unit previously used to read the formatted atom coordinate information.

3) Rerun QDIFFX.

4)Notice the increase in speed

(b) Producing an unformatted frc-type pdb file:

1) Run QDIFFX with the write flag set for the frc pdb file (line 17) This results in a file being written to unit 21.

2) Rename this file and link it to unit 15, nor mally used to read in the formatted coordinates for the site potential file

3)Rerun QDIFFX. This results in an unformatted file being written to unit 22.

_*7_*._*5_*._*2_*. EXAMPLE PARAMETER FILE

1. 65        ! grid size

2. 70        ! percentage box fill

3. 2,2,5     ! x,y, and z offsets for center of box

4. 2,80      ! dielectric in and dielectric out

5. 0.145     ! salt concentration in mM

6. 2.0,1.4   ! salt exclusion radius, probe radius

7. 2         ! dipolar coulombic type of boundary con dition

8. f,f,t     ! periodic boundary conditions in z direction only

9. A         ! number of linear iterations, set here to automatic

10. 0        ! number of nonlinear iteration, not used by QDIFFX 

11. f,f      ! flags for phi-map output format

12. t        ! site potential output

13. f        ! no formatted modified output pdb file

14. Sample parameter file  ! label

15. f        ! spherical charge distribution algo rithm- not used in QDIFFX

16. t        ! unformatted pdb file output with radii, charges

17. t        ! site potential pdb file to be written in unformatted form

18. GC       ! grid and coulombic energies calculated

19. f,t,20,2 ! log file, and convergence date: Graph ouput off, potential output on, check for convergence every 20 iterations at every other grid point. 

_*8_*. PROGRAM QDIFFXS

last revision: 2nd may 90

_*8_*._*1_*. BRIEF DESCRIPTION

The program QDIFFXS supersedes QDIFFX. QDIFFXS has a better description of the dielectric boundary surface, and so pro duces more accurate estimates of reaction field and solva tion energies, particularly at scales of less than 2 grids/angstrom. Input is identical to QDIFFX. The major difference is in the output log file, where various energy terms are outputted. The calculation of these terms is con trolled by line 18 of the parameter file, as for QDIFFX, but their names have been changed to be more accurate and reflect previous usage in published papers: On line 18 upto four letters switch on the calculation of certain electro static energy quantities by the program: G total energy. C coulombic energy, S for induced surface, or reaction field energy. A analytical estimate of the grid energy.

The COULOMBIC ENERGY is calculated via coulombs law, and is the energy to bring the charges present from infinity to their final resting place, in a dielectric of that specified as being internal. This includes all charges, not just those in the grid.

The TOTAL ENERGY (formally called the grid energy) is that obtained from the potential at each charge WITHIN the grid, multiplying by that charge, and summing over all such sites. It contains all the real electrostatic energy terms, plus the self energy of the grid. This latter is not a meaning ful number in itself, but can be subtracted out to yield meaningful quantities such as solvation eneries etc.

The ANALYTICAL GRID ENERGY is an accurate estimate of the grid and coulombic energies. This term is thus the total energy that would be obtained from a QDIFFXS run with a uni form dielectric equal to that of the interior. Subtracting the analytical grid energy from the total energy thus gives an estimate of the reaction field energy, ie that due to the dielectric boundary. However a better estimate of the reac tion field energy is obtined directly, see below, so this value should not be used.

The REACTION FIELD ENERGY term (formally called the solva tion energy term) is obtained by calculating the induced surface charge at each surface point within the box, then using these charges to calculate the potential at every charge, not just those in the box. If the molecule lies entirely within the box, and there is no salt present, this corresponds to the energy of taking the molecule from a solvent of dielectric equal to that of the interior, to that of the exterior. Depending on the physical process, this may be the required solvation energy, but in general the solvation energy is obtained by taking the difference in reaction field energies between suitable final and initial reference states that define the required process- hence the name change. In QDIFFX the reaction field energy was not accurate if the scale was less than 2 grids/angstrom. QDIF FXS however by using a better representation of the dielec tric surface, and by repositioning the induced surface charge at its correct position before calculating the reac tion potential back at the charge, gives much more accurate estimates, even at 0.5 to 1 grids/angstrom. Thus solvation energy calculations should always be done via differences in reaction field energy, rather than via differences in total energy as was done before. NOTE however that the best way to assess the inaccuracies arising from the lattice representa tion is by repeating calculations at different scales or box fills, and observing the change in the calculated quantity.

_*9_*. WORKED EXAMPLES

_*9_*._*1_*. THE POTENTIAL AROUND CU,ZN SUPEROXIDE DISMUTASE

OBJECTIVE

produce a potential map showing the potential distribu tion in and around CU,ZN Superoxide Dismutase

REFERENCES

Klapper et al, Proteins, 1:47 (1986) for parameters of the run and pictures of potentials in a slice through the two active site coppers.

Getzoff, Tainer et al., Nature 306:284,287 (1983) for structural details.

FILES

contour.kt   list of contour values in kT, 1 per line, input to 
             CONTOUR2D. 

sod.siz      atomic radii, input to qdiff.

sod.crg      atomic charges, input to qdiff. 

sod.eps      dielectric map=protein 'shape'. 

sod.log      log file of qdiff run (finite difference calculation). 

sod.pdb      atomic coordinates, input to qdiff. 

sod.phi      output potential map, input to con tour2d. 

sod.prm      input parameters for qdiff. 

sodout.pdb   output atomic coordinates with assigned radii and charges,
             input for contour2d. sod.plt output plot file from 
             contour2d for HP75 plotter showing potentials in slice
             thru 2 coppers.

NOTES

The grid size in this example is set to 33 for speed. For greater accuracy 45-65 would be better. The protein fills 66% of box- a tradeoff between accurate boundary values (smaller % ), and detail in protein (larger %). For further accuracy, focussing can be used (see exam ple 5). 400 iterations were perform, using the linear PB equation, since no salt is present (even with salt, SOD is not highly charged, so we could probably still use the linear equation). For large grids or better convergence, iterations should be increased. A max change in potential of < 0.001kT/e is reasonable. Look ing at the charge file sod.crg, note that only 'full' charges have been assigned-for potentials far out in solution, dipole charges (or partial charges) make little difference. A complete set of partial charges (including polar H's) will require building of H atoms into sod.pdb using some modelling package.

_*9_*._*2_*. TEST CHARGES IN A SPHERE- Comparison with analytical solutions

OBJECTIVE

Determine the potential and field at each of 3 points in a 10A sphere of dielectric 2, surrounded by solvent with a dielectric of 80, containing 1-1 electrolyte , concentration 0.145M (physiological). Find the total electrostatic energy, and contributions from self energy (interaction of each charge with the potential it induces by polarizing solvent, a.k.a the image or solvation terms) and the interaction between charges. Results from the finite difference method (FDPB) are compared with analytical solutions from the Tanford Kirkwood equations (TK). Potentials are in kT/e (== 25.6mV, or 0.593 kcal/mole at 25oC). Fields are in kT/e/A, and energies are in kT.

REFERENCES

Gilson Sharp and Honig, J. Comp. Chem 9:327 (1987), for method and error assessments in the finite difference procedure for solving the Poisson-Boltzmann eqn. (FDPB method).

Gilson, M., Sharp, K., Honig, B. Calculating electro static interactions in bio-molecules: Method and error assessment. J. Computational Chem. 9, pp327-335. For a definition of the various electrostatic terms calcu lated here.

Tanford and Kirkwood, JACS 79:533 (1957), T.L. Hill, J.Phys.Chem. 60:253 (1956) for analytical series solu tions to the Poisson-Boltzmann equation for charges in a sphere.

OUTPUT FILES

Files with extension *.frc contain potentials, fields and sum of 1/2.(potential*charge) from runs as follows: For prefix sph#, # refers to which of the three charges was present. No number means all three charges were present. suffix _2_80 refers to dielectric 2 for the sphere, 80 for the solvent, and ionic strength of 0.145M. Suffix _2_2 refers to uniform dielectric of 2 inside and out, and no ionic strength. Analytical solu tions from the Tanford-Kirkwood equations for the 2/80 case are in files with suffix _anal:

sph1_2_2.frc
sph1_2_80.frc
sph1_anal.frc
sph2_2_2.frc
sph2_2_80.frc
sph2_anal.frc
sph3_2_2.frc
sph3_2_80.frc
sph3_anal.frc sph_2_2.frc
sph_2_80.frc
sph_anal.frc
sph_anal_slf.frc

NOTES

Grid size in this example is 45 for speed. For greater accuracy 45-65 would be better. Sphere actually fills 66% of box-(note this is large than the nominal value of 20% requested in the parameter file since this did not include the 'radius' of the 10A 'atom' See section 6.8.2). This should be born in mind when specifying the desired fraction of box to be filled. 66% is a tradeoff between accurate boundary values (smaller % ), and detail in representing the sphere boundary (larger %). For further accuracy, focussing can be used (exam ple 5). 800 iterations were perform, using the linear PB, since Tanford-Kirkwood solutions are for linear case only. For uniform dielectric 2 cases, with no ionic strength, 1500 iterations were used, since con vergence is slower with higher potentials and with no salt.

BACKGROUND TO ANALYSIS OF RESULTS

The image or reaction field: Consider a single charge in a low dielectric (Di) spherical cavity surrounded by high dielectric (Do) solvent. Its field polarizes the material in the cavity less than the surrounding sol vent. The mathematical statement of this is that there is a discontinuity in the normal component of the field at the sphere boundary, such that Di.Eo = Do.Eo. Alter natively, a surface charge density is induced at the sphere boundary. This surface charge produces a field (the reaction field), which when added to the field from the charge yields the total field. The potential at the charge due to the reaction field is the self, or image potential, Ps, and the change in 'solvation energy' of the charge is q.Ps/2.

Calculation of the reaction potential via the Finite Difference Procedure: When a FDPB calculation is per formed, the potential at the site of a charge is not singular, since mapping onto a lattice essentially dis tributes the point charge over some volume element of order of one little grid box. The potential at the charge is then Pd = Pg + Ps, where Pg is some grid potential term, whose values has no-intrinsic meaning, Fri Jan 20 10:57:43 1995 a2ps_delphi.rno Page 49 DelPhi V3.0 -49- but which is constant for a given charge location, grid size and boundary condition. The term Ps can be evalu ated by subtracting the potential, Pu, at the charge with Do=Di (when Ps==0 by definition, and P = Pg), from the potential, Pd, with Do not = Di. Thus:

Ps = Pd - Pu,

Also the solvation energy change is:

dGs = q.(Pd - Pu)/2.

Similarly, the field at the charge is Ed = Eg + Es, and since (ignoring the error due to boundary conditions, Eg ==0), a good approx. to the field at the charge can be obtained directly from grad(Pd). This is provided in the *.frc file.

Many Charges in a Sphere: When there is more one charge, the potential at charge qi is

Pid = Piis + Pig + sum(Pjis + Pjic), i.ne.j

where Piis is the potential at i due to qi's polariza tion of solvent, Pig is grid potential at qi, Pjis is the potential at qi due to qj's polarization of the solvent, Pjic is the direct Coulombic potential due to qj at qi [ie Pjic = qj/(Di.rji)]. For a uniform dielectric case (Di=Do), Pijs==0 for all i,j, and:

Piu = Pig + sum(Pjic), i.ne.j

Similarly for the fields:

Eid = Eis + sum(Ejis + Ejic), i.ne.j

where E's are vectorial quantities, and Eig==0. The total solvation energy is then

dGs = sum[qi.(Pid - Piu)/2.] = sumi[qi.(Piis + sumj{Pjis},i.ne.j)]/2.

ANALYSIS OF RESULTS

In the first run, dielectrics of 2 and 80 are used and all three charges are present. The total field at each charge can be obtained directly with all 3 charges pre sent (sph_2_80.frc), giving:

inner,outer dielectric: 2.0 80.0
ionic strength (M): 0.145

      coordinates    | charge| field (kT/e/Ang.)
 #| x    | y    | z  | (e)   | Ex     | Ey     | Ez
 -|------|------|----|-------|--------|--------|-----
 1|-3.00 | 0.   | 0. | 1.00  | 6.02   | -4.36  | 0.
 2| 3.00 | 0.   | 0. | 2.00  | -7.60  | -67.0  | 0.
 3| 3.00 | 3.00 | 0. | -2.00 | -4.42  | -69.2  | 0.
 -|------|------|----|-------|--------|--------|----- 

while the TK analytical results (in sph_anal.frc) are:

 coordinates      | charge| field (kT/e/Ang.)
 x    | y    | z  | (e)   | Ex     |     Ey | Ez
 -----|------|----|-------|--------|--------|-----
-3.00 | 0.   | 0. | 1.00  | 5.84   | -4.20  | 0. 
 3.00 | 0.   | 0. | 2.00  | -7.55  | -68.0  | 0.
 3.00 | 3.00 | 0. | -2.00 | -4.48  | -70.2  | 0. 
 -----|------|----|-------|--------|--------|----- 

with around 5% error or less. The total solvation energy is obtained from summing q.phi/2. for the three charges for the D=2/80 (last line in sph_2_80.frc) and subtracting the same sum for the D=2/2 case (sph_2_2.frc):

inner,outer dielectric: 2.0 80.0 ionic strength (M): 0.145

total energy = 2869.8 kt

inner,outer dielectric: 2.0 2.0 ionic strength (M): 0.0

total energy = 2892.0 kt

giving a difference of -22.2 kT. The analytic value for D=2/80 from sph_anal.frc is -386.6543, while from Coulombic potentials for D=2/2 it is -364.11, yielding a difference of -22.5 kT, for a 1% error.

Thus fields at all the charges can be calculated simul taneously with all charges present, with one run. The total solvation energy can be calculated with all charges present, with two runs, the first of these, with Do = Di, being a reference state run.

To calculate the individual pairwise interactions, three sets of runs must be done, each with only one charge present in turn (file prefixes sph1, sph2, sph3), with D=2/80 (suffix _2_80). Analytical results are from *_anal.frc files.

For charge 1, FDPB calculated fields (sph1_2_80.frc) and analytical fields are:

        FDPB         ||     TK analytical
 Ex   | Ey     | Ez  || Ex    | Ey    | Ez 
------|--------|-----||-------|-------|------ 
 1.01 | -0.002 | 0.  ||  0.988| 0.    | 0. 
-7.36 | -0.0007| 0.  || -7.3  | 0.    | 0. 
-4.93 | -2.76  | 0.  || -4.95 | -2.67 | 0.
------|--------|-----||-------|-------|------ 

for charge 2:

         FDPB          ||     TK analytical
 Ex    | Ey       | Ez || Ex    | Ey    | Ez
 ------|----------|----||-------|-------|------
  14.7 | -0.0014  | 0. || 14.6  | 0.    | 0.
 -2.01 | -0.004   | 0. || -1.98 | 0.    | 0.
 -1.99 | -69.0    | 0. || -1.95 | -69.9 | 0.
 ------|----------|----||-------|-------|------

for charge 3:

         FDPB        ||     TK analytical
 Ex    | Ey     | Ez || Ex    | Ey     | Ez
 ------|--------|----||-------|--------|----
 -9.76 |  -4.38 | 0. || -9.80 |  -4.20 | 0.
  1.79 | -67.0  | 0. ||  1.75 | -68.0  | 0.
  2.50 |   2.48 | 0. ||  2.42 |   2.42 | 0.
 ------|--------|----||-------|--------|---- 

Interaction potentials between different charges are also obtained from these runs:

charge  |  potential (kT/e)
 pair   | FDPB     | TK
 -------|----------|---------
 1-3    |   16.3   |   16.4
 2-1    |   42.7   |   42.5
 2-3    | -129.    | -126.
 -------|----------|--------- 

For individual self energy or image interactions of each charge it is necessary to do 3 pairs of runs, each pair with only one charge present, with D=2/80 (suffix _2_80.frc) and D=2/2 (suffix _2_2.frc), and subtract the latter potential at the charge itself from the former (TK results are from sph_anal_slf.frc):

        |    Dielectric     | Energy difference
 Charge | D=2/80  | D=2/2   | FDPB    | TK
 -------|---------|---------|---------|---------
 1      |   694.3 |   723.8 |  -29.52 | -30.4
 2      |  1388.5 |  1447.6 |  -59.1  | -60.9
 3      | -1381.8 | -1447.6 |   65.8  | 67.4
 -------|---------|---------|---------|--------- 

_*9_*._*3_*. SOLVATION ENERGY OF THE ACETATE ION

OBJECTIVE

To calculate the electrostatic contribution to the sol vation energy of the acetate ion in water. The acetate molecule is assumed to have a dielectric of 2, account ing for electronic polarizability, and the water to have dielectric 80 (actually 78.6).

REFERENCES

Gilson, Sharp and Honig, J. Comp. Chem. 9:327 (1988). For general method.

Rashin and Namboodri, J. Phys. Chem., 91:6000 (1987), and Gilson and Honig, Proteins, 3:32 (1988) for solva tion energy calculations- the values for acetate are taken from the last reference.

Kang, Nemethy and Scheraga, J.Phys.Chem. 91:4118 (1987). Wolfenden, Anderson and Cullis, Biochem. 20:849 (1981), for experimental solvation energies.

FILES

 ace.crg      charge file with CHARMM partial charges for acetate.
 ace.pdb      input atom coordinates for acetate built by CHARMM.
 ace.prm      parameter file for qdiff. ace.siz radius file, with charmm radii.
 ace_2_1.frc  reference run with vacuum dielectric outside (D=2/1).
 ace_2_80.frc solvation energy run, with solvent (dielectric 80 present). (D=2/80).
 ace_2_80.log log file.
 ace_2_1.log   "    ".
 aceout.pdb   output atom coordinate file including charges and radii.

NOTES

The basic idea here is to calculate the sum of (charge*potential)/2. at all the charged atoms in acetate. In the case with solvent present (two dielec tric, D=2/80), this energy contains four terms: the energy of interaction of each charge with all the oth ers, the interaction of each charge with the reaction field induced by its own field, the interaction with the reaction field of other charges, and the self energy of the grid. For the vacuum dielectric case D=2/1, only the grid self energy and the direct charge charge interaction are present. Subtracting the energy in the D=2/1 run from the energy in the D=2/80 run, we are left with only the charge/solvent interaction energy, or solvent energy.

RESULTS

              Total energy (kT)
 Run          from *.frc or *.log file
 -----------------------------
 D=2/80       1673.44
 D=2/1        1793.24
 Difference   -119.8    == -71.0 kcal/mol
 ----------------------------- 

Alternatively, a more accurate solvation energy may be obtained using QDIFFXS, in which case the values for the 'corrected reaction field energy', written in the log file, are used:

                   Reaction field energy (kT)
 Run from          *.log file
 -----------------------------
 D=2/80            -60.60
 D=2/1              58.11
 Difference       -118.7     == -70.3 kcal/mol
 -----------------------------

Note that in this example, the first method is almost as accurate as the second, since with a small molecule the scale is > 2 grids/angstrom but in general this will not be the case.

Experimental values range from -79 to -87 kcal/mole. Previously calculated value from Gilson and Honig was -70 kcal/mole.

_*9_*._*4_*. PK SHIFTS IN SUBTILISIN

OBJECTIVE

Calculate the expected change in pK of the active site histidine in subtilisin due to site directed mutagene sis of asp99->ser and glu156->ser at two different ionic strengths.

REFERENCES

Gilson and Honig, Nature 330:84 (1987)

Sternberg et al, Fersht, Nature 330:86 (1987) for experimental data and similar theoretical calculations (but not including ionic strength).

FILES

 sbt.crg         atom charges
 sbt.pdb         atom coordinates from brookhaven data bank
 sbt.siz         atom radii
 sbt1.prm        run at 0.1M ionic strength
 sbt1.log                 "
 sbt1_99_156.frc          "
 sbt2.prm        run at 0.01M ionic strength
 sbt2.log                 "
 sbt2_99_156.frc          "
 sbt_99_156.pdb  atomic coordinates where potentials outputted

NOTES

The change in pK of a group due to to the electrostatic potential of another group is the change in work neces sary to bring a proton from solution to that group, ie dpK = phi/2.303, where phi is the potential at one group due to the other, in units of kT/e. Since reciprocity holds, the charge can be put on either group, and the potential calculated at the other group. Thus for this case, it is more efficient to put the charge on the histidine 64, and get the potential at all other charged groups simultaneously. Note that where site-directed mutagenesis being performed, the change in bulk of the side chain could also have an effect on the potential at the titrating group, and so runs should be done with both side chains present, and the difference in potential taken. Also this procedure only yields CHANGES in pK, since the intrinsic pK of the group is not known (unless measured or calculated independently). The intrinsic pK being the pK of the group with all the other titratable groups neutral, ie the difference in intrinsic pK of an isolated side chain and one in a folded protein depends on the dif ference in solvation energy of the charged specie when incorporated into the protein.

The brookhaven data bank file for subtilisin contains oxygen coordinates for several water oxygens. In this example these waters are included as part of the molecule (ie with dielectric 2). What the dielectric behavior of crystallographically bound waters is, and whether they shold be included as part of the molecule is an open question.

In this example the centre of the protein is not set at the grid centre, but is offset 0,5,0.5,0.5, angstroms. This is merely to bring attention to the fact that when doing quantitative work, it is advisable to check that the results are not sensitive to the precise position of the molecule with respect to the grid. This applies particularly when conclusions are based on calculations from a few potential values at selected coordinates, such as for the pK changes in this example, and with the solvation eenrgies in example 3. If sensitivity to grid position is found results from runs with different positions of the molecule upon the grid should be aver aged (eg see discussion of rotational averaging in Gilson,Sharp and Honig, (1988) J.Comp Chem, 9:327).

RESULTS

The data is taken from sbt1_99_156.frc (0.1M) and sbt2_99_156.frc (0.01M) and average potentials at the asp 99 and glu 156 carboxyl oxygens computed. This gives the histidine pK changes when these groups are mutated to serines:

       |        |                   pK Change
 ------|--------|---------------------------------
       |Ionic   |Experimental  |Gilson |calculated
 Case  |Strength|(Fersht et al)|& Honig|(example 4)
 ------|--------|--------------|-------|----------
 Asp99 |(0.1M)  |0.23-0.29     |0.18   | 0.17
       |(0.01M) |0.42          |0.29   | 0.25
 ------|--------|--------------|-------|---------- 
 Glu156|(0.1M)  |0.25          |0.20   | .30
       |(0.01M) |0.42          |0.37   | 0.38
 ------|--------|--------------|-------|----------

Differences from calculations of Gilson & Honig arise since focussing boundary conditions and rotational averaging were not used here.

_*9_*._*5_*. ELECTROSTATIC POTENTIALS AND ION CONCENTRATIONS AROUND B-DNA

OBJECTIVE

To calculate the electrostatic potential and mobile ion charge density distributions around DNA, using the non linear PB equation, periodic boundary conditions along the axis of DNA, and focussing.

REFERENCES

Gilson, Sharp and Honig, J. Comp. Chem. 9:327 (1988), for general method.

Jayaram, Sharp and Honig, Biopolymers, 28:975 (1988), for non-linear PB, periodic boundary conditions and application to DNA..

FILES

contour.cnc     contour values for concentration profile.

contour.kt      contour values for potential pro file.

dna.pdb         input atom coordinates.

dna.siz         atom radii.

dna_out.log     run with coarse grid to get bound ary conditions.

dna_out.phi                    ".

dna_out.prm                    ".

dna_in.log      run with fine grid to get poten tials.

dna_in.phi                     ".

dna_in.prm                     ".

dna_in_conc.log run with fine grid to get charge concentrations.

dna_in_conc.phi                ".

dna_in_conc.prm                ".

nuca_full.crg   full atom charges (-0.5 on each PO4 oxygen only). 

NOTES

The atom coordinate file contains 3 turns of poly A, poly T DNA in the B form. The first run ( file suffix _out) is done with a coarse grid (of 25), including all 3 turns, the helix axis aligned along the z- direction, with 100% of the box filled in this direction. This puts the x,y edges of the box about 40A from the dna, where the coulombic boundary condition should be fairly good, even for highly charged DNA. Boundary conditions on the Z ends are periodic, simulating an infinite piece of DNA. These potentials are only used to produce more accurate boundary conditions for a finer run.

For the inner runs (file suffix _in), a finer grid is used (45), and one turn included in the box only (per cent fill 300%). Boundary conditions are taken from the coarse run ('focussing', option #3). Periodic boundary conditions are still applied to Z edge, although this is not strictly necessary since the focussing boundary conditions also impose Z periodic ity. Note that linear PB iterations are then followed by non-linear iterations-this provides good convergence for less iterations than just non-linear iterations alone.

In the first fine run the potentials are outputted, on the second, the concentration flag is set to true (t), and net ion concentrations (ie. sum of positive and negative charges) are outputted (file suffix _conc). In the latter case, the concentrations are 0 within the molecule (the region occupied by the van de Waals spheres plus the ion exclusion radius, 2A). When con tour output of this concentration map is produced, closely spaced contour lines appear at the ion exclusion surface due to the abrupt drop in concentra tion to 0. Note that the ion exclusion surface lies outside the the solvent acessible surface depicted on the plots.

PLOTS

Plot files in dna_pot.plt and dna_conc.plt for the HP75 plotter show potential and charge density contours plotted on top of the solvent accessible surface of DNA. Plots are produced by CONTOUR2D. The plot range is set at 170., the probe radius at 2A. For the poten tials, contours are at -0.5,-1.,-2.,-3.,-4 kT/e. For the charge densities, contours are at -0.5,-1,-2,-3 moles/litre.

_*9_*._*6_*. INTRINSIC PK CALCULATION IN TRYPSIN

OBJECTIVE

Calculate the the intrinsic pK of the active site his tidine of the catalytic triad in rat trypsin, his 40 (res # 57 in bovine numbering). Incorporates desolva tion effects, protein dipoles and charges.

REFERENCES

Soman, Yang, Honig and Fletterick, Biochem. 28:9918 (1989) for calculations, Sprang et al Science 237:905 (1987) for structure.

FILES

 tryp.prm           parameters
 tryp.out           summary of key numbers 
 def.siz            radii for whole protein
 his40.siz          radii for his40 in reference state (alone in solution)
 his40.crg          charges for his+40
 his40_neut.crg     charges for unprotonated his40
 tryp_all.crg       dipoles and charges for protein
 tryp_full.crg      formal charges for protein
 rat_ptb.pdb        protein coordinates
 rat_ptb_his40.pdb  his40 coords for input charges
 rat_ptb_his40n.pdb his40 coords for output potentials
 tryp0_his40.frc    his+40 in reference state 
 tryp0_his40n.frc   neutral his40 in reference state
 tryp_his40.frc     his+40 in chargeless, dipoless protein
 tryp_his40n.frc    neutral his40 in chargeless, dipoless protein
 tryp_all.frc       potential at his+40 from all pro tein dipoles, charges
 tryp_alln.frc      potential at neut. his40 from all protein dipoles, charges
 tryp_full.frc      potential at his+40 from all pro tein charges
 tryp_fulln.frc     potential at neut. his40 from all protein charges

NOTES

The pK of a group is given by dGdeprotonate/1.366 (dG in kcal, or dG/2303 in kT). With respect to the refer ence state pKa, for the isolated histidine in water (pK6.8), the change in dG on putting the histidine into the folded protein is ddGsolv + dGdip + dGcharge, where ddGsolv is the difference in solvation energy of the charged v. uncharged form of the histidine in position in the protein, wrt. the reference, isolated state. dGdip is the difference in energies of the charged and neutral histidines arising from the protein dipole potentials. Similarly, dGcharge comes from the poten tials of the other formal charged groups, including of course the other titratable groups. Note that dGcharge depends on the protonation state of all the other groups, and thus is pH dependent. To calculate ddG, the solvation energy of the histidine charges is calcu lated for the isolated histidine G1 (tryp0_his40), ie assigning radii and charges only to his40; for the his tidine in the protein G2 (tryp_his40), ie. assigning charges to his 40 only, and radii to both his and the protein. The same calculation is done on the neutral (unprotonated) form of his40 to give G3 (tryp0_his40n) and G4 (tryp_his40n). Then ddGsolv = (G2-G1) - (G4-G3). Energies G2-G1 and G4-G3 are examples of self energy differences, and are calculated from the total energy, sum(q.phi/2), where the sum is over all the histidine atomic charges, and the potentials phi are those produced by the very same charges.

Note that the scale and position of the charges must be exactly the same for each set of runs for these differ ences to be meaningful- remember that the 'artifactual' grid energy must be subtracted out. One way to hold the scale constant is to place a pair of dummy atoms of zero radius and charge in all the pdb files to fix the min and max extents which control the scaling and posi tioning of the molecule (see the first two entries in the pdb files rat_ptb.pdb or rat_ptb_his40.pdb).

When QDIFFXS becomes available, instead of using sum(q.phi/2), the program can calculate the reaction field energy as sum(q.phix/2) over the histidine charges, where phix is now the reaction potential, arising from the induced surface charge at the dielec tric boundary of the molecule. In this case the scales need not be the same and one can maximize the percent box fill to achieve the best accuracy for each molecule

dGcharge is obtained (assuming usual protonation states for all other groups at pH 7 (glu, asp =-1) (arg,lys = +1) (his=0.5)), by assigning charges to all ionizable groups except his 40, radii to all protein atoms, and evaluating the energy of the his+ charges G5 (tryp_all) v. the neutral his charges G6 (tryp_alln) in the resulting potential. dGcharge = G5-G6. Since G5,G6 are charge-charge interaction energies they are given by sum(Qhis.Phi) where Phi is the potential arising from all the other protein charges. Similarly, dGdip is obtained by an identical calculation where only the protein dipoles are charged. In practice, since charge sets for neutral forms of ionizable groups may not be available, it is easier to calulate the sum of dGcharge and dGdip by assigning all charges to the protein atoms, and to get dGdip if required, by subtracting dGcharge. Thus in these runs (tryp_all, tyrp_full), charges are assigned to the protein for the input, using rat_ptb.pdb, and potentials are read out at the histidine using rat_ptb_his40n.pdb.

RESULTS

For ddGsolv, 1/2sum q.phi from either the log file (as total energy) or from the bottom of the frc file gives:

 file                energy (kT)
 -------------------------------
 tryp0_his40         -55.326
 tryp0_his40n         37.420
 tryp_his40           74.870
 tryp_his40n         -39.315
 -------------------------------
 ddGsolv              17.64 kT = 10.58 kcal/mol

With QDIFFXS, the reaction field energies from the log files give:

 file                energy (kT)
 -------------------------------
 tryp0_his40          31.39
 tryp0_his40n         -2.68
 tryp_his40          -12.53
 tryp_his40n           0.56
 -------------------------------
 ddGsolv              16.83 kT = 9.98 kcal/mol

For dGcharge and dGcharge+dGdip, 1/2sum q.phi from the bottom of the frc file (x2) gives:

 file                energy (kT)
 -------------------------------
 tryp_full           -26.98
 tryp_fulln           -8.48
 tryp_all            -26.70
 tryp_alln            -8.84
 -------------------------------
 dGcharge = -11.1 dGdip+crg = -10.59 dGdip = 0.38 (kcal/mole) 

dpK = (10.59-9.98)/1.366 = 0.45, pK = 7.25. Note the sign of the pH change, the charged form of the histi dine is stabilized over the neutral form by the pro tein, hence the pK is raised.

_*9_*._*7_*. PACKING ENERGIES FOR HEMERYTHRIN HELICES

OBJECTIVE

Calculate the electrostatic energy to assemble the four helices of hemerythrin, assuming that the isolated helices have the same conformation as in the final pro tein structure. Includes both the helix-helix charge interactions, and the desolvation energy of each helix. The goal is to calculate the contribution from the charges that produce the helix 'dipole', ie. the pep tide backbone atoms.

REFERENCES

Gilson and Honig, PNAS 86:1524 (1989) for Stenkamp et al, Acta Cryst. Sect B 39:697 (1983), S.Sheriff, W.A.Hendrickson, J.L.Smith, J.Mol.Biol. 197:273 (1987) for structures.

FILES

suffixes a, b, c, d refer to sections of the protein containing helices A(19-37) ie res 1-38, B(41-64) res 39-67, C(70-85) res 68-88, and D(91-103) res 89-end.

 helix.prm         parameters
 def.siz           radii
 helix.crg         charges on all peptide backbone atoms
 helixa/b/c/d.crg  charges on backbone of one helix only
 helix4.pdb        whole protein coordinates
 helixa/b/c/d.pdb  coords. of one helix containing fragment
 helixt.pdb        whole protein coords, used to read out potentials
 helixa/b/c/d.frc  potentials at all backbone atoms due to 1 helix
 helixa/b/c/d0.frc potentials at helix backbone due to its own charges

NOTES

The work of assembling the four helices in hemerythrin due to the peptide backbone charges consists of two type of interactions: 1) the desolvation of each helice as it is placed against the low dielectric region formed by the other 3 helices. 2) the helix - helix charge interactions. nb. Binding energy calculations are done in an identical way. The disassembled state is taken to be each isolated fragment of the protein containing one helix, with the same conformation as in the assembled protein. PDB files are prepared for the whole protein helix4.pdb (from 2mhr.pdb in the data bank which contains hydrogens, and is more convenient than the file 1hmq.pdb used by Gilson and Honig). This file is split into four files helixa/b/c/d.pdb between residues 38-39, 67-69 and 88-89. In four separate sets of calculations, charges are assigned to the peptide backbone atoms of one helix alone (19-37, 41-64, 70-85 or 91-103), using CHARMM potentials (Brooks et al, J. Comp. Chem. 4:187 (1983)), ie C: 0.5, O: -0.5, H: 0.25, N: -0.35, Calpha: 0.1. The potentials are calculated back at the charges of the same helix, for the isolated state (helix0a/b/c/d), and next to the three other chargeless helices (helixa/b/c/d). The difference in total energy, sum(q.phi/2) for each helix in the assem bled protein with respect to the isolated helix in water gives the desolvation cost for that helix. Note that the scale and position of the charges must be exactly the same for each set of runs for these differ ences to be meaningful- remember that the 'artifactual' grid energy must be subtracted out. One way to hold the scale constant is to place a pair of dummy atoms of zero radius and charge in all the pdb files to fix the min and max extents which control the scaling and posi tioning of the molecule (see the first two entries in the pdb files.

When QDIFFXS becomes available, instead of using sum(q.phi/2), the program can calculate the reaction field energy as sum(q.phix/2) over the helix charges, where phix is now the reaction potential, arising from the induced surface charge at the dielectric boundary of the molecule. In this case the scales need not be the same and one can maximize the percent box fill to achieve the best accuracy for each molecule

For each charged helix in the assembled protein the potentials at the peptide backbone atoms of each of the other three helices is determined, and the helix-helix charge interaction energy, sum (Qx.PHIy) etc calculated for each helix pair x/y = a/b, a/c, a/d, b/c, b/d, c/d.

RESULTS

For the helix desolvation energies, the total energies (from the log file) give:

                  energy (kT)
 file       total            desolvation 
 --------------------------------------- 
 helixa     1582.16              6.25
 helixa0    1575.91
 helixb     2177.09             10.25
 helixb0    2166.84
 helixc     1153.79              5.23
 helixc0    1148.56
 helixd      918.20              8.48
 helixd0     909.72
 ---------------------------------------

Or from QDIFFXS, using the reaction energies from the log file:

                  energy (kT)
 file       total            desolvation
 ---------------------------------------
 helixa      -30.68              5.27
 helixa0     -35.95
 helixb      -29.71              6.44
 helixb0     -36.15
 helixc      -25.88              3.88
 helixc0     -29.76
 helixd      -24.07              7.60
 helixd0     -31.67
 ---------------------------------------

For the helix-helix charge interactions, the sums of q.phi are calculated from the *.frc files, helix a charged: helixa.frc, helix b charged: helixb.frc, helix c charged: helixc.frc, helix d charged: helixd.frc. For the potential at helix a: summing over lines 333-682, for helix b: summing over lines 720-1081, helix c: sum ming over lines 1176-1463, and helixd: summing over lines1525-1743.

The results of the desolvation and interaction energies can be summarized in a table (energies in kcal/mole)

 HELIX                     target
 source   |    a    |    b    |     c   |    d    |
 --------------------------------------------------
  a       |  3.12   | -0.077  |   0.066 | -0.273  |
 --------------------------------------------------
  b       | -0.077  |  3.82   |  -0.124 |  0.061  |
 -------------------------------------------------- 
  c       |  0.066  | -0.125  |   2.30  | -0.179  |
 --------------------------------------------------
  d       | -0.273  |  0.061  | -0.178  |  4.51   |
 -------------------------------------------------- 

The diagonal terms are the desolvation energies for each helix. The off diagonal terma are helix helix charge interactions. The table should be symmetric, due to reciprocity, ie. it should make no difference whether helix A is charged and sum(q.phi) evaluated at helix B, or vice versa. The slight differences arise from numerical errors in the algorithm, are negligable. Note that the helix-helix interactions alternate in sign, parallel helices giving positive values, and antiparallel ones negative values. There are twice as many favourable interactions as unfavourable, since hemerythrin is folded to give the optimum, 'battery pack' arrangement +- -+ of helix orientations, with the unfavourable interac tions occurring across the diagonal where it is weak est. The total interaction is -0.526 kcal/mole, and the total desolvation is 13.75 kcal/mole, with the total packing energy of 13.22 kcal/mole. Thus inspite of the best helix 'dipole' packing, the assembly is electrostatically destabilized by about 3.3 kcal/helix. The figures from Gilson and Honig are 11.8 kcal/moel for the desolvation, and -0.32 kcal/mole for the helix helix charges interactions.

_*1_*0_*. HINTS

Caveat Emptor...

When starting off, or running a new molecule, always use the option to output a PDB file with the assigned radii and charges in it. This will help check that the assignments were made as you intended them.

Always check the number of charges assigned, net positive, negative and total charge, as printed in the log file. Espe cially check 'problem' residues such as histidines. Several charge states are included in some of the *.crg files, but these are by no means gospel.

Check the alignment of fields in the PDB file against those used by DelPhi (section 6.5.4). Pay attention especially to: the atom name, special characters (such as ' or * in nucleic acid naming conventions), hydrogen names (eg H12 v. H2 v. HC1 etc). The program is case INsensitive. As long as the name lies totally within the field, right or left justi fication is irrelevent. However included blanks will cause problems.

Check the dependence of your results on all numerical param eters, especially the grid scale, or box fill, and the num ber of non-linear iterations, the orientation and position of the molecule on the grid. Variation of more than a few percent indicates poor choice of parameters inappropriate application of the method, or limitations of the method due to the finite size and resolution of the lattice. Try using averaging and focussing to extend the latter.

Check the dielectric map visually, especially when running a new molecule. Use LOOKEPS for a crude display on a line ter minal, or reformat the *.eps file as a *.phi file using REFORM, and display it along with the PDB file, using what ever display method (INSIGHT/CONTOUR, QUANTA/brickmaps, CON TOUR2D etc) is available (See also script file: surface).

Errors may arise in the molecular surface generation, and hence in the dielectric boundary, when a non-zero probe radius is used, and the distance from the molecule surface to the grid boundary is less than this radius. The symptoms will be a weird apperance of the dielectric map (see above), whereby the molecule's dielectric surface is attached to the grid boundary by a 'neck', instead of lying completely within the grid. The solution is to use a smaller box fill.

Remember that the units are proton charge (e), for charge, angstroms (A) for distance, moles/litre for ionic strength. The dielectric is usually defined as a ratio relative to the permittivity of vacuum, and so is unitless. However an inspection of Coulomb's law indicates that it really has units of energy x distance / charge squared. The numerical value is about 1/332 Angstroms*kcal/mole per proton charge squared. Ie. working electrostatic calculations in these units, and multiplying the answer by 332 will give poten tials in kcal/mole/charge. Similarly, for units of kT/e at 25 Celsius, the 'natural' units for chemistry in the labora tory, the factor is 561. DelPhi and most of the examples use units of kT/e. Note however this only works at 25 Cel sius. To convert to other temperature, use 561x(300/Temp), and remember that the dielectric constant is a function of temperature.

Electrostatic energies are FREE ENERGIES, since the dielec tric response includes the entropy of organization in response to a field, ie. the product of charge and potential is the isothermal work to bring that charge from infinite to that point.