|
The analysis of the docking results are performed after the
semi-flexible simulated annealing step and after the
explicit solvent refinement. A number of
standard CNS analysis scripts are automatically run by HADDOCK and the
results are placed in the analysis directory in runX/structures/it1 and
runX/structures/it1/water, respectively. Some of the generated output files are parsed
automatically by HADDOCK to generate for example violations statistics (see
violation analysis). Another important step consists in a
manual analysis of the generated structures and their clusters.
This is the critical step for the classification of the docking solutions and the
identification of the best(s) cluster(s).
Topics:
Standard analysis performed by HADDOCK
The following CNS analysis scripts are automatically run by HADDOCK:
get_average.inp: This script will calculate an average structure by
superimposing the structures on the backbone atoms of the flexible interface
defined in the run.cns parameter file.
Note1: If less than three atoms are selected when using the defined semi-flexible
segments, then the entire backbone will used. If still less than three atoms are selected,
then all heavy atoms will be used for the fitting. This makes sure that at least three atoms
are selected for any molecule, including small ligands.
The structures are fitted onto the average structures and written to disk in the analysis
directory. Various average rmsds calculated over the ensemble of structure and rmsds from
the average for each structure are output to file.
Output files:
- fileroot_ave.pdb: average structure
- filerootfit_1.pdb, filerootfit_2.pdb, ...: superimposed structures
Note2: The numbering of the superimposed PDB files does not correspond
with the numbering in the it1 or water directories,
but to the position of the structure in the sorted file.list
file, i.e. structure number 1 in the analysis directory
is the first (best) in file.list and structure number 50 is at
position 50 in that file.
- rmsave.disp: contains the RMSD from the average structure for each structure and
the average values over the ensemble. For this, the structures are superimposed on the backbone
atoms of the flexible interface (see Note1 above) and the following average RMSD
values from the average structure are calculated and written to file:
- RMSD backbone interface of all molecules
- RMSD complete backbone of all molecules
- backbone interface of molecule A
- backbone interface of molecule B
- backbone interface of molecule C
- ...
In addition to the average RMSD calculated from the entire ensemble, the corresponding single structure
RMSD values are listed in rmsave.disp
- rmsdseq.disp: per residue RMSDs (backbone heavy atoms (N,CA,C), extended backbone heavy atoms
(N,CA,CB,C,O), side-chain heavy atoms and all heavy atoms.
- fileroot-reduced.crd: trajectory file containing only the coordinates of the flexible interface
backbone atoms (see Note1 above); this reduced file is used to calculate the
pairwise RMSD matrix and thereby speed up the calculations.
rmsd.inp: This script calculates the pairwise RMSD matrix over all
structures. For this the structures are first superimposed on the flexible interface backbone atoms of molecule A
and the RMSD is calculated on the flexible interface backbone atoms of the other molecules
(see Note1 above). This RMSD can be termed: "ligand interface RMSD".
Note3: Compared to the previous HADDOCK1.3 version, the ligand interface RMSD values calculated
are much higher and consequently a larger cutoff should be used for the
clustering (e.g. 7.5A).
Note4: Running this script is somewhat time consuming since a larger number of structure comparisons
are required (N*(N-1) where N is the number of structures analyzed). However, the reduced
trajectory format significantly reduces the computing time compared to earlier versions.
Output files:
energy.inp: this script performs the analysis of bonded and non-bonded energies
per structure and averaged over the ensemble. Various energy terms are calculated:
- over the entire complex
- over the flexible interface only (as defined in the run.cns parameter file)
- only the intermolecular energies (vdw and elec)
In addition, the buried surface area is also reported. The buried surface area is calculated by taking the difference
between the sum of the solvent accessible surface area for each molecule separately and the solvent accessible area of
the complex. The solvent accessible area is calculated using a 1.4A water probe radius and an accuracy of 0.075A (in
case of memory problems for very large complexes increase this value, e.g. 0.1 or higher).
Output files:
- energies.disp: this file contains the various energy terms per structure and averaged over the ensemble
- Complex statistics: Etot, Ebond, Eangle, Eimpr, Edihed, Evdw, Eelec
- Flexible interface statistics: Etot, Evdw, Eelec
- Intermolecular statistics: Etot, Evdw, Eelec
- Buried surface area
edesolv.inp: this script performs the analysis desolvation energy
per structure and averaged over the ensemble. The desolvation energy is calculated using the empirical
atomic solvation parameters from Fernandez-Recio et al. JMB 335:843 (2004). These are
defined in the def_solv_param.cns CNS script in the protocols directory.
Output files:
- edesolv.disp: this file contains the desolvation energy per structure and averaged over the ensemble
ene-residue.inp: this script performs a per-residue intermolecular
interaction energy analysis for all residues which make intermolecular contacts.
A residue is selected for analysis if it makes at least one contact within 5A within the ensemble analysed.
Van der Waals, electrostatic and total interaction energies are reported per structure and as averages over the
ensemble. They are calculated using the default 8.5A cutoff and a dielectric constant of 1 (all defined in the
read_struc.cns CNS script.
Output files:
- ene-residue.disp: this file contains the various energy terms per structure and averaged over
the ensemble
Example:
|