The ModFOLDclust Model Quality Assessment Program ------------------------------------------------- Version 1.1 (Feb 2009) (c) Liam J. McGuffin Description ----------- You can only use this program if you have MULTIPLE models for your target sequence. ModFOLDclust works best if you have several models, if possible built from alternative target-template alignments, using several different methods. Clustering methods such as ModFOLDclust are currently the most accurate methods, but the ModFOLDclust results will not be as reliable if you only have a few models that have been built using the same target-template alignment. Please download the ModFOLD method if you only have a SINGLE model for your target sequence. References ---------- This software is free and you may copy it or use it in any other applications, so long as it is properly referenced. Please cite the following references: McGuffin, L. J. (2008) The ModFOLD Server for the Quality Assessment of Protein Structural Models. Bioinformatics, 24, 586-7. McGuffin, L. J. (2007) Benchmarking consensus model quality assessment for protein fold recognition. BMC Bioinformatics, 8, 345. This version of ModFOLD is dependent on TMscore code: Yang Zhang, Jeffrey Skolnick, Proteins, 2004 57:702-10. Please always cite the appropriate references. Installation ------------ No installation is required for ModFOLDclust program itself after you have downloaded the file. The program is provided in the form of an executable jar file (ModFOLDclust.jar) and is designed to run on Linux operating systems. This version of the program has been tested on recent versions of Ubuntu and CentOS, but it should work on most versions of Linux that have bash installed. Requirements: 1. A recent version of Java (java.com/getjava/). 2. Please ensure your system environment is set to English, as using other languages may causes problems with the ModFOLDclust2 calculations: export LC_ALL=en_US.utf-8 Running the program ------------------- You can edit the shell script (ModFOLDclust.sh) or you can follow the steps below. 1. (optional) Set the environment variable for Java, if you have not installed it system wide. e.g. export JAVA_HOME=/home/Liam/jdk1.6.0/ 2. Run ModFOLDclust. For example, if your target is called "T0515", the sequence file is "/home/liam/T0515.fasta" and the models directory is "/home/liam/T0515_example_models/", then enter the following: $JAVA_HOME/bin/java -jar ModFOLDclust.jar T0515 /home/liam/T0515.fasta /home/liam/T0515_example_models/ Or, if you have java installed system wide: java -jar ModFOLDclust.jar T0515 /home/liam/T0515.fasta /home/liam/T0515_example_models/ Please ensure that the models are provided as separate files in PDB format. The sequence file should be in FASTA format. IMPORTANT: Please also note that you should use FULL PATHS for your input file and models directory, the models directory should also end with a "/". Output ------ A number of output files are produced in the models directory (e.g. "/home/liam/T0515_example_models/") and a log of the progress is written to the screen as standard output. A description of the output files follows: 1. The QMODE2 output file - this file will consist of the target name plus "_ModFOLDclust.out", e.g. "T0515_ModFOLDclust.out". This file conforms to the CASP QA QMODE2 data format (http://predictioncenter.org/casp8/index.cgi?page=format#QA). 2. The sorted data file - this file will consist of the target name plus "_ModFOLDclust.sort", e.g. "T0515_ModFOLDclust.sort". This file contains the same data as the QMODE2 file but without the headers and in a more convenient machine readable format. 3. B-factor files - these have the extension "*.bfact", e.g. "nFOLD3_TS1.bfact". These files contain your original model with the predicted per-residue error entered into the B-factor column. If you open these files using Pymol or Rasmol you can colour your models according to the predicted errors with the b-factor/temperature colouring options. 4. Gnuplot files - these have the extension "*.gnuplot", e.g. "nFOLD3_TS1.gnuplot". These files contain data for each model which can be plotted using gnuplot, for example using the following script: set terminal postscript color set output "nFOLD3_TS1.ps" set boxwidth 1 set style fill solid 0.25 border set ylabel "Predicted residue error (Angstroms)" set xlabel "Residue number" set yrange [0:15] set yzeroaxis unset key set datafile missing "NaN" plot "nFOLD3_TS1.gnuplot" using 1:2 with boxes,\ "nFOLD3_TS1.gnuplot" using 1:3 with points quit Trouble Shooting ---------------- Email me: l.j.mcguffin@reading.ac.uk I will try to respond to your issue as soon as I can! Thanks, Liam