************ k2d.read.me ****************** Documentation file for the k2d program for protein secondary structure prediction. Last update: 14/4/96 ******************************************* 1.- FILES SUPPLIED k2d.read.me - This file k2d.zip - A zip compressed file containing * weights.dat - a file containing the weights of 100 trainings * k2d.c - the k2d program written in c * k2d.exe - the executable k2d program for PC * gd.dat - CD sample (glyceraldehyde-3-phosphate dehydrogenase) k2d.SUN.tar.Z - A compressed tar file containing * weights.dat - a file containing the weights of 100 trainings * k2d.c - the k2d program written in c * k2d - the executable k2d program for SUN * gd.dat - CD sample (glyceraldehyde-3-phosphate dehydrogenase) 2.- HOW TO USE k2d? - Place in the same directory the executable program and the weights.dat file. - Generate a file with your problem CD spectra. It must contain 41 CD values ranging from 200 nm to 241 nm. You can also experiment with the example supplied (gd.dat). The CD values must be given in deg cm^2 dmol^-1 multiplied by 0.001. - Run the k2d program following the instruccions on the screen. - The program generates two files: * The output CD file has three columns. The 1st has the wavelength values, the 2nd is the CD spectra of the sample, and the 3rd has the mean CD spectra value of the winning neuron for the 100 sets of training weights contained in weights.dat. * The percentage file gives the predicted alpha, beta and random coil values. Additionally, it gives the square of the euclidean distance between the real and the winning neuron CD spectra, and, acordingly to this distance, the estimation of the mean error in the prediction of the three secondary structure values. If the distance is too large, the prediction could not be reliable, and the program cannot give an error estimation. In this case the predicted values should not be taken into account. 3.- AN EXAMPLE - Run the k2d program. - Use the 'gd.dat' file as input file. - Generate a CD file and a percentage file. - You can display the CD file using your favourite graphics tool to see how the computed spectrum mimics the sample spectrum. - The resulting percentage values are 0.30, 0.12, 0.58. The square of the distance between the two spectra is 32.20. According to this distance, the program has given a maximum mean error of 0.080. This means that the sum of the errors in the prediction of the alpha, beta and random percentage values divided by three is expected to be less then 0.08. Since the secondary structure percentage values of the glyceraldehyde-3-phosphate dehydrogenase are 0.30, 0.22 and 0.48, the sum of the absolute errors in the three predicted values is 0.00+0.10+0.10=0.20. Since 0.20/3=0.066 < 0.080, the error is below the maximal error given by the program. 4.- THE ALGORITHM - The algorithm has been published in: M.A. Andrade, P. Chacon, J.J. Merelo and F. Moran. (1993) "Evaluation of secondary structure of proteins from UV circular dichroism spectra using an unsupervised learning neural network". Protein Engineering. 6: 383-390 Merelo, J.J., M.A. Andrade, A. Prieto and F. Moran. (1994) "Proteinotopic Feature Maps". Neurocomputing. 6: 443-454 5.- SEND A MAIL - If you get the program, please let us know your e-mail address by sending a mail to andrade@embl-heidelberg.de We will inform you about the following versions of the program.

[K2d Home Page]