protein="67" SHREC 2007-Protein Challenge

Representation of protein data

-PDB-file format
-atoms
-cartoon
-connected line
-Tools to view the data

About SCOP

In the SCOP (Structural Classification of Proteins) database published in 1995 all proteins of known structure are ordered according to their evolutionary and structural relationship. The protein domains are hierarchically grouped into families, superfamilies, folds and classes. The last update of the hierarchy dates from October 2004.
The basic unit in SCOP is a protein domain. The domain is either a monomer or a part of a protein and it should reflect a structure that did not change throughout evolution. Since this definition is very hard to measure by an algorithm, SCOP solely relies on visual inspection by experts.
Each domain can be addressed either by an unique integer (sunid) or by a concise classification string (sccs). For example, the protein with the PDB identity 1dlr has the sunid 34906 and the sccs ’c.71.1.1’, where ’c’ stands for the class, ’71’ the fold, ’1’ the superfamily and the last ’1’ for the family. In the ’dir.des.scop.txt’ file the domains sunid, sccs and English names for proteins, families, superfamilies, folds and classes are listed. Also the sequence number where the domain in the chain starts and ends is contained in this file.
A family consists of proteins which either have residue identities over 30% or have similar structure or functions. Globins and Triosephosphate isomerase (TIM) are examples of protein families.
A superfamily consists of proteins with lower than 30% sequential identity and a probable common evolutionary origin. Examples for superfamilies are Actin-crosslinking proteins. A fold contains proteins having same major secondary structures in same arrangement with the same topological connections. The most interesting members of a fold are those with low sequential similarity where there exists an evolutionary link to the other proteins of the fold. A class contains folds with similar secondary structure and is the most general way of defining a protein structure.

Existing methods

Many efforts have been made to find a suitable algorithm for protein structure comparison: Pride [1] computes the distribution of Calpha - Calpha distances. In [3] hierarchical clusters based on indirect coding features from the amino-acid composition sequence are formed with the help of a neural network. The Gauss integral features [7] based on knot theory are the latest attempt to tackle the protein structure comparison problem.
One of the difficulties of the task is to be approved by the biological community since the quality of the classification algorithm is very difficult to measure. Most molecular biologists use DALI for automatic classification. Furthermore, the user is interested in an alignment of the structure which is a high time consuming task. In fact, alignment techniques such as contact map overlap [4] are very popular, although their computation is NP-complete.

Literature

[1] O. Carugo and S. Pongor. Protein fold similarity estimated by a probabilistic approach based on calpha-calpha distance comparison. J. Mol. Biol., 315:887–898, 2002.
[2] L. Holm and C. Sander. Touring protein fold space with dali/fssp. Nuc. Acids, 26:316–319, 1998.
[3] C.-D. Huang, C.-T. Lin, and N. R. Pal. Hierarchical learning architecture with automatic feature selection
for multiclass protein fold classification. IEEE Trans. on Nanobioscience, 2(4):221–232, 2003.
[4] G. Lancia, R. Carr, B. Walenz, and S. Istrail. Optimal pdb structure alignments: a branch-and-cut
algorithm for the maximum contact map overlap problem. RECOMB, pages 193–202, 2001.
[5] A. Murzin, S.E.Brenner, T.Hubbard, and C.Chothia. Scop: a structural classification of proteins database
for the investigation of sequences and structures. J. Mol. Biol., 247:536–540, 1995.
[6] C. Orengo, J. S. J. D. S. M. Michie, A.D., and J. Thornton. Cath- a hierarchic classification of protein
domain structures. Structure, 5(8):1093–1108, 1997.
[7] P. Rogen and B. Fain. Automatic classification of protein structure by using gauss integrals. Proc.Nat.Sci
USA, 100(1):119–124, 2003.
[8] P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser. The princeton shape benchmark. In Shape Modeling
International, Genova, Italy, 2004.
[9] R. C. Veltkamp, R. Ruijsenaars, M. Spagnuolo, R. van Zwol, and F. ter Haar. Shrec2006: 3d shape
retrieval contest. Technical Report UU-CS-2006-030, 2006.
This homepage was created by Maja Temerinac for the SHREC 2007 Protein Challenge.
For more information, please contact:
Maja Temerinac temerina(at)informatik.uni-freiburg.de or
Marco Reisert reisert(at)informatik.uni-freiburg.de